Last Updated: March 25, 2026 at 15:30
Repositories in Domain-Driven Design
Repositories provide a clean abstraction that makes aggregates appear as if they are held in a simple in-memory collection, while handling all the complexity of persistence behind the scenes. By defining repository interfaces in the domain layer and implementing them in the infrastructure layer, the domain remains pure and testable—unaware of databases, SQL, or connection strings. Repositories operate at the aggregate root level, ensuring that retrieval and persistence respect the consistency boundaries that aggregates define. When implemented well, repositories become invisible helpers that free developers to focus on business logic rather than infrastructure concerns

Introduction
In the previous articles of this series, we have built a foundation of tactical DDD patterns. We explored entities and value objects—the building blocks of domain models. We examined aggregates, which cluster these building blocks into consistency boundaries. And we looked at factories, which encapsulate the complex creation of these objects.
But there is a question that arises once objects are created and aggregates are formed: how do we retrieve them? And how do we save them back to persistent storage?
Consider an order aggregate in an e-commerce system. When a customer views their order history, the system must retrieve the relevant orders from the database. When a customer updates their shipping address, the system must save that change. These operations—retrieval and persistence—are fundamental to any application.
Yet there is a tension here. The domain model represents business concepts in a pure, expressive way. It should not be concerned with database tables, SQL queries, or connection strings. Those are infrastructure concerns. Mixing them with domain logic leads to the kind of scattered, hard-to-maintain code that DDD seeks to avoid.
Repositories resolve this tension. They provide a clean abstraction that allows the domain layer to work with aggregates as if they were simple in-memory collections, while hiding all the complexity of persistence behind the scenes.
This article explores repositories in depth. We will examine what repositories are, why they exist, how they differ from other persistence patterns, and how to implement them effectively while maintaining the separation between domain and infrastructure.
Part One: What Is a Repository?
A Concrete Example First
Before defining what a repository is, let us look at one in context.
Imagine you are working with an order aggregate. The order has an order number, a customer, a list of items, a status, and a total. In your domain code, you want to retrieve an order by its order number. You want to save an order after changes are made. You do not want to write SQL statements or worry about database connections.
A repository gives you exactly that. In your domain layer or application layer, you might write code like this conceptually:
The repository acts like a collection of aggregates. You ask it to find an aggregate by its identifier. You ask it to save an aggregate after modifications. You never see SQL. You never see database tables. You simply work with your domain objects.
Definition
A repository is a pattern that mediates between the domain layer and the persistence infrastructure. It provides an abstraction that makes aggregates appear as if they are held in a simple, in-memory collection, while handling all the details of storage, retrieval, and mapping behind the scenes.
The repository pattern has two primary responsibilities:
First, retrieval. The repository provides methods to find aggregates by their identifiers and by other business-relevant criteria. These methods return fully constructed aggregates—roots with all their internal entities and value objects intact—ready for use in the domain.
Second, persistence. The repository provides methods to save aggregates after changes. When an aggregate is saved, the repository ensures that all changes to the root and its internal objects are persisted together, maintaining the consistency boundaries defined by the aggregate.
Repository vs. Collections
A useful way to think about repositories is to compare them to collections in memory.
If you were building a simple in-memory application, you might store orders in a list. You would retrieve an order by iterating through the list until you found the one with the matching order number. You would add new orders to the list. You would replace existing orders when they changed.
A repository provides the same interface—methods like add, findById, save, remove—but underneath, it handles the complexity of talking to a real database. This abstraction allows the domain layer to remain blissfully unaware of how the data is stored.
Part Two: The Problem Repositories Solve
The Temptation to Scatter Persistence Logic
In traditional layered architectures, persistence logic is often placed in dedicated DAO or repository classes—and that is good practice. However, the boundary between where persistence ends and domain logic begins can still become blurred.
Even with a dedicated persistence layer, domain logic can leak into the wrong places. A service class might call a repository to retrieve orders, then loop through them to filter or calculate something that belongs in the domain model. Transaction boundaries may be managed at the service layer in ways that affect domain behavior. And because the domain objects themselves are often anemic—holding data but no behavior—the services end up orchestrating both persistence and business rules.
This blurring creates more subtle problems than scattered SQL.
First, the domain model remains anemic. When persistence is cleanly separated but the domain objects have no behavior, services become the default home for business logic. The domain layer becomes a collection of data containers rather than an expression of business rules.
Second, invariants become implicit rather than explicit. Instead of being enforced by the aggregate root, rules are scattered across services. A rule that should apply to an order may only be checked in one service but missed in another.
Third, testing becomes a compromise. Even with repositories, testing domain logic often requires mocking repositories and setting up test data. If the domain objects themselves contained the logic, they could be tested in isolation without any persistence infrastructure at all.
The repository pattern in DDD is not just about separating SQL from domain code. It is about creating a clean abstraction that allows the domain layer to work with aggregates as if they were simple in-memory objects, while keeping all persistence concerns—whether SQL, ORM, or NoSQL—behind an interface. The goal is not merely to avoid scattered queries, but to ensure that the domain layer remains pure, testable, and focused entirely on business rules.
The Repository Solution
Repositories solve these problems by centralizing persistence logic in one place. All retrieval and storage operations for a given aggregate root go through a single repository. The domain layer depends on the repository's abstraction, not on the underlying database.
This centralization brings several benefits.
The domain layer remains pure. It focuses entirely on business rules, without any knowledge of how data is stored.
Persistence logic is consolidated. Queries and mapping logic live in one place, making them easier to maintain and modify.
Testing becomes simpler. The domain layer can be tested with a mock repository that returns in-memory aggregates, without touching a real database.
Database changes become manageable. If the schema or the database technology changes, only the repository implementation needs to be updated. The domain layer remains untouched.
Part Three: Repository vs. Other Patterns
Repository vs. Factory
As we discussed in the previous article, factories and repositories serve different purposes and are sometimes confused.
A factory creates new aggregates that did not previously exist. It takes raw inputs and produces a new instance, ensuring that all creation invariants are satisfied.
A repository retrieves existing aggregates that have already been persisted. It takes an identifier and returns the aggregate that exists in storage.
The lifecycle is clear: factories create, repositories retrieve and persist. When a new aggregate is created by a factory, it is then saved using a repository. When an aggregate is later needed again, it is retrieved by the repository.
Repository vs. DAO (Data Access Object)
In many traditional architectures, the Data Access Object (DAO) pattern is used to separate persistence logic. A DAO typically provides CRUD operations—create, read, update, delete—for a database table.
A repository is similar in purpose but different in scope and philosophy.
A DAO typically operates at the table level. It works with database records, often returning data transfer objects (DTOs) rather than domain objects. The domain layer must then map these DTOs to its own objects.
A repository operates at the aggregate level. It works with aggregates—complete clusters of domain objects—and returns them directly to the domain layer. The repository handles the mapping from database tables to aggregates internally, keeping the domain layer free of that complexity.
Another distinction is intent. A DAO is a technical pattern focused on data access. A repository is a domain pattern focused on providing the illusion that aggregates are simply held in memory. The repository's interface speaks the language of the domain—methods like findByOrderNumber rather than selectById.
Repository vs. Unit of Work
A unit of work tracks changes to aggregates during a business operation and coordinates the persistence of all those changes at once. It ensures that either all changes are saved or none are.
A repository typically works with a unit of work. The repository retrieves aggregates. The unit of work tracks which aggregates have been changed. When the operation completes, the unit of work tells the repositories to persist the changes.
In many implementations, the repository and unit of work are separate patterns that collaborate. The repository focuses on retrieval and individual save operations. The unit of work coordinates the transaction across multiple repositories.
Part Four: Repository Abstraction
The Domain Interface
A repository is defined by its interface, which belongs to the domain layer. The interface describes what operations are available, using the language of the domain.
For an order aggregate, the repository interface might include:
findById(orderId): Retrieves an order by its unique identifier.findByCustomer(customerId): Retrieves all orders for a given customer.save(order): Persists an order and all its internal objects.delete(order): Removes an order from storage.
Notice that these methods do not mention databases, SQL, or any technical details. They speak purely in domain terms. This interface becomes a contract that the domain layer depends on, but the implementation of that contract lives elsewhere.
The Infrastructure Implementation
The implementation of the repository lives in the infrastructure layer. It takes the abstract interface and provides concrete logic that translates domain operations into database operations.
The implementation is responsible for:
- Connecting to the database
- Constructing queries to retrieve aggregate data
- Mapping database records back to domain objects
- Translating changes to domain objects into database updates
- Managing transactions, often in coordination with a unit of work
The domain layer never sees this implementation. It depends only on the interface. This separation is what allows the domain to remain pure and independent of infrastructure.
A Simple Example of Separation
Conceptually, the separation looks like this:
Domain layer (interface):
interface OrderRepository { findById(id); save(order); }
Infrastructure layer (implementation):
class SqlOrderRepository implements OrderRepository { findById(id) { ... SQL query ... } }
Application or domain layer (usage):
order = orderRepository.findById(orderId);order.changeShippingAddress(address);orderRepository.save(order);
The code that uses the repository knows only the interface and entity and value objects. It does not know that the implementation uses SQL, or that it talks to a particular database vendor. This allows the implementation to be swapped out without affecting the domain logic.
Part Five: Persistence Boundaries
One Repository per Aggregate Root
In DDD, repositories are defined for aggregate roots, not for every entity. The order repository handles the order aggregate root. It does not provide a separate repository for order items, because order items belong to the order aggregate and are always accessed through the order root.
This rule reinforces the aggregate boundary. Because external code should never directly manipulate internal entities, the repository should not provide direct access to them. The aggregate root controls access, and the repository operates at the aggregate root level.
What the Repository Retrieves
When a repository retrieves an aggregate, it retrieves the entire aggregate—the root and all its internal entities and value objects—in a single operation. The aggregate must be fully loaded and ready for use.
This means the repository must handle the complexity of loading multiple database tables and assembling them into a complete object graph. This can be challenging, especially with deep or complex aggregates. But it is essential for maintaining the consistency boundary: if the aggregate is only partially loaded, invariants that span internal objects cannot be enforced.
What the Repository Persists
When a repository saves an aggregate, it persists the entire aggregate—all changes to the root and all internal objects—in a single transaction. This ensures that the aggregate's consistency boundary is respected.
In practice, this often means the repository must handle inserts, updates, and deletes across multiple tables. Some objects may be new and need to be inserted. Others may have changed and need to be updated. Still others may have been removed and need to be deleted. The repository must determine what has changed and issue the appropriate database operations.
This is where the unit of work pattern becomes valuable. The unit of work tracks changes during a transaction and tells the repository exactly which objects need to be persisted.
Part Six: Domain vs. Infrastructure
The Layer Separation
One of the core principles of Domain-Driven Design is the separation of domain logic from infrastructure concerns. The domain layer should contain business rules, entities, value objects, aggregates, domain services, and repository interfaces. The infrastructure layer should contain database access, external service clients, file systems, and repository implementations.
This separation is not merely stylistic. It is essential for maintaining a clean, testable, maintainable system.
What Belongs in the Domain Layer
The domain layer defines the repository interface. This interface expresses the persistence needs of the domain in domain terms. It does not concern itself with how those needs are fulfilled.
The domain layer uses the repository through its interface. Application services or domain services that need to retrieve or save aggregates depend on the repository interface, not on a concrete implementation.
The domain layer contains no knowledge of:
- Database connection strings
- SQL or query syntax
- ORM frameworks
- Table names or column mappings
- Transaction management details
What Belongs in the Infrastructure Layer
The infrastructure layer implements the repository interface. It contains all the persistence details that the domain layer does not need to know about.
The infrastructure layer handles:
- Establishing and managing database connections
- Writing SQL queries or using an ORM
- Mapping between database tables and domain objects
- Managing transactions, often in coordination with a unit of work
- Handling database-specific errors and exceptions
Why the Separation Matters
This separation delivers several important benefits.
Testability. Domain logic can be tested in isolation using mock repositories. Tests can verify business rules without the complexity of setting up a real database.
Flexibility. The persistence mechanism can be changed without affecting the domain layer. Switching from SQL to a document database, or from one ORM to another, requires changes only in the infrastructure layer.
Clarity. The domain layer remains focused on business rules. A developer reading domain code sees only domain concepts, not infrastructure noise.
Maintainability. Persistence logic is centralized in repositories, making it easier to find, understand, and modify.
Part Seven: Practical Implementation Considerations
Retrieval Methods
Repository interfaces should provide retrieval methods that make sense for the domain. These are not simply CRUD methods. They reflect the actual ways the business needs to find aggregates.
For an order repository, useful retrieval methods might include:
findByOrderNumber(orderNumber)findByCustomer(customerId)findByCustomerAndStatus(customerId, status)findOpenOrders()
Each method name expresses a business need. The repository interface becomes self-documenting, revealing the queries that matter to the domain.
The Challenge of Complex Queries
One tension in repository design is handling complex queries that return only a subset of aggregate data.
Consider a dashboard that needs to show a list of orders with only a few fields—order number, customer name, total, and status. Retrieving full aggregates for hundreds of orders just to show a summary view is inefficient.
There are several approaches to this tension.
One approach is to use separate queries that return read-only projections directly from the database. These queries bypass the repository and the domain model entirely, going straight to the database and returning simple data transfer objects.
Another approach is to provide specialized retrieval methods on the repository that return only the required data, though this can blur the line between the repository pattern and simple data access.
The important principle is that writes go through the domain model, ensuring invariants are enforced. Reads can sometimes be optimized by bypassing the full aggregate loading when full consistency is not required for display purposes.
Handling Concurrency
Repositories must handle concurrent access to aggregates. When two users modify the same aggregate at the same time, the second user's changes should not silently overwrite the first user's changes.
The standard approach is optimistic concurrency control. The aggregate includes a version number or timestamp. When the repository saves an aggregate, it checks that the version in the database matches the version when the aggregate was loaded. If they do not match, the save fails, and the application must reload the aggregate and retry the operation.
This approach works well when conflicts are rare. It keeps the domain model simple while ensuring that lost updates do not occur.
Part Eight: Common Misconceptions
Repositories Are Just DAOs
Repositories are often mistaken for DAOs, but they serve different purposes. A DAO is a technical pattern focused on data access. A repository is a domain pattern focused on providing an in-memory collection abstraction. The repository interface speaks the language of the domain. The repository implementation may use DAOs internally, but the repository itself is a domain concept.
Every Entity Needs a Repository
Only aggregate roots have repositories. Internal entities are accessed through their aggregate root and are persisted as part of the aggregate. Providing a repository for an internal entity would break the aggregate boundary and allow external code to bypass the aggregate root's invariants.
Repositories Must Return Fully Loaded Aggregates
In most cases, repositories should return fully loaded aggregates. However, there are valid exceptions. When displaying lists or summaries, loading full aggregates may be inefficient. In these cases, consider separate queries for read-only projections rather than forcing the repository to return partial aggregates.
Repositories Should Handle All Query Needs
Repositories are designed for retrieving aggregates by identity and for simple queries that return aggregates. For complex reporting queries, reporting needs, or analytical queries, a separate query mechanism is often more appropriate. This is why CQRS—separating commands that change state from queries that read state—is a common companion to DDD.
Part Nine: Practical Example
Order Repository in Context
Let us walk through a complete example of how a repository is used.
A customer places an order through an e-commerce system. The application service coordinates the operation:
- A new order is created using an
OrderFactory. The factory takes the customer, items, and addresses, and returns a newOrderaggregate with a generated order number and a calculated total. - The application service asks the
OrderRepositoryto save the new order. The repository takes the order aggregate and persists it to the database, inserting records into the orders table, order_items table, and any other related tables. - Later, the customer views their order history. The application service calls
orderRepository.findByCustomer(customerId). The repository executes a query that joins orders and order_items, assembles the data into fullOrderaggregates, and returns them. - The customer changes their shipping address. The application service retrieves the order using
orderRepository.findById(orderId), callsorder.changeShippingAddress(newAddress)on the aggregate, and then callsorderRepository.save(order). The repository updates the shipping address in the database.
Throughout this process, the domain code—the order aggregate and the factory—never touches database concepts. The repository handles all persistence complexity, and the application service coordinates the flow without worrying about SQL.
Conclusion
Repositories are a fundamental pattern in Domain-Driven Design. They provide a clean separation between the domain layer and the infrastructure layer, allowing business logic to remain pure and expressive while persistence concerns are handled in a dedicated, maintainable place.
By abstracting persistence behind a domain-friendly interface, repositories offer several benefits. The domain layer remains testable and independent of database technology. Persistence logic is centralized and easy to modify. Aggregate boundaries are reinforced, because repositories operate at the aggregate root level.
The pattern is simple in concept but requires careful implementation. Repository interfaces should speak the language of the domain. Implementations should handle the complexity of loading and saving aggregates while respecting consistency boundaries. The separation between domain and infrastructure must be maintained.
When repositories are implemented well, they become invisible helpers. The domain layer works with aggregates as if they were simple in-memory objects. The infrastructure layer quietly handles the persistence details. And the overall system becomes cleaner, more testable, and easier to evolve over time.
About N Sharma
Lead Architect at StackAndSystemN Sharma is a technologist with over 28 years of experience in software engineering, system architecture, and technology consulting. He holds a Bachelor’s degree in Engineering, a DBF, and an MBA. His work focuses on research-driven technology education—explaining software architecture, system design, and development practices through structured tutorials designed to help engineers build reliable, scalable systems.
Disclaimer
This article is for educational purposes only. Assistance from AI-powered generative tools was taken to format and improve language flow. While we strive for accuracy, this content may contain errors or omissions and should be independently verified.
