Last Updated: March 25, 2026 at 15:30

Repositories in Domain-Driven Design

Repository Abstraction, Persistence Boundaries, and Domain vs. Infrastructure

Repositories provide a clean abstraction that makes aggregates appear as if they are held in a simple in-memory collection, while handling all the complexity of persistence behind the scenes. By defining repository interfaces in the domain layer and implementing them in the infrastructure layer, the domain remains pure and testable — unaware of databases, SQL, or connection strings. Repositories operate at the aggregate root level, ensuring that retrieval and persistence respect the consistency boundaries that aggregates define. When implemented well, repositories become invisible helpers that free developers to focus on business logic rather than infrastructure concerns

Image

Introduction

In the previous articles of this series, we built a foundation of tactical DDD patterns. We explored entities and value objects — the building blocks of domain models. We examined aggregates, which cluster these building blocks into consistency boundaries enforced by the aggregate root. And we looked at factories, which encapsulate the complex creation of aggregates.

But a question arises once aggregates are created: how do we retrieve them later? And how do we persist changes back to storage?

Consider an order aggregate in an e-commerce system. When a customer views their order history, the system must retrieve the relevant orders from the database. When a customer updates their shipping address, the system must save that change. These operations are fundamental to any real application.

Yet there is a tension here. The domain model represents business concepts in a pure, expressive way. It should not be concerned with database tables, SQL queries, or connection strings — those are infrastructure concerns. Mixing them with domain logic produces scattered, hard-to-maintain code.

Repositories resolve this tension. They provide a clean abstraction that allows the domain layer to work with aggregates as if they were simple in-memory collections, while hiding all persistence complexity behind the scenes.

This article explores repositories in depth: what they are, why they exist, how they differ from other persistence patterns, who in DDD uses them and how, and how to implement them while maintaining a clean separation between domain and infrastructure.

Part One: What Is a Repository?

A Concrete Example First

Imagine you are working with an order aggregate. The order has an order number, a customer, a list of items, a status, and a total. In your application code, you want to retrieve an order by its order number and save it after changes are made, without writing SQL or managing database connections.

A repository gives you exactly that. Application code working with a repository looks conceptually like this:

// The concrete implementation is injected by the application's composition root
// — never instantiated directly here
order = orderRepository.findByOrderNumber("ORD-12345");
order.changeShippingAddress(newAddress);
orderRepository.save(order);

The repository acts like a collection of aggregates. You ask it to find an aggregate by a domain-meaningful identifier. You ask it to save an aggregate after modifications. You never see SQL. You never see database tables. You simply work with your domain objects.

Notice that orderRepository is received via dependency injection — the application code depends on the OrderRepository interface, not on any concrete implementation. The concrete SqlOrderRepository is wired together at the application's composition root (the entry point of the application), never instantiated inside domain or application service code. This is what makes the abstraction work.

Definition

A repository is a pattern that mediates between the domain layer and the persistence infrastructure. It provides an abstraction that makes aggregates appear as if they are held in a simple, in-memory collection, while handling all the details of storage, retrieval, and mapping behind the scenes.

The repository pattern has two primary responsibilities.

Retrieval. The repository provides methods to find aggregates by their identifiers and by other business-relevant criteria. These methods return fully reconstituted aggregates — roots with all their internal entities and value objects loaded — ready for use in the domain.

Persistence. The repository provides methods to save aggregates after changes. When an aggregate is saved, the repository ensures that all changes to the root and its internal objects are persisted together, maintaining the consistency boundary defined by the aggregate.

Repository vs. In-Memory Collection

A useful mental model is to compare a repository to a simple in-memory collection. If you were building a toy application, you might store orders in a list, retrieve them by iterating until you find the matching order number, and replace entries when orders change.

A repository provides the same conceptual interface — methods like add, findById, save, remove — but underneath it handles the complexity of talking to a real database. This abstraction allows the domain layer to remain unaware of how data is physically stored.

Part Two: The Problem Repositories Solve

When Persistence Logic Leaks

In architectures without a clear repository abstraction, persistence logic tends to spread. Service classes retrieve data and then perform business calculations that belong in the domain model. Transaction boundaries get managed in ways that inadvertently affect domain behaviour. Domain objects become anemic — holding data but no behaviour — because the services have absorbed all the logic.

This produces several compounding problems.

The domain model remains hollow. When domain objects carry no behaviour, services become the home for business rules. The domain layer becomes a collection of data containers rather than an expression of how the business actually operates.

Invariants become scattered. Instead of being enforced by the aggregate root, rules are spread across services. A rule that must apply to an order may be checked in one service and overlooked in another.

Testing becomes entangled with infrastructure. Testing business logic requires setting up databases, populating test data, and mocking infrastructure. If domain objects contained the logic, they could be tested in complete isolation.

The Repository Solution

Repositories centralise persistence logic. All retrieval and storage operations for a given aggregate root go through one place. The domain layer depends on the repository's interface, not on the underlying database technology.

This centralisation delivers concrete benefits. The domain layer remains pure, focused entirely on business rules. Persistence logic lives in one place and is easy to maintain. Domain logic becomes testable using a simple in-memory repository implementation — no database required. And when the storage technology changes, only the repository implementation needs updating; the domain layer is untouched.

The goal of the repository pattern in DDD is not simply to avoid scattered SQL. It is to ensure the domain layer remains pure, independently testable, and free to focus entirely on business rules — regardless of whether the persistence mechanism is a relational database, a document store, or anything else.

Part Three: Who Uses Repositories in DDD?

This is a question the pattern's description often leaves implicit. The answer is precise and worth stating directly.

Application Services Use Repositories

Application services are the primary callers of repositories. An application service coordinates a use case from start to finish. It loads aggregates via repositories, passes them to domain services or invokes methods directly on them, and then saves the results back through repositories. It also owns the transaction boundary.

// Application service — the correct place to use repositories
class PlaceOrderService {
constructor(orderRepository, customerRepository, pricingService) { ... }

placeOrder(command) {
customer = customerRepository.findById(command.customerId);
order = OrderFactory.create(customer, command.items);
pricingService.applyPricing(order, customer); // domain service
orderRepository.save(order);
}
}

Domain Services Do Not Use Repositories

Domain services contain domain logic that spans multiple aggregates — but they do not call repositories. Domain services receive the aggregates they need as method parameters, supplied by the application service that loaded them. This is a firm rule.

If a domain service calls a repository, it has mixed infrastructure concerns into the domain layer. Every test for that service now requires a repository mock. Transaction control becomes fragmented. The domain layer is no longer pure.

The correct pattern is:

// Application service loads aggregates, then passes them to the domain service
sourceAccount = accountRepository.findById(command.sourceId);
destinationAccount = accountRepository.findById(command.destinationId);
transferService.transfer(sourceAccount, destinationAccount, command.amount);
accountRepository.save(sourceAccount);
accountRepository.save(destinationAccount);

The domain service transferService has no knowledge of repositories. It receives fully loaded aggregates and works purely with domain logic.

Aggregates Do Not Use Repositories

Aggregates have no knowledge of repositories whatsoever. An aggregate's responsibility is to enforce its own invariants, manage its internal state, and raise domain events. It does not know how it is stored, retrieved, or persisted. Injecting a repository into an aggregate would create a circular dependency, make aggregates impossible to instantiate in tests without infrastructure, and violate the Single Responsibility Principle.

The Dependency Direction Is One Way

Composition Root
→ injects concrete repository into Application Service

Application Service
→ calls Repository interface (to load and save aggregates)
→ calls Domain Service (passing loaded aggregates)
→ Domain Service calls methods ON aggregates
→ Aggregates manage their own state

Nothing flows upward. Aggregates do not call domain services. Aggregates do not call repositories. Domain services do not call repositories. The application service is the sole orchestrator of persistence.

Part Four: Repository Abstraction

The Domain Interface

A repository is defined by its interface, which belongs to the domain layer. The interface describes what operations are available, expressed entirely in the language of the domain.

For an order aggregate, the interface might include:

  1. findById(orderId) — retrieves an order by its unique identifier
  2. findByCustomer(customerId) — retrieves all orders for a given customer
  3. findByCustomerAndStatus(customerId, status) — retrieves orders matching a status
  4. save(order) — persists an order and all its internal objects

These methods mention no databases, no SQL, no technical details. The interface becomes a contract that the domain and application layers depend on, while the implementation lives in the infrastructure layer.

Note the absence of a blanket delete(order) method. In most business domains, aggregates are not hard-deleted — they are cancelled, archived, or soft-deleted to preserve audit history. Whether a delete method is appropriate depends on the specific domain. When in doubt, prefer a status transition on the aggregate over physical deletion.

The Infrastructure Implementation

The implementation lives in the infrastructure layer. It translates domain operations into database operations and is responsible for:

  1. Establishing and managing database connections
  2. Constructing queries to retrieve aggregate data
  3. Mapping database records back to fully reconstituted domain objects
  4. Translating changes to domain objects into the appropriate inserts, updates, and deletes
  5. Handling database-specific errors and exceptions

The domain layer never sees this implementation. It depends only on the interface. This is what allows the persistence mechanism to be swapped without touching domain logic.

Reconstitution vs. Construction

An important distinction that repository implementations must handle correctly is the difference between constructing a new aggregate and reconstituting an existing one from persistence.

When a factory constructs a new aggregate, creation-time invariants are enforced. Business rules about initial state are validated. A newly created Order might enforce that it must have at least one item before being saved.

When a repository loads an aggregate from the database, it is reconstituting an object that already exists and was previously valid. Running creation-time validation again would be incorrect — the aggregate was already validated when it was first created. Loading a historical order from the database should not re-trigger rules about minimum items or initial state.

Implementations typically handle this through a separate reconstitution path: a private constructor, a static factory method, or ORM-managed hydration that bypasses creation-time logic. Getting this wrong causes subtle bugs where loading an aggregate accidentally fires business logic that should only run once, at creation.

A Conceptual View of the Separation

// Domain layer — the interface
interface OrderRepository {
findById(orderId): Order
findByCustomer(customerId): List<Order>
save(order): void
}

// Infrastructure layer — the implementation
class SqlOrderRepository implements OrderRepository {
findById(orderId) { /* SQL query + reconstitution */ }
save(order) { /* inserts, updates, deletes across tables */ }
}

// Application layer — usage via injected interface
order = orderRepository.findById(orderId); // receives OrderRepository interface
order.changeShippingAddress(newAddress);
orderRepository.save(order);

The application code depends only on the interface. The concrete SQL implementation is invisible to it.

Part Five: Persistence Boundaries

One Repository per Aggregate Root

Repositories are defined for aggregate roots, not for every entity. An OrderRepository handles the Order aggregate root. It does not expose a separate repository for OrderItem, because order items belong to the order aggregate and are always accessed through the order root.

This rule directly reinforces the aggregate boundary. Providing a repository for an internal entity would allow external code to bypass the aggregate root's invariants — exactly what aggregate boundaries exist to prevent.

Loading Aggregates

When a repository retrieves an aggregate, the intent is to return it fully loaded — the root along with all its internal entities and value objects — so that invariants spanning the full object graph can be enforced.

In practice, many ORM-based implementations use lazy loading for collections within an aggregate. This means the aggregate is not necessarily loaded in a single database round trip. This is acceptable as a performance trade-off, but it must be configured deliberately. An aggregate that is inadvertently partially loaded — where lazy loading silently returns empty collections — can lead to invariant violations that are extremely difficult to diagnose. Loading behaviour should be tested explicitly, not assumed.

Persisting Aggregates

When a repository saves an aggregate, all changes to the root and its internal objects must be persisted together in a single transaction. This ensures the aggregate's consistency boundary is respected — partial saves that leave the aggregate in an inconsistent state are not acceptable.

In practice this means the repository must determine what has changed and issue the appropriate inserts, updates, and deletes across potentially multiple tables. This is where the unit of work pattern becomes valuable.

Repository vs. Unit of Work

The unit of work tracks all objects that have been loaded or modified during a business operation and coordinates their persistence at the end of the operation in a single transaction. In many mature implementations — Entity Framework's DbContext, Hibernate's Session — the unit of work manages an identity map and change tracker internally. When the transaction is committed, the unit of work flushes all tracked changes directly to the database. In these implementations, calling repository.save(order) registers the aggregate with the unit of work's change tracker; the actual SQL is not issued until the transaction commits.

This is an important nuance: the unit of work does not simply "tell repositories to persist changes." In many implementations it bypasses individual repository save methods entirely, issuing persistence directly based on its own change tracking. Understanding this prevents confusion when debugging persistence behaviour in ORM-based systems.

Part Six: Domain vs. Infrastructure

The Layer Separation

One of the core principles of DDD is separating domain logic from infrastructure concerns.

The domain layer contains: business rules, entities, value objects, aggregates, domain services, factories, and repository interfaces.

The infrastructure layer contains: database access, external service clients, file systems, message brokers, and repository implementations.

The domain layer has no knowledge of:

  1. Database connection strings
  2. SQL or query syntax
  3. ORM frameworks or their configuration
  4. Table names or column mappings
  5. Transaction management mechanics

Why This Separation Matters

Testability. Domain logic and application services can be tested using simple in-memory repository implementations. No database setup is required. Tests run fast and remain stable.

Flexibility. Switching from a relational database to a document store, or from one ORM to another, requires changes only in the infrastructure layer. The domain layer is untouched.

Clarity. Developers reading domain code see only domain concepts. The persistence mechanism does not intrude.

Maintainability. Persistence logic is centralised in one place per aggregate root, making it easy to find, understand, and modify.

Part Seven: Practical Implementation Considerations

Retrieval Methods Should Speak Domain Language

Repository interface methods should reflect the actual ways the business needs to find aggregates — not generic CRUD verbs. The interface becomes self-documenting when method names express business intent:

  1. findByOrderNumber(orderNumber)
  2. findByCustomer(customerId)
  3. findByCustomerAndStatus(customerId, status)
  4. findOpenOrders()

Each name expresses a business need. A developer reading the interface immediately understands what the domain requires.

The Challenge of Complex Queries and Read Models

A tension arises when complex queries need only a subset of aggregate data. A dashboard showing a list of orders with four fields — order number, customer name, total, and status — does not need full order aggregates loaded for hundreds of records.

The appropriate solution is a read model (also called a query model in CQRS terminology). Rather than forcing the repository to return partial aggregates or adding reporting methods to a domain repository, a separate, dedicated read path queries the database directly and returns lightweight data transfer objects optimised for display. This read path bypasses the domain model entirely.

The governing principle is: writes go through the domain model so that invariants are always enforced. Reads can bypass the domain model when full consistency is not required for display. This separation is the foundation of CQRS — Command Query Responsibility Segregation — which is a common and natural companion to DDD in systems with complex query requirements.

Handling Concurrency

When two processes modify the same aggregate concurrently, the second should not silently overwrite the first's changes.

The standard approach is optimistic concurrency control. The aggregate carries a version number or timestamp. When the repository saves an aggregate, it checks that the version in the database matches the version that was loaded. If they differ, another process has modified the aggregate in the meantime — the save fails, and the application reloads the aggregate and retries.

This approach works well when aggregate boundaries are small and conflicts are rare — exactly the conditions that good aggregate design produces.

Part Eight: Common Misconceptions

"Repositories Are Just DAOs"

A DAO is a technical pattern focused on data access, typically operating at the table level and returning data transfer objects. A repository is a domain pattern focused on providing an in-memory collection abstraction over aggregates. The repository interface speaks the language of the domain. A repository implementation may use DAOs internally, but the repository itself is a domain concept — the distinction is one of level of abstraction and intent.

"Every Entity Needs a Repository"

Only aggregate roots have repositories. Internal entities are accessed through their aggregate root and persisted as part of the aggregate. Providing a repository for an internal entity breaks the aggregate boundary and allows external code to bypass the invariants the root is meant to enforce.

"Repositories Must Return Fully Loaded Aggregates in One Query"

The intent is to return a fully usable aggregate, but the technical implementation may span multiple queries, especially with ORM lazy loading. What must not happen is returning an aggregate that is silently incomplete in ways that cause invariant violations at runtime. Whether this is achieved in one query or several is an implementation detail — what matters is that the aggregate is fully usable when returned.

"Repositories Should Handle All Query Needs"

Repositories are designed for retrieving aggregates by identity and for queries that return complete aggregates for the purpose of executing domain operations. Complex reporting, analytical queries, and display-oriented projections belong in a separate read model. Using repositories for all query needs leads to bloated interfaces and the temptation to return partial aggregates — both of which erode the pattern's value.

"Domain Services Should Call Repositories When They Need Data"

This is the most operationally damaging misconception. Domain services must not call repositories. All aggregate loading is the application service's responsibility. The application service loads what is needed and passes it to the domain service. This keeps the domain layer free of infrastructure dependencies, keeps transaction control in one place, and keeps domain services testable without any persistence infrastructure.

Part Nine: Practical Example

Order Repository in Context

A customer places an order through an e-commerce application. The application service coordinates the operation:

  1. The application service receives a PlaceOrderCommand from the controller.
  2. It retrieves the Customer aggregate via customerRepository.findById(customerId).
  3. It creates a new Order aggregate using OrderFactory.create(customer, items, shippingAddress). The factory enforces creation-time invariants.
  4. It calls orderRepository.save(order). The repository inserts records into the orders and order_items tables within a single transaction.

Later, the customer changes their shipping address:

  1. The application service retrieves the order via orderRepository.findById(orderId). The repository reconstitutes the full aggregate — bypassing creation-time logic — and returns it ready for use.
  2. The application service calls order.changeShippingAddress(newAddress). The aggregate enforces the invariant that a shipped order cannot have its address changed.
  3. The application service calls orderRepository.save(order). The repository issues the appropriate update.

Throughout this process, the domain code — the Order aggregate and the factory — never touches database concepts. The repository handles all persistence complexity. The application service coordinates the flow. Domain logic remains pure, testable, and free of infrastructure concerns.

Conclusion

Repositories are a foundational pattern in Domain-Driven Design. They provide a clean separation between the domain layer and the infrastructure layer, allowing business logic to remain expressive and pure while persistence concerns are handled in a dedicated, maintainable place.

Used correctly, the pattern is precise about who may call a repository: application services load and save aggregates; domain services receive aggregates as parameters and never call repositories; aggregates have no knowledge of persistence at all. This single rule, consistently applied, keeps the domain layer free of infrastructure dependencies and makes business logic independently testable.

Repository interfaces belong in the domain layer and speak the language of the domain. Implementations belong in the infrastructure layer and handle all persistence complexity — including the critical distinction between constructing a new aggregate and reconstituting an existing one. Complex read requirements belong in dedicated read models, not in repositories.

When repositories are implemented with these principles in place, they become invisible helpers. The domain layer works with aggregates as if they were simple in-memory objects. The infrastructure layer quietly handles the persistence details. And the system as a whole becomes cleaner, more testable, and easier to evolve over time.

N

About N Sharma

Lead Architect at StackAndSystem

N Sharma is a technologist with over 28 years of experience in software engineering, system architecture, and technology consulting. He holds a Bachelor’s degree in Engineering, a DBF, and an MBA. His work focuses on research-driven technology education—explaining software architecture, system design, and development practices through structured tutorials designed to help engineers build reliable, scalable systems.

Disclaimer

This article is for educational purposes only. Assistance from AI-powered generative tools was taken to format and improve language flow. While we strive for accuracy, this content may contain errors or omissions and should be independently verified.

Repositories in Domain-Driven Design: Abstracting Persistence