Repositories and IQueryable, the paging case.
By Jérémie Chassaing on Monday, January 19, 2009, 11:50 - Domain Driven Design - Permalink
Edit : My opinion on this subject have changed… You can read the full story in Back on Repositories and Paging. Introducing reporting.
The technique is still useful to write the query services, but I would not recommend to implement it on a repository.
When it comes to repositories, people have a hard time figuring how to
respect the DDD vision while taking most out of current ORM technologies (Linq
and ORM) and not writing too much code – we’re so lazy.
The war between IRepository<T> generic repositories or not is raging outside, and I took some time to chose my side. Here are the points to consider :
- The repository is a contract between the domain and the infrastructure
- The implementation details should not leak outside
In my opinion, the first point indicates that the repository should be tailored to the domain needs. It cannot be generic, or it is not a contract at all.
When writing a contract, details matter !
This doesn’t mean that we cannot use generic tools to access data behind the interface curtain. Linq DataContext and Tables<T> are very sharp tools to implement repositories. And there is a very good post by Greg Young about that.
There is still a point to be discussed though :
Should the repository methods return IEnumerable<T> or IQueryable<T> ?
The IQueryable<T> is part of the framework, and cleanly integrated in the language.
The problem is that its implementation depends heavily on the underlying provider. And it is a really serious leak !
So lets state the question differently :
- Why would we need IQueryable ?
- Because we can add
new query clauses, and they will be executed directly in the database.
- What kind of clause would you add ?
- Don’t know…
clauses…
- Would it be business specifications ?
- No, these
should already be in the repository..
- So ?
- Sorting and Paging ! These are
presentation concerns !
- Here’s the point.
Paging is not a recent concern for programmers and there is never enough tools to implement it properly. The main problem is that paging once you’ve got all the data is less that effective. And this is what will happen with an IEnumerable approach.
But let’s ask a two last questions.
Why is paging useful ? Is it really a presentation concern ?
We need paging to navigate through large collection of object, and if a collection can grow enough so that is cannot be embraced in a single query, it becomes a domain concern !
- When your object collection is known at design time to stay in small bounds but you still want to page it for presentation clarity, there is no real penalty to fetch all and display only a few.
- But when your collection can grow big, you SHOULD provide a mechanism to retrieve only a range of it, for presentation purpose or simply for batching purpose.
The problem is that if we leak IQueryable, the user can do far more than paging, and problems can arise. So I suggest to use a new interface IPaged<T> that would provide everything needed for paging :
public interface IPaged<T> : IEnumerable<T>
{
///<summary>
/// Get the total entity count.
///</summary>
int Count { get; }
///<summary>
/// Get a range of persited entities.
///</summary>
IEnumerable<T> GetRange(int index, int count);
}
And here is a simple implementation on a IQueryable :
public class Paged<T> : IPaged<T>
{
private readonly IQueryable<T> source;
public Paged(IQueryable<T> source)
{
this.source = source;
}
public IEnumerator<T> GetEnumerator()
{
return source.GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
public int Count
{
get { return source.Count(); }
}
public IEnumerable<T> GetRange(int index, int count)
{
return source.Skip(index).Take(count);
}
}
Then your repository can return IPaged collections like this without leaking implementation details :
public IPaged<Customer> GetCustomers();
This seems to be a major step in the repository pattern understanding, and it’s underlying war. And you, on which side are you ?
Comments
I like it! Looks like something I might have to steal shamelessly for my on going platform!
@chris Of course you can use it, if you practice it and find new ideas, post it here !
The other area that I would throw out there is sorting. One could argue that sorting is a presentation concern, but obviously, sorting is going to be much faster when performed in the database and I think your same justifications for paging would equally apply to sorting.
@Shaun > Exactly ! The post would have been a bit messy if I added that, but it's going to be the subject of the next one.
Moreover I would say that paging without sorting cannot give meaningful results. You always page at least on an implicit order.
I'm intrigued to know your use case for paging in your domain (as opposed to paging in reporting, which would bypass the repository anyway)
@Michael > When your collection can be very large, it's allways convenient to have means to get it by chuncks (also called pages). Imagine you have a collection that can contain 1Million rows, and you need to process everything.
If you only have a GetAll method, you'll have to retrieve every entity in memory then process - not efficient - or using progressive fetching do operations on the fly, but in this case, the GetAll query will be very long to execute.
With paging, you can get pages of 100 entities, treat it and do it again. This leads to cheaper database access.
False dichotomy, IMO...
- Why would we need IQueryable ?
- Because we can add new query clauses, and they will be executed directly in the database.
If you expose an IEnumerable and the underlying implementation is actually an IQueryable, you can still add new query clauses and they will still be executed in the database. There's no need to expose IQueryable.
@Jérémie: Thanks for your thoughts, but I'm not yet convinced :-) I can just see it being abused for the purposes of displaying grid-views, etc..
If you're processing 1 million rows, then it's going to be an expensive operation regardless. If it's happening in your domain, then there's probably deadlines and transactions to take care of, and you'd better have a lot of memory because you want to process that sucker ASAP. The question is, should your domain (outside of the repository) need to know what a "page" is? Is it part of the language of the domain?
Again, I'm happy to be convinced be a realistic use case as I've only got my own experience to go on, but the one you gave was still quite vague.
@Michael > If the domain contains a better - mainly not as arbitrary - concept to group data in small chuncks, it is highly recommended to expose it in the model and use it in the application. I totaly agree on this.
But I think that if no concept has emerged yet, it's still a good idea to provide a way to always retrieve data in small enough chuncks. After all, when your application uses paging, it is because the quantity of data can grow big and it becomes a true concern. Not only for presentation but also for your business. You should provide a way to work efficiently with that load of data. The paging can be a solution.
@Jérémie/@Michael: I think a more general concept does exist, in many languages called a "slice". Python for example allows you to slice arrays, strings, etc.
>>> a="mystring"
>>> a[2:3]
's'
>>> a[2:]
'string'
>>> a[2:6]
'stri'
>>> a[2:6:2]
'sr'
in python the slice takes the start/end offset and an optional stride but the general concept can be altered to accept a count instead of end offset.
If instead of IPaged<T>, it implemented
ISliceable<T> : IEnumerable<T>
IEnumerable<T> GetSlice(int startingOffset, int count);
you no longer have a "page" concept but a more fundamental and familiar concept that can be applied to implement both paging, array buffer windows, etc.
However, you still have an implicit order implied in retrieving a slice.
Note: It could be made more general if needed by implementing
ISliceable<TKey, T> : IEnumerable<T>
IEnumerable<T> GetSlice(TKey startingKey, int count);
One last comment: on your existing implementation of IPaged<T> you have a 'Count' property which by its very existence implies either some means of counting the total number of elements or a naive implementation of iteration of all existing elements. Either one adds complexity and goes against the general approach of IEnumerable<T> being lazy. If a count is needed let the programmer make the call to Enumerable.Count<T> so it is explicit.
@Jérémie:
I agree, I think a separate interface, perhaps ICountable, is appropriate.
Between the recent discussions of Repositories and their implementations I ended up posting the implementation I have for my Fluent NHibernate project on my blog, if you get a chance let me know what you think about the post.
http://dotnetchris.wordpress.com/20...
@ Jérémie >
>> When your collection can be very large, it's allways convenient to have means to get it by >> chuncks (also called pages).
Your example sounds to me like your eluding to doing batch processing in chunks in which case there is some overlap between the concept of chunk in a batch processing context and paging as used in exploring information. However I think on crucial difference between the concepts is that a chunk entails processing to perform over every item where a page doesn't.
One way I have implemented paging is with the following interfaces
//Responsible for paging e.g. count, current page, iteration between pages
Pager<T>
+Count
+CurrentPage
+Next()
....
//something that can be paged
Pageable<T>
+ Count
+ GetPage(PageSize, PageNumber)
//represents the predicate - typically used in UI screens that provide ad hoc search capability
QueryObject<T>
GreaterThan
Like
....
QueryObjectFactory
+ QueryObject Create(Repository, IsPageable)
I'm using NHibernate so one of the QueryObject implementations I have looks like
QueryObjectNHibImpl : QueryObject, Pageable
QueryObject<T> Like(.....)
{ Crieria.Add(.....); return this; }
At the UI level I do something like
QueryObject<Product> TheQuery = QueryObjectFactory.Create(ShipmentRepository, true);
Pager<Product> ThePager = PagerFactory.Create(DefaultPageSize, TheQuery);
Wow that's more than I wanted to write in your comments section. I think I should turn this into
a blog post. And then you can tell me what you really think.
Cheers,
Aeden
@Aeden> I've been thinking about all that recently, and I think the solution is perhaps in command query separtion... I'll post about that soon !
Comment edited for code clarity
Regarding the underlying war, I too have been going through these ORM vs DAL vs Repository perils for over a year now and still have not come to a mature set of wisdoms (so to speak). I think much of it is due to the many interpretations out there that have somehow taken set as "the" way albeit conflicting. The REAL truth in my opinion is the perspective of it. That is, the Repository pattern (as defined by the benevolent Martin Fowler) really is a controller in essence. Period. "How" it is implemented could be many ways contingent upon available technologies of the day (read: of the day; its 2010). Patterns also tend to evolve, get optimized (e.g. classical OOP sub-classing versus more modern generics (static polymorphism) and inferencing) within a technology space or either deprecate or become "buried" by the constant growth and change of the IT landscape. After accepting that, the other distinction is in the application or "instancing" of a pattern. In one case, I could have a "repository" involving an entire data source and in another case just a single entity and yet in another case, a set or branch of related entities. The point of the repository is control based on needs.
So since its 2010, and us developers being incredibly lazy, and those fancy architects evangelizing about reuse, one should even question if they need a dedicated repository implementation, or if there is one hidden behind a much more sophisticated technology (or composite pattern), right under their nose! I saw the light on this when I saw someone post about the Specification pattern and how its semantics could be in .NET 3.5, where they pointed out that IQueryable and Expression<T> already provide (albeit generic but elegantly loose-coupled) what the patterns document! So extending those facilities as much as possible makes perfect sense, and keeps one aligned with the technology's evolutionary path). Similarly, if I were doing it in Java, I would use their primary equivalent facilities.
Or, take the service-oriented approach, and let your entity-centric services act as your repository. Note that services do NOT have to be "hosted"! As an example of a "non-hosted" service, consider the new C# 4.0 compiler-as-a-service stuff Microsoft is going to release. Thomas Erl is a godsend on the SOA/SOC/SOE topics.
For me, keeping the approach I choose aligned with the current technology spaces is more important than anything. If I use (or will use) a workflow technology, then I want it to play nicely. If I decide to later provide a services layer, then I also want it to play nicely. Or perhaps I decide to migrate to a cloud solution (Azure/System.Data.Services, S3/EC2, etc), will my solution remain resilient?
Nonetheless I digress, I did something a little different for the pagination case. I chose to boil it down a little further such that I instead defined a very generic IPageable interface (similar to the IQueryable concept) and stuffed the implementation detail down in the IQueryable and/or IEnumerable interface as general-purpose extensions:
public interface IPageable { int DefaultPageSize { get; } int PageSize { get; set; } bool SupportsPaging { get; set; } } public static class PagedEnumerableExtensions { public static IEnumerable<T> AsEnumerable<T>( this IQueryable<T> source, int pageSize) { foreach (var r in AsEnumerable(source, pageSize, 0)) { yield return r; } } public static IEnumerable<T> AsEnumerable<T>( this IQueryable<T> source, int pageSize, int skip){ if (pageSize > 0) { T[] arr = null; int count = source.Count(); int index = skip > 0 ? skip - 1 : 0; int seekIndex = pageSize > 0 ? index - (index % pageSize) : index; arr = source.Skip(seekIndex).Take(pageSize).ToArray(); if (arr == null || arr.Length == 0) { yield break; } for (; index < count; ++index) { seekIndex = pageSize > 0 ? index % pageSize : index; if (index > 0 && index % pageSize == 0) { arr = source.Skip(index).Take(pageSize).ToArray(); if (arr == null || arr.Length == 0) { yield break; } } T item = arr[seekIndex]; yield return item; } } else { foreach (T item in source) { yield return item; } } } }Then I let the IPageable interface act as a provider and plug it into whomever is going to be pageable - even if they have nothing to do with IQueryable or IEnumerable. Then I am free to compose it like this (some details intentionally left out for brevity):
public interface IJob { IRunInfo RunInfo { get; set; } JobStatus Status { get; } void Cancel(); } public class DataJob, IJob, IPageable { public int DefaultPageSize { get; set; } public int PageSize { get; set; } public bool SupportsPaging { get; set; } public DataJob() { DefaultPageSize = 1000; PageSize = DefaultPageSize; SupportsPaging = true; } public void Run() { // TODO: Refactor to your specific data source DataContext dc = new DataContext(); IQueryable<Person> items = dc.Persons; int index = 0; // Satisfy providing pagination via the IQueryable extensions since // in this case we are operating on that kind of data. foreach(var i in items.AsEnumerable(PageSize, index) // Processing... } }It is close to yours, just a different alternative.
Good blog, and good job on your rounded mentality about coming to the right solutions regarding layered, distributed, scalable data access and so forth. There are not enough people out there contributing at this quality.
Hey, i'm not too sure what this will do if you call the GetRange method twice. This will mean the queries will get stacked wierdly i think...
Cheers
@Luke> Since enumerators maintains their own state, this is actualy not an issue.