Think Before Coding

To content | To menu | To search

Tuesday, February 17, 2009

Implement Linq to Objects in C# 2.0

I’m still working mainly with Visual Studio 2005 at work, and I was really missing Linq to Objects features. And I’m sure I’m not the only one.

There are workarounds when compiling C#2.0 code using Visual Studio 2008 since it’s using the C#3.0 compiler internally, but it won’t work in VS2005.

 

How does linq to objects work ?

Linq to Object works by chaining operations on the IEnumerable<> interface.

When writing the following the following linq query statement

   var paperBackTitles = from book in books

           where book.PublicationYear == 2009

           select book.Title;

The compiler translates it to :

  IEnumerable<string> paperBackTitles = books

   .Where(book => book.PublicationYear == 2009)

   .Select(book => book.Title);

 

The lambdas are used as simple delegates in Linq to Object using the following definitions :

 public delegate TResult Func<T,TResult>(T arg);

 public delegate TResult Func<T1,T2,TResult>(T1 arg1, T2 arg2);

 //...

 public delegate bool Predicate<T>(T arg);

 

 public delegate void Action<T>(T arg);

 public delegate void Action<T1,T2>(T1 arg1, T2 arg2);

 //...

So the preceding code is equivalent to :

  IEnumerable<string> paperBackTitles = books

   .Where(delegate(Book book){return book.PublicationYear == 2009;})

   .Select(delegate(Book book){return book.Title;});

But the IEnumerable<> interface doesn’t provide those methods. These are actually extension methods defined in the Enumerable class.

  public static class Enumerable

  {

    public static IEnumerable<T> Where<T>(

               this IEnumerable<T> source,

               Predicate<T> predicate);

    public static IEnumerable<TResult> Select<T, TResult>(

               this IEnumerable<T> source,

               Func<T, TResult> projection);

    //...

  }

 

The translation is immediate :

  IEnumerable<string> paperBackTitles =

     Enumerable.Select(

       Enumerable.Where(books,

          delegate(Book book)

           { return book.PublicationYear == 2009; }),

          delegate(Book book) { return book.Title; });

Once we’re here, there’s nothing that cannot be implemented in C#2.0.

 

What do we need ?

There is plenty of things in Linq to Object, and I prefer to say right now that we will not have the full integrated query syntax !

Implementing the static Enumerable class is not very difficult, let’s provide a implementation for Where and Select :

  public static class Enumerable

  {

    public static IEnumerable<T> Where<T>(

                     IEnumerable<T> source,

                     Predicate<T> predicate)

   {

      if (source == null)

        throw new ArgumentNullException("source");

      if (predicate == null)

        throw new ArgumentNullException("predicate");

 

      return WhereIterator(source, predicate);

   }

 

   private static IEnumerable<T> WhereIterator<T>(

                            IEnumerable<T> source,

                            Predicate<T> predicate)

   {

      foreach (T item in source)

        if (predicate(item))

          yield return item;

   }

 

   public static IEnumerable<TResult> Select<T, TResult>(

                            IEnumerable<T> source,

                            Func<T, TResult> projection)

   {

      if (source == null)

        throw new ArgumentNullException("source");

      if (projection == null)

        throw new ArgumentNullException("projection");

 

      return SelectIterator(source, projection);

   }

 

   private static IEnumerable<TResult> SelectIterator<T, TResult>(

                              IEnumerable<T> source,

                              Func<T, TResult> projection)

   {

      foreach (T item in source)

         yield return projection(item);

   }

 

    //...

}

You can notice that I’m splitting the methods in a part that does argument check and another one that makes the actual iteration process. This is because the iterator code will only get called when actually iterating, and it will be really hard to find out why the code throws an exception at that moment. By performing argument checking in a non-iterator method, the exception is thrown at actual method call.

 

Since C#2.0 doesn’t support extension methods we’ll have to find something so that the code doesn’t look ugly as in the final translation above.

 

Simulating extension methods in C#2.0

Extension methods are just syntactic sugar and are simply converted to a static method call by the compiler :

  books.Where(predicate)

  // is translated to

  Enumerable.Where(books, predicate)

If we can wrap the books variable in a kind of C++ smart pointer providing the Where method, the trick is done.

To do this, we will use a small struct that encapsulate the IEnumerable<> interface :

  public struct Enumerable<T> : IEnumerable<T>

  {

     private readonly IEnumerable<T> source;

 

     public Enumerable(IEnumerable<T> source)

     {

        this.source = source;

     }

 

 

     public IEnumerator<T> GetEnumerator()

     {

        return source.GetEnumerator();

     }

 

     IEnumerator IEnumerable.GetEnumerator()

     {

        return GetEnumerator();

     }

 

     public Enumerable<T> Where(Predicate<T> predicate)

     {

        return new Enumerable<T>(

            Enumerable.Where(source, predicate)

        );

     }

 

     public Enumerable<TResult> Select<TResult>(

                       Func<T, TResult> projection)

     {

        return new Enumerable<TResult>(

            Enumerable.Select(source, projection)

         );

     }

 

     //...

}

The return type is Enumerable<> so that calls can be chained.

 

We can had a small helper to make the smart pointer creation shorter :

  public static class Enumerable

  {

    public static Enumerable<T> From<T>(

                    IEnumerable<T> source)

    {

       return new Enumerable<T>(source);

    }

    //...

}

Now we can write :

  IEnumerable<string> paperBackTitles =

    Enumerable.From(books)

     .Where(delegate(Book book){return book.PublicationYear == 2009;})

     .Select<string>(delegate(Book book){return book.Title;});

We just have to extend the Enumerable class and Enumerable<> struct with more methods to get a full linq to object implementation in C# 2.0.

Wednesday, January 21, 2009

Mixing IEnumerable and IQueryable

Marcel posted a comment in the previous post saying that even if returning IEnumerable, the new query clauses would be executed in the database… But it’s not.

If the repository use linq internally and returns the result as IEnumerable, on the other side, consider something like this :

var selectedEntities = repository.GetAll().Where(x => x.Selected)

Where GetAll returns an IEnumerable (that is actually a IQueryable).

The Where extension method will be selected on Enumerable. Be careful, Extension methods are static methods, no virtual call is involved here. The static type of the object decide the selected extension method.

 

Check in your debugger, selectedEntities is an instance of the Enumerable.WhereIterator internal class.

 

So when enumerating it, it enumerates its source and returns every item that passes the predicate.

When enumerating the source, here the source use linq2Sql to get the items and creates a query that returns all rows from the database.

The where clause was not executed in the database.

So the Linq provider did not leak outside of the repository.

Monday, January 19, 2009

Repositories and IQueryable, the paging case.

Edit : My opinion on this subject have changed… You can read the full story in Back on Repositories and Paging. Introducing reporting.

The technique is still useful to write the query services, but I would not recommend to implement it on a repository.

 

When it comes to repositories, people have a hard time figuring how to respect the DDD vision while taking most out of current ORM technologies (Linq and ORM) and not writing too much code – we’re so lazy.civilwar

The war between IRepository<T> generic repositories or not is raging outside, and I took some time to chose my side. Here are the points to consider :

  • The repository is a contract between the domain and the infrastructure
  • The implementation details should not leak outside

In my opinion, the first point indicates that the repository should be tailored to the domain needs. It cannot be generic, or it is not a contract at all.

When writing a contract, details matter !

This doesn’t mean that we cannot use generic tools to access data behind the interface curtain. Linq DataContext and Tables<T> are very sharp tools to implement repositories. And there is a very good post by Greg Young about that.

 

There is still a point to be discussed though :

Should the repository methods return IEnumerable<T> or IQueryable<T> ?

The IQueryable<T> is part of the framework, and cleanly integrated in the language.

The problem is that its implementation depends heavily on the underlying provider. And it is a really serious leak !

So lets state the question differently :

- Why would we need IQueryable ?
- Because we can add new query clauses, and they will be executed directly in the database.

- What kind of clause would you add ?
- Don’t know… clauses…

- Would it be business specifications ?
- No, these should already be in the repository..

- So ?
- Sorting and Paging ! These are presentation concerns !

- Here’s the point.

Paging is not a recent concern for programmers and there is never enough tools to implement it properly. The main problem is that paging once you’ve got all the data is less that effective. And this is what will happen with an IEnumerable approach.

But let’s ask a two last questions.

Why is paging useful ? Is it really a presentation concern ?

We need paging to navigate through large collection of object, and if a collection can grow enough so that is cannot be embraced in a single query, it becomes a domain concern !

  • When your object collection is known at design time to stay in small bounds but you still want to page it for presentation clarity, there is no real penalty to fetch all and display only a few.
  • But when your collection can grow big, you SHOULD provide a mechanism to retrieve only a range of it, for presentation purpose or simply for batching purpose.

The problem is that if we leak IQueryable, the user can do far more than paging, and problems can arise. So I suggest to use a new interface IPaged<T> that would provide everything needed for paging :

public interface IPaged<T> : IEnumerable<T>

    {

        ///<summary>

        /// Get the total entity count.

        ///</summary>

        int Count { get; }

 

        ///<summary>

        /// Get a range of persited entities.

        ///</summary>

        IEnumerable<T> GetRange(int index, int count);

    }

 

And here is a simple implementation on a IQueryable :

public class Paged<T> : IPaged<T>

    {

        private readonly IQueryable<T> source;

 

        public Paged(IQueryable<T> source)

        {

            this.source = source;

        }

 

 

        public IEnumerator<T> GetEnumerator()

        {

            return source.GetEnumerator();

        }

 

        IEnumerator IEnumerable.GetEnumerator()

        {

            return GetEnumerator();

        }

 

        public int Count

        {

            get { return source.Count(); }

        }

 

        public IEnumerable<T> GetRange(int index, int count)

        {

            return source.Skip(index).Take(count);

        }

    }

Then your repository can return IPaged collections like this without leaking implementation details :

public IPaged<Customer> GetCustomers();

 

This seems to be a major step in the repository pattern understanding, and it’s underlying war. And you, on which side are you ?