Think Before Coding

To content | To menu | To search

Monday, March 5, 2012

Caching done right

I was trolling on twitter Saturday, when I saw tweet by Nate Kohari and some answers :

 

image

I immediately thought :

If you have one problem and use cache to solve it, you now have two problems.

 

Where’s the problem ?

 

The time to retrieve the data is not negligible due to frequency of request and/or time taken by calculation + data access. So we put data to cache so that we don’t have to endure this time on each call.

 

But then comes the problem of cache expiration:

  • We can use a duration.. but what is the acceptable delay ?
  • We can make a check to original data to check if it has changed. It’s more accurate, but incurs a new data access.

Moreover, checking if it changed is often not enough, we also need to find what changed.

 

And deriving what happened from state is basically reverse engineering. I’m an engineer. I prefer forward engineering.

 

 

Let’s do it forward

 

It’s actually easy, it’s the whole point of CQRS.

 

Let’s build a system that raises Domain Events, and we can denormalize events to a Persistent View Model.

 

We just have to listen to events and change the representation we want to send to the users:

  • The events contain everything we need to do fine grained updates easily.
  • We can can compute denormalizations asynchronously if it’s time consuming
  • We can store it in a relational database, a document database, or in memory
  • We can choose any form of denormalization since it’s a denormalization (object graph, but also flat string, json, html …)
  • It will be up to date quickly because it will be updated when the original data changed
  • The first client that makes a request after a change will not endure a cache miss that can be long to process since computing is done on change, and not on request.

A good way to Keep It Simple, Stupid!

Sunday, February 26, 2012

NuRep your local NuGet+symbols+source repository

As some of you already know, I'm a proponent of DRY : Do Repeat Yourself, but code reuse has some value and nuget is a good way to manage it.

 

So far, the advantages I see in using a package manager are:

  • easy get/update of external projects and their dependencies
  • on demand dependency update (instead of forced dependency update)
  • makes it easier to modularize dependencies

All this thing are out of the box when using OSS projects published on nuget.org, and you can host your own nugets using NuGet.Server.

 

The debugging story is also quite good. symbolsource.org can host symbols and source packages and be used as source server. You can directly step in your favorite OSS source code without having to compile it yourself. They even provide private repositories.

 

But sending your company's source code to an external service is not always compatible with internal policy.

 

In this configuration, using your own nugets leads to a poor dev experience when you have no way to step in your own code : compiled in Release and potentially not your latest code version, indicating the source code in another directory will not give good results.

 

You need a source server.

 

NuRep to the rescue

 

NuRep is a nuget repository based on NuGet.Server but it is also a symbols + code server.

 

When creating your nuget package, specify the –Symbols flag, and nuget will create a .nupkg and a .symbols.nupkg that you can push to NuRep (http://myserver/nurep/api/v2/package).

 

Then configure visual studio's symbols servers (Tools / Options / Debugging / Symbols / add http://myserver/nurep/symbols )

 

Don't forget to enable source server in Debugging / General options, and to disable Just my code.

 

That's it.

 

Now you'll step into the exact code that was used to compile the nuget version.

 

Have fun !

Friday, December 2, 2011

I love SQL Server and cultures... NOT !

When developing a large system, all is not unicorns and rainbows.

For now, everybody was working on a single SQL dev server and friction is high.

 

That’s why I’m working on SQL scripts management with mercurial and powershell to the rescue, so that any developer can trash his own SQLExpress instance, and rebuild everything needed in a single command. (I’ll maybe blog about all that later).

 

We have loads of stored procs.. I know people don’t like it, but it acts as a strong sanity layer when the database schema is so ugly your eyes bleed when you look at it.

 

Yesterday, I run a stored proc, and I get the following error :

The conversion of a varchar data type to a datetime data type resulted in an out-of-range value.

 

Why the f**k.

 

The procedures is using a scalar function :

ALTER FUNCTION [dbo].[DateMaxValue]()
RETURNS datetime
AS
BEGIN
RETURN '9999-12-31 23:59:59.998'
END

 

It’s working on other servers.. why doesn’t it work here.

After several tries, I try with the date ‘9999-12-01’ and I get the following date:

Year: 9999

Month: 01

Day: 12

 

Yes.. the date is interpreted as YYYY-dd-MM on a French server.

 

Even when you use the YYYY-??-?? format, SQL Server still try to mess with culture month/day ordering !

 

You can use the SET DATEFORMAT dmy or SET DATEFORMAT mdy to change this, but it will apply only in current session, and you cannot use it in a stored proc.

 

You can change the server culture, but it wont change anything. The dmy/mdy setting is ultimately in the Login culture.

 

You read it right :

  • For an English Login the function above works.
  • For a French Login the function above fails miserably.

There is no way to my knowledge to specify a strict date parsing in a stored proc or function.

 

So generates your logins with scripts, and enforce the same culture for all logins.

 

It’s just profoundly broken.

Tuesday, June 21, 2011

Event Sourcing and CQRS, Dispatch options 2

In the part one comments, Clement suggested a more efficient solution than registering handler in constructor.

 

The proposed solution is to have a RegisterAllEvents virtual method in which event handler registration would occur. This method is a method instance to have access to this but will be called only once per class. The registration use Expression<Action<T>> to access the expression tree and extract the method info of the handler. This enables type checking, make R# happy – no unused methods – and make reflection not too painful.

 

Good solution.

 

I didn't go that far because with Event Sourcing, you usually keep aggregates in memory, so aggregates are instantiated once per service lifetime.
I just crafted a small performance test :

 

 


using System;
using System.Collections.Generic;
using System.Diagnostics;

namespace AggregatePerfTest
{
class Program
{
static void Main(string[] args)
{
var watch = new Stopwatch();

const int count = 10000000;
Guid id = Guid.NewGuid();

watch.Start();
for (int i = 0; i < count; i++)
new AggregateRegisteredOncePerInstance(id);

watch.Stop();

Console.WriteLine(watch.Elapsed.TotalMilliseconds);

watch.Reset();
watch.Start();

for (int i = 0; i < count; i++)
new AggregateRegisteredOncePerClass(id);

watch.Stop();

Console.WriteLine(watch.Elapsed.TotalMilliseconds);

}
}

public class AggregateRegisteredOncePerClass
{
private readonly Guid id;

private static readonly object ClassInitLock = new object();
private static bool initialized;


public AggregateRegisteredOncePerClass(Guid id)
{
this.id = id;

lock (ClassInitLock)
{
if (!initialized)
{
initialized = true;
// registration happens only once here
}
}
}

public Guid Id
{
get { return id; }
}
}

public class AggregateRegisteredOncePerInstance
{
private readonly Guid id;
private readonly Dictionary<Type, dynamic> handlers =
new Dictionary<Type, dynamic>(5);
public AggregateRegisteredOncePerInstance(Guid id)
{
this.id = id;
Register<int>(OnSomethingHappened);
Register<double>(OnSomethingHappened);
Register<float>(OnSomethingHappened);
Register<long>(OnSomethingHappened);
}

public Guid Id
{
get { return id; }
}

public void DoSomething()
{
Apply(1);
}

private void OnSomethingHappened(int message) { }
private void OnSomethingHappened(double message){ }
private void OnSomethingHappened(float message) { }
private void OnSomethingHappened(long message) { }

protected void Apply<T>(T @event)
{
handlers[typeof (T)](@event);
}

protected void Register<T>(Action<T> handler)
{
handlers.Add(typeof(T), handler);
}
}
}

The code is straight forward, I just created two aggregate classes :

  • one with registration in .ctor based on this post code
  • one without any registration at all, considering that doing it once is the same as not doing it for large numbers, but I added a lock section with a boolean check to simulate what will done on each instance creation.

I created 10.000.000 instances for each, and you get:

  • 3978ms for the one with .ctor registrations,
  • 377 ms for the one without.

It's true that it makes a difference. But how many aggregates do you have in your system ?

 

With 10.000 aggregate you're still under 8ms. I think you can afford that.

 

This is then a trade-off between performance and simplicity :

  • If you have very large numbers, go for expression tree parsing, class lock management etc.
  • In any other situation I recommend using registration in .ctor that makes the code easy to implement in approximately 5min.

Tuesday, June 14, 2011

DDDx 2011

I’m just back from DDDx 2011, and it was great !

The event happened Friday 10 at Skills Matter in London, with great speakers, coffee and food.

 

You can see all the talks on Skills Matter website. Congratulation to the team that released the videos on the web in less that an hour.

 

It was also the occasion to meet IRL DDD practitioners I usually find on twitter.

 

You can also register for next year now for only 50£.

 

So Hurry up !

Thursday, June 9, 2011

Time

556656621_ba9e8c870f[1]

How do we usually manage time in applications ?

Timers, threads, concurrency locks…

If we want to practice Domain Driven Design, we’re surely at the wrong level of abstraction.

 

What is time, btw ?

Tricky question. We know what time is, but… giving a definition is not that easy.

What defines time ? The second ?

Not really. It is used as a measure of time, but it doesn’t seem sufficient.

 

Let’s have a look at Wikipedia’s definition of time  :

Time is a part of the measuring system used to sequence events, to compare the durations of events and the intervals between them, and to quantify rates of change such as the motions of objects. […]

Now we have something interesting : Time is what happens between events.

But what is this thing between events.

The definition of the measure unit surely can give us further insight.

Lest have a look at Wikipedia’s definition of the second :

[…] Since 1967, the second has been defined to be

the duration of 9,192,631,770 periods of the radiation corresponding to the transition between the two hyperfine levels of the ground state of the caesium-133 atom.

The second is defined as a count of transitions between states of an atom of cesium.

We measure time by considering that the ~time~ between those state transitions is constant.

What if it’s not ?

It’s not that important if other events seem synchronized with those events. Will come back to this later.

 

Events

Let’s step back a bit.

 

How do you feel time passing ?

By looking at your watch ?

 

Maybe, but how could you explain that an hour sometimes seems so long, and sometimes passes in a flash ?

 

Time seems slow and empty when you’re bored.

Times seems fast and full when you’re busy with interesting things.

When you’re bored, it’s because few interesting things happen.

 

You can deduce from this that your personal state transitions are those interesting things that happen.

These are meaningful events. Things that happens and change you deeply.

 

Of course a lot of things happen between those meaningful events, you’re moving, thinking. Your blood flows through your body, but it is just maintenance move. You don’t change deeply.

Maybe some things happen between state transition of a cesium atom, but since we cannot notice it and give it a meaning for now, it has no influence.

 

But when when a meaningful event happens, you change. You’re not the same before and after.

This is what time is about, and this is why it’s one way.

Before –> Event –> After

Events define time by causality

This perception of meaningful events is surely a reason why people say the time pass faster when old. In your 6 first years any event around you is meaningful. Any event make you change since you have no previous knowledge. Then has time goes by, you integrate knowledge and filter things you already know, you’ve already seen. When old, a year can more easily seem the same than the year before.

But some people continue to enjoy and learn as much as they can to still have a long now.

 

When do your system change ?

Your system never change for no reason.

It’s always because a meaningful event happened.

This event can be a user interaction, a call from an external system, a sensor trigger…

And when things change because it’s midnight ?

It simply means that midnight is a meaningful event in your system.

 

Where are those meaningful events in your code ? Hidden in infrastructure code ?

 

I hear Greg Young say :

Make the implicit explicit !

And it’s simple :

Use Domain Events.

 

Once you’ve introduced Domain Events in your domain model, you have made Events and so Time explicit in your domain.

There is no change in the domain that is not due to an Event.

The events appear everywhere in the Ubiquitous Language :

  • When the client HasMoved to a new location, send him a welcome kit.
  • When a RoomHasBeenOverbooked try to relocate the customer
  • Every day at midnight = MidnightOccured, change last minute prices.

I’m sure you can find examples in your own domain. If your domain is business related, it has to deal with time because business is about time and money.

 

Time is now part of your Ubiquitous language and you have an implementation for it.

And that’s huge.

Monday, January 24, 2011

Switching keyboard language in WP7

It was bugging me that I could not switch WP7 keyboard language.

I write most of my emails in French, but my blog and tweets are in English.

I’ve seen that some people were also asking it for the soon to come update.

 

But you can actually already do it easily.

 

Here’s a short how to.

 

Go to Settings/Keyboard. Then tap on Keyboard language.

 

You can select multiple languages here with the checkboxes !

 

That’s all.

 

Then when you open any application with a keyboard you can notice the language selector near the space bar :

imageimage

You an also view all selectable languages with tap&hold :

 

image

 

That’s it.

Friday, January 21, 2011

SmtpListener

I’ve been posting a sample SmtpListener on my repository a few days ago.

It’s infrastructure stuff.. what does it have to do with what I’m usually talking about here ?

 

It’s about Reactive Programming.

 

Sometimes you have to integrate with legacy systems that’ll send you things through email.

Ok, I know, it kind of sucks as an integration mechanism, but still… you have to do this way.

 

The usual way to receive email in an application is to set a mailbox on a mail server (think Exchange), and pull mailboxes periodically to see if there’s something new there.

 

There are two bad things here :

  • People tend to use their enterprise mail server for this. The mail sever is often vital for your business and screwing it up with a bug can have a big impact on your organization.
  • Email are pushed to you.. why would you pull it. You can push it directly on your service bus !

So, I prototyped a small smtp listener that could easily be integrated with whatever you want.

 

You need to add a MX entry in your dns zone configuration so that other servers find it, and get a valid certificate if you want to use secured TLS connections (I’ve disabled certificated verification for demo purpose).

 

But as you can see, the code is very simple since the .Net framework has already what’s needed.

 

The TcpListener is used to receive connections that’ll provide a TcpClient with underlying streams.

The SslStream class is used to encapsulate tcp streams to add Ssl encryption.

I’m using the reactive framework to convert the Begin/EndAcceptTcpClient methods to an Observable to avoid writing the accept loop myself.

 

Then implementation of the protocol is very easy, you can find an overview and sample on wikipedia.

The RFCs can easily be found :

RFC 5321: Simple Mail Transfer Protocol

RFC 3207: SMTP Service Extension for Secure SMTP over Transport Layer Security

 

Of course you can use it freely, and propose changes to make it better since it’s not attack proof.

This code is not resistant to known potential SMTP attacks, including dangling connections, long lines etc..

Thursday, January 20, 2011

Code code code

There are some sample/proof-of-concept codes on code.thinkbeforecoding.com (hosted by bitbucket.org)

 

You can have a look at it and contribute if you want.

 

Enjoy !

Tuesday, October 19, 2010

Duck : Delete Update Create Killer

Duck sign, Stockbridge High StreetI recently had a remark from Frederic Fadel from Aspectize, telling me about Event Sourcing something like:

Why would you like to write SQL to write data to your read model when our product can do it for you ?

I acknowledge that their product is fancy. You simply declare your db data schema, your UI and services and bind it all together.

But it doesn’t fit well with CQRS and Event Sourcing. And I want to do Event Sourcing for domain and business reasons, not technical reasons.

But he was write on this point :

I don’t want to write SQL to denormalize my events to my queryable storage.

What are my options ?

  • Writing SQL by hand, but testability is low, and you’ll get a mix of C# to get data from the events, and SQL for the Update queries.
  • Using an ORM. When doing NHibernate you don’t really write SQL. Testability wont be great anyway.

The problem with ORMs

ORM are usually better at getting data than at changing it. You can do it, but let’s look at what happens.

The ORM loads data from your Db into entities that will be tracked by an identity tracker. Then you change the values in memory . Then the ORM will find what needs to be sent back to the server and make a query to the Db so that the change happens.

But what I need to do is a lot simpler. Just emit some INSERT, UPDATE or DELETE based on current table values and event data.

With an ORM, what happens if the data is changed between loading and saving ? I’ll have to manage some versioning and/or transaction. And I’ll make two roundtrips to the server needlessly.

Here comes Duck

Duck is a kind of ORM oriented toward Delete Update Create.

Don’t ask Duck to load data in memory, it simply can’t.

You simply express how data should change based on current row content and values that you’ll pass.

It avoids the first roundtrip to the database, and make shorter code to express the change.

Let’s see how to use it

First, you should declare a class that has the structure of your table with public get/set properties, and marked with a Table attribute :

     [Table]
class Species
{
public Guid Id { get; set; }
public string Name { get; set; }
public string BinomialName { get; set; }
public bool IsEndangered { get; set; }
public int Population { get; set; }
}

It contains current observed species at an observatory.

Then a simple new statement, let’s say that a new species has be registered at the observatory :

     var duck = new DuckContext'(connectionString);
var speciesId = Guid.NewGuid();
duck.In<Species>()
.Insert(() =>
new Species
{
Id = speciesId,
Name = "Mallard",
BinomialName = "Anas platyrhynchos",
IsEndangered = false,
Population = 50
});

Nothing tricky here..

The observatory noticed a population decay, the species is endangered :

     duck.In<Species>()
.Where(r => r.Id == speciesId)
.Update(r => new Species {
Population = r.Population - 40,
IsEndangered = true});

Here, the use of the current value of Population will not load current value. It will the following statement :

UPDATE Species
SET
    Population = Population - 40,
    IsEndangered = 1
WHERE
    Id = @p0

I chose  to create a new Row from the old one rather than change the original one for two reasons :

  • It makes rows immutable and you don’t have to think about execution order between fields assignments. It’s the way SQL works
  • Linq Expressions cannot represent statement blocks and assignments in C#3, Duck would have been .Net only…

The –40 is directly in the query here because we used a constant. I we where using a variable, query would contain a parameter

Now the species has disappeared, it should be removed from the observed species (though it could be just an update somewhere else) :

     duck.In<Species>()
.Where(r => r.Id == speciesId)
.Delete();

Testability

To run your test you just have to use the InMemoryDuckContext… you have then access to Table<T>() that’ll enable you to set up your data and verify after execution that things changed as expected. I’ll talk a bit more about it soon.

Try it now, it’s OSS

You can grab the code at bitbucket and try it now :

http://bitbucket.org/thinkbeforecoding/duck

It’s in F# ! Writing a AST analyzer in F# is far more easy, concise and expressive than in C#. You’ll just have to reference Duck in you project, there’s no direct F# dependency.

Next episode will be about how to mix it with Rx (Reactive Framework) to declare your event handling logic.

Hope you like it, and don’t hesitate to give feedback and suggestions.

Monday, June 14, 2010

DDD Exchange 2010

I could not attend to this year’s edition that seemed really great with Eric Evans, Greg Young, Udi Dahan, Ian Cooper and Gojko Adzic.

The videos from the events should be soon somewhere around here.

And you can already find transcripts of the talks on Gojko ‘I post faster than people talk’s blog :

Eric Evans: Domain driven design redefined

Udi Dahan: the biggest mistakes teams make when applying DDD

Greg Young :Evolution of DDD: CQRS and Event Sourcing

 

If you also missed it, don’t make the same mistake next year, and register now for £50.00 (instead of £250.00) until the end of the week.

Sunday, April 25, 2010

Event Sourcing and CQRS, Events Deserialization

So we have our events serialized in our event store. Deserializing events is not an issue, until we start to make them evolve and need to manage several versions.

Since we never modify what has been log, we’ll have to deal with old versions anyway.

A simple way to do it is to maintain every versions of the events in the projects, and make the aggregate root accept all of them. But it will soon charge the aggregate root with a lot of code and will make it bloated rapidly.

This is why you can usually introduce a converter that will convert any version of the event to the last one (usually you provide methods to update to next version, and iterate until last version so that this part of the code is incremental). This is a convenient way to address the problem, but you still have classes v1, v2 … vn that you keep in your project only for versioning purpose even if you don’t use it anymore in your production code.

Events as documents

519485340_1a83117720_o[1]It is easy do deserialize an event as an object or a document, you only need to split two responsibilities in you deserialization process :

  • Stream reading
  • Object building

The deserializer will be in charge of reading the data, it reads the bits, and get the meaning from context, it will tell the Object Builder about objects types, fields names and value.

On its side, the ob ject builder will instantiate the objects, set fields values depending on names.

You can provides two distinct Object Builders. The strongly typed one will instantiate concrete .net types and set fields using reflection. The document builder one, will instantiate objects that will only be property bags.

When deserializing an event in its last version, you can use directly the strongly typed one, but when reading an previous version of the event, you can deserialize it as a document and give it to the converter.

The converter will then add/remove properties from the document to make it up to date, and the document will be used to create a concrete .net type of the last event version.

Here the process is quite the same, you should provide a document reader that will use the strongly typed object builder to instantiate the event.

There’s no need to keep every version of you Event Classes now since you can manipulate old versions as documents.

Using dynamic in C#4

Document manipulation can make things a bit messy since it can be hard to understand the original structure of the object. This is where you can use the DLR DynamicObject class to make the property bag (the document) a dynamic object that you’ll be able to use as any standard .net object.

This way, in the converter you can manipulate old versions of the events as .net objects without having to keep all those old classes that won’t be used anymore.

Saturday, April 17, 2010

Event Sourcing and CQRS, Bounded Contexts

Once again, I prefer a new post that a long comment reply. This one is about a important concept of Domain Driven Design, Bounded Contexts.

Hendry Luk asked :

Just 1 question, you represent borrower in events as a simple full-name string.
Is there any reason or just for sake of simplicity for example?
Supposed I'm using borrowerId, how would that work in other BC, say
LateBookNotifier (let's assume its a separate BC). How does this BC shows the
name of the borrower? Does it communicate directly with command BC using ACL?
Or does it also subscribe to BorrowerRegistered event as well (hence every BC
would have duplicate data of each of the borrowers, just like they do each of
the books)?

The short answer is ‘Yes, it was just for sake of simplicity’. In a real world scenario, borrowers would probably be entities, and thus would have an identity. I would even probably be an Aggregate Root.

The Borrower Aggregate Root would encapsulate state needed to perform commands on this Aggregate.

Bounded Contexts CommunicationsBooks

I can see the following contexts here :

  • Inventory : Manage books availability and state (the book has been damaged, there a notes written on it etc..)
  • Relationships : Manage contact by email, phone with borrowers, and tracks the care they take to your books, if they return it on schedule.

Since we are using CQRS (and even more, Event Sourcing), aggregates in these context don’t need more state that what’s needed to take decisions,

So a Book in the Inventory Context will probably not need more that the Id of the borrower and the date a witch it was borrowed.
We can then call the ReturnToShelf command on the Book that will publish a ReturnedLateToShelf { Book : bookId, By : borrowerId, After : 20 days, LateBy : 6 days  }.

A Handler at the Relationships Boundary will catch the event, and call a CheckExcuseForLateReturn on the Borrower Aggregate Root (based on its id). The command will check the borrower’ss record to see if its acceptable. It will simply publish a LateReturnGentlyAccepted if the borrower is usually in time, but will publish a KindnessLimitReached in the other case.

Another handler will catch it, and call SendAngryMessage on the Messaging Service. The role of the Messaging Service is to tweet borrowers to let them know they should not forget to return your books. How does this service know the twitter account of the borrower ? When the handler (the one that call SendAngryMessage) catches a BorrowerRegistered event or a BorrowerTwitterAccountChanged message, it says so to the service that can maintain a list of accounts in any desired storage (SLQ, NoSql, in memory.. ?). The SendAngryMessage can now tweet ‘Hey you filthy @borrower, you better return my book today or I shall share all the pics from your last party…’

Done.

Where does data live ?

There’s usually a huge concern about data duplication in all contexts. Is the info duplicated in so many places ?

There will be two main places :

  • The Persistent View Model used to see and edit borrower’s details
  • The Persistent View Model used by the messaging service to Query borrower’s twitter accounts. Here, no other borrower’s data is needed except its id and account name.

The Borrower Aggregate Root and Book Aggregate Root in the two main Domain Bounded Contexts will not need to keep track of this kind of data. They won’t need it in their decision process.

If you pursue this idea, to answer further to Leonardo, you’ll notice that strings will probably never been used as state inside Domain Bounded Context. They can appear as identity key, or just pass through a command and be republished in the following event. But since strings are rarely – if never – a good way to represent information on which you’ll have to take a decision, it should almost never be stored in an aggregate root current state. This is another reason why most domain models can fit in memory, because names, descriptions and other documents usually represent the biggest part of the data in a system, the remaining data is usually small. These documents and names are useless to run domain  internal logic (except validation rules, but not state change rules) so they can simply be logged in events and persisted in the Query’s View Models. Only state needed to take state change decisions will stay in memory.

Thursday, February 25, 2010

Event Sourcing and CQRS, Snapshots !

Leonardo had a question about reloading huge amounts of events.

It’s true that some Aggregate Roots have very long lifetimes with lots of events, and it can become a problem.

 

There are two things involved to resolve this problem :

Snapshots

Ok, the philosophy of event sourcing is to store changes instead of state, but we’ll still need state in our Aggregate Roots, and getting it from scratch can be long.

Take a snapshot every n events (you’ll see that n can be quite high), and store it alongside events, with the version of the aggregate root.

 

To reload the Aggregate Root, simply find the snapshot, give it to the Aggregate root, the replay events that happened after the snapshot.

 

You only need the last snapshot for each Aggregate Root, no need to log all passed snapshots.

 

When you want to change stored state in an Aggregate Root, you won’t be able to used last snapshot since it will not contains expected state. But you can still replay events from scratch when it happens, so you have no loss, and simply take a new snapshot with the new state.

 

In memory domain

Usually with an ORM, you reload entities from the storage on every unit of work.

 

But in the case of Event Sourcing, your Aggregate Roots only need to retain state that will be used to take business decisions. You’ll never query state from Aggregate Roots. A large part of the entity state and especially the part that has the biggest memory footprint is usually stored only for queries, like names, descriptions and things like that.

 

In an Aggregate Root in an Event Sourcing environment, a name or description can simply be checked for validity, put in an event, but don’t need to be kipped in the in memory entity state – the Aggregate Root fields.

 

You’ll notice that your big domain state can fit in memory once you’ve trimmed it this way.

 

Now that your model is in memory, no need to reload every events on each unit of work. It happens only once when the Aggregate Root is needed the first time.

 

Well see soon how you can use this to make your event serialization even faster to have very high business peak throughput.

Thursday, December 10, 2009

Business Errors are Just Ordinary Events

Error handling has always been something quite difficult to grasp in software design and still is.

Exceptions are now widespread in languages, and it helps a lot to manage corner case where something fails badly.

But should we use Exceptions to manage business errors ?

The business errors

What do we call business errors actually ?

Broken Invariants

What if an invariance rule is broken ?

The situation should never happen : There is a bug. A bug is not a business error, correct it and deploy.

The situation can happens sometimes : This is not an invariant, but a rare state. It should be handled as any other state change.

Invalid commands

What if we receive an invalid command ?

The command data is meaningless : There’s a bug, you should always validate that command data is not just garbage.

The command leads to an invalid state : The user nonetheless requested to perform the command.

In this case the event will be ‘the request was rejected’. The event can be handled by sending an email back to the customer, or a support request can be started so that the support can call the customer and manage the problem. All this is part of the business process anyway.

Corner cases create business opportunities

I can often see discussions around account validation for credit, to make the transaction fail when your account goes below zero.

But it’s not what’s happening in real life. Transaction is accepted, then the bank charges you because your account is in the red zone.

I’m currently working in the hotel business. When a booking is received and  there’s no room left, should I reject the booking ? Another client can cancel soon, or I can move the customer to another hotel nearby, but just saying ‘there’s no room left’ is not a good business answer ! Overbooking management has even become a strategic practice in the business.

To fully manage your customers you should embrace the whole business lifecycle in your system. This includes support and corner case management. Part of it will be done by hand, other part automatically, but you should not just report an exception is a trace log.

These critical situations are usually the one in which you customer needs you more than in any other case, you should design your fault handling strategy with care and make it a full concern of you business.

Udi Dahan's post on CQRS

Udi Dahan wrote a new post on CQRS today : Clarified CQRS

It is essentially the content of the presentation he gave here in Paris and in other places.

You should read it I you want to understand the deep reasons to use CQRS and see how to change your mind to use it.

Tuesday, November 17, 2009

Udi Dahan talks on CQRS in Paris

Udi Dahan gave a very good talk yesterday evening at Zenika, there was only few attendees… perhaps because it was on a Monday evening. Whatever, there was barely not enough place already in the Italian restaurant where we moved after.

I won’t make a full report, just talk about some interesting points.

First of all, the session focused mainly on why you should do CQRS and not how. Second point, the talk was not about event sourcing, but you already now that you can do CQRS without event sourcing.

Something we should accept : Stale Data

The paradigm of usual architecture’s best practice has a serious flow : when you show data to your users, it’s already stale.

Is it important ? Yes.

Is it a problem ? Not really.

The world have worked with stale data for years, and it was handled rather gracefully until now. Computers have reduced the time span, but when the data appear on the screen, it’s stale.

Tel it to your users, they will accept it. Find with them what is acceptable. 1 second, 10 seconds, 1 minute, 1 hour, 1 day ? The users are used to it in there own business. Do it too.

Queries

What’s the purpose of queries ? To show data. Not objects.

So why should the data from the database come across 5 layers through 3 model transformations ? It’s a bit overkill to display data.

Why not just this : The UI read data from the database and displays it ?

No DTOs, no ORM, not business rules executed on each query.

You simply define a Persistent ViewModel (thank’s Udi, I like this description of the Q side), and display it directly to screen. It should be as simple as one database table per UI view.

Of course you need a way to keep the Persistent ViewModel up to date, but we’ll see that later.

Commands

On the other side, there are commands.

It should be done in 3 phases :

Validation

Is the input potentially good ? Structured correctly, no missing field, everything fit in ranges ?

This can be done without knowing current state, and be done outside of entities command handling.

Rules

Should we do this ?

Here, the decision is taken using current state.

It leads to a discussion about UI design. In order to handle the user command as well as you can, you have to capture the user intent in the command.

In CRUD applications, the new data is sent by the UI layer. You have to extract the user intent from that data to know if you can process the data.

There is a huge difference between UserMovesToNewAddress and CorrectTheMisspellingInUserAddress from a business point, but in a CRUD application you would probably end with the same Update data…

State change

What’s the new state ?

It’s the easy part once the rules are applied.

Domain Model

What aren’t they for ?

Validation : commands are validated before the model is called. Do not bloat your domain model with this.

Queries : entity relationships for reading are unnecessary. You can do eager loading on your Aggregate Roots safely, they’ll never be used for queries that need only partial information.

What are they for ?

Answer to the question : should we do what this valid command is asking ?

If the answer is yes, change the state !

Maintain the query model up to date

There are two main ways to maintain query model up to date.

You can use something like views or ETL to transform data from the domain data to the shape required by the query side.

If you prefer, or when your domain persistence is not compatible with this option (OODB, Event Storage..), you can publish events from you command side, and provides handler’s on the query side that will maintain the views state in the relational database (or a cube… or whatever). A denormalization will happen here.

What do we gain from this ?

Asynchronous model

The model is deeply asynchronous, it’s not a matter of tweaking things with threads. It’s asynchronous from the ground up, at domain level.

Your user sends a command, and your design is good if you can answer : “thank you, we will come back to you soon…”. Take the time needed to fulfill your user wish, he will be happy !

Scalability

By relaxing the rules, the system becomes more scalable.

Domain persistence choice

The domain is accessed only to process rules and state changes. There is no need to join tables, filter rows. So you can easily use an non relational database.

Possible options are a OODB or an Event Storage (for event sourcing).

You can still use a RDBMS with or without an ORM if you’re more familiar with these technologies.

But the persistence mechanism becomes an implementation detail from from Command side that will not interfere with your queries.

Conclusion

Ooops, I said it was not a complete report… but it actually is. Every point was interesting ?

After the talk we had a discussion about forecasting and other interesting subjects. Perhaps more on this later.

There was a video camera in the room, so I think the guys from Zenika will try to put it on the internet when they have time. I’ll add the link when available.

If you was here and have a picture of the event, I would be glad to put it in the blog :D

Monday, November 16, 2009

Udi Dahan talks on CQRS at Zenika

I’ll be at Udi Dahan’s talk this evening (19h) at Zenika in Paris.

Tell me if you’re planning to be there too !

I’ll surely post about it in the following days.

Thursday, November 5, 2009

Event Sourcing and CQRS, Serialization

Be sure to read the three preceding parts of the series:

Event Sourcing and CQRS, Now !  
Event Sourcing and CQRS, Let’s use it
Event Sourcing and CQRS; Dispatch-options

Today, we’ll study to a required part of the event storage : Serialization/Deserialization

The easy way

The .Net framework as several serialization technologies that can be used here, Binary serialization, XML serialization or even DataContract serialization introduced with WCF.

The penalty

The particularity of Event Sourcing is that we will never delete or update stored events. They’ll be logged, insert only, once and forever.

So the log grows. grows. grows.

Event storage size will influence greatly the growth rate of the log.

Xml Serialization

If your system processes frequently lots of events, forget about XML. Far to verbose, you’ll pay the Angle Bracket Tax.

Binary Serialization

But the binary serialization still cost much, even if compact, it will contain type names and field names…

Raw Serialization

You could write serialization/deserialization code into your type.

The type can chose a format, so no extra type/field name is needed. This kind of serialization is very compact – it contains only required bits – but you cannot read data back without the deserialization code.

It can be ok if you plan to have a definite small number of well documented events. Unmanageable if your event type count will grow with time and versions.

Avoid it

Let’s consider how data are stored in a database.

A database contains tables. Tables have a schema. When storing a row, no need to repeat column names on each cell. The data layout is defined by the table schema and will be the same on each row.

We cannot do the same since events have different schemas, but we work with a limited set of events that will occur many times.

Split schema and data

We can thus store schemas aside, and specify the row data schema on each row. The event data will the be stored as raw bits corresponding to specified schema.

This way you can design tools to explore your log file with complete event representation without needing the original event class, and you got a very compact serialization. Have your cake and eat it too !

Stay tuned, the code comes tomorrow…

Tuesday, November 3, 2009

Event Sourcing and CQRS, Dispatch options.

As seen in previous post, I used dynamic to replay events.

The main reason to use it was to avoid long code using reflection in the infrastructure that would have made it hard to read.

I’ll show several ways to do this dispatch with pros and cons in each cases.

Dynamic

The proposed solution was using dynamic.

+ Pros : there is no reflection code involved, code is very simple.
- Cons : all state change (Apply) methods must have the same name.

I made no performance test, so I cannot judge if perf is better or not. It seems that the DLR has a rather good cache when the same type is encountered several time, but only measures can tell.

Handlers registration

This is the current implementation in Mark Nijhof’s sample.

The base class maintains a dictionary of Type/Action<T> association to dispatch events based on type.

Since an Action<T> delegate must have a target instance, the delegate must be constructed from within the instance, in the .ctor.

    public class AggregateRoot<TId>

    {

        readonly Dictionary<Type, Action<object>> handlers =

              new Dictionary<Type, Action<object>>();

 

        protected void Register<T>(Action<T> handler)

        {

            handlers.Add(typeof(T),e => handler((T)e));

        }

 

        protected void Replay(IEnumerable<object> events)

        {

            foreach (var @event in events)

                handlers[@event.GetType()](@event);

        }

        // rest of the aggregate root class

    }

Here is code that use it :

 

    public class Book : AggregateRoot<BookId>

    {

        private readonly BookId id;

        public Book(BookId id,IEnumerable<object> events) : this(id)

        {

            Replay(events);

        }

 

        public Book(BookId id,string title, string isbn) : this(id)

        {

            var @event = new BookRegistered(id, title, isbn);

            OnBookRegistered(@event);

            Append(@event);

        }

 

        private Book(BookId id)

        {

            this.id = id;

            Register<BookRegistered>(OnBookRegistered);

            Register<BookLent>(OnBookLent);

            Register<BookReturned>(OnBookReturned);

        }

 

        private void OnBookRegistered(BookRegistered @event) { /**/ }

        private void OnBookLent(BookLent @event) { /**/ }

        private void OnBookReturned(BookReturned @event) { /**/ }

    }

+Pros : Still no reflection,
            Meaningful method names
-Cons : Additional plumbing code, 
            Private constructor to avoid repetition
            Registration occurs at each instantiation

Convention Based Method Naming

This is the way advocated by Greg Young.

If your event is called BookRegistered, assume the method will be called OnBookRegistered, and find it by reflection. You can implement a cache at class level to avoid reflection on each dispatch.

 

    public abstract class AggregateRoot<TId> : IAggregateRoot<TId>

    {

        private static readonly Dictionary<Type, IEventDispatcher> Handlers =

               new Dictionary<Type, IEventDispatcher>();

        private static readonly object HandlersLock = new object();

 

 

        protected void Replay(IEnumerable<object> events)

        {

            var dispatcher = GetDispatcher();

            dispatcher.Dispatch(this, @events);

        }

 

        private IEventDispatcher GetDispatcher()

        {

            IEventDispatcher handlers;

            var type = GetType();

            lock (HandlersLock)

            {

                if (!Handlers.TryGetValue(type, out handlers))

                {

                    handlers = EventDispatcher.Create(type);

                    Handlers.Add(type, handlers);

                }

            }

            return handlers;

        }

        ... rest of the code here

    }

The dispatcher code :

    internal interface IEventDispatcher

    {

        void Dispatch(object target, IEnumerable<object>events);

    }

    internal class EventDispatcher<T> : IEventDispatcher

    {

        private readonly Dictionary<Type, IEventHandler<T>> handlers;

 

        public EventDispatcher()

        {

            var h = from m in typeof(T)

              .GetMethods(BindingFlags.Instance | BindingFlags.NonPublic)

                    let parameters = m.GetParameters()

                    where parameters.Length ==1

                    && m.Name == "On" + parameters[0].ParameterType.Name

                    select EventHandler.Create<T>(m);

 

            handlers = h.ToDictionary(i => i.EventType);

        }

 

        public void Dispatch(object target, IEnumerable<object> events)

        {

            var typedTarget = (T)target;

            foreach (var @event in events)

            {

                var handler = handlers[@event.GetType()];

                handler.Call(typedTarget, @event);

            }

        }

    }

 

    internal static class EventDispatcher

    {

        public static IEventDispatcher Create(Type type)

        {

 

            return (IEventDispatcher)Activator.CreateInstance(

               typeof(EventDispatcher<>).MakeGenericType(type));

        }

    }

and the event handler :

    internal interface IEventHandler<T>

    {

        void Call(T target, object argument);

        Type EventType { get; }

    }

    internal class EventHandler<TEntity, TEvent> : IEventHandler<TEntity>

    {

        private readonly Action<TEntity, TEvent> handler;

 

        public EventHandler(MethodInfo methodInfo)

        {

            handler = (Action<TEntity, TEvent>)Delegate.CreateDelegate(

                  typeof(Action<TEntity, TEvent>), methodInfo, true);

        }

 

 

        public void Call(TEntity target, object argument)

        {

            handler(target, (TEvent)argument);

        }

 

        public Type EventType

        {

            get { return typeof(TEvent); }

        }

    }

 

    internal static class EventHandler

    {

        public static IEventHandler<T> Create<T>(MethodInfo methodInfo)

        {

            var eventType = methodInfo.GetParameters()[0].ParameterType;

 

            return (IEventHandler<T>)Activator.CreateInstance(

                  typeof(EventHandler<,>)

                  .MakeGenericType(typeof(T), eventType),

                  methodInfo

                  );

        }

    }

The trick here is to create a static delegate with two parameters from an instance method info that take one parameter (and one implicit this target).

This way, the delegate is not tied to a specific instance and can be used on any target.

As you can see, this option requires more code ! I did not want to start with that.

+Pros : Convention base names mean no manual mapping, mapping is implicit
            Binding is made a class level instead of instance level

-Cons : Only unit tests can tell when you mess with names
            Not immune to event name change, should have good unit tests !

Apply then Append

I also had a remark that if I forget Append after Apply, I’ll get in trouble.

In Handler Registration option and Convention base method naming, the dispatch can be done by the base class, so I could tell the base class to dispatch then Append then event to UncommittedEvents.

This way you end with something like :

            var @event = new BookLent(/**/);

            Play(@event);

where play dispatches the event to the right method and appends.

This way you cannot forget.

My problem with this, especially in the Convention base method naming scenario is that nobody references the event application methods anymore. Resharper will report them as unused methods, and you won’t know unless you run unit tests.

Moreover, you pay the cost of a dynamic dispatch when you know your event type.

Perhaps something like this could be better :

            var @event = new BookLent(/**/);

            Play(@event).With(OnBookLent);

the implementation is not very complicated :

    public class AggregateRoot<TId>

    {

        private readonly UncommittedEvents uncommittedEents;

 

        protected EventPlayer<TEvent> Play<TEvent>(TEvent @event)

        {

            return new EventPlayer<TEvent>(@event, uncommitedEvents);

        }

        ... rest of the code here

    }

 

    public struct EventPlayer<TEvent>

    {

        private readonly TEvent @event;

        private readonly UncommittedEvents uncommittedEvents;

        internal EventPlayer(TEvent @event, UncommittedEvents uncommittedEvents)

        {

            this.@event = @event;

            this.uncommittedEvents = uncommittedEvents;

        }

 

        public void With(Action<TEvent> handler)

        {

            handler(@event);

            uncommittedEvents.Append(@event);

        }

    }

This way, methods are referenced at least once with type check.

My mind is still not set… What do you prefer ?

- page 1 of 4