Event Sourcing and CQRS, Serialization
By Jérémie Chassaing on Thursday, November 5, 2009, 14:00 - Domain Driven Design - Permalink
Be sure to read the three preceding parts of the series:
Event Sourcing and CQRS, Now !
Event Sourcing and CQRS, Let’s use it
Event Sourcing and CQRS; Dispatch-options
Today, we’ll study to a required part of the event storage : Serialization/Deserialization
The easy way
The .Net framework as several serialization technologies that can be used here, Binary serialization, XML serialization or even DataContract serialization introduced with WCF.
The penalty
The particularity of Event Sourcing is that we will never delete or update stored events. They’ll be logged, insert only, once and forever.
So the log grows. grows. grows.
Event storage size will influence greatly the growth rate of the log.
Xml Serialization
If your system processes frequently lots of events, forget about XML. Far to verbose, you’ll pay the Angle Bracket Tax.
Binary Serialization
But the binary serialization still cost much, even if compact, it will contain type names and field names…
Raw Serialization
You could write serialization/deserialization code into your type.
The type can chose a format, so no extra type/field name is needed. This kind of serialization is very compact – it contains only required bits – but you cannot read data back without the deserialization code.
It can be ok if you plan to have a definite small number of well documented events. Unmanageable if your event type count will grow with time and versions.
Avoid it
Let’s consider how data are stored in a database.
A database contains tables. Tables have a schema. When storing a row, no need to repeat column names on each cell. The data layout is defined by the table schema and will be the same on each row.
We cannot do the same since events have different schemas, but we work with a limited set of events that will occur many times.
Split schema and data
We can thus store schemas aside, and specify the row data schema on each row. The event data will the be stored as raw bits corresponding to specified schema.
This way you can design tools to explore your log file with complete event representation without needing the original event class, and you got a very compact serialization. Have your cake and eat it too !
Stay tuned, the code comes tomorrow…
Comments
What about JSON serialization? This is what I currently use.
@Gilligan> I forgot this ones, Json or M serialization.
It will be equivalent to XML serialization (text) but less verbose since there is no open/close repetitions.
You still use 1byte per digit when serializing integers, so a 2x ratio in hexadecimal, a bit more in decimal, and you still have field names, spaces, semi colons, colons, curly braces, quotes...
True. I see your point. You are talking about the most efficient and least-dependent methods for serializing/deserializing events. The schema-in-database technique would definitely be useful for logs that receive lots of events!
@Gilligan> I know that Greg Young talked about compact serialization with schema Id, but I'm not sure he stored Schema in the database itself.
Hey Jérémie,
Are you still planning to post code about your serialization technique?
Sounds pretty much like you may want to use Googles protocol buffers? See http://code.google.com/apis/protoco...