Caching done right
I was trolling on twitter Saturday, when I saw tweet by Nate Kohari and some answers :
I immediately thought :
If you have one problem and use cache to solve it, you now have two problems.
Where’s the problem ?
The time to retrieve the data is not negligible due to frequency of request and/or time taken by calculation + data access. So we put data to cache so that we don’t have to endure this time on each call.
But then comes the problem of cache expiration:
- We can use a duration.. but what is the acceptable delay ?
- We can make a check to original data to check if it has changed. It’s more accurate, but incurs a new data access.
Moreover, checking if it changed is often not enough, we also need to find what changed.
And deriving what happened from state is basically reverse engineering. I’m an engineer. I prefer forward engineering.
Let’s do it forward
It’s actually easy, it’s the whole point of CQRS.
Let’s build a system that raises Domain Events, and we can denormalize events to a Persistent View Model.
We just have to listen to events and change the representation we want to send to the users:
- The events contain everything we need to do fine grained updates easily.
- We can can compute denormalizations asynchronously if it’s time consuming
- We can store it in a relational database, a document database, or in memory
- We can choose any form of denormalization since it’s a denormalization (object graph, but also flat string, json, html …)
- It will be up to date quickly because it will be updated when the original data changed
- The first client that makes a request after a change will not endure a cache miss that can be long to process since computing is done on change, and not on request.
A good way to Keep It Simple, Stupid!