A lot of posts talk about how caching can improve performance and how one can use tools such as memcached for the same. It is a great tool, and I am certain it does wonders, but I do wonder if it and its usage is getting too much press at the cost of other caching options.
This post talks about the various dimensions that should be carefully examined before deciding upon the caching strategy. I would like to argue that the most difficult part of caching is not necessarily learning or selecting the caching tool but the detailed analysis of the application, its architecture and its data flow, that actually precedes the caching tool selection. I also submit that once this analysis is conducted and the various dimensions as described below well understood, the actual cache design then becomes a relatively simpler and lesser risky activity. Before getting into the dimensions of cache design, there is some important groundwork to be done.
Note : I am referring to data caching. I am not referring to optimisation of opcodes / byte codes / instruction sets etc. which is a different class of caching.
Know your application well :
Let me emphasise this. If you want to decide on the most appropriate caching strategy, you need to have a sufficiently good understanding of your application. This is relatively easy if you are enhancing the caching capabilities in an existing application or are rebuilding an existing application. But if you are building a completely new application, some of the data listed below may not be easily available and you may need to project a little bit after understanding the requirements and a high level data / logic flow. One of the things you really need to have either a clear understanding or at least some reasonable projection of, are the various data elements in the application, how frequently will they be accessed and modified and given the processing flow, which elements make sense to be cached. You will also need to have a good understanding of how critical is it to have the most current data (eg will a 1 min. stale data be acceptable) and the transactional semantics around the data processing (are strict ACID semantics a must ?).
Note that for the rest of this post when I mention data, I shall be referring to cachable data.
Know your performance targets :
Unless you are clear about your performance targets, it will be quite difficult to work out the appropriate strategy. I would suggest you should have clear targets defined (for a specified hardware) before proceeding down this path. Some times the target specifies very ambitious hardware which you may not have access to in early stages of the development. Try to convert these targets into appropriate targets for the hardware you shall be working with (with some reasonable margin of safety).
Know your architectural guidelines / preferences :
Will you be using a single server or multiple ? When handling future growth would you prefer to scale up or scale out ? What languages will you be using ? Would you prefer to use a single database instance or would you want to create vertically partitioned shards ? As we shall see, these are extremely important considerations before getting into cache design. Sometimes the decision making is not sequential, so you may end up revisiting these topics once you get into your cache design.
Dimensions
We shall now explore some of the various dimensions that influence cache design.
More…

