|
May 29
2009
|
Questioning Disk Solutions - Part 4 of 4Posted by: Steven Calvert in Infrastructure on May 29, 2009 |
|
In our final chapter, we discuss subsystem cache and the effect it can, and cannot, have on disk performance.
Question 4: How much Cache do I need?
Most storage subsystems contain a level of cache memory within them. The term (created by Lyle R. Johnson of IBM back in 1967 if Wikipedia is to be believed) refers to an area of memory within a device where the cost in terms of time of fetching desired data, is less than fetching the data from its ultimate location. In our case, this is usually some form of memory DIMM rather than the disk spindle itself. Note there is "cache" and "non-volatile cache", and in some storage subsystems there is a difference between the two. "Cache" is usually used only to store data that has already been committed to disk, and is used to fulfill subsequent read requests for the same data. "Non-Volatile Cache" however can also store incoming writes, since in the event of a failure the data can remain within this memory area without being committed to disk. How this is achieved varies on the subsystem, some simply have internal batteries to retain data within the memory chip itself, some however de-stage that data to a dedicated disk or USB device before the device is shut down completely. It's this non-volatile state that ensures the expensive nature of non-volatile cache and the small values of cache size compared to host memory. However when it comes to cache, it's usually this non-volatile cache that has the greatest performance benefit. This is because when writing data, it can be written to the cache and then an acknowledgement can be sent back to the host immediately, and the data is then committed to disk at a time more suitable for the subsystem controller. This helps to mask the significant write delays on RAID5 and RAID6 arrays caused by parity calculations. Likewise any data committed via this cache can hang around in case it's needed again. But because so much data can pass through, data cannot hang around in cache too long before it is replaced.
So the bigger the size of the cache, the better it is? Not always. While too little cache can be a detriment, having too much will not impact performance but it will impact the cost of the storage subsystem. As to when a chace is too big, some applications are considered "cache-unfriendly", in that they never access the same data twice, or they are very write intensive. Cache is useful for burst writes, coping with peak loads; however when writes are sustained the cache will eventually fill up with uncommitted writes, and then data can only be committed as fast as it can be written to the underlying disk. At that point cache provides very little benefit to the host. To tell what cache size will be of benefit, and how much is an optimal trade-off between cost and capacity, the application and the way it accesses disk needs to be well understood. Again, other than a finger-in-the-air approach the only way to determine an accurate cache size is though either application testing or using existing application workload data to simulate response times based around different cache sizes. Failing that, simply go for what you can afford.
Summary
So when ordering a storage subsystem, ask yourself the following questions...
How many spindles do I need?
What are my RAID requirements?
What size disk do I need?
How much cache do I need?
And finally...
How much capacity do I need?

written by Robin Webster, 17:03 September 17, 2009








