Normally when someone specifies a storage requirement, they tend to specify only the desired capacity, and perhaps a preferred method of host attachment. Sometimes that requirement can be as basic as "I'd like 2TB of storage, please". But how effective is this at fulfilling actual real world requirements? Taking the above quote as an example, we can look at two extreme solutions:
- Two 1TB SATA Disks in an external SAS enclosure.
- Fourteen 146GB 15K Spindles in an external SAN attached storage disk subsystem with 8GB of cache.
One of these solutions costs around £200 from a high street store, the other can cost over a hundred times more than that from a dedicated storage reseller. So which is the correct solution? Chances are that neither solution is correct, but when performance issues with the host system are encountered it is usually the big black unknown void of disk storage where the finger points first. So how do you go about sizing a storage solution that meets people's expectations in both capacity and performance? What questions need to be asked? In this article I'll be covering some of the high-level basics of storage specification and highlight the questions that should be asked when looking for storage. To some of you I'm sure that this will be fairly common knowledge, but to others who are new to storage or fancy a more concise introduction to storage, this may prove useful.
Question 1: How many disks do I require?
When it comes to creating a storage solution to meet or exceed specific performance levels it's the number of disks (or “spindles”), and not necessarily their capacity, that is the most important factor to consider. It is worth remembering that each disk is a physical unit, a spinning platter with a read/write head that moves above its surface. Therefore in the computing world where even the speed of light can be considered slow and problematic at times, the effort required to actually move a physical piece of metal around at sub-sonic speeds is practically antiquated. Therefore in an ideal world the maximum amount of performance possible from a single disk, by discounting the movement of the disk head and using an unrealistically optimized sequential read or write, the maximum data transfer rate is around 80-100MB/s for a 15,000rpm SCSI disk. Each bit of information passed to or from that disk spindle is an operation, and the number of these IOPs (Input/Output operations per second) is also limited. Putting the maths around track lengths and rotational speeds aside, in an ideal world you can manage to squeeze around 200 IOPs from a single spindle. However if we start to introduce aggressive seek times into that so the head has to actually move back and forth across the platter, which is normal behaviour during more random based disk operations in the real world, and these figures can half immediately.
Disk devices are surprisingly slow; it's only with a more layered structure, command queuing, and heavy cache use that we can make a dent in these performance figures. While attempts can be made from an OS level to try and further tune the layout of data across these disks, unless the configuration was very bad to begin with, then only a gain of around 5-10% in performance can be expected. The easiest and most effective method to improve performance in a disk subsystem is to increase the number of spindles the required data is spread across.
Take our previously mentioned example. The two SATA disks with their slower spin speeds (5,400 or 7,200rpm) and more basic data transfer protocols will manage approximately a hundred or so IOPS and maybe 40MB/s. However the fourteen SCSI based disks with their faster spindles (15,000rpm) have the capacity for a total of 2,800 IOPS and a throughput of 1400MB/s. This example may be a little extreme; however this shows that despite the same requirement there is a dramatic performance difference. So when choosing the capacity of the disks used, what you're essentially defining is a compromise between cost and the number of spindles. While a couple of 450GB disks may be cheaper in terms of “megabyte per dollar” than a 146GB disk, more 146GB disks will give you more “spindles per dollar” because 146GB disks are cheaper. Also by buying the smaller disks, you force administrators to spread data across more spindles, rather than placing all the data on as few spindles as possible, thus ensuring some form of guaranteed performance.
Question 2: How much capacity do I actually require?
There are less physics but more maths involved in answering this question. We'll assume for now that we know exactly how much file space is required for a system, ignoring concepts such as HSM, thin provisioning, snapshots, over-estimation, and so forth.
Firstly you need to ask yourself, binary and decimal? Most people are used to dealing with binary gigabytes; these are the ones where 1024KB equals 1MB and 1024MB equals 1GB. However most storage manufacturers use decimal gigabytes, where 1000KB equal 1MB and 1000MB equals 1GB and so forth, because quite frankly that way it sounds like their disks are capable of storing more data than they actually can. 24MB may not sound like much between friends, however when you start to play in the terabyte realms this can have a major effect on the actual space available, around 9.9% of the actual capacity available! The IEEE have introduced the use of "GiB" (Gibibytes) and "TiB" (Tebibytes) to indicate the binary rather the decimal quantities, however this is still not common practice and therefore cannot be relied upon.
Another common mistake is not considering the difference between raw space versus usable space. There are two areas where this comes into play. The one most people are familiar with is the concept of RAID arrays. Be it striping with parity (RAID5 or 6) or mirroring the data (RAID1 or 10), you're going to end up with a lot less usable space than raw disk capacity by the end of it. Remember, there are also significant read and write penalties for RAID-5, so anything with performance in mind should be considered for RAID-1 protection. So there's half your disks gone there, and your 2TB solution is actually only 1TB in usable capacity. The second, more hidden loss of usable capacity however is at disk level. With nearly all storage subsystems the disks will have some kind of configuration storage area, a portion of the disk that is used to store information on the format of the disk, the controller configuration, array and logical drive information and so forth. For example the IBM DS3000-5000 hardware this area is called the "DACSTOR" area. NetApp and N Series devices also have similar areas with a whopping 10%-20% overhead on the actual capacities available to the end hosts, due to the volume management using the WAFL system they utilise. While this volume information is very useful for the migration of disks from one storage subsystem to another, it does sap away more of your usable space. So now it looks like your 146GiB disk is only capable of around 138GiB or less once it's formatted and read to use.
Then you've got the matter of hot spares. These are empty disks just left to spin in the subsystem enclosures ready to take over from any drive that is either failing, or about to fail. While you don't need many to protect an entire storage subsystem, it can have an impact if these disks are not included in smaller solutions where a specific amount of disks have been ordered. So remember to take these and the overheads mentioned above into account when specifying the number of spindles. For the most part a couple of disks will suffice, however make sure that the disks are of suitable size to protect all the arrays. For the DS4000 range for example, larger capacity hot spares can temporarily replace a failed smaller capacity volume, however the reverse is not true.
Question 3: What attachment do I need?
Once the storage subsystem is configured it needs to attach it to the host. But do you really need the latest and greatest fibre speeds at the host level? To add to the general confusion, since it's communications we're now talking gigabits and not gigabytes and these are always in decimal, so 8Gbps link is actually around 763MiB/s throughput. Due to protocol overhead, transmission times and so forth, the actual data throughput will be around 10% less than this. As we've seen in one of the previous examples, we need at least fourteen spindles in a perfect and highly unlikely continuously sequential read or write to get up to 1.4GiB/s. So with two connections (for redundancy) we can just cope with this unlikely scenario with two 8Gb fibre connections. However, given real world throughputs are much less, in our example 4Gb fibre would probably be more than adequate for most hosts with the given number of spindles.
iSCSI is still trying to make headway, and provides a cheaper solution to fibre based storage networks. Using 1Gb Ethernet we can expect a throughput of around 95MiBps with a direct connection, probably more around 80MiBps in a real environment. While it's not stunning it's usually enough to do the job for small, non-critical systems. The thing to always bear in mind with iSCSI though is that dedicated networks for iSCSI traffic should be used. Start sending other data across the same network and it will generate a lot of unpredictable performance bottlenecks. There are other network based protocols as well such as CIFS and NFS which can also provide network connectivity to storage areas. However these are file rather than block based protocols, and they are intended for file sharing rather than storage sharing applications, so are less effective when you simply want raw performance.
New to the scene is FCoE. (Fibre Channel over Ethernet) This is essentially the fibre channel protocol run over Ethernet based hardware and cabling, similar in fashion to iSCSI but with more dedicated hardware infrastructure. However FCoE better integrates with existing FC infrastructure as it carries all the same management and control protocols as SAN infrastructure. This is mainly intended for data centres where there is a desire to consolidate the cable types used, and ease the complexity of managing many hardware systems. At the moment it is unlikely to replace optical fibre outright, since it is currently easier to use higher speeds over greater distances using optical rather than copper cabling.
Finally there's SAS, or Serial Attached SCSI. It uses the same wire protocol as SATA, hence the fact SATA devices can co-exist on SAS disk backplanes. (The reverse however is not true, and drive connectors are keyed to prevent this.) Current bus speeds are 3Gbit and 6Gbit, so while slower than current fibre technology it still provides a more than ample connection speed to a single host.
With these speeds in mind, what defines the connectivity used is the scale of the solution. Want only two or three hosts sharing a storage device, then go down the route of SAS. Looking to start a storage network of five or six systems but no really plans to scale out further, iSCSI provides a cheap entry point. However if you want a multi-host, multi-subsystem storage network, then fibre channel networks are still the way to go
Question 4: How much Cache do I need?
Most storage subsystems contain a level of cache memory within them. The term “cache” (created by Lyle R. Johnson of IBM back in 1967 if Wikipedia is to be believed) refers to an area of memory within a device where the cost in terms of time of fetching desired data, is less than fetching the data from its ultimate location. In our case, this is usually some form of memory DIMM rather than the disk spindle itself. Note there is "cache" and "non-volatile cache", and in some storage subsystems there is a difference between the two. "Cache" is usually used only to store data that has already been committed to disk, and is used to fulfil subsequent read requests for the same data. "Non-Volatile Cache" however can also store incoming writes, since in the event of a failure the data can remain within this memory area without being committed to disk. How this is achieved varies on the subsystem, some simply have internal batteries to retain data within the memory chip itself, some however de-stage that data to a dedicated disk or USB device before the device is shut down completely. It's this non-volatile state that ensures the expensive nature of non-volatile cache and the small values of cache size compared to host memory. However when it comes to cache, it's usually this non-volatile cache that has the greatest performance benefit. This is because when writing data, it can be written to the cache and then an acknowledgement can be sent back to the host immediately, and the data is then committed to disk at a time more suitable for the subsystem controller. This helps to mask the significant write delays on RAID5 and RAID6 arrays caused by parity calculations. Likewise any data committed via this cache can hang around in case it's needed again. But because so much data can pass through in a short space of time, data cannot hang around in cache too long before it is replaced.
So the bigger the size of the cache, the better it is? Not always. While too little cache can be a detriment, having too much will not impact performance but it will impact the cost of the storage subsystem. As to when a cache is too big this is normally down to applications that are considered "cache-unfriendly", in that they never access the same data twice, or they are very write intensive. Cache is useful for burst writes, coping with peak loads; however when writes are sustained the cache will eventually fill up with uncommitted writes, and then data can only be committed as fast as it can be written to the underlying disk. At that point cache provides very little benefit to the host. To tell what cache size will be of benefit, and how much is an optimal trade-off between cost and capacity, the application and the way it accesses the disk needs to be well understood. Again, other than a finger-in-the-air approach the only way to determine an accurate cache size is though either application testing or using existing application workload data to simulate response times based around different cache sizes. Failing that, simply go for what you can afford!
Summary
So when ordering a storage subsystem, ask yourself the following questions...
How many spindles do I need?
What are my RAID requirements?
What capacity disk do I need to select?
How much cache do I require?
And finally...
How much capacity do I need?










