|
Dec 07
2009
|
Solid State of the Art? (Part 2)Posted by: Steven Calvert in Infrastructure on Dec 7, 2009 |
|
In our second part of "Solid State of the Art?" we're looking at how to use Solid State Drives, what applications we can best use them for, and what the future holds for SSD technology.
Tips on using SSDs
An obvious tip is to never "defrag" an SSD. Unlike disk spindles, all memory cells are accessible at the same speed at any time, therefore relocating data will not speed up the drive. In fact, it will only age the drive faster because you're writing the same file data twice.
Due to the speed of SSDs, adapter cache is less effective. Therefore if you're using a SAS RAID adapter and cost is an issue consider forgoing larger cache variants with battery backup write cache. Likewise if you're using SSD drives in something like a DS5000, you can consider disabling the cache for both fast writes and read pre-fetch with any logical drives based on SSD arrays, allowing more of the cache to be available with standard HDD disks where it's needed.
The recommendation is that SSDs should still use RAID. While SSD failure is more predictable in terms of cell wear and there's no physical moving parts, it's still a single unit with a single copy of data. However because reads are quicker than writes and seek times are significantly quicker, RAID5 is a more cost effective solution compared to RAID1 in terms of cost versus performance. DS5000 also offers dynamic array expansion and RAID level changes, so you can always change from one level to another at a later point.
Be very wary of the subsystem connectivity with SSD disks. If you have an array of four SSDs, the combined read throughput is nearly 1GBytes/s before considering protocol overhead. The latest SAS connectivity has a limit of 6Gbit/s (600MB/s) per device, or 24Gbit/s (2.4GB/s) total across four buses, while optical fibre runs at 8Gbit/s. (800MB/s) Therefore it's quite easy to saturate attachment interfaces, especially when using a SAN infrastructure.
When using SSD drives with POWER systems, the limitations are not well documented by IBM. Worst still is that just because the disks work, doesn't mean that IBM will support that particular configuration. Therefore bear the following in mind:
- You cannot mix hard disks and SSDs in the same bays or subsystem enclosure.
- You can only place up to 8 SSDs in the EXP12S SAS Expansion Drawer.
- You must format to 69GB disks using 528 byte sectors (not 512 byte sectors to get 74 GB)
- You can't mix SSD and hard disks in a mirror (i.e. one technology or the other)
Further information on the limitations using SSDs with POWER systems can be found here;
http://publib.boulder.ibm.com/infocenter/powersys/v3r1m5/index.jsp?topic=/iphal/iphalssdconfig.htm
Where to use SSDs
The greatest benefit for SSD drives are for databases, especially heavily used transactional databases where there is an excessive random I/O workload but the overall amount of data being dealt with is fairly small. SSDs also help to negate disk configuration issues when it comes to databases. With HDD devices it's critical on how the database structure is laid out and over what number of spindles, but for many SSD based systems however it may not matter where you lay out your data or whether you use column or row-oriented storage for your databases, because all the data space has the same performance.
SSD should be suitable for frequently read tables and indexes due to its superior random reads performance. External sorting is another operation that can benefit from the low latency of SSD, because the read pattern of external sorting is quite random during the merge phase in particular. Surprisingly the use of low write latency SSD as a dedicated storage device for transaction logs can also reduce the commit time delay considerably.
If your database has a lot of historical data within it, your budget may not stretch to enough SSDs to host your production databases. If this is the case then consider stripping that historical data out of the database into a new database location, creating a much leaner production database. Tools such as IBM Optim can help with this process.
Unfortunately in more general file based environments it's much harder to identify what data is currently active and what will benefit from SSD drives. The required capacity required to replace everything with SSDs usually outstrips any available budget. Current approaches are therefore to try and use SSD drives as some kind of intermediate cache between memory cache and HDD spindles, or to isolate active data to just the SSD disks. However the means by which to do this are numerous to say the least.
One concept is to have a more layered storage subsystem, for example IBM's SVC now comes with the option to install up to 4 SSDs within the new SVC "CF8" node. These currently appear like any other "mdisk" within the SVC environment and you use the RAID1 mirror feature to mirror these SDDs between nodes for resilience. Once in place "vdisks" (Volumes presented to hosts as data volumes) can be dynamically moved between HDD and SSD based storage on the fly. This is somewhat limited at the moment in that it is both a manual process to identify which volumes need to use SSD, and only entire volumes can be moved rather than just active segments of data. Tools like Tivoli Storage Productivity Centre could be used to identify hot data and can be scripted to automate some of this migration process if required. Plans are in place to allow the SVC nodes to identify data hot-spots and use the SSDs as a temporary staging area for just these hot sections of data, so it will be interesting to see how this technology develops. But this is by no means unique though, Avere have the concept of "dynamic tiering" with their FXT NAS boxes. Pillar also make an interesting point about "QoS" (Quality of Service), that the key element to using SSDs effectively is effectively gathering information and identifying the current and future workloads and how to effectively use a limited resource such as SSDs. Subsystem memory cache is usually based on recent past data, not current or future workloads, so it can be difficult to achieve effective caching within more random I/O based environments. DB2 offers a "cache hint" feature with the DS8000 to give stored data some value, but even this seems a very primative method to identify high use, high value data to a storage subsystem. Therefore there needs to be some architectural re-thinks if SSD usage is to be considered for use at a purely subsystem level.
Another method is to control the use of SSD volumes at the OS level, where application integration is a bit easier. The most obvious location is within the logical volume manager (LVM) level, and various tools already exist. Solaris have been dabbling with this for a while, using SSD as L2ARC for a Zettabyte File System (ZFS) has shown significant enhancements on the query executions and significantly reduced the service time, as the file system automatically manages the SSD drive as a cache mechanism. Unfortunately ZFS is an immature and problematic file system at best so not really suitable for enterprise use as yet, but it proves such concepts can be implemented at a Logical Volume Manager level.
HSM products can also be used at the file level, and can be integrated as part of a backup solution when using Tivoli Storage Manager. It's a somewhat clunky solution however, and anything not on the primary tier will suffer in terms of access times as data is restored back to the primary tier. A more effective approach seems to be Veritas Storage Foundation, which has the concept of Dynamic Storage Tiering within VxFS. Similar to HSM this takes the concept a step further, where several volumes are covered by a single address space. Files can then be moved (rather than "removed" in the case of HSM) from one volume to another without changing the address space, based on policies around name, modification, even I/O rate (referred to as "I/O Temperature"). Of course such environments require adequate knowledge of the stored data to setup and configure correctly, so there'll be certain element of trial and error with this approach, but once in place will be fairly self-policing.
(Yet!) Another approach is the hybrid disk, a combination of SSD and HDD in one physical package. While this creates a simply plug in replacement for normal HDD devices without any complex configuration work, this option is unlikely to prove popular as it loses most of the advantages of SSD disks in terms of size, power and cooling requirements. Also it again suffers the cache limitations in that you can't easily identify at this level what should be on SSD and what should be on HDD.
Solid Gains?
So, given all these possible options, what are the majority of people currently doing with SSDs? As far as we can see, as little as possible at the moment. Where budget allows and people are desperate for performance, they're trying SSDs on a per-server basis. But for the most part people seem more content to watch and see what standard emerges before risking large investments.
The concept of a spinning disk of rust is somewhat aniquated and the interest in SSDs proves that people are looking for faster alternatives, however there are more tempting technologies on the horizon which may offer a more simple and direct replacement to HDDs. I believe DRAM based SSDs will continue as a niche product, as cost and capacities will continue to limit it's use to system memory. However improvements in MLC tecchnology in terms of reliability, capacity, and cost will increase the appeal of SSDs as we currently know them. Alternative technologies are also waiting in the wings to replace SSD such as phase change materials and resistive memory, which may give more appealing alternative techologies to SSD in terms of cost and performance ahead of what SSD can achieve.
The other prospect is that SSDs and their replacements remain costly compared to HDDs, and that capacity requirements continue to outpace SSD capacity increases. If this continues then improvements will have to be made in data management to make better use of these SSD, bringing a tiered storage architecture into an area that was traditionally just a flat file system. Developing software (application or middleware) to take advantage of SSD however will require a much longer time frame than simply improving on an existing product. It will also require a certain amount of discipline to manage a more graduated approach of architecture with a better level of overall management.
Only time will show us for certain the outcome of SSDs in the market place, so we'll see what happens over the course of the next two years. (That's 1,766,016 million IOPS for SSD, but only 12,614.4 million IOPs for HDD!)













