|
Jun 02
2010
|
IBM Storage Manager copyback not started automatically - using "Replace Drives"Posted by: Emily Malbon in Infrastructure on Jun 2, 2010 |
|
Recently I have seen some strange behaviour with the later levels of firmware (around version 7.10 upwards) whereby a failed drive that has reconstructed to the hot spare does not automatically copyback when a new drive is inserted. The disk appears in storage manager as Optimal and Unassigned and logs indicate it has been picked up without error.
In later versions of IBM Storage Manager (versions 10.10 and up). There is a new function included "Replace Drives" that allows the user to select whether to replace a failed drive with any unassigned drive in the subsystem or the hot spare drive. It should be possible use this function to initiate the copyback. In some cases, I have seen that when the option "Replace Drives" is selected, the message is displayed "no drives available for replacement", despite the fact that unassigned drives are clearly available. In my investigations I have checked that the new drive was inserted correctly (leaving 30 seconds between the pull of the old and insertion of the new). I have seen that the new disk is of exactly the same type and brand and running the same level of firmware and that there do not appear to be any obvious problems within the event log,
The resolution so far has been simply to wait (even up to 2 days), and eventually, the "Replace Drives" option will recognise the available drive(s). So far, the only indication of what has occurred between these times has been that the regular media scan (scrub) has kicked off. Note that if a Storage Subsystem password is set, this is required to authorise the copyback.
The "Replace Drives" function does not appear to be clearly documented in the new Storage Manager client Redbooks so I will blog regards what I know of this function later.
Speak to one of our technical experts now to see how Centiq can help your business by requesting a "call back" using the button to the right, or calling us on 0115 951 9666











If you wish to use the hot spare or a different disk to rebuild the array permanently, the option should be selected before the failed drive is removed or replaced. Once the failed drive is physically replaced, unless "Replace Drives" has been selected previously then copyback *should* automatically commence and rebuild on the replacement drive.
The documentation warns not to use other drives of the same DS unit to replace the failed drive. IE, pull a drive from slot 3 to replace a failed drive in slot 5. While no reason is given I suspect the DACstore information could prove confusing for the controllers, this may in turn stop automatic copyback from occurring.
Another factor may be that the controllers are set to enable drive migration rather than disabled, (this is carried out with scripts) and the replacement drives are being installed with existing DACstore information. Most field spares these days seem to have come from other units and not cleared down fully, so again there's the potential for confusing DACstore information that may inhibit copyback.
Also, copyback won't commence until the hot-spare has finished rebuilding the array. On a slow RAID5 array this could take a significantly long time to complete. However it normally gives visual indication on the logical volumes that this is still occurring, so I only include it here for completeness.
Interesting, from what you describe I think there may be a more fundamental issue with the copyback feature than the "replace drive" option to blame. I can't find any known faults other than one from way back in 6.12 that was fixed.
At least the above provides some credence to my DACstore theory in that it intimates that should drive initialisation problems occur, or that the controllers disagree, copyback will not commence. Let me know if you have this problem again, I've a couple of "undocumented" commands at my disposal that might provide a bit more insight.