Centiq Blog

Centiq Blog

Jun 02
2010

IBM Storage Manager copyback not started automatically - using "Replace Drives"

Posted by: Emily Malbon in Infrastructure

Tagged in: Support , Storage , IBM

Emily Malbon

Recently I have seen some strange behaviour with the later levels of firmware (around version 7.10 upwards) whereby a failed drive that has reconstructed to the hot spare does not automatically copyback when a new drive is inserted. The disk appears in storage manager as Optimal and Unassigned and logs indicate it has been picked up without error.

In later versions of IBM Storage Manager (versions 10.10 and up). There is a new function included "Replace Drives" that allows the user to select whether to replace a failed drive with any unassigned drive in the subsystem or the hot spare drive. It should be possible use this function to initiate the copyback. In some cases, I have seen that when the option "Replace Drives" is selected, the message is displayed "no drives available for replacement", despite the fact that unassigned drives are clearly available. In my investigations I have checked that the new drive was inserted correctly (leaving 30 seconds between the pull of the old and insertion of the new). I have seen that the new disk is of exactly the same type and brand and running the same level of firmware and that there do not appear to be any obvious problems within the event log,

The resolution so far has been simply to wait (even up to 2 days), and eventually, the "Replace Drives" option will recognise the available drive(s). So far, the only indication of what has occurred between these times has been that the regular media scan (scrub) has kicked off. Note that if a Storage Subsystem password is set, this is required to authorise the copyback.


The "Replace Drives" function does not appear to be clearly documented in the new Storage Manager client Redbooks so I will blog regards what I know of this function later.

Speak to one of our technical experts now to see how Centiq can help your business by requesting a "call back" using the button to the right, or calling us on 0115 951 9666

Hits: 4372
Trackback(0)
Comments (1)Add Comment
Steven Calvert
...
written by Steven Calvert, 15:14 June 17, 2010
I've yet to use the "Replace Drives" feature in anger, however my impression was that it did not replace automatic copyback.

If you wish to use the hot spare or a different disk to rebuild the array permanently, the option should be selected before the failed drive is removed or replaced. Once the failed drive is physically replaced, unless "Replace Drives" has been selected previously then copyback *should* automatically commence and rebuild on the replacement drive.

The documentation warns not to use other drives of the same DS unit to replace the failed drive. IE, pull a drive from slot 3 to replace a failed drive in slot 5. While no reason is given I suspect the DACstore information could prove confusing for the controllers, this may in turn stop automatic copyback from occurring.

Another factor may be that the controllers are set to enable drive migration rather than disabled, (this is carried out with scripts) and the replacement drives are being installed with existing DACstore information. Most field spares these days seem to have come from other units and not cleared down fully, so again there's the potential for confusing DACstore information that may inhibit copyback.

Also, copyback won't commence until the hot-spare has finished rebuilding the array. On a slow RAID5 array this could take a significantly long time to complete. However it normally gives visual indication on the logical volumes that this is still occurring, so I only include it here for completeness.

Interesting, from what you describe I think there may be a more fundamental issue with the copyback feature than the "replace drive" option to blame. I can't find any known faults other than one from way back in 6.12 that was fixed.

"98679 (95893) Drive state doesn't agree between controllers, causes reconstruction and copyback problems.
Fix: If spin up of a drive fails (cfgPrepareDrive), send a message to the other controller to fail the drive in the event that controller successfully accessed the drive."


At least the above provides some credence to my DACstore theory in that it intimates that should drive initialisation problems occur, or that the controllers disagree, copyback will not commence. Let me know if you have this problem again, I've a couple of "undocumented" commands at my disposal that might provide a bit more insight.

Write comment

security code
Enter the displayed characters


busy

Request more information

Want us to contact you right now?

Leave your details and we'll call you Immediately during work hours.

Name: *
Company:
Phone: *

Bloggers

Emily MalbonEmily Malbon:
Helpdesk and Support

Rebecca PritchardRebecca Pritchard:
Project Management

Robin WebsterRobin Webster:
UNIX

Steven CalvertSteven Calvert:
Storage

Steve StringerSteve Stringer:
Blade and SAP BWA

Glyn HeathGlyn Heath:
IT Industry

Tags

tecniq_with_textv2ibm-premier-business-markhi-res monitiq logohp partner 2011accredit_uk_logo v2

If you enjoyed this blog, you may find these of interest...