System Failures
From: Charley Shanks
To: All
Just thought I'd take a little informal survey. Our shop is running a B60 model using 9332 DASD units for storage. In the last two weeks we have had six (count 'em, six!) separate and distinct DASD failures. Some have been related to what IBM calls the HDA (the disk itself). Some have been related to servo cards and some have been related to the servo track on the disks.
We have had to do two complete system restores from backups. And it looks as though we're about to do our third. The other times we were able to replace cards and pump the drives. We already have one of the Rochester big boys on- site, and IBM is flying in another one. This has been a disaster-in no minor sense! My past experience has always been that Big Blue is second to none in quality, and I have to admit that IBM's now doing everything humanly possible to rectify the situation.
My survey question is very simple. Has anyone ever heard of IBM hardware ever taking this kind of a hit-or worse?
From: John Lewis
To: Charley Shanks
When we were upgrading our B35 to an E35, which was done by an IBM CE, we lost all of our 9332s. Something went wrong when the CE tried to tell the system about the new internal DASD, which wiped out our existing DASD. We had about six high-level CEs out here trying to recover the drives (four 9332-600s), but without any luck. We ended up doing a full disaster reload and came out of the whole process OK.
From: Michael Brock
To: Charley Shanks
I have never heard of anything even close to that figure! Must be horrible!
From: Charlie McLean
To: Charley Shanks
When we started with a B35 we had four 9335 B01s and in six weeks had three HDA failures. We now run on EMC 935EXPs and 936s and have had one controller card replaced in two and a half years-no other problems. Downtime in nine months with 9335s was six days. Downtime on EMCs in two and a half years has been one and a half hours.
From: Charley Shanks
To: Michael Brock
It gets worse. Since my last message, we had another 9332 failure. After the last one, IBM replaced all servo cards and then mirrored all of the 9332s with 9336s. We got everything back up.and then we had a system failure at the bus level. IBM replaced our power supply and a few other miscellaneous things. Then we replaced all 9332s with 9336s. Then we did our fourth complete system restore. This morning it went down again....
From: Kris Ball
To: Charley Shanks
I feel for you. We've been through a total system reload before, when a 3370 attached to our S/38 wouldn't pump. Had to restore everything from backup. Thought we were almost done, so I left for vacation to Dallas, and as soon as I hit town I had to turn around and drive eight hours home.
Our backup saved off all the source, but no objects, so we spent the next two days compiling programs and recreating DFUs, queries, etc. Couldn't really complain, because I was the new kid on the block who thought she knew what she was doing when she rewrote the SAVEALL!
Moral of the story: Always, always check and double-check your recovery procedures!
From: Charley Shanks
To: Kris Ball
Luckily, our backups have been flawless. The only good thing to come out of our repeated failures is a system upgrade from a B60 to an E50, all new mirrored DASD, and an OS/400 upgrade to V2R2, all compliments of IBM. This may sound great, but considering that our company lost an estimated $1 million in sales because of system downtime for most of the month of November...we're still sucking wind!
LATEST COMMENTS
MC Press Online