Last month I wrote about fixing a network nightmare. Additional tools now are in place to better monitor the system. Now, a second story about education, tools and resources regarding servers and server-based functions.
A server-based operating system is not a workstation OS like Windows XP or Windows 7. Windows 2000 server and desktop systems used the same files and structure. A simple registry switch can change them back and forth. Windows 2003/2008 are entirely server-based and provide server functionality such as Active Directory, user management, and file and web services, along with functions such as RAID.
Now, I'm a software guy. My customer needed RAID drive controllers for redundant data. No problem. IBM server hardware comes with a setup CD that makes this a breeze, and Windows 2000 server was put on the machine.
It was decided later to use Windows XP for some terminal services. But XP is not a server-based OS, and certain drivers are designed for server use and some are not.
Some RAID drivers are not. The customer wanted redundant server hardware so that in case of a failure, the RAID drives could be moved from one server to another. It's not as easy as you might think.
The first thing I learned was that setting up RAID in the SCSI BIOS is not the same as using a software-based interface. Drive initialization, array configuration and verification can be done. Allocating a drive as a hot spare? I have to tell the system I have put a drive in? And that it was a spare? Tried that, and it blew up the original array and the data. Oops.
More reading. When a drive is configured as a simple volume disk (like a workstation disk), there is no RAID information written on the disk. Oh, there's information written to the disk? Apparently, and it is serialized to the machine and drive controller so you can't simply transfer one drive to another machine.
So I talked to a few IT guys, and searched on replicating server hardware. No one had done it, and the forums weren't much help. Trial and error time.
I was to have the backup server in place two days after I took the hardware. Although I wasn't working at it constantly, I was at two weeks now. I configured the array from scratch, and put in a second drive, and it needed verifying. Three hours later, it was done and I rebooted. "Recovering errors," it said. Cool, until it finished and the "boot sector error" message popped up. More reading.
Again, starting from scratch with both drives and using the IBM ServerGuide install disk, I created the array with Windows 2000 server. I discovered that IBM also has a tool called ServeRAID. Where have you been all my server life? Now things were coming into focus. With this interface you readily can see what the drives were and what they were doing, and when you put in a second drive to fix a broken one, and it is not configured, it tells you.
So I installed it on XP. Oops, again. It's only for server platforms. I found some drivers for XP, but I couldn't get it to work. After three and a half weeks, I had put way too much into this, but I learned a lot.
Sometimes we know enough to be dangerous. The backup server was running different software for a SQL database application, which I needed to virtualize. I ripped out one of the drives to take home and extract the image. I put in the spare drive. Wasn't configured! The drive lights were flashing in a very strange way. I took out the recently inserted drive and replaced it with the original. So I took the whole server home from my customer's site to get the image. I screwed up both drives, since neither would boot. After more reading, I found Bart's PE bootable recovery CD. The registry was messed up on both drives, and I had to fix it. That was three days all on its own.
There is a fine line between tenacity and stupidity. I crossed that line many times, but learned. It remains to be seen if it was worth it.
Moral of the story: Stick to your knitting. And laugh about it when you don't.