Wednesday, 18 June 2008

Hardware failures...

The Bad:

When I got to work today, I've found our server (white box with Debian etch responsible for networking, files, printers, etc) powered off - most likely the UPS' battery didn't survive a power outage during the night. When I turned the server on I was greeted by all those nice lines telling me I had a hard disk problem.

The Ugly:

Instead off marking the disk showing the read errors as bad, the RAID stack (device mapper?) somehow concluded the "good" disk of the RAID 1 array wasn't synched and kicked it out...

The good

The bad sectors did take only some unimportant collectd status files with them. After some poking with dd trying to force the HD to redirect the bad sectors, the read errors vanished and the Reallocated Sector Count didn't increase according to smartctl, which seems like a good signal.

