I wrote about all the little issues that the server behind kahunaburger.com had in the last few weeks. I finally found the real reason for the problems: one of the RAM modules in the system is dying and the memory on the module is corrupted randomly. This makes FreeBSD really upset (no wonder) and leads to sporadic reboots followed by lengthy filesystem checks (fsck), because the disks were not unmounted cleanly. Wash, rinse and repeat …
Let’s just order a replacement memory module then. I checked the RAM specs for the system on dell.com, then headed over to newegg.com to find a replacement. The system is a bit on the old side and happens to be one of the few computers that used Rambus RDRAMs. Searching on newegg for a replacement module yields zero results. Not being sold any more. Not available. I found a few places which still sell RDRAMs, but they are either out of stock or charge ridiculous prices.
The next option was to just yank out the faulty module (at least the BIOS tells me, which one it is) and restart without it. Well, easier said than done: the Dell’s configuration requires that all four memory banks are occupied and does not even attempt to boot, if one or more of them are missing.
Great, just great! No replacement module and I can’t boot without the faulty one. I looked over all BIOS settings to see if there was a way to disable one or the other one, but that was also not available.
I finally remembered that you could tell FreeBSD during the boot process to limit the amount of physical memory to use. After some searching I found the correct option and started to experiment with it. The loader-configuration “hw.physmem=xxx” allows you to tell the operating system, how much physical memory you want to use. There’s 1GB of memory in the system and each module is 256MB. So I started reducing the memory in chunks of 256MB until I did not see any more random reboots. Per Murphy’s law, if there are four possible configurations that could work, only the last one you test will be the one that works. I had to reduce the memory to 256MB before the system was stable again. Hmmm - I think my wrist-watch has more memory …
A replacement system has been ordered already, but it is taking it’s good old time to make it here to Santa Fe.
That’s just another lesson showing you how hardware which is just a few years old becomes obsolete …
May 10th 2008 Posted to
News