MESSAGE
DATE | 2017-01-19 |
FROM | Rick Moen
|
SUBJECT | Subject: [Hangout-NYLXS] RAM and RAM-testing
|
Forwarding at Ruben's suggestion. Note bits about RAM-testing, about which I'll also forward some separate comments.
----- Forwarded message from Rick Moen -----
Date: Wed, 18 Jan 2017 22:24:20 -0800 From: Rick Moen To: conspire-at-linuxmafia.com Subject: [conspire] Old hardware, ridiculously old hardware: free RAM for you Organization: If you lived here, you'd be $HOME already.
Dana was kind enough to come back over last night to help me work on a Supermicro 2U server he'd given me. Some months ago, Daniel had tried to install onto it, and reported a problem; something about no video display or something. (I failed to take exact notes, intending to just circle back and check it out myself. January rolled around; It was time to investigate.)
This is a really nice machine, that was new circa 2010. It's very quiet for a 2U server, well built, and Dana says it draws only about 40W at idle, which is not bad at all. And with ample RAM and disk, you can do... a great deal.
o 2U case with quiet yet effective fans o Supermicro X8SIE motherboard based on Intel 3420 chipset ('Ibex Peak') o Intel Xeon X3430 -at- 2.40GHz 'Lynnfield' quad-core CPU o Motherboard has 6 x SATA 3Gb/s headers, AHCI interface o Hotswap backplane that can hold up to 8 SATA drives o 3Ware SATA hardware RAID controller o 2 x Intel 82574L gigabit ethernet o Capacity for 32GB DDR-1333 ECC registered (or 16GB unregistered) dual-channel (interleaved) SDRAM, in six RAM sockets o PCI-E 2.0 x16 slot + PCI-E x8 slot + PCI 32-bit slot o Matrox G200eW w/16MB RAM video o 6 x USB, plus two more as headers o 2 x PS/2 o 2 x RS232C serial o Separate LAN interface for IPMI 2.0 (Realtek RTL8201N) [1]
After Saturday's CABAL meeting, I'd kicked the machine around and running it through memtest86. Overnight, the machine hard-froze in that RAM-checker, so hard that even the keyboard's NumLock key didn't even toggle the LED. Hard-booting caused it to enter a state where the fans were stuck full-blast and there was no video. I pessimistically thought 'Likely a wonky motherboard; that's probably why it was pulled.'[2]
But wait; not so fast. After some more alternately leaving it unplugged and poking it, system came back up, and I had the excellent idea of looking around in the BIOS. Something in the back of my mind was probably trying to tell me 'Look in the event log!' Many motherboards since the 1990s have had built-in hardware event logging, with significant detail about system hardware problems but _no_ effort to get the admin's attention: You have to remember to go into the BIOS and look. Sure enough, I found a large number of single-bit RAM errors, at least roughly corresponding to the times of system hangs, and always citing the same RAM stick as where the problem happened.
This machine had a pair of these 4GB Registered ECC sticks: http://www.ubbcentral.com/store/item/Lot-of-two-(2)-Actica-ACT4GHR72R8G1333M-4GB-RAM-dimms---8GB-upgrade_152277298609.html
Well, shoot: One of them is dodgy. This datum so far has held out as a candidate root cause for the freezing. Replacing both sticks with different RAM has made the system so far quite stable.
Longtime CABAL people may find the above story hauntingly familiar, because the same thing happened to me in December 2006, when a pair of bad PC100 512MB ECC SDRAM sticks on a VA Linux Systems 2230 motherboard caused mysterious problems:
http://linuxmafia.com/pipermail/conspire/2006-December/002668.html http://linuxmafia.com/pipermail/conspire/2006-December/002668.html http://linuxmafia.com/pipermail/conspire/2007-January/002743.html
Even though, just like the Supermicro's 4GB stick, these were _ECC_ (error checking and correcting) server-grade RAM, these bad sticks _still_ caused instability and the system gave zero indication to the admin. So, no, ECC doesn't automatically protect you.
I like to say, the best memory-checker is actually Linux. In the linked posts, I illustrated how to -really- test RAM -- iterative parallel kernel compiles configured to use up all RAM (the 'parallel' part), using 'make -j':
# cd /usr/src/linux-source-2.6.16 # while : ; do make clean && make -j N ; done
...where you adjust N upwards until you just barely start to see swap activity in the output of the 'free' command, or as shown by vmstat.
Back then in 2006, with my 2 x 256MB sticks in the motherboard (system total 512MB) and the dodgy 2 x 512MB sticks removed, I gradually increased N from 4 to 256 before RAM was being fully exercised. With _only_ good RAM, I could keep that while loop running indefinitely. If I put either of the bad RAM sticks in, I'd get freezes or spontaneous reboots within a few hours (with N high enough to exercise all RAM).
It's important to note that _memtest86 didn't find this problem_. I'd run it at least 24 hours just before, and no errors had showed.
So: ECC is not a cure-all. memtest86 won't always find bad RAM. iterative, parallel kernel compiles _do_ always find bad RAM.
I must say, having a well-designed 2010 rackmount server at my disposal (now with all of my spare SATA drives in it) has reminded me once again that tempus fugit, and that a lot of the old hardware sitting around in my cabinets is way past its sell-by date. As I said here the other day, PATA (old IDE), for example, was never very good in the first place, and has quietly left the retail market -- gone. And, y'know, that's good, because SATA (and its SCSI cousin SAS) is worlds better.
While Dana was here and we were taking care of other tasks, I sat down and researched all of Dana's old spare RAM, and labelled it. 'Eh?', you say.
Right, old RAM: Over time, you accumulate old RAM sticks that you leave on a shelf, preferably in antistatic bags -- but you never make use of it because it's real work figuring out which of your old RAM sticks would work in what machines.
To avert that outcome, you have to put sticky paper labels on each set of identical RAM and write what they are, what they're good for. I did this for all of Dana's spare RAM. Then, I did likewise for mine.
Except for a spare pair of sticks for the Supermicro, and some still inside spare rackmount servers I have, mine's pretty damned old -- see below -- and will probably get junked soon. _However_, if you want any of this, speak up, and it's yours.
Laptop SDRAM (200-pin SO-DIMMs): PC2-5300 DDR2: Vintage 2005 or so. 2 x 512MB Hynix brand, 2 x 512MB Micron. PC-2100 DDR: Vintage 2002 or so. 1 x 512MB Micron brand.
Workstation/server SDRAM (DIMMs): PC133: Vintage 1999 or so. 8 x 256 MB Corsair brand, 168-pin PC3200: Vintage 2001 or so. DDR. 1 x 256 MB Samsung brand, 184-pin PC2-4200. Vintage 2005 or so. DDR2. 1 x 512 MB Samsung brand, 240-pin
If you want these, come get them! Limited-time offer!
'DDR' in this context means Double Data-Rate (relative to first-generation SDRAM like PC-100 and PC-133 sticks). Each generation, DDR (circa 2001-2002), DDR2, and DDR3, are backwards-incompatible (and sometimes even need different voltage from prior generations), and accordingly have SIMM different notch positions and pin densities so you cannot accidentally use the wrong type. Without getting into detail (see Wikipedia), each generation is simply faster.
Almost all _current_ SDRAM is DDR3: regular SIMM sticks for everything but laptops, smaller SO-DIMM sticks for laptops & similar tiny machines. Motherboards using DDR4 (such as those using Intel Haswell CPUs) have been also entering the market.
SDRAM took over in the late 1990s from EDO DRAM, which in turn replaced plain ol' DRAM (dynamic random access memory), properly termed FPM = fast page mode DRAM, in the middle '90s (though few ever called it that).
Quantity of RAM is usually the biggest limiting factor towards perceived machine performance, in my experience, so it's wise to stuff in as much as possible, and use the highest-RAM-density sticks you can. If perchance your old machines can use the above sticks, _you want them_, if only as spares (because sometimes your sticks will develop faults).
[1] Which has its problems, so I'm glad I can elect to have it not active at all, especially given that Supermicro makes it accessible only with its proprietary Java application IPMIView. Which in turn lead to http://www.kb.cert.org/vuls/id/648646 . Note the warning that IPMI should be carefully traffic-restricted.
[2] IT businesses, in characteristic fits of optimism, tend to pull computers out of service whenever those machines show signs of unreliable operation, intending to figure out their problems, but then reality intrudes: If reloading the OS doesn't make the problem vanish, typically little or no further diagnosis is even attempted, as either they don't know how or cannot spare the time. This is probably what happened with the Supermicro 2U server: Nobody thought to check it for bad RAM. Similarly the free RAM I was trying to use in 2006 had probably been yanked as suspect but never tested.
-- Cheers, "The crows seemed to be calling his name, thought Caw." Rick Moen -- Deep Thoughts by Jack Handey rick-at-linuxmafia.com McQ! (4x80)
_______________________________________________ conspire mailing list conspire-at-linuxmafia.com http://linuxmafia.com/mailman/listinfo/conspire
----- End forwarded message -----
----- End forwarded message -----
_______________________________________________ svlug mailing list svlug-at-lists.svlug.org http://lists.svlug.org/lists/listinfo/svlug
----- End forwarded message ----- _______________________________________________ hangout mailing list hangout-at-nylxs.com http://www.nylxs.com/
|
|