Memtest86 and Ramcheck - ESX RAM Test Options
I was involved with a customer support issue today where multiple ESX hosts were experiencing random restarts. Although I did not personally get to troubleshoot the servers, the customer was confident that the RAM, all ordered at the same time, was the issue and probably a bad batch. This scenario is extremely difficult to troubleshoot, and extremely costly when multiple guests are hosted on a production ESX server. Of course, best practice when building new ESX hosts is to thoroughly test for bad memory before hosting VMs, but there is also a utility installed with ESX made for RAM testing in the background too. This post covers both options. Memtest86 should be used before an ESX host is in production while Ramcheck can be used if a problem develops after hosting running virtual machines.
Use the Memtest86 Live CD to test RAM before deploying in production
There are several versions of Memtest86 downloadable from www.memtest86.com, but the easiest form to use in any scenario is the .ISO file which, after burned to a CDR, is a bootable Live CD that can even be used on bare metal systems. It is recommended to run Memtest86 for at least 48 and up to 72 hours.
Memtest86+ is the latest port of the original Memtest86, and has been updated as of Feb ‘08. This port is found at www.memtest.org
Use Ramcheck after VMs are running
updated 8.1.08
As of the release of ESX 3.5, Ramcheck was removed from the installation files and is not supported in the latest versions. The tool is therfore only an option for ESX 3.0.x versions. Thanks to VM /ETC reader MichaelK for bringing this to my attention with his comment. This fact makes the only option for ESX 3.5 hosts in production to VMotion evacuate guests so that the server can be booted with the Memtest86 CD.
Information about the built in Ramcheck utility can be found at xtravirt.com - ESX3: Ramcheck. The following was taken from that link.
ESX3: Ramcheck
Description: A built in alternative to Memtest86 for ESX
Here’s a little gem that popped up at VMworld.
Instead of Memtest86 you also have the option of running ‘ramcheck’ which is a background memory tester built into ESX3. To start it, log into the ESX Service Console as root (or su / sudo). Type:
#service ramcheck start
This starts a background ram check of the server’s RAM and writes out a log file to /var/log/vmware/ramcheck.log and ramcheck-err.log. It runs as a world in VMkernel space.
It is non-disruptive and no reboot is required. It also consumes only nominal CPU resources but the trade off is the time to complete. It’s the sort of thing that would run in the background consuming less than a few percent of CPU over a couple of weeks.
Run esxtop and you’ll see it show up as ramcheck.<id>
You can type “service ramcheck stop” to cancel the memory test at any time. Once complete the service will stop automatically.
You may still want to use Memtest86 when deploying a new server as you can boot it from bare metal rather than having to install ESX first, and it’s quicker and more intensive. However it can be very useful as a non intrusive maintenance check during the lifecycle of a server.









