tsignal_vmwareesx_180x300_animated
Badges

vexpert_logo_100x57

gestaltitbadge

follow-me-twitter

Subscribe to me on FriendFeed

Advertisements
Comments / DISQUS
Feedjit.com

Memtest86 and Ramcheck - ESX RAM Test Options

I was involved with a customer support issue today where multiple ESX hosts were experiencing random restarts. Although I did not personally get to troubleshoot the servers, the customer was confident that the RAM, all ordered at the same time, was the issue and probably a bad batch. This scenario is extremely difficult to troubleshoot, and extremely costly when multiple guests are hosted on a production ESX server. Of course, best practice when building new ESX hosts is to thoroughly test for bad memory before hosting VMs, but there is also a utility installed with ESX made for RAM testing in the background too. This post covers both options. Memtest86 should be used before an ESX host is in production while Ramcheck can be used if a problem develops after hosting running virtual machines.

Use the Memtest86 Live CD to test RAM before deploying in production

There are several versions of Memtest86 downloadable from www.memtest86.com, but the easiest form to use in any scenario is the .ISO file which, after burned to a CDR, is a bootable Live CD that can even be used on bare metal systems. It is recommended to run Memtest86 for at least 48 and up to 72 hours.

Memtest86+ is the latest port of the original Memtest86, and has been updated as of Feb ‘08. This port is found at www.memtest.org

Use Ramcheck after VMs are running

updated 8.1.08

As of the release of ESX 3.5, Ramcheck was removed from the installation files and is not supported in the latest versions. The tool is therfore only an option for ESX 3.0.x versions. Thanks to VM /ETC reader MichaelK for bringing this to my attention with his comment. This fact makes the only option for ESX 3.5 hosts in production to VMotion evacuate guests so that the server can be booted with the Memtest86 CD.

Information about the built in Ramcheck utility can be found at xtravirt.com - ESX3: Ramcheck. The following was taken from that link.

ESX3: Ramcheck

Description: A built in alternative to Memtest86 for ESX

Here’s a little gem that popped up at VMworld.

Instead of Memtest86 you also have the option of running ‘ramcheck’ which is a background memory tester built into ESX3. To start it, log into the ESX Service Console as root (or su / sudo). Type:

#service ramcheck start

This starts a background ram check of the server’s RAM and writes out a log file to /var/log/vmware/ramcheck.log and ramcheck-err.log. It runs as a world in VMkernel space.

It is non-disruptive and no reboot is required. It also consumes only nominal CPU resources but the trade off is the time to complete. It’s the sort of thing that would run in the background consuming less than a few percent of CPU over a couple of weeks.

Run esxtop and you’ll see it show up as ramcheck.<id>

You can type “service ramcheck stop” to cancel the memory test at any time. Once complete the service will stop automatically.

You may still want to use Memtest86 when deploying a new server as you can boot it from bare metal rather than having to install ESX first, and it’s quicker and more intensive. However it can be very useful as a non intrusive maintenance check during the lifecycle of a server.

Related Posts

Tags: , , ,

  • MichaelK
    Looks like the useful ramcheck utility has been removed from ESX 3.5

    "Ramchecker Service Fails to Start on the ESX Server 3.5 Versions
    The Ramchecker utility is not supported, and does not work on ESX Server 3.5 versions. Starting with this release, to prevent the installation of the Ramchecker utility, the Ramchecker service is removed from the VMware-esx-lnxcfg RPM, and the Ramchecker binary is removed from the VMware-esx-apps RPM.
    On ESX Server 3.5 and ESX Server 3.5 Update1 systems, the Ramchecker service fails to start, with an error message similar to the following:
    Starting ramchecker /usr/lib/vmware/bin/ramcheck: error while loading shared libraries: libc.so.6: failed to map segment from shared object: Error 28"

    http://www.vmware.com/support/vi3/doc/vi3_esx35...
  • MichaelK,

    Thanks for pointing this out. That's disappointing, and I guess that makes the only option for testing RAM on ESX 3.5 hosts in production is to VMotion evacuate the guest and use Memtest86.
  • Ramcheck wasn't all that great anyway. The main problem with it is that it would only over test free memory on an ESX server. If you had a busy server with a lot of RAM in use then it can not test that RAM. Memtest86 is the way to go.
  • Eric,

    Thanks for the input. I thought Ramcheck eventually tests all memory, but it takes much longer because it can only work with free memory as you point out. Assuming VMs are not reserved memory then eventually all memory should cycle through when when not in use.
  • It was a handy unobtrusive utility but testing beforehand with memtest is the preferred way to do it. Unfortunately most people don't do this so Ramcheck was a nice way to do it afterwards without having to take your host down for a whole day while it runs. I wrote a blog article on the topic a few months back also.

    http://servervirtualization.blogs.techtarget.co...
blog comments powered by Disqus
h9_coolvendor_160x600
@rbrambley tweets
Advertisements
VMTN Roundtable Podcasts
Subscribe



Add to Google Reader or Homepage
Subscribe in NewsGator Online
Add to netvibes
Add to Plusmo

UserOnline