ESX snapshots are like a loaded gun
I’ve posted about dealing with ESX snapshots before, but it seems to always be a topic that resurfaces. It’s almost as if there is a “virtualization school of hard knocks” and dealing with open ESX snapshots is a required course. Maybe the misunderstanding is so widespread because a lot of administrators first use VMware Workstation before implementing production servers hosted on ESX?
Anyways, there is a lot of information about snapshots and the problems they can get you in already on the web, but this week Jason Boche posted Know thy open snapshots
which I found to be very informative and helpful. The title of my post was taken from an eye opening quote from Jason:
“Unfortunately in the current builds, VMware doesn’t give us real good (or automated) visibility of open snapshots. I liken it to handing a loaded gun to a child – it’s only a matter of time before an accident happens. That analogy is quite extreme but it gets my point across …”
Read Jason’s post in it’s entirety for much more information, but I particularly like his explanation
of what happens when you create a snapshot on ESX.
“When a snapshot is created, a delta file is created on the VMFS volume and in the folder where the VM resides. The initial size of the delta file is 16MB. The purpose of the delta file is to maintain the delta changes to virtual disks since the snapshot was taken. This would be any disk write I/O activity inside the guest VM OS.
Disk write I/O inside a guest VM may be seldom or it may be very active. It depends on the role of the VM and more specifically the software and features installed inside the VM. When the initial 16MB delta file fills to capacity with the delta changes it maintains, it dynamically increases its size by another 16MB. Once again, if and when the delta file fills to capacity with delta changes, it grows by another 16MB. For those who excel in math, our delta file is now 48MB in size. Do you see the pattern? The delta file will continue to grow in 16MB increments to a maximum size of the parent file (and in some cases very rapidly!) unless one of a few conditions is met:
- Someone closes the snapshot
- Someone creates an additional child snapshot (perpetuating a potential problem)
- The snapshot file somehow becomes corrupted before or during closing of the snapshot (bad news)
- The VMFS volume where the VM and delta file are stored runs out of available storage space (update your resume. All other VMs on the same VMFS volume, snapshotted or not, as well as VMKernel swap and VM logs are now also out of write space)”
Jason also provides some tools to help you with finding snapshots now that you know they are a problem.
“So how do we gain better visibility of snapshots that’s not going to tie up a bunch of our valuable time? Fortunately there are some good 3rd party solutions available for free to help us out. A few that I like are Xtravirt’s Snaphunter, RVTools, and hyper9.”
Since I just posted about it, I’ll add another tool to the list – Trilead VM Explorer.













One can also use the VI-Toolkit for Windows to get this information, and do some cool things with it.
First get the toolkit from here: http://vmware.com/go/powershell/
Then:
List all open snaps:
connect-viserver -name (esx or vc server)
get-vm | get-snapshot
All snaps over 7 days:
Get-VM | Get-Snapshot | where { $_.Created -le (Get-Date).AddDays(-7)}
Like magic.
-Cody
http://professionalvmware.com
i wrote a nagios plugin to help find any snapshots that have been forgotten about and are growing out of control… http://tinyurl.com/akrrsz
VMware really needs to build something in that allows scheduled creation and expiration of snaps…
CL,
Thanks for letting us know about the Nagios plug-in for ESX snapshots!
CL,
Thanks for letting us know about the Nagios plug-in for ESX snapshots!