VMware Site Recovery Manager Overview

Posted on May 8th, 2008 in dr, fail over, srm by Rich

One of the hands on labs I attended at VMware Partner Exchange was the Site Recovery Manager (SRM) lab. In the lab I was able to get a good understanding of the technical details of how the yet to be released product is configured. The lab then walked us through the fail over process and workflow. This post is a high level summary of what I learned. This post is not intended to be a detailed how to, but instead just a logical overview about what it will take to set up SRM.

XenServer integrates everRun VM for HA features

everRun VM diagramCompared to VMware ESX Enterprise Edition, business continuity and high availability features are lacking when deploying Citrix XenServer “out of the box.” Specifically, XenServer does not have the built in equivalent to VI3’s HA feature. Also missing is a solution similar to VMware’s soon to be released Site Recovery Manager (SRM). However, Marathon Technologies and XenSource (now a division of Citrix) have worked together to develop everRun VM as a enterprise class answer to fault tolerant availability for Windows virtual machines hosted on Citrix XenServer. According to Marathon’s Director of Products, Michael Bilancieri, at a recent Atlanta “Virtualization for the Real World” event, the integrated solution will be generally available sometime in the 2008 Q2/Q3 time frame.

Quoting from the Best of VMworld (more on this award later in this post) white paper downloadable from the everRun link above:

Designing ESX Resource Pools

Posted on March 4th, 2008 in cluster, drs, esx, esx 3i, esx3.5, fail over, how to, services, vc2, vc2.5, vi3, vmetc.com by Rich

How do you design resource pools in an ESX Cluster? There are two strategies that are the most popular in my experience. The first strategy creates resource pools based on CPU and Memory shares for host resource conflict management, and the second strategy uses reservations and limits to guarantee physical resources and ensure VM containment. This post will use a 3 ESX host example to explain both strategies. Please feel free to comment on the pros and cons of each or why you think one is better than the other.

In the example scenario three ESX hosts each have 16 GB RAM and 2 dual core 3.0 Ghz CPUs. The three hosts will all be members of the same ESX cluster.

Planning ESX host capacity

Posted on January 12th, 2008 in availability, capacity analysis, fail over, vi3, vmotion by Rich

How many VMs should run on each ESX host? The answer is determined mostly by the physical resources of the host’s platform (storage, ram, cpu, etc.). Before VI3 introduced ESX Clusters with DRS and HA squeezing as many VMs on each ESX host as possible was acceptable. Today it’s not just ESX host capacity, but ESX Clusters need to be take into consideration. Planning Cluster capacity means ensuring availability of VMs while maintaining acceptable host performance in a fail over scenarios.

VMWare HAFirst, what is a fail over scenario? The first thing that comes to mind is a problem. One or more of your ESX hosts unexpectedly crashed. This is considered unplanned downtime. Another fail over scenario to consider is planned downtime such as rebooting after applying ESX patches. For both of these types of scenarios you want to make sure your VMs stay online.

VMware’s solution for planned downtime is VMotion. The solution for unplanned downtime is the HA feature of ESX Clusters. When determining your ESX capacity be sure to allow room to leverage these features.

VMotion migrates a VM to a different ESX host without users losing connectivity. Evacuating an ESX server by VMotion enables you

vRanger Pro P2V-DR Module

Posted on October 29th, 2007 in dr, esx, fail over, vizioncore by Rich

Vizioncore: vRanger Pro P2V-DR Module

Vizioncore’s new P2V-DR module adds the ability to create backups of running physical servers on centralized Windows storage.

“The P2V-DR Module in vRanger Pro leverages the robust conversion engine of Vizioncore’s vConverter software. The cloning method employed by vConverter is executed at the “block-level” as opposed to “file-level” which results in extremely fast & reliable conversions with superior completion rates and no data loss.”

Unlike Ghost or other products that allow you capture an image of a server for bare metal restores, Vizioncore’s new module captures the server image while the server is live, and those images are converted for restoring the image to a VM. This sounds similiar to Platespin’s P2I (physical to image) conversions.

ESX NIC Teaming and VLANs

Posted on October 15th, 2007 in availability, cisco, esx, fail over, vmware by Rich

Every time I have to work with a customer’s networking engineers, or even my own Cisco consultants, I get funny looks when I have to tell them that there is not much to the nic teaming configuration on an ESX server.

Once a vSwitch is created it’s just a matter of assigning multiple physical NICs, creating port groups with the assigned VLANs, and setting the right policy. To the disbelief of the network guys, that can be done without adding any driver utilities or third party management software. After that ESX will load balance traffic headed out of the ESX host to the physical switch and provide redundancy for NIC fail over. Up to this point no changes to the switch are even needed.

On the physical switch side it does require more involved set up to provide inbound load balancing and setting up an ether channel. There are many guides already available on how to do this. Here are a few for reference:

ESX Server, NIC Teaming, and VLAN Trunking - blog.scottlowe.org

VMware ESX Server 3 802.1Q VLAN Solutions

To scale up or scale out?

Posted on October 14th, 2007 in availability, cluster, esx, fail over, vmware by Rich

When designing VI would you rather scale vertically or horizontally? That is, would you rather increase the number of VMs per ESX host, or increase the total number of ESX hosts in your environment?

A couple of years ago with ESX 2.X it was always about the consolidation ratio.

“How many VMs can I fit on a server that has 32gb of RAM?”

“What’s my ROI on a 16 CPU server?”

Even today a healthy percentage of clients maintain this strategy. Usually for the following reasons:

  • Rack space may be limited
  • VM application connectivity or performance may be maximized
  • VMs with large amounts of RAM and multiple cpus are needed.
  • Switch ports are limited

Now with the features of VI3 it’s more feasible, and sometimes more cost effective, to have many smaller servers as your ESX hosts.

“Should I use a Bladecenter?”

“How many servers will it take to consolidate my datacenter”

Clients who scale horizontally usually:

  • Have a dynamic environment with constant growth
  • Have a more restrictive annual budget.
  • Administer application “farms” spread across hosts (Citrix, Exchange, clustered or load balanced applications)
  • Have multiple network segments to put VMs on (DMZ, Development, Internet, contractor)

In my opinion VI3 facilitates a horizontal scale out strategy that makes more sense. Recent enhancements by hardware manufacturers are focusing on performance and availability for multiple sessions hosted on virtual servers without emulation. Dual core, quad core, Intel VT, AMD-V, and other emerging features make smaller servers more efficient and capable of hosting larger numbers of virtual machines. Assuming a VI design prevents a vmotion boundary, scaling horizontally also helps ensure host fail-over and availability to manage hardware problems or software updates without taking guest VMs offline.

Which strategy do you agree with or recommend, and why?

Considerations for Implementing Fail Over VI at a Secondary Site

Posted on September 18th, 2007 in availability, dr, esx, fail over, services, treesum, vcb by Rich

These are my notes I used to prepare for a discussion with a client about implementing a secondary site for DR fail over. The client has already virtualized their production data center and is wanting to leverage VI for DR. The point of my discussion is that VI is too often viewed as a “silver bullet” for tough projects like back up and fail over. Yes, there are some specific areas that are easier to implement with VI, but careful consideration and planning must be executed if the overall DR plan is to be successful.

Goals and Objectives - the customer must make important decisions first !

 

· Recovery Time Objectives – acceptable time to start up systems and allow user access

requires server by server analysis

· Recovery Point Objectives – acceptable point in time recovery or start up at secondary site

requires application by application analysis

· Mission Critical Services

which applications & services must be available first.