XenServer integrates everRun VM for HA features

everRun VM diagramCompared to VMware ESX Enterprise Edition, business continuity and high availability features are lacking when deploying Citrix XenServer “out of the box.” Specifically, XenServer does not have the built in equivalent to VI3’s HA feature. Also missing is a solution similar to VMware’s soon to be released Site Recovery Manager (SRM). However, Marathon Technologies and XenSource (now a division of Citrix) have worked together to develop everRun VM as a enterprise class answer to fault tolerant availability for Windows virtual machines hosted on Citrix XenServer. According to Marathon’s Director of Products, Michael Bilancieri, at a recent Atlanta “Virtualization for the Real World” event, the integrated solution will be generally available sometime in the 2008 Q2/Q3 time frame.

Quoting from the Best of VMworld (more on this award later in this post) white paper downloadable from the everRun link above:

Planning ESX host capacity

Posted on January 12th, 2008 in availability, capacity analysis, fail over, vi3, vmotion by Rich

How many VMs should run on each ESX host? The answer is determined mostly by the physical resources of the host’s platform (storage, ram, cpu, etc.). Before VI3 introduced ESX Clusters with DRS and HA squeezing as many VMs on each ESX host as possible was acceptable. Today it’s not just ESX host capacity, but ESX Clusters need to be take into consideration. Planning Cluster capacity means ensuring availability of VMs while maintaining acceptable host performance in a fail over scenarios.

VMWare HAFirst, what is a fail over scenario? The first thing that comes to mind is a problem. One or more of your ESX hosts unexpectedly crashed. This is considered unplanned downtime. Another fail over scenario to consider is planned downtime such as rebooting after applying ESX patches. For both of these types of scenarios you want to make sure your VMs stay online.

VMware’s solution for planned downtime is VMotion. The solution for unplanned downtime is the HA feature of ESX Clusters. When determining your ESX capacity be sure to allow room to leverage these features.

VMotion migrates a VM to a different ESX host without users losing connectivity. Evacuating an ESX server by VMotion enables you

ESX NIC Teaming and VLANs

Posted on October 15th, 2007 in availability, cisco, esx, fail over, vmware by Rich

Every time I have to work with a customer’s networking engineers, or even my own Cisco consultants, I get funny looks when I have to tell them that there is not much to the nic teaming configuration on an ESX server.

Once a vSwitch is created it’s just a matter of assigning multiple physical NICs, creating port groups with the assigned VLANs, and setting the right policy. To the disbelief of the network guys, that can be done without adding any driver utilities or third party management software. After that ESX will load balance traffic headed out of the ESX host to the physical switch and provide redundancy for NIC fail over. Up to this point no changes to the switch are even needed.

On the physical switch side it does require more involved set up to provide inbound load balancing and setting up an ether channel. There are many guides already available on how to do this. Here are a few for reference:

ESX Server, NIC Teaming, and VLAN Trunking - blog.scottlowe.org

VMware ESX Server 3 802.1Q VLAN Solutions

To scale up or scale out?

Posted on October 14th, 2007 in availability, cluster, esx, fail over, vmware by Rich

When designing VI would you rather scale vertically or horizontally? That is, would you rather increase the number of VMs per ESX host, or increase the total number of ESX hosts in your environment?

A couple of years ago with ESX 2.X it was always about the consolidation ratio.

“How many VMs can I fit on a server that has 32gb of RAM?”

“What’s my ROI on a 16 CPU server?”

Even today a healthy percentage of clients maintain this strategy. Usually for the following reasons:

  • Rack space may be limited
  • VM application connectivity or performance may be maximized
  • VMs with large amounts of RAM and multiple cpus are needed.
  • Switch ports are limited

Now with the features of VI3 it’s more feasible, and sometimes more cost effective, to have many smaller servers as your ESX hosts.

“Should I use a Bladecenter?”

“How many servers will it take to consolidate my datacenter”

Clients who scale horizontally usually:

  • Have a dynamic environment with constant growth
  • Have a more restrictive annual budget.
  • Administer application “farms” spread across hosts (Citrix, Exchange, clustered or load balanced applications)
  • Have multiple network segments to put VMs on (DMZ, Development, Internet, contractor)

In my opinion VI3 facilitates a horizontal scale out strategy that makes more sense. Recent enhancements by hardware manufacturers are focusing on performance and availability for multiple sessions hosted on virtual servers without emulation. Dual core, quad core, Intel VT, AMD-V, and other emerging features make smaller servers more efficient and capable of hosting larger numbers of virtual machines. Assuming a VI design prevents a vmotion boundary, scaling horizontally also helps ensure host fail-over and availability to manage hardware problems or software updates without taking guest VMs offline.

Which strategy do you agree with or recommend, and why?

Considerations for Implementing Fail Over VI at a Secondary Site

Posted on September 18th, 2007 in availability, dr, esx, fail over, services, treesum, vcb by Rich

These are my notes I used to prepare for a discussion with a client about implementing a secondary site for DR fail over. The client has already virtualized their production data center and is wanting to leverage VI for DR. The point of my discussion is that VI is too often viewed as a “silver bullet” for tough projects like back up and fail over. Yes, there are some specific areas that are easier to implement with VI, but careful consideration and planning must be executed if the overall DR plan is to be successful.

Goals and Objectives - the customer must make important decisions first !

 

· Recovery Time Objectives – acceptable time to start up systems and allow user access

requires server by server analysis

· Recovery Point Objectives – acceptable point in time recovery or start up at secondary site

requires application by application analysis

· Mission Critical Services

which applications & services must be available first.

Thursday 9.13.07 Keynote - what I missed :(

Posted on September 15th, 2007 in appliance, availability, gen session, stor vmotion, vmware, vmworld by Rich

Unfortunately I slept late Thursday morning. Waking up at 7:30 am in Hayward, CA meant that there was no way short of a helicopter I was going to make it to San Francisco before 9. I’m pretty sure my company would not let me expense a helicopter so I decided to catch up on some email from the hotel until traffic burned off. I also had “Smash Head” from the party Weds night!

blog.scottlowe.org has some great notes on this session. Here’s my thoughts on what I missed.