Badges

gestaltitbadge

follow-me-twitter

Subscribe to me on FriendFeed

Comments / DISQUS
Feedjit.com

Design a clustered VM application that can fully leverage VMotion, DRS, and HA?

This post is more of an idea then a report. If you’ve experimented with a design similar to my thoughts below please post a comment and let me know!

Have you tried to configure VMs in a MS cluster across separate ESX hosts? How about clustering a physical server with a VM? VMware’s guide can be found here. Referencing this guide I am specifically talking about “Clustering Virtual Machines Across Physical Hosts (Cluster Across Boxes)” and “Clustering Physical Machines and Virtual Machines (Standby Host)”.

Read the guide and you’ll find there are several prerequisites and restrictions. The most important ones being:

  • you must use RDMs in physical mode for shared storage
  • dedicate at least 2 physical nics to the VMs
  • you can not use multipathing software
  • you must use the LSILogic virtual SCSI adapter in your VMs
  • you can only use 32 bit VMs. You can not cluster with 64 bit VMs
  • iSCSI disks are not supported. NAS disks are not supported.
  • you can only use 2 node clustering
  • the boot disks for the VMs must be on local storage
  • clustered VMs can not participate in an ESX cluster and use VMotion, DRS and HA

So how do we design a clustered VM application that can fully leverage VMotion, DRS, and HA?


What if a virtual appliance was used to present storage as iSCSI LUNs to the nodes, and what if the iSCSI LUNs were really inside a .vmdk file on a VMFS volume? If all of the VMs also had their boot OS drives on a VMFS on shared storage then all the nodes and the virtual appliance could freely vmotion between ESX hosts without issue.

In case you need to visualize what I am talking about then look at the following: (click image for a larger view)


As illustrated above, any one of the three VMs could Vmotion to any of the three ESX hosts and still have necessary connectivity to the VMFS with the .vmdks, as well as the private, public, and iSCSI networks.

Microsoft has already announced support for iSCSI in MSCS.

There are numerous server applications that can be installed in a VM to provide virtual iSCSI storage to Windows OS es:

Those are just a few I could think of off the top of my head.

The trick to this design is the iSCSI storage is actually inside a .vmdk file. Add some common sense and create DRS and HA rules that prevent the VM nodes from being on the same ESX host, and be sure to prevent each node from restarting if an ESX host is isolated.

Here’s an example of using a MSCS iSCSI cluster in VMware workstation from baeke.info. The concepts and configuration are similar.

Am I smoking crack here? Let me know!

Related Posts

  • http://www.equallogic.com Wade

    This is something I commonly see, and is widely mis-understood. You have the answer partly there. Since MS does support iSCSI as a connection method to shared resources in an MS Cluster, you can use iSCSI to accomplish this easily. However, you don’t have to virtualize your storage in a VMDK to do this. Just think outside the box.

    Just put a VM Network Port group on the same VSwitch where you have VMKernel for iSCSI. Then put a Virtual NIC attached to your iSCSI VSwitch. Load the MS iSCSI 2.0x init in the Guest VM, map your iSCSI volumes correctly and you can cluster away. Since at this point VMWare doesn’t know anything about a cluster at all, ESX just sees TCP/IP traffic. Something they are really good at as we all know, is networking.

    Now you can have a multi-Node cluster, a VM to VM cluster on different machines, or a Physical to Virtual Cluster.

    I have several customers already doing this. Works great!

    Just come see me on Friday morning, since I’m going to be in your office with Kevin in the lab.

    Wade

  • http://www.equallogic.com Wade

    This is something I commonly see, and is widely mis-understood. You have the answer partly there. Since MS does support iSCSI as a connection method to shared resources in an MS Cluster, you can use iSCSI to accomplish this easily. However, you don’t have to virtualize your storage in a VMDK to do this. Just think outside the box.

    Just put a VM Network Port group on the same VSwitch where you have VMKernel for iSCSI. Then put a Virtual NIC attached to your iSCSI VSwitch. Load the MS iSCSI 2.0x init in the Guest VM, map your iSCSI volumes correctly and you can cluster away. Since at this point VMWare doesn’t know anything about a cluster at all, ESX just sees TCP/IP traffic. Something they are really good at as we all know, is networking.

    Now you can have a multi-Node cluster, a VM to VM cluster on different machines, or a Physical to Virtual Cluster.

    I have several customers already doing this. Works great!

    Just come see me on Friday morning, since I’m going to be in your office with Kevin in the lab.

    Wade

  • http://www.rtfm-ed.co.uk Mike Laverick

    There are number of requirements both from a support and physical perspective that makes MS-Clustering and VMware Clustering affectively incompatiable. To be supported the VM’s boot disk must be on local storage – this kills VMotion, DRS and HA. Additionally, the second SCSI adapter has be set to be either “Virtual” bus sharing or “Physical” bus sharing. VMware appear to have a condition statement in DRS/VMotion – to the affect that IF=(Bus Sharing is enabled, Stop VMotion, Allow VMotion).

    I’ve done some wacky configurations. Such as the quorum and shared volumes should be RDM’s for a cluster-across-boxes – and I’ve breached that and used virtual disks. Additionally, I’ve put nodeA and nodeB boot disks on shared storage, when to meet the support requirement they should be local. What did I find? Well, VMotion/DRS still didn’t work because of the condition statement – about the only part of DRS that did work was initial placement (keep apart) during power on. Interesting HA+MS clustering in this configuration did “work”. In the sense I that I had nodeA on ESX1 and nodeB on ESX2 – and when I pulled power-cord failover did occour. However, as far as VMware and MS is concerned this breach the support requirement of local storage. There are some good reasons for this requirement, not least that MS don’t support boot-from-SAN clusters in the physical world, never mind the virtual world.

    The way forward I feel is increased and improved clustering support from VMware. To tell you the truth I loathe and dispise MS clustering with a passion. But if VMware can release “continious” HA this opens the door to clustering applications that can’t be clustered with conventional MS clustering software. They have already announced the ability to power-cycle a VM that BSODs. So we are also begin to see better VM-awareness of the GOS from a VMware Clustering perspective…

  • http://www.rtfm-ed.co.uk Mike Laverick

    There are number of requirements both from a support and physical perspective that makes MS-Clustering and VMware Clustering affectively incompatiable. To be supported the VM’s boot disk must be on local storage – this kills VMotion, DRS and HA. Additionally, the second SCSI adapter has be set to be either “Virtual” bus sharing or “Physical” bus sharing. VMware appear to have a condition statement in DRS/VMotion – to the affect that IF=(Bus Sharing is enabled, Stop VMotion, Allow VMotion).

    I’ve done some wacky configurations. Such as the quorum and shared volumes should be RDM’s for a cluster-across-boxes – and I’ve breached that and used virtual disks. Additionally, I’ve put nodeA and nodeB boot disks on shared storage, when to meet the support requirement they should be local. What did I find? Well, VMotion/DRS still didn’t work because of the condition statement – about the only part of DRS that did work was initial placement (keep apart) during power on. Interesting HA+MS clustering in this configuration did “work”. In the sense I that I had nodeA on ESX1 and nodeB on ESX2 – and when I pulled power-cord failover did occour. However, as far as VMware and MS is concerned this breach the support requirement of local storage. There are some good reasons for this requirement, not least that MS don’t support boot-from-SAN clusters in the physical world, never mind the virtual world.

    The way forward I feel is increased and improved clustering support from VMware. To tell you the truth I loathe and dispise MS clustering with a passion. But if VMware can release “continious” HA this opens the door to clustering applications that can’t be clustered with conventional MS clustering software. They have already announced the ability to power-cycle a VM that BSODs. So we are also begin to see better VM-awareness of the GOS from a VMware Clustering perspective…

  • Jim

    We’ve left the VM side out of the cluster equation and settled on a L4 device to do the load balancing for us. Gives us the ability to have the load go to the “other” node if one goes offline, while providing the ability to load balance traffic when both (or more) nodes are available. By taking the cluster side out of the virtual world and putting it on the network guys, as long as we have traffic flowing to the VMs, then we’re in a highly available configuration from a service standpoint, and our VM configurations are still capable of using all the great toys found in the Virtual Infrastructure. If we loose a ESX server, then we suffer a boot time event for one of the nodes. By using the L4 device, we can take the service degradation long enough for HA to kick in on the failed node. As a side note, this configuration also gives us the ability to do much more than a dual node cluster on some SOA levels. Case in point is that I’ve got 6 IIS servers providing balanced traffic to 20 application servers. The L4 lets us direct the traffic to the least busy IIS nodes while providing the insurance of service protection.

    The one thing we don’t do effectively with this configuration is answer the database layer question. Still working on that one.

    Some things work great in the virtual world, somethings still work easier outside of it. This in my opinion is one of them.

  • Jim

    We’ve left the VM side out of the cluster equation and settled on a L4 device to do the load balancing for us. Gives us the ability to have the load go to the “other” node if one goes offline, while providing the ability to load balance traffic when both (or more) nodes are available. By taking the cluster side out of the virtual world and putting it on the network guys, as long as we have traffic flowing to the VMs, then we’re in a highly available configuration from a service standpoint, and our VM configurations are still capable of using all the great toys found in the Virtual Infrastructure. If we loose a ESX server, then we suffer a boot time event for one of the nodes. By using the L4 device, we can take the service degradation long enough for HA to kick in on the failed node. As a side note, this configuration also gives us the ability to do much more than a dual node cluster on some SOA levels. Case in point is that I’ve got 6 IIS servers providing balanced traffic to 20 application servers. The L4 lets us direct the traffic to the least busy IIS nodes while providing the insurance of service protection.

    The one thing we don’t do effectively with this configuration is answer the database layer question. Still working on that one.

    Some things work great in the virtual world, somethings still work easier outside of it. This in my opinion is one of them.

  • http://treesum.homeip.net Rich

    After discussing this with some contacts at VMware, the main issue here is whether or not a config like this can be or will be supported under a client’s SNS agreement. Bottom line though is that if a VM boots from a LUN that is on a storage device that is currently on VMware’s HCL than that’s all VMware requires. It should also be mentioned that even in a non-supported scenario VMware is usually good about providing best effort support. I would be surprised if the helpdesk did not try to help.

  • http://treesum.homeip.net Rich

    After discussing this with some contacts at VMware, the main issue here is whether or not a config like this can be or will be supported under a client’s SNS agreement. Bottom line though is that if a VM boots from a LUN that is on a storage device that is currently on VMware’s HCL than that’s all VMware requires. It should also be mentioned that even in a non-supported scenario VMware is usually good about providing best effort support. I would be surprised if the helpdesk did not try to help.

  • http://vmetc.com rbrambley

    fixed link to image in post.

  • http://www.vmetc.com Rich

    fixed link to image in post.

  • http://www.peachtreedata.com Richard

    This post has been around for a while so I thought I would ask if these limitations are still present with the current version of ESX 3.5 Update 2?

    I’m running ESX with an EqualLogic SAN. In fact I do believe that Wade in the first post was one of the presenters at our office last year :-) It’s a small world!

    Anyway, I’m looking for a solution that will allow me to setup VM’s in a cluster to support our SQL Server databases.

    So far I have not found the “eureka” solution, but I’m still optimistic that it’s out there!

  • http://www.peachtreedata.com Richard

    This post has been around for a while so I thought I would ask if these limitations are still present with the current version of ESX 3.5 Update 2?

    I’m running ESX with an EqualLogic SAN. In fact I do believe that Wade in the first post was one of the presenters at our office last year :-) It’s a small world!

    Anyway, I’m looking for a solution that will allow me to setup VM’s in a cluster to support our SQL Server databases.

    So far I have not found the “eureka” solution, but I’m still optimistic that it’s out there!

  • http://vmetc.com rbrambley

    Richard,

    MSCS support is basically still the same as it was since ESX version 3.0.1.

    Duncan over at yellow-bricks.com posted about v 3.5 changes at http://www.yellow-bricks.com/2008/04/02/support-for-microsoft-cluster-server-mscs-in-35-update/

    I believe the latest VMware .pdf on this topic is still http://www.vmware.com/pdf/vi3_35/esx_3/vi3_35_25_u1_mscs.pdf

  • http://www.vmetc.com Rich

    Richard,

    MSCS support is basically still the same as it was since ESX version 3.0.1.

    Duncan over at yellow-bricks.com posted about v 3.5 changes at http://www.yellow-bricks.com/2008/04/02/support-for-microsoft-cluster-server-mscs-in-35-update/

    I believe the latest VMware .pdf on this topic is still http://www.vmware.com/pdf/vi3_35/esx_3/vi3_35_25_u1_mscs.pdf

  • Anonymous

    I always loved this kind of IT stuffs and now I don’t know what to think about the career I decided to chose. I’m a Content freelancer writer and for a while I have to admit that this satisfied to me. But now I’m planing to learn something else and start new all over again.

Get My Podcast On iTunes!
Support VM /ETC
Support VMETC.com

Support VMETC.com

Free Business and Tech Magazines and eBooks
@rbrambley tweets
VMTN Roundtable Podcasts
Subscribe



Add to Google Reader or Homepage
Subscribe in NewsGator Online
Add to netvibes
Add to Plusmo