One of the guys from the original Xensource team was down in South Africa this week, and this blog follows on from an interesting conversation regarding how these two different vendors have implemented HA. Interesting you say? Yes - especially if you have a beard and leather sandals.
VMware
VMware's HA is heavily dependent on DNS or alternatively hosts entries being in place. The VMware implementation is based on the Legato Automated Availability Management (AAM), in fact some of us will recall that it used to place those logs into /opt/LGTOaam512/logs/ (since 3.5 this has been moved /opt/vmware/aam).
VMware's HA uses the network to establish a heartbeat between all the ESX Hosts participating, in fact a best practice would be to provide redundancy in this regard either at the NIC level or by implementing multiple Service Console ports (HA uses the Service Console port to communicate).
So practically, what does this mean to the poor bloke who has to support the servers? If you network has a bit of a flap (personally I always blame the Network guysJ), your servers will implement an "isolation response", the default server response will shut down your Virtual Machine to release the shared storage locks, this will allow the machine to be restarted on another host, this of course may not be desirable if the server is busy doing something, i.e. you may cause corruption or other issues with the Application/Database. In other words it won't perform a clean shutdown. This is configurable such that you can keep the machines powered on, but this isn't recommended in the case of NAS or iSCSI (as they are also network dependant) and you may end up with a split-brain situation.
There is now also experimental support for component level HA, i.e. if a Virtual Machine fails, then VMware will try to restart it.
Citrix
The Citrix HA mechanism uses a quorum on the shared storage. This is known as the Heartbeat SR. It also uses the network so it has two heartbeat routes. Citrix has licensed Marathon Technologies HA mechanism, much in the same way as VMware integrated third party software.
Your virtual machines can reside on NFS, iSCSI or FC, but the Heartbeat SR does need to be on iSCSI or FC. When enabling HA it checks network connectivity between the Xenservers, and the Heartbeat SR. The quorum is 356MB in size - well at least the version I am using is - that comprises 4MB of heartbeat, and the rest metadata. For the purposes of my first test I used Starwind, a software iSCSI emulator, but then the guys down the road at NetApp kindly loaned us some NetApp storage - now if I can only fabricate an excuse to hang on to it........
The network connectivity seems a lot less "fussy" about DNS/machine names than VMware's implementation, I distinctly remember fiddling around with short names on ESX servers where clients had mixed up the case sensitive names (windows guys, what do you expect J).
When setting up a Virtual Machine to be highly available, both physical hosts need to see the same networks, same storage and have no local DVD drives attached, but nothing more complicated than that.
As with the VMware scenario, what does this practically mean to the guys who have to support the Servers? Having two heartbeat mechanisms means that you don't get false positives, that is, the members of an HA cluster won't assume a physical server has failed purely based on network connectivity.
The Marathon guys have done some really fantastic stuff with component level resiliency. What you have is a component that insinuates itself between the hypervisor layer and the Virtual Machine, such that it intercepts all the calls and can replicate those to a partner. This is not standard High Availability (HA), but rather Continuous Availability (CA) as the Virtual Machine is actually executing and running two instances simultaneously, i.e. one member of the partner experiences hardware failure and the machine carries on executing on the other member.
Conclusions
In my clearly biased view (I am working for Citrix after all) I think that the HA mechanism that Citrix is using is arguably better than that of VMware's. Both mechanisms obviously require your storage to be up, but in the Citrix case, the heartbeat over the storage makes the HA mechanism more robust.
Disclaimer:
These are my personal thoughts and are in no way shape or form representative of any organisation. This is based on documentation and software in the Public Domain
Article Tags
Trackback this Article
http://www.dabcc.com/trackback.aspx?nCdType=1&nCdContent=10415
Latest Articles