Guest Articles: Redundant VPN Access to Virtual Machines
By Mike Scherbakov, Mirantis
One of the most popular platforms for building private and public clouds is an open source effort called OpenStack. The OpenStack initiative incorporates a number of projects (compute, storage, image service etc.). In this article, we’ll focus on OpenStack elastic compute, also known as Nova. One thing that sets OpenStack Nova aside from its competitors is its architecture, which has been designed to support massive scale deployments. In fact, Nova was originally built by NASA as a replacement for Eucalyptus because of its scale limitations.
Nova’s scalability is what ultimately makes it a popular choice for running not just private clouds, but public, Web clouds. Consequently, OpenStack has been gaining traction in the enterprise and also among hosting service providers. Companies like Internap, DreamHost, RackSpace and, recently HP, have all opted in to base their public clouds on OpenStack.
While Nova reference architecture is ideal for public clouds, the project itself is still fairly new and various hacks are frequently required to make it run in highly available mode, supporting hundreds of nodes. In this discussion, we wanted to focus on one fairly small, but extremely important hack, which has to be addressed by virtually anyone looking to deploy Nova at scale.
For service providers (and in some situations with private clouds), clouds must support various network topologies. In addition to being able to provision VMs with public IP addresses, one must provide an option to spin up private networks hosted in the cloud (also known as virtual private clouds). Such private networks can then be accessed via a VPN. While OpenStack supports this, some manual hacking is necessary to support private networks and VPN access in a highly available configuration.
Regardless of the network topology used in the deployment (Nova currently supports three different topologies), in order to access VMs in a hosted private network, a VPN server must be provisioned in the same VLAN with the rest of the machines in the private network. Nova uses the OpenVPN implementation of VPN server. VPN server instances spun up by Nova are referred to as Cloudpipe instances. A Cloudpipe instance is basically a VPN server that enables the client to access virtual machines inside the client’s hosted private network. If the Cloudpipe instance crashes or dies, the client completely loses access to the virtual private cloud and all applications that run on it. Current versions of OpenStack do not support highly available configurations for Cloudpipe, hence making it the most critical point of failure.
The approach we took to solve this challenge consists of two steps. The first step involved bringing up a monitoring service that monitors and, when necessary, brings up new Cloudpipe instances. The second step was to introduce redundant Cloudpipe instances and configure the OpenVPN client to access these multiple instances.
Let’s look at how we implemented the monitoring service and the time it took us to re-instantiate Cloudpipe instances.
To monitor VPN connectivity, the periodic tasks method in nova network service was used. We configured this method to run with the periodic interval that we set in the configuration file. To check VPN connectivity, vpn_ping method from utils.py was used. vpn_ping sends VPN negotiation packets to the server and returns True or False value, depending on the status of the connection attempt. It was found that checks should be performed no more often than every 10 seconds. In order not to shut down instances that hung for a second during the check, three attempts are performed, with a 10-second delay between the attempts. If VPN is not accessible for three attempts, then the Cloudpipe instance is terminated and a new one is bootstrapped. The time to bootstrap the Cloudpipe instance and get VPN working takes, on average, 2½ to 3 minutes.
To minimize access downtime associated with Cloudpipe instance crashes, we also introduced redundant Cloudpipe instances running on a separate physical server. When configuring OpenVPN on the client end, it is possible to define multiple connection endpoints (such as connection ports). At the same time, with some hacking of the Nova networking engine, it is possible to deploy multiple Cloudpipe instances. In this scenario, when one of the Cloudpipe instances fails, the client automatically switches to the remaining running instance. With the setup, only minimal access downtime is experienced by the client (to the tune of 10 seconds).
Mirantis plans to merge the results of this work to the OpenStack trunk in the near term. If you have questions about this approach, please feel free to ping Mike on twitter @mihgen.
About the Author: Mike Scherbakov, Principal Cloud Strategist, Mirantis
Mike is a lead strategist for the cloud innovation team at Mirantis, an engineering services company focused on open source application infrastructure. As team leader, he manages the company’s OpenStack group and plans research in open source technologies.
His development expertise comes from dozens of projects related to technology innovation, continuous delivery, cloud automation, private clouds, grid computing, high performance systems, gLite, Globus Toolkit, OpenStack, EC2, Ruby, Python, Cucumber, Hadoop, Cassandra, and more.
Mike previously engineered advanced cloud automation programs at Grid Dynamics, and high performance computing solutions at Saratov State University. He began his career as a systems administrator and has a degree in physics from Saratov State University.