What the Business Really Wants
One day in the not too distant future, your CFO will come to you and say, "We have selected our new travel and expense reimbursement system. It is currently running in a pilot system with 100 users. We would like to scale that to 5000 users. Can you do this in two weeks, and tell me what will it cost to provision and operate this system?" You reply "coolly" with four questions:
- What service level in terms of end user experience are you willing to pay for?
- Do you agree to pay for the quoted cost required to grow the infrastructure to support this new application and its workload?
- Do you agree with the timeline with which we propose to put this new application into production?
- Do you agree with the ongoing monthly charge that you will bear in order for us to support this new application in production?
In other words, what your CFO will really want you to do is to be able to quickly bring up a new application on your infrastructure, and to manage this application as a Business Service. This takes the existing concept of Business Service Management to a whole new level, since you need to respond with a level of agility that was never possible with physical infrastructure, and you also need to able to manage this application as a Business Service in your dynamic virtual infrastructure. In other words, you are being asked to pull off Dynamic Business Service Management when most IT organizations cannot even pull off BSM for applications deployed on static and physical infrastructure.
Why Existing "Static" BSM Is Hard
BSM is hard (ever wonder the BS in BSM really stands for?) because even on today's production physical infrastructure with the best tools that IBM, CA, HP, and BMC have to offer it is not really achievable in terms that IT and the business can agree upon. The reason for this is that IT and the business have radically different views of what is Business Service really is. In general, IT views services as combinations of hardware, networks and software that it can deliver to the business. By way of example the business might view an Order Entry service in the following manner:
- I would like for my internal and external users to be able to enter orders and have the items that are ordered always ship with 48 hours (this is clearly beyond the scope of just IT, since it brings supply chain management into the picture but this is illustrative of how a business person might look at things).
- I would like the order entry process to work so well that user do not experience any errors during the order entry process
- I would like the process work so well that users do not make mistakes in the order in which they do things within the process
- I would like each step in the process to take no more than ½ of one second, and the total process should not take more than 30 seconds excluding user idle time in between steps
So, BSM is hard because even on today's deployed production infrastructure even if IT and business could agree on the definition of the business services and how compliance would be measured, since there are basically only very few tools that are capable of measuring the items listed above. Measuring how well users are doing at their end of the business process requires detailed instrumentation of end user actions - going far beyond response time. It requires understanding in a holistic manner what the user is doing as they step through an application that is a component of a business service, and how the entire users environment (hardware, network, systems software, utilities, and applications) are interacting with the user. This is basically impossible to do today on the physical infrastructure without deploying agents on end user machines that have a deep understanding of each critical business application. Such agents are only available for a small portion of the total number of business critical applications in the world so we have a really long way to go on this front.
Once we move into the realm of "external users" who might be accessing these applications over the Internet, this kind of instrumentation basically becomes impossible. The reason is not a technical barrier, as the agent based systems that can do this kind of work do work for browser based applications; it is due to the fact that organizations basically cannot deploy agents onto the desktops of computers that belong to individuals and companies outside of their control.
Virtualized Servers, Desktop and Dynamic Business Services
So now let's launch ourselves into the future and how virtualizing servers, applications, desktops and storage make the concept of Dynamic Business Service both easier and more difficult to achieve (you did not think this was going to be easy did you?). Let's first go through the how virtualization makes Dynamic Business Services possible, and even a little easier:
- 1. The most obvious benefit of virtualization in this respect is the ease with which applications systems can get provisioned. Any application system that requires multiple instances of things like applications server or web servers can much more easily be scaled when you are just copying files in order to create a new server instead of building a server upon bare metal.
- 2. The ease with which compute environments are replicated also substantially impacts Dynamic Business Service Management once you do virtualized desktops. Granted you need to figure out how to dynamically provision the OS and stream in the applications (Citrix has this today, and VMware is getting close), but once you have this figured out you really have, for the first time, centralized provisioning and management of user desktops.
- 3. Once you virtualize all of the servers and all of the desktops for a set of users and applications, management of performance and capacity will be dramatically enhanced by the fact that you will have, for the first time, some common points of measurement for all servers, applications, desktops and users. Physical diversity is one of the things that made performance management very hard for distributed applications and users, and the centralization that comes with virtualization really simplifies things.
- 4. With the proper provisioning tools, many of the tasks that IT administrators have to perform today, can be offloaded to users themselves via self-service provisioning that is based upon agreed upon rules.
New challenges that virtualization creates when it comes to delivering Dynamic Business Services:
- Virtualization breaks the most common methods of measuring applications performance. The most common method is to use an agent running on the servers that measures the resources used by each application or process on the server. Due to the fact that a virtualized OS has no control of what portion of time is allocated to it by the hypervisor, all time based measurements taken within a virtualized guest OS are corrupted by an amount tied to how the hypervisor is scheduling work at that moment of time (which of course varies over time).
- The fact that guests move from one host to another breaks any notion of normal that is based upon what percentage of the resources an application uses at any moment in time.
- Moving guests around also creates dynamic load patterns upon back end resources like SAN's. This combined with virtualized storage creates a situation that requires a constant re-discovery of exactly which path users are taking through they layers of an application system back to the SAN.
- The fact that virtualization breaks the "resource based" method of applications performance management creates a situation where response time becomes the metric of choice to measure end user experience and transaction efficiency.
- However, many of the common methods of measure response time (those that rely upon agents in end user operating systems, or on web servers) get broken as well by the clock corruption issues discussed in #1 above.
- Since you are unlikely to end up with just one virtualization platform, and you are likely to end up with applications that span multiple platforms, you will need tools from vendors that address all of the platforms you are using. This will include the need to have tools that address commingling platforms from multiple virtualization vendors within one applications system.
New Information that you Need:
- The ability to measure service levels in a manner that the business agrees with (they are not going to care about availability or resource utilization) - which means you need the ability to measure response time for every application and every user. Furthermore you need this ability to be something that works within your virtual infrastructure.
- You do not just need the response time from the perspective of the end user of the application; you also need it between all of the layers of the application system. This in turn requires continuous discovery and rediscovery of where the pieces of the application are running as they get moved around.
- You need to know how the components of the application system are mapped into the back end infrastructure that supports each application. This means a relatively continuous discovery of how each application is mapped to each supporting database table or file, and how those tables and files are mapped to the back end LUN's and spindles in the SAN.
- Once you understand all of the mapping, you need to determine how much of the key resources under your control does this new application need per user? Specifically how much CPU, Memory, Network Bandwidth, End-to-End Network Response Time, Storage Capacity and I/O activity is generated per user and per application?
- Does adding this workload cause you to need to expand capacity for any of the key resources listed above? If so, how much does it cost to add a unit of CPU, Memory, Network Performance, Storage Capacity, or I/O Rate Capacity?
- As the committed Service Level is changed (from a response time of two seconds to one second), exactly where do you have to spend money in order to create the capacity required to make this happen?
- For each of the key resources listed above, what does the workload look like over the course of a day, a week, a month, and a business period? Do the peaks match up with other peaks (contributing to higher peak loads) or is the load uncorrelated with what you already have?
- Does this new application require any hardware or software infrastructure that you do not have? If so, what is it and what does it cost to acquire it?
Summary and Prediction
I think that virtualization will make Dynamic Business Services possible. I think that the payoff to the enterprise in terms of agility from Dynamic Business Services will be so high that IT organizations will be pressed to deliver that payoff even without proper management tools (just as IT organizations were pressed to deliver the hard dollar ROI's associated with virtualization/consolidation projects without having had proper tools to manage the resulting environment). However, the attempt to respond with the agility promised in Dynamic Business Service initiatives will create unintended (or warned about and ignored) consequences. These will includes unforeseen impacts upon existing systems when new ones are slammed into the existing production environment, and poor ability to make good choices about where to spend money on additional capacity.
As Dynamic Business Services become something that more than just a few leading edge organizations try, tools that meet the requirements for dynamic provisioning and performance/capacity management discussed above will become the products that in fact enable Dynamic Business Services. This will radically change the entire process by which these tools are evaluated and purchased as they will move from being an after-the-fact solution to a point of pain, to a critical enabler of a significant business benefit.
Ironically, it will not be the large systems management vendors that currently do most of the talking (and marketing) of Business Services that will lead the charge in the creation of these new tool sets. These large vendors are trapped in the emerging realization that their existing products based upon agents that run on servers are worse than useless in virtualized environments (they consume resources and add no value), and are also being deluded by the virtualization platform vendors into believing that VMware, Microsoft and Citrix themselves will address this opportunity in a meaningful way. Rather the solution is most like to "come out of left field" from a set of startups that are not really viewed as BSM vendors at all today.
Bernd Harzog
CEO
Application Performance Management Experts
bernd.harzog@apmexperts.com
http://www.apmexperts.com/
Article Tags