Is ‘VM Stall’ the Next Big Virtualization Challenge?

Stop Sign

Is ‘VM Stall’ A Stop Sign for Virtualization?

There appears to be a challenger to ‘VM sprawl’ as the scourge of virtualization success – a problem I call ‘VM stall’.

We know about VM sprawl – because new virtual machines are so easy to deploy, organizations can end up with more VMs that they can handle, or even use. This has the potential to cause severe problems to availability, performance, compliance, costs, security, and more.

However, I am seeing more and more evidence of this new phenomenon I think of as VM stall – the tendency for virtualization deployments to stall once the low-hanging fruit has been converted (typically around 20-30% of servers).

I think it happens more or less like this…

In general, organizations start virtualization deployments by converting relatively low-risk, low-impact systems – dev/test servers, Web servers, file servers, internal applications, etc. – to virtualization. With a big impact, great results, and reasonably fast and easy implementation, it is a great hit with IT and business owners. This may even spawn a’virtual first’ initiative, where all new server requests are deployed as virtual servers by default.

However, when faced with the next step, converting the remaining existing servers – including tier 1 business services, customer-facing environments, enterprise-wide systems, 3rd-party applications, multi-platform services, and composite applications – virtualization projects often stall.

I was interested to see the notion of VM stall confirmed again last week (courtesy of eWeek via @JSchroed) in some new research into virtualization (PDF) coming out of Prism Microsystems, a software vendor in the SIEM market.*

One of the most interesting outcomes in this research was again the low penetration of server virtualization within each organization. As the chart below shows, most organizations have still virtualized less than a third of their production servers.

Percentage of VM Deployments

Source: Prism Microsystems, ‘2010 State of Virtualization Security Survey’, April 2010

What’s more, fully 15% have not even started to virtualize their production servers at all!

It might seem that this is really at odds with the common wisdom that sees virtualization as mature, ubiquitous, commoditized, and even passe. We hear so much about virtualization, how it has been a top priority for years, about how everyone is deploying virtualization. For example:

  • The IBM Global CIO Study 2009 in September showed 76% of 2500 global CIOs are undergoing or planning virtualization projects
  • The Gartner 2010 CIO Survey in January reported that virtualization is the top priority for over 1500 global CIOs (up from number 3 the previous year).
  • In January, CDW’s Server Virtualization Life Cycle Report (registration required) found that 90% of respondents have implemented server virtualization at some level.
  • As far back as 2008, EMA research showed 75% of enterprises were using virtualization for production use cases
  • The Prism Microsystems report the chart above comes from states that 85% of their sample have adopted virtualization to some degree

I am even starting to hear that virtualization is set to be irrelevant, becoming nothing more than just a stepping stone to cloud.

“Despite the widespread adoption of virtualization, it is still very low as a percentage of servers”

However, despite the widespread adoption of virtualization as a percentage of organizations, it is consistently still very low as a percentage of production servers.

Indeed, this is not the only recent (and not so recent) research study to highlight this issue. Over time, CIOs have reported a persistent difficulty in expanding their virtualization deployments beyond the initial 20-30% of servers. For example:

  • Around 6 months ago, Gartner reported that “only 16 percent of workloads are running in virtual machines today.”
  • Research from EMA has found that the average organization has only virtualized around 25% of servers (and only retired just 17%).
  • The CDW Server Virtualization Life Cycle Report cited above showed that just 34% of the average organization’s total server infrastructure consists of virtualized servers
  • CIO and HP survey in October 2009 reported that on average just 38% of mission-critical business services have been virtualized by companies with virtualization projects
  • Forrester Research from May this year (conducted for CA) shows that the average enterprise has virtualized only around 30% of their servers.

At a time when so many organizations are experiencing VM sprawl, it seems hard to believe that VM stall is such an issue. Yet time and again we see that organizations find it difficult to get over the hump of the initial 20-30% of servers, and difficult to move from low-risk/low-impact servers to high-risk/high-impact services.

“VM stall appears to be holding many deployments at around 20-30% of servers”

If this were just a point-in-time observation, then VM stall might not exist. The low penetration rate may just be a point in the deployment cycle. However, VM stall appears to be a longitudinal effect, as it has been holding many deployments at around 20-30% of servers for several years. IIRC, something resembling VM stall was cited as an issue in EMA research as far back as 2008, and again in 2009. The CDW virtualization lifecycle research also reinforces the potential for long-term VM stall. In it, even organizations that self-report as “fully deployed” for server virtualization have only virtualized 37% of their servers. So while many organizations see VM stall as a short-term delay to virtualization rollout, many others are seeing VM stall as a permanent situation.

I see many possible causes for VM stall. For example:

  • Risk aversion – high-risk, high-impact services have more stakeholders, more politics, larger and more distributed infrastructures, greater cost of failure and downtime, reduced or non-existent 3rd-party support, and maximum management attention, among many other risk factors. The risk of failure may be too great, and the newest technology is always blamed for any new problems. Without new ways to address continuity, availability, performance, cost allocation, and other business requirements, conversion risk may be enough to stall virtualization deployment.
  • Resourcing – with around 20-30% of servers converted, virtualization staffing starts to become a real challenge. As I talked about recently with my great mate, David Marshall, staff and skills shortages put a real throttle on virtualization deployments, especially as virtualization starts to scale. Not only is demand for virtualization skills still high, but supply continues to lag. Plus, the problem is getting worse, not better. Without the resources and skills to go forward, there is often little alternative to VM stall.
  • Scalability – with one (typically small) team trying to manage a quarter of the entire server workload, staff from the virtualization project team simply cannot handle further virtualization deployment. In some cases, the virtualization technology itself does not scale well either; and in others, the management tools do not scale. Throwing more bodies at the problem is rarely the answer – after all, nine women cannot make a baby in one month. So organizations end up with VM stall almost by default, as they find that they need to fundamentally change their processes and technologies to enable further virtualization growth.
  • Manageability – new IT management issues come up as the scale and risk of virtualization deployment increases. Enterprise virtualization needs new approaches to performance assurance, process automation, VM mobility, continuity planning, security and audit, software compliance, OEM support, configuration compliance, and more. The importance of manageability is greatly magnified for high-risk/high-impact services, but few (if any) organizations seem to have the virtualization-aware management tools to scale to handle enterprise-class virtualization deployments. Again, VM stall happens almost by default, as IT tries to figure out enterprise-class manageability.
“There is little doubt in my mind that VM stall exists, and it is a significant problem”

There may be more or different causes, but whatever the reasons, there is little doubt in my mind that VM stall exists. It is not universal – indeed, every study shows that a decent percentage of organizations are able to power through it – but for the majority of organizations, it appears to be very real. I have personally seen many enterprises going through it. More and more research continues to support it. For affected organizations, it is a significant problem, too, because stalled virtualization deployment means the highly desirable outcomes of virtualization – OpEx reduction, improved continuity, greater IT and business agility, energy cost reduction, ROI, etc. – either stalls as well, or even starts to backslide.

Whether VM stall represents as big a problem as VM sprawl, time will tell; but it is certainly a significant and growing challenge to the success of virtualization – and a fundamental driver for better virtualization management.

(EDIT: This article has been picked up and published on CIO.com! Join in the discussion there, or here.)