Unisys NZ provides in-depth look into the Kiwi data centre market
- 22 May, 2015 02:43
Vendors announcing new data centres in New Zealand usually make a big thing of stressing their Tier 3 rating. But is it important?
Unisys country manager Steve Griffin maintains that design is only a small part of today's overall reliability equation.
In today’s world of 'location-less' data, businesses need a different approach to selecting data centre solutions that takes into account more than just the 'four walls'.
Data centre tiers were historically viewed as industry benchmarks related to facility design and theoretical reliability - it was a consistent measure to help define the capability of a data centre building.
Unfortunately, conceptual design outcomes do not always mirror reality, and in many cases, a better approach is to look at the history of what the data centre has delivered - as they say, history predicts.
This is because the way a site is actually managed, maintained, and operated has a greater impact on the availability and reliability metrics that are achieved from the site, more than what the paper-based theoretical design parameters suggest.
There are a number of notable challenges associated with following or mandating the theoretical design approach, two of which are:
1) The market often ends up with excess capacity and capability as a number of vendors build capacity to meet potential market demand.
In the New Zealand context, this is relatively small. The side effect of this is the cost of over-supply is ultimately passed onto the customer through inflated pricing, which hinders the expected uptake of capacity.
A subsequent premium price point is maintained, and the cycle continues.
2) The maturity of operation and management of the sites, as well as early-life failures of technology and process can and do occur, which create a service delivery risk to customers.
This in turn drives a slower uptake of the available floor-space as the perception of reliability is tarnished - customers typically do not want to be early adopters of technology / capability to deliver and support mission critical systems.
It is understandable why the focus on the data centre buildings and associated plant has historically been relevant, particularly at a time when network bandwidth was expensive, slow, and not fit for the purpose.
Infrastructure technology was very expensive and lacked the capability to deliver highly available systems within a data centre, let alone across diverse ones.
These features of the time more or less mandated a need to build data centres with exceptionally high availability metrics.
The problem is these thought processes still permeate specifications and procurement decisions today when for the most part, they are no longer relevant.
If you consider the delivery of services from data centres today, the paradigm has shifted considerably for two fundamental reasons:
1 - Both data centre providers and the Uptime Institute (UTI) have realised that delivering data centre availability is more about the management and operation of the facility:
a. having robust [preventative] maintenance procedures, including the plant lifecycle
b. maintaining good engineering and implementation practices
c. good change, problem, capacity management systems and processes in place
d. providing a healthy work environment for the staff.
Running data centres with a focus on these areas allows the delivery of outcomes not dissimilar in terms of availability and reliability to those possible from certified Tier 3 sites (noting of course that UTI have removed the expected uptime percentage of a given tier classification).
2 - Exploitation of Moore's law:
a. reliable low latency, high bandwidth networks are ubiquitous
b. storage and database technologies are easily capable of delivering reliable data replication
c. network technologies to load balance inside and outside of the data centres are standard features in network designs
d. virtualisation provides levels of transportability through abstraction of compute, storage and network resources.
These capabilities all combine to make virtual data centres a reality which is being delivered across geographically diverse locations.
Customers are starting to exploit such capabilities to create systems which are nominally agnostic to the location of the data i.e. they are “location-less.”
By way of example, the traditional perspective of production being in site A, and DR/ITSCM being delivered from site B is being broken down - today, customers are running systems which are able to 'flip-flop' without interruption, or change in operational processes, or design.
Because of this the industry now considers a more pragmatic approach to data centre service reliability and availability: a data centre design which is fit-for-purpose, coupled with well-designed technology, and wrapped with operations and management processes to deliver the outcomes for the customer.
If you put aside what some may consider to be a religious war and instead look at where to spend your investment money for delivering services to your customers, consider this…
According to Gartner, unplanned outages only account for approximately 5% of all system outages – the generally accepted leading cause of which, depending on whose research you read, is human error, and ranges between 60% and 80%.
This is followed by hardware failures, software failure, security error and, finally, environmental causes in low single digits; yet customers have insisted on data centre providers investing significant sums to mitigate low single digit percentages of outages.
Wouldn’t a better investment of services spend be on building smarter systems to minimise both planned and unplanned outages, including designing resilient infrastructure which leverage 'location-less' concepts?
Given that one of the most common causes of data centre related outages is human error, which is something that cannot easily be accounted for in a data centre design, isn't the focus on a data centre tier rating misplaced?
There is no question that data centre design plays a role in the availability and reliability discussion, but in reality it is a small part of the overall reliability equation.
Operations, process, maintenance, and quality and risk management strategies are by far the bigger part. This would seem to be borne out by UTI themselves introducing the Management and Operations stamp of approval in 2010.
The 'location-less' model is prevalent in 'cloud' systems, including Google, eBay, or Netflix, which are built upon systems that are agnostic of location, and delivered from data centres which are not necessarily high up in the tier classifications; rather they use the technology to deliver the required availability instead of relying of the ‘container’ they reside in – Amazon refer to this approach as Availability Groups.
Today in New Zealand, many organisations are demanding services from data centres based on a UTI tier rating.
This is hindered by a lack of facilities formally being certified against the UTI Tier Certification of Design Documents (TCDD), or Tier Certification of Constructed Facility (TCCF) at any tier level, despite some claims to the contrary - it is further confused by claims of certification to non-existent UTI standards, including Tier 3+.
Unless planning on solely purchasing square metres, shouldn’t customers consider buying a service availability outcome for the service they are consuming, rather than being specific about how that outcome is delivered from within the data centre?
At the end of the day what is more important in both a 'cloud' and heritage system is implementing infrastructure to exploit the technologies available.
Through careful infrastructure design, mainframe systems are running in New Zealand today delivering on service availability and reliability of non-functional requirements indistinguishable from 'cloud'-based systems.
In summary, the age old adage of 'judging a book by its cover' is equally applicable to data centre services - the discussion really should not be about the 'four walls' which services are delivered from; it is more about the design and execution of what is on the inside - the people, the processes, and the technology.
These three elements contribute more every day to the delivery of reliable and highly available services than the four walls ever will.