During a recent inquiry, a client asked Elias Khnaser, research director, Gartner how they could purchase “guaranteed capacity” at AWS in the event of a disaster.
“Frankly, I had never even considered such a scenario,” Khnaser admits.
“After asking the client for clarification, I discovered that they were concerned about AWS’ ability to guarantee capacity when/if a large number of organisations tried to simultaneously provision or power-on instances.
“This is assuming, of course, that the disaster affected a large geographic area and, consequently, a large number of organisations.
“That immediately reminded me of the old way of doing Disaster Recovery, where organisations would pay to have physical servers reserved and, in the event of a disaster, would be guaranteed access to those servers to rebuild their environment.”
So Khnaser began to answer the question…
“Well, with AWS, you could purchase Reserved Instances which would guarantee that your instances would power on, but there are design best practices that are intended to avoid those types of situations.
For starters, you would need to deploy your instances into two Availability Zones within the same Region in order to adhere to AWS’ compute SLA.”
Now, of course the client is thinking about DR, and as such, may not be willing to deploy instances to two Availability Zones.
Nonetheless, Khnaser maintains that is the correct deployment method to avoid such an outage. But the client then asked, ”What if the disaster affected the entire Region?”
“I then explained that AWS Regions have at least two AZs, and some have more,” Khnaser recalls.
“Furthermore, the likelihood of a Region running out of capacity is extremely low.
“And of course, you can always architect your environment to work across multiple Regions; you sacrifice synchronous replication capabilities and some other things, but it is doable.”
Khnaser then took the client through the importance of a Business Impact Analysis (BIA), which would not only consider the types of disasters the business needs to protect against, but would also identify the RTOs and RPOs needed to design an effective DR plan.
Khnaser also explained to the client that at some level, they must consider the social aspects of a disaster, not just the technical aspects.
“If the disaster is that big, the last thing on anyone’s mind is how to bring back services,” he adds, “you are in survival mode, at that point.
“If you are trying to protect against more than a hurricane, a tornado, an earthquake of a certain magnitude, or even against terrorist attacks of a specific caliber, you face many more challenges than whether or not your instances will power on.”