Microsoft's datacentre plan questioned

Container-based project queried by experts, MS responds

Microsoft's plan to fill its mammoth Chicago datacentre with servers housed in shipping containers (reported on in Computerworld, April 28) has experts wondering whether the strategy will succeed. In Microsoft's plan, each container in the datacentre, still being built, will be filled with several thousand servers.

Computerworld US queried several outside experts — including the president of a datacentre construction firm, a data center engineer-turned-CIO, an operations executive for a datacentre operator and a "green" datacentre consultant — to get their assessments of the strategy. While they were individually impressed with some parts of Microsoft's plan, they also expressed scepticism that the idea will work in the long term.

Here are some of their objections, along with the responses of Mike Manos, Microsoft's senior director of datacentre services.

1. Russian-doll-like nesting (servers, on racks, inside shipping containers) may work out to less Lego-style modularity, as some proponents claim, and more mere ... moreness.

Server-filled containers are "nothing more than a bucket of power with a certain amount of CPU capacity," quipped Manos.

His point is that setting up several thousand servers inside a container in some off-site factory setting will make them nearly plug-and-play once the container arrives at the datacentre. By shifting the set-up to the server vendor or system integrator and then wrapping it up inside a 40-foot metal box, containers become far easier and faster to deploy than individual server racks, which have to be moved one at a time.

But people like Peter Baker, vice president for information systems and IT at Emcor Facilities Services, argue that in other ways, containers still "add complexity."

"This is simply building infrastructure on top of infrastructure," he says.

One example, says Baker — who worked for many years as an electrical engineer, building power systems for datacentres before shifting over to IT management — is in the area of power management. Each container, he says, will need to come with some sort of UPS (uninterruptible power supply) that does three things: 1) converts the incoming high-voltage into lower usable DC voltages; 2) cleans up the power to prevent it from spiking and damaging the servers; 3) provides backup power in case of an outage.

The problem is that each UPS, in the process of "conditioning" the power, also creates

"harmonics" that bounce back up the supply line and can "crap up power for everyone else," Baker says.

Harmonics is a well-known issue that's been managed in other contexts, so Baker isn't saying the problem is unsolvable. But, he argues, the extra infrastructure needed to alleviate the harmonics generated by 220 UPSs — the number of containers Microsoft thinks it can fit inside the Chicago datacentre — could easily negate the potential ROI from using containers.

Manos' rebuttal: "The harmonics challenges have long been solved [by Microsoft's] very smart electrical and mechanical folks". However, he declined to go into specifics. Manos added that he also "challenged the assumption" that Microsoft's solutions are bulky and non-cost-effective: "You can be certain that we have explored ROI and costs on this size of investment." He also admonished critics' speculation that relies too heavily on the "traditional way of thinking about datacentres", without going into detail.

2. Containers are not as plug-and-play as they seem.

Servers normally get shipped from factory to customer in big cardboard boxes, protected by copious Styrofoam. Setting them up on vibration-prone racks before they travel cross-country by truck is a recipe for broken servers, argues Mark Svenkeson, president of Hypertect, a builder of datacentres. At the very least, "verifying the functionality of these systems when they arrive is going to be a huge issue", he says.

But damaged servers haven't been a problem, claims Manos, since Microsoft began deploying containers at its datacentres a year ago.

"Out of tens of deployments, the most servers we've had come dead-on-arrival is two," he says. He also downplays the labour of testing and verifying each server. "We can know pretty quick if the boxes are up and running with a minimum of people," he says.

He also says that Microsoft plans to make its suppliers liable for any transit-related damage.

So let's say Microsoft really has solved this issue of transporting server-filled containers. But part of what makes the containers so plug-and-play is that they will, more or less, sport a single plug from the container to the "wall" for power, cooling, networking and so forth.

But, Svenkeson points out, that also means that an accident such as a kicked cord or severed cable would result in the failure of several thousand servers, not several dozen.

"If you're plugging all of the communications and power into a container at one point, then you've just identified two single points of failure in the system," Svenkeson says.

While Manos concedes the general point, he also argues that a lot "depends on how you architect the infrastructure inside the container".

Outside the container, Microsoft is locating services worldwide — similar to Google's infrastructure — in order to make them redundant in case of failure. In other words, users accessing a hosted Microsoft application, including Hotmail, Dynamics CRM or Windows Live, may connect to any of the company's datacentres worldwide.

That means that "even if I lose a whole datacentre, I've still got nine others," Manos says. "So I'll just be at 90% serving capacity, not down hard."

Microsoft is so confident its plan will work that it's installing diesel generators in Chicago to provide enough electricity to back up only some, not all, of its servers.

Few datacentres dare to make that choice, says Jeff Biggs, senior vice president of operations and engineering for data center operator Peak 10.

"That works out to be about 17 seconds a day," says Biggs, who oversees 12 datacenters. "The problem is that you don't get to pick those 17 seconds."

3. Containers leave you less, not more, agile.

Once containers are up and running, Microsoft's system administrators may never go inside them again, even to do a simple hardware fix. Microsoft's research shows that 20%-50% of system outages are caused by human error. So rather than attempt to fix malfunctioning servers, it's better to let them die off.

To keep sysadmins from being tempted to tinker with dying servers, Microsoft plans to keep its Chicago IT staff to a total of 35. With multiple shifts, that works out to fewer than 10 techies on-site at any given time. That's despite the 440,000 or more servers Microsoft envisions scattering across the equivalent of 12 acres of floor space.

But where Manos sees lean and mean, others envision potential disaster.

"It seems pretty thin to me," says Svenkeson, who has been building datacentres for 20 years. "These are complex systems to operate. To watch them remotely and do a good job of it is not cheap."

As more and more servers go bad inside the container, Microsoft plans to simply ship the entire container back to the supplier for a replacement.

It becomes a problem, then, of defining the tipping point. As more servers die, the opportunity cost of not replacing the container grows bigger and bigger.

"Say 25% of the servers have failed inside a container after a year. You may say you don't need that compute capacity — fine," says Dave Ohara, a datacentre consultant and blogger. "But what's potentially expensive is that 25% of the power committed to that container is doing nothing. Ideally, you want to use that power for something else.

"Electrical power is my scarce resource, not processing power," Ohara says.

Biggs agrees.

"Intel is trying to get more and more power-efficient with their chips," he says. "And we'll be switching to solid-state drives for servers in a couple of years. That's going to change the power paradigm altogether."

But replacing a container after a year or two when a fraction of the servers are actually broken "doesn't seem to be a real green approach, when diesel costs US$3.70 a gallon," Svenkeson says.

Manos acknowledges that power is somewhat "hard-wired" within the datacentre, making it difficult to redistribute. But he assertsd that if a datacentre is "architected smartly on the backside, you can get around on those challenges, by optimising your power components and your overall design." He declined to elaborate.

If containers need to be swapped out before expectation, that cost will be borne by the container vendor, not Microsoft, says Manos.

But he hinted that Microsoft is willing to tolerate a fairly large opportunity cost — that is, hold onto containers even if a large percentage of the servers have failed and are taking up valuable power and real estate as a result. "I don't know too many people who are depreciating server gear over 18 months. Rather, I see pressure to move out to a five-to-six-year cycle," he says.

4. Containers are a temporary, not long-term, solution.

To meet its late summer opening date for the Chicago datacentre, Microsoft has already opened the containers up for bid. Manos declined to comment on which vendors are in the running, but he confirmed that Microsoft hopes to award contracts to multiple vendors.

Microsoft is in the midst of its huge datacenter expansion in order to accommodate its growing Windows Live and Office Live online services. As a result, containers provide an "excellent opportunity to increase the scale unit, from server, to rack, to server to mini data center," Manos says.

But what happens when expansion inevitably slows?

"I think this is a very short-lived, ephemeral model that may work right now," says Biggs, who adds that most datacentre operators, such as Peak10, have no interest in containers because the scale is simply too large for them and their customers.

"The only thing interesting to me about containers is the predictability of how much power you need and how much heat you'll produce," he says. "Otherwise, they're kind of a novelty."

That's why some observers, such as Ohara, say the market is actually in smaller units. A former supply chain engineer for both Hewlett-Packard and Apple, Ohara has been developing his own prototypes for a "server cube" that would weigh about 1,000 pounds and measure 1 meter in each dimension — hence the name of his blog, GreenM3.

"It's taking what's in a server rack but putting it into a cube to make it more efficient to roll out," he says. "That potentially could apply to many more people."

Manos agrees that containers aren't everything for datacentres, including Microsoft's. He points out that the second floor of the Chicago datacentre will still be fully comprised of conventional free-standing server racks.

"For us, it is about right-sizing the scale with the 'needs and speeds' of deployments," he says. "As it stands today, containers deliver on this goal."

"If trends continue as anticipated, containers will continue to be an important piece to the puzzle, but not the only piece," Manos says. But he also acknowledges "The only true constant in technology is that technology will change. Whether that means the server compute form factor changes, I can only guess."

5. Containers don't make a datacentre greener.

Microsoft has not-so-subtly tried to portray its new datacentres as being exemplars of green computing. In San Antonio, the site of an upcoming datacentre, construction workers built around an old live oak tree on the site, even putting up concrete barriers to help protect, it according to the local newspaper. It also plans to use recycled gray water in the datacentre and install the most efficient hardware, power and cooling systems.

Apart from preserving old-growth oak trees, Microsoft is doing many of the same things at its Chicago datacentre. Another thing about locating there is that it is considered the most energy-efficient US city in which to locate a data center.

Indeed, Microsoft said late last year that being in Chicago will enable it to use "all sorts of cold-air cooling options in the winter months," a process known as airside economisation.

An airside economiser, says Svenkeson, is a fancy term for "cutting a hole in the wall and putting in a big fan to suck in the cold air". Ninety percent more efficient than air conditioning, airside economisers sound like a miracle of Mother Nature, right?

Except that they aren't. For one, they don't work — or work well, anyway — during the winter, when air temperature is below freezing. Letting that cold, dry air simply blow in would immediately lead to a huge buildup of static electricity, which is lethal to servers, Svenkeson says.

To keep the humidity at the 30% minimum of most datacentres, water would need to be added to the air as it blows in. But that requires exorbitant amounts of energy and can create a huge condensation problem if done wrongly.

"You'll quickly have an ice-side economiser," Svenkeson says.

Airside economisers actually work better in warmer climates, where temperatures drop quickly (but not below zero) at night, Svenkeson says. Or they can work in office environments, where maintaining a minimum humidity is easier because of the workers inside and also less vital.

A less risky solution is using an air conditioning system that can be transformed during the winter. This process essentially involves exposing coolant-bearing pipes to the hot air inside the datacentre. The coolant absorbs the heat and expands, rushing through the pipes to the outside of the building. There, it cools, shrinks and flows back inside, where it repeats the process.

While closed-loop systems are "wickedly efficient", according to Biggs, they still take a lot of energy to work. "There's no free lunch. The laws of physics haven't been repealed."

Even with cutting-edge cooling systems, it still takes a watt of electricity to a cool a server for every watt spent to power it, estimates Svenkeson.

"It's quite astonishing the amount of energy you need," Svenkeson says.

Or as Emcor's Baker put it, "With every 19-inch rack, you're running something like 40,000 watts. How hot is that? Go and turn your oven on."

Manos acknowledges that Microsoft's initial plan to use only air-side economisers, especially during the winter, was overly optimistic. As a result, the Chicago datacentre will use both air and liquid cooling. "We're optimising for both extremes," he says.

Manos wouldn't go into details, except to say "an entire organisation of research and engineering people" is working on cooling and power issues. "I'm not sure if we're doing anything more revolutionary in this space, but a lot of the problems have been solved."

And he emphasises that with the cost of power making up the vast majority of the ongoing cost of its datacentre operations, Microsoft has every incentive to make sure its datacentres are as energy-efficient as possible.

But with Microsoft building three electrical substations on-site generating a total of 198 megawatts, or enough to power almost 200,000 homes, green becomes a relative term, others say.

"People talk about making datacentres green. There's nothing green about them. They drink electricity and belch heat," Biggs says. "Doing this in pods is not going to turn this into a miracle."

6. Containers are a programmer's approach to a mechanical engineer's problem.

Some say that there are good reasons why geeks have given Microsoft a free pass so far on its containers plan. First, they seem to offer a long-overdue paradigm shift in power and cooling problems that, by comparison, seem to routinely occur in software and other areas of IT, but that haven't yet really happened for power and cooling.

"I think IT guys look at how much faster we can move data and think this can also happen in the real world of electromechanics," Baker says.

Another is that techies, unfamiliar with and perhaps even a little afraid of electricity and cooling issues, want something that will make those factors easier to control, or if possible a non-problem. Containers seem to offer that.

"These guys understand computing, of course, as well as communications," Svenkeson says. "But they just don't seem to be able to maintain a staff that is competent in electrical and mechanical infrastructure. They don't know how that stuff works."

Svenkeson tells the story of the datacentre manager whose UPS systems kept overloading, even though he had each of them set at only 80% load. It Turns out that the pair of UPSs was running 160% of the maximum load through his servers, which is why they kept failing.

Attempting to eliminate these variables through plug-and-play containers "is a fairly natural response," Svenkeson says, though he believes it's the wrong one. He argues that containers will ultimately be seen as a "fast-food approach".

"It might be a viable market, but only for a limited time," he says. "As soon as the first containers arrive with a bunch of broken processors inside, that will be the end of it."

Manos is unfazed. Much of the criticism, he implied, is knee-jerk.

"Datacentres are very conservative," he says. "You go into one built a year ago or one built 10 years ago and they'll look very similar."

Microsoft had been testing containers for almost a year before it started talking about them publicly, Manos says. What Microsoft has revealed so far is just the tip of the iceberg. When critics learn more, he says, they'll be convinced.

"Half of the people say this is the greatest thing they'd ever heard. The other half say this will never work inside a datacentre," Manos says. "But the fact of the matter is that this does work."

Join the newsletter!

Or

Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.

Tags managementMicrosoftcontainerschicago datacentre

Show Comments
[]