Inside Xero’s mammoth cloud migration
- 03 February, 2017 07:00
When Xero announced in November that it had completed its shift to a public cloud platform, the accounting software-as-a-service provider claimed that it was one of the largest migrations of its kind to be attempted in Australia and New Zealand.
All up the process involved moving some 1.4 petabytes of data, 59 billion records, 3000 apps and 120 databases onto Amazon’s public cloud platform.
In total, some 862,000 Xero subscribers were shifted from infrastructure managed by Rackspace to AWS.
But according to Xero’s GM of platform architecture and delivery, Mark Rees, that hasn’t been the end of it: Now that the Xero is all-in with cloud, the company is pursuing new opportunities to innovate quicker and cut costs further.
Xero was hosted with Rackspace for the first seven or so years of the software vendor’s life, Rees said. “Around two-and-a-half years ago we started looking at how that platform was performing and what we wanted to do in the future,” he said.
“Although [Rackspace] were providing a really high quality of service, we started to observe that the processes behind a managed service — where you have to create a ticket to commission a piece of work — was creating early indications of slowing down our ability to ship software,” Rees said.
At the same time, the software vendor was keenly aware that public cloud offerings from the likes of Amazon, Google and Microsoft were maturing rapidly.
“We thought those platforms were a real opportunity for accelerate the way we deploy and host software,” Rees said. Xero decided that Amazon Web Services (AWS) was the right fit for the direction the company wanted to go in.
“The project started, I think in earnest, about two years ago,” Rees said. The first year involved Xero making changes to its application to allow it to run on a cloud platform and to build out the infrastructure for the migration. The last nine months or so of the process saw the migration itself.
“Xero’s got to the size where you can’t just shut everything down on a Friday, move it all across and then be ready for Monday morning,” Rees said. “It’s not feasible to do a big bang migration any more.”
Instead, Xero designed a mechanism to move batches of customers between the two platforms over consecutive weekends. “I think we did approximately two-and-a-half months of weekends where we moved somewhere around 50,000 live customers a weekend,” Rees said.
“Their experience would be for about an hour, possibly two, their organisation would be offline. Then they would come back online and they would be operating as per normal [but] on AWS. It’s relatively low impact for a project like this.”
There were three underlying drivers for the shift to cloud, Rees said.
The first was agility, he said. Previously it would take Xero around six weeks to commission a new database server — “today we can completely automatically stand that up in just under 30 minutes or so — so quite a bit faster.”
A second was cost, with Xero’s dynamic load profile making it well suited to cloud infrastructure. “The busiest time of day at Xero is Tuesday about 2pm,” Rees explained.
“There’s a predictable pattern about when users are using the site: It goes up and down quite considerably. In the past we had to have all the infrastructure running all the time to handle the peak time, with about 30 per cent headroom.
“With AWS, and cloud platforms in general, we can match our capacity to the demand — so scale up ahead of the times we have peak, and save quite a lot of money because of that.”
The final driver, and what Rees said was the key factor in the decision, were the opportunities to innovate that the move opened up for Xero.
“There’s so much investment being made by these providers into commoditising things like machine learning. The combination of our data, these tools and the really low cost compute opens the door to a whole bunch of really exciting product opportunities.”
Although from a user’s perspective there was no noticeable difference before and after the move to AWS, behind the scenes Xero had to implement significant architectural changes.
On cloud platforms, individual servers fail more frequently than is the case with managed service offerings, Rees said. “Each server on its own is less available but collectively the platform is more available,” he said.
“To make your application work well in that world, there are some changes you need to have to make to make it more resilient to, say, a server disappearing. That would very rarely happen in a managed services environment.”
The other key change was the scaling model used by Xero. “We didn't have to do that but it was a good time in our growth to make that change,” he explained
“We’ve basically moved to something we call the cell architecture,” Rees explained. “The best way to think about it is, we’ve changed the problem from having to build a platform to serve, say, a million organisations on one platform, to building a platform that serves 100,000 organisations and doing that 10 times over.
“It just so happens that breaking up into independent cells of 100,000 users is much easier than building a platform for a million users. We have fully partitioned the app into these discrete cells of capacity, which is a model that is used by the likes of Salesforce and Evernote.”
Rees said that the opportunities that cloud presents to organisations are real, but that businesses need to be aware of the degree of change involved with a wholesale migration.
“I think overall these projects are harder to do than you think at the start,” Rees said.
“There’s a lot of talk about the public cloud and opportunity. Mostly we’re finding that’s real, but the degree of the change is quite significant — in a sense it’s because the application becomes quite closely connected with the infrastructure it runs on.
“It's like, if you build a house on a steep block, the shape of the house and the foundations all become quite dependent on the shape of the land. It’s like that with an application... There’s a lot of assumptions that you have to revisit and re-engineer.”
Beyond the changes to the software itself, Rees said that the move to cloud is having a significant organisational impact on Xero.
“We have to rethink a whole bunch of things,” he said. “You have developers using an API to provision capacity in the public cloud. They’re effectively spending money on parts of Xero, by setting up a new server or something like that.”
“Before that was controlled centrally through our platform services team,” Rees said. “That model, where the decision-making of stuff is spread out more, it’s good in terms of innovation — but you have to implement a lot of monitoring and control to ensure that you continue to manage costs well.”
With the migration behind it, Xero is now looking at further opportunities to optimise its infrastructure and take advantage of cloud, Rees said.
“I think there's a lot more to come,” Rees said. “We're finding now when teams are building applications, they're building them in quite a different way to when they did before,” he explained.
“They're using the tools that Amazon provides to accelerate things,” he said, offering a number of examples including Amazon’s Aurora database, the cloud service’s queuing infrastructure, and Amazon Lambda.
“As time goes by, the way we build applications will change quite considerably,” Rees said. “We still have a core of our application running on Windows and SQL Server with .NET, but I think the new functionality we build will be quite different. It will take advantage of the new services quite a lot.”
There’s also “huge opportunity” to boost efficiency of the platform, he added; for example by using Spot instances and fine-tuning Xero’s scaling model.