Computerworld

​AWS on board as University of Canterbury leads cloud use for big data analytics

The University of Canterbury is the first university in Australasia to develop training through unique access to cloud infrastructure, designed to solve big-data analysis problems for staff and students.

The University of Canterbury is the first university in Australasia to develop training through unique access to cloud infrastructure, designed to solve big-data analysis problems for staff and students.

UC Senior Lecturer from the School of Mathematics and Statistics Dr Raazesh Sainudiin secured grants from Databricks Academic Partners Program and Amazon Web Services Educate, which enable free and ongoing access for all UC faculty, staff and students to use cloud computing infrastructure for academic teaching and research.

“This provides UC with huge potential to emerge as a leader in big data analytics in this region of the globe,” Dr Sainudiin says.

“In today's digital world, data about every conceivable aspect of life is being collected and amassed at an unprecedented scale.

“To give you some idea of how much data we are talking about, IBM estimated that a whopping 2.5 exabytes (2,500,000,000,000,000,000 bytes) of data was generated every single day, and that was back in 2012.

“This massive data could potentially hold answers for many critical questions and problems facing our world today. But to be able to get at these important answers, the first step is to be able to explore and analyse this gargantuan volume of data in a meaningful way.

“Cloud computing allows you to instantly scale up access to over 10,000 off-site computers, as required by the scale of the real-world big data problem at hand, and complete the data analyses in the least amount of time needed - usually a matter of hours.

“What if all past and present recorded and real-time data of earthquakes on the planet could be analysed simultaneously? Or consider the live analysis of every tweet on Earth.

“There are on average 60 tweets per second. The scale of such volumes of data is such that they can't be stored, let alone analysed, by one computer or even a 100 computers in any sort of reasonable timeframe.”

Dr Sainudiin says UC has already established a research cluster with thousands of computer nodes running Apache Spark, a lightning-fast cluster computing engine for large-scale data processing.

As Dr Sainudiin explains, this locally set-up resource taps into the infrastructure provided by these grants and is being used by UC students in a new course STAT478: Special Topics in Scalable Data Science, including several students who are full-time employees in the local tech industry.

“We hope that such industry-academia collaborations will continue to be a dynamic training ground for future employees in our growing data industry,” adds Roger Jarquin, Chief Technical Officer, Wynyard Group.