INSIGHT: Big Data is pregnant with analytics

"We are at the interesting point: big data time is over. It is now big data analytics time."

Credit: dreamstime

Comments

We are at the interesting point: big data time is over. It is now big data analytics time.

Many organisations are at the point when they have figured out how to get data in Hadoop (or other big data stores), but not - how to get the data out and derive value from it.

These companies are becoming increasingly nervous under the pressure of rapidly growing amounts of unprocessed data (a.k.a. WRITE ONLY data that nobody will ever read).

The convergence of cloud, mobility and social computing that started several years ago culminated now in the first truly widespread big data analytics use case - the Internet of Things, IoT.

Companies in very different industries - insurance, oil and gas, healthcare, transportation and agriculture among many others - are deploying sensors and collecting data, generated by them.

The rise of data lakes reflects the nature of the current point in time. Data lakes signify uncertainty, when organisations want to store more and more data generated with enormous speed, hoping to make sense of this data someday.

Companies need a conventional name for a data storage that allows them to keep their options open for future analysis - this is a data lake.

The more data is in the lake, the harder it is to separate the signal from the noise. The signal is there, among a myriad of other signals, go fish.

Analytics is the way out of the data lakes, it will help to find value in big data stores. However, analytics is now different: it is not just a clever tool for analysis, but also the whole architecture to put data in the analytic-ready form.

And remember, it is big data analytics - different solutions and algorithms are required at scale.

Moore’s law is still hard at work: for example, server memory is measured now in terabytes compared to gigabytes two years ago, so clusters of severs can keep tens of terabytes in memory - this paves the road to fast in-memory analytics.

Apache Spark - a fast, in-memory processing and analytical framework - came to focus at the right time, in the right place: it is leading the shift from big data storage to big data analysis.

In early 2014, I claimed that “the rocket ship of big data analytics is launched and on its way to orbit.”