Classic enterprise data warehouse architecture is evolving under the influence of new technologies, new requirements, and changing economics.
As the market heads into 2016, one of the most significant changes to enterprise data warehouse technology is the introduction of so-called ‘data lakes’, large storage repositories and processing engines that are transforming the way data is handled by enterprises.
“Data lakes let enterprise data warehouses store massive amounts of data, offer enormous processing power, and let organisations to handle a virtually limitless number of tasks at the same time,” says Martin Hooper, head of business development, CenturyLink.
Classic enterprise data warehouses have sources feeding a staging area, and data that is consumed by analytic applications.
In this model, the access layer of the data warehouse, known as the data mart, is often part of the data warehouse fabric, and applications are responsible for knowing which databases to query.
In modern enterprise data warehouses, data lake facilities based on the Apache Hadoop open source software framework replace the staging area that sits at the centre of traditional data warehouse models.
While data lakes provide all of the capabilities offered by the staging area, they also have several other important benefits.
“A data lake can hold raw data forever, rather than being restricted to storing it temporarily, as the classic staging area is,” Hooper adds.
“Data lakes also have compute power and other tools, so they can be used to analyse raw data to identify trends and anomalies. Furthermore, data lakes can store semi-structured and unstructured data, along with big data.”
But as Hooper explains, using Hadoop as an enterprise data warehouse staging area is not a new concept.
“A data lake based on Hadoop not only provides far more flexible storage and compute power, but it is also an economically different model that can save businesses money,” he claims.
“A Hadoop staging approach begins to solve a number of the problems with traditional enterprise data warehouse architecture, while full-blown data lakes have created an entirely new data warehouse model that is more agile, more cost-effective, and provides companies with a greater ability to leverage successful experiments across the enterprise, resulting in a greater return on data investment.”