Computerworld

Inside the new Cloudera Data Platform: vision, migration and roadmap

The vendor has laid out its plans for a new combined data platform, which promises to bring together the best features of CDH and HDP

Cloudera has laid out its plans for a new combined data platform since merging with fellow open source big data vendor Hortonworks, which promises to bring together the best of the former rival Hortonworks Data Platform (HDP) and Cloudera Distribution Hadoop (CDH) options.

Here, we go into more detail regarding how the two companies are coming together to create a unified product, as well as how staff and support functions have been blended into one organisation and what customers can expect from the new Cloudera Data Platform (CDP).

The marriage so far

Former rivals Cloudera and Hortonworks merged last year, the end of what Cloudera CMO Mick Hollison called a "long dating process" that was finalised earlier this year.

The merger can be thought of as something of a marriage of convenience as "many of our engineers had been in effect working together for the past six, seven, eight years in the open source community," as Hollison put it while speaking to the media this week.

"About 70 per cent of our code is overlapping between the two companies," he added. "So the bulk of what we were working on, we were working on together in many respects already. So there was a natural affinity that was already there, certainly, on the engineering side."

That being said, the go-to-market functions weren't as quick to get along. "It was a lot more competitive as you might imagine," Hollison said, "but that also is coming together really, really well."

The company just completed its sales kickoff in Seattle, where everyone got to "meet one another and once a bunch of sales people start drinking together, everything's fantastic, right? So we all enjoyed ourselves, and got to know and trust one another quite a bit more."

Speaking on Cloudera's Q4 earnings call earlier this month, where the vendor announced combined revenues of US$145 million, Cloudera CEO Tom Reilly went into a bit more detail, saying: "Our account representatives receive their account assignments last month and we have completed our sales kickoff with extensive training on the new roadmap and product offerings added in the merger."

This salesforce will be solely focused on the biggest enterprise customers.

Cloudera is also bringing together its Cloudera University for training, the professional services function and customer support into a single department with common systems to deliver streamlined training and support for customers.

Finally, on partnerships, Hollison talked about how the vendor has now "merged two very large and successful partner ecosystems. On the Cloudera side, I can tell you, we are super excited about improving upon our already great partnerships with IBM and with Microsoft.

Hortonworks recently announced a new, improved relationship with the Google Cloud Platform as well, we're going to be digging into that."

Cloudera Data Platform

That product roadmap is starting to take shape now with the announcement of the Cloudera Data Platform, which promises to bring together the best features of both companies' enterprise big data solutions.

Speaking to the press and analysts this week during the Dataworks Summit in Barcelona, Hollison explained that CDP has to have four primary attributes to give customers what they need.

- It has to support multi-function analytics: from streaming and big data ingest to IoT and machine learning

- It has to support every conceivable cloud delivery mechanism: private cloud, public cloud, multi-cloud, hybrid, on-prem and containerised deployments, all with a common metadata catalog and schema.

- It has to have a common security and governance model across all analytic functions and cloud delivery modes.

- And it has to be a 100 percent open platform.

When it comes to 'multi-function analytics', Hollison is talking about the 'Edge to AI' part Cloudera likes to talk about, providing a platform for streaming and data ingestion which can be stored and analysed at velocity and where advanced machine learning and AI techniques can be applied.

On security and governance, Hollison explained that the vendor has "taken the best of what we were developing with Cloudera's Altus and what Hortonworks was developing with their data plane services, we brought those together to give our customers and administrators in particular, a common way of addressing identity, orchestration, management, and overall operations of the environment."

Then, on being fully open source, Hollison admits that Cloudera has been seen as 'open core' while Hortonworks was seen as 100 per cent open source in the past. "Now we have announced that the Cloudera Data Platform will be 100 per cent open source," he clarified.

Fred Koopmans, VP of product management at Cloudera also talked about some of the low-hanging fruit that could be integrated between the two platforms straight away without what he called "major surgery" to the platform.

This includes support for Apache Phoenix; a remote cluster management solution for Cloudera that Hortonworks customers have long had access to, and the extension of the Cloudera Data Science Workbench for HDP customers, who were "historically were a bit more limited in terms of their ability to deploy machine learning applications, with CDSW their data scientists can be much more productive in an integrated manner."

Roadmap

The first release of CDP is planned for summer of this year and will be available only as a public cloud product on Microsoft Azure and AWS.

Later in the year, or perhaps towards the start of 2020, Cloudera will deliver what Hollison calls "the second release, which will support private cloud implementations, containerisation, Kubernetes etc."

"So that's basically the sequencing that you can think about," he said. "The first release will support data engineering, data warehousing and machine learning. Then we will bring the rest of those multi-function analytics to bear in that second release."

Moving forward, public cloud customers will get regular push updates to the platform and those running on-prem will get access to new features a couple of times a year.

As Koopmans explained: "We will build in CDP public cloud first. For that, we're going to push as frequently as we can, ideally twice a month. So if you're a customer on that service, you're constantly seeing the service get better and better, as you would expect for any other cloud service.

"For our customers that want to run the data centre, they can't consume things twice a month by any stretch, they want something like twice a year.

"So what we'll do is roughly six months of development and then we'll cut an installable version of that same software, and then give that to them in the data centre. Or if they happen to want to run it as installed software on their cloud of choice, they can do that as well."

Migration

Reilly has already reassured existing customers of HDP 3 and CDH 5 and 6 that they will be supported through to January 2022 at least, including a post-merger commitment of three years of support.

Now, Koopmans says that the vendor will provide customers with 'direct upgrade paths' to the new platform, even if they are on some older versions of HDP and CDH.

"We're going to support an upgrade path from our current versions to the CDP that covers both CDH 5 and 6, and HDP 2 and 3," he added. "The reason that is important is that both Cloudera and Hortonworks had just last summer released a major new version.

"That was the first time in five years that either company had introduced the major new version. As of this moment, as of March of 2019, the vast majority of customers are still running a prior version. So we're going to give them an upgrade path to go directly to CDP."

Then there is the path for "a customer running CDH or HDP that wants to upgrade your cluster from the current version, on the current hardware, with current data and the current applications, all to a CDP cluster," Koopmans said.

"So while CDP will do many new things, one important thing is it can do what customers have already been doing with it, which is: 'I have a bare metal cluster of 200 nodes in my data centre and all I want to run is the new version', that is a supported use case and a very common path we expect our customers to take."

Customer perspective

One Hortonworks customer that is currently assessing their migration path to CDP is Zurich Insurance.

Abhishek Sakhuja is the Senior Technology Leader at technology consultancy Everis, which supports Zurich in its use of Hortonworks for the Benelux region.

He admitted to Computerworld UK that he didn't expect the merger when it was announced, but that "when we heard about CDP that was amazing because we now are thinking that instead of one set of services with Hortonworks, we get the best of both worlds, so using [Apache] Ranger or [Apache] Sentry, that will add some value.

"CDP also provides a good upgrade from HDP 2 that we are on now," he said, but in the end "that is a business decision to upgrade or continue with HDP.

"The only concern that we have right now is that we want to know which services are being merged, so we are comfortable using [Apache] Knox or [Apache] Atlas, how will these be migrated into CDP?

"For example we expose services to developers using [Apache] Ambari, when this shifts what will that look like? That will factor into our decision."