The Future of Big Data and Machine Learning Is Clear: It’s All on the Cloud

Start Free Trial
October 8, 2018 by Updated April 8th, 2024

Last week’s announcement that Cloudera and Hortonworks will merge to form a single entity speaks volumes about the state of the big data and Machine Learning (ML) market: the cloud is the future, and the world of on-premises data centers is becoming a thing of the past. Companies who build their software products for that legacy world have all stopped growing — a notable contrast to the rapid expansion of cloud-native software businesses. On a personal note, this validates what my co-founder Joy and I had predicted when we decided to build Qubole as a cloud-only platform.

Many studies including a recent one from Forrester assert big data in the cloud is quickly rising and is expected to grow nearly 7.5 times faster than the on-premises market. The announced merger confirms this analysis and reveals how difficult it has become for vendors to remain competitive in a dwindling on-premise market space, whereas the cloud market for data platforms is growing rapidly. In fact, I would argue that vendors with on-premises offerings — even a Cloudera-Hortonworks hybrid — are quickly being overpowered by platforms built in the cloud and natively designed for complex data processing and machine learning as a service.

Why Did This Happen, and Who Wins — Cloud or On-Premises?

I’ve seen the symptoms of a challenging on-premises market environment for a few years now. Despite big promises, on-premises Hadoop vendors have not fulfilled their claims of delivering value. As a result, companies who chose the on-premises route have found their projects difficult to implement, overly complex to manage, and impossible to scale without vast resource and budget commitments. And as the volume, variety, and velocity of data sources continue to expand, organizations are increasingly discovering that on-premises infrastructures simply cannot offer the agility and scalability required for complex big data and ML projects.

Now, customers of Cloudera and Hortonworks must assess the long-term viability of their investments in on-premises big data technology versus cloud-native options. The ensuing merger will lead to months of confusion and chaos as the hybrid company sorts out product offerings and technological changes. In addition, the requisite financial and operational restructuring may prompt employees and customers to evaluate other options — encouraging them to see why the grass is greener on the other side.

The Broad Appeal of Cloud-Native Big Data Platforms

The growth of ML and AI has created a demand for more agile, scalable platforms than traditional on-premises vendors can deliver. Even hybrid solutions pose problems to businesses who need to combine structured, semi-structured, and unstructured data in open source big data frameworks. Data professionals have identified two of the top obstacles to implementing machine learning as analyzing extremely large data sets (40 percent of respondents) and integrating new data into existing pipelines (38 percent), according to the 2018 Big Data Trends and Challenges Survey Report.

For Cloudera and Hortonworks Customers, the Time to Act Is Now

Cloud-native data platforms like Qubole offer scalability and adaptability that their on-premises predecessors cannot. If a business needs additional compute power, the infrastructure can easily expand to meet that need. Likewise, a company that wants to begin using new open-source technology can do so easily with cloud-native platforms, which are more pliable and can be modified to meet changing needs.

For AI and machine learning, a cloud-native data platform enables organizations to automate complex data processing tasks, resulting in faster time to value and lower infrastructure costs. Unlike legacy on-premises platforms, Qubole provides sophisticated workload-aware autoscaling, automatic cluster start/stop, and heterogeneous cluster configurations to optimize and reduce infrastructure costs by 50 percent or more.

To learn more about Qubole’s cloud machine learning capabilities and our migration program, check out our on-premises to cloud webinar.

Start Free Trial
Read Machine Learning: Model Training, Evaluation, and Real-Time Scoring with XGBoost, Apache Spark, and Flask