Avoiding Big Data Failure – Let the Data Genie out of the Bottle!

Start Free Trial
June 29, 2017 by Updated April 16th, 2024
Opening Keynote at Data Platforms 2017 with Qubole Co-Founder/CEO Ashish Thusoo

All businesses are now in the data business. Whether ingesting interaction clickstream and transactional data from customers or IoT data from various machines monitoring one another (servers, thermostats, and so forth), this real-time data needs to be enriched with enterprise information housed somewhere in the data lake. Data teams need to quickly locate, access, and combine this enterprise information with real-time data, gleaning insights that can lead to action. Whether the outcome is serving up relevant ads, detecting fraudulent behavior, or recommending relevant next-best actions for consumers, all of these use-cases can be based on a modern data platform.

Industry Data Trends

Ashish Thusoo opened up the keynote by summarizing three seismic changes that companies are grappling with on their journey to becoming a data-driven enterprises:

  1. Exploding data. According to IDC, by 2020, there will be 40 zettabytes of data (5,200 GB) for every man, woman, and child on Earth. Ashish equated that to 90 years’ worth of HD videos being created every day.
  2. Speed of industry innovation both on computing platforms (GPU computing, serverless cloud offerings, in-memory database computing) and on software platforms for processing data (Apache Spark and Flink, Heron, etc.).
  3. Companies moving to the cloud to adapt to constant business changes – being an agile businesses that can scale up or down based on business volumes. Per Forrester, big data growth is the number one reason why companies are moving to the public cloud.

Watch Ashish’s Presentation at Data Platforms 2017

Solve Complexity or Suffer Big Data Failure

Ashish continued laying out some of the major obstacles to success in a big data project. He explained that there are more than 100 open-source projects and customers don’t know which engine to use for which individual use case. Additionally, the speed of innovation causes havoc – Spark alone has had seven releases in two years. Ashish also recounted that at Facebook, they had new releases every month that needed to be tested and put into production.

Another major factor complicating the big data platform landscape is the global labor shortage for people with data skills. According to an IDC report, by 2018, there will be a shortage of 900,000 people with data management and interpretation skills.

And finally, Thusoo quoted a Gartner study, that by the year 2020, over 60 percent of big data projects will not move past the proof-of-concept stage and will be abandoned.

Transforming to a Data-Driven Enterprise

After artfully laying out the current big data landscape and business imperatives, Thusoo provided a compelling case study of the transformation to a self-service data-driven enterprise that took place at Facebook and how those lessons have been baked into the Qubole Data Platform solution. The solution, he offered, is a two-pillar approach: moving to a DataOps framework and implementing artificial intelligence for autonomous data management.

All departments within an organization need access to data. The traditional way data accessibility was structured included an intermediary data team functioning as a service organization – this caused bottlenecks galore! The data team handled every data request individually coming in from end-users such as business analytics and marketing. Clearly, an unsustainable arrangement as the company and user base scaled exponentially.

Ashish relayed that transforming the data team from a service organization to a platform team supporting user self-service, enabling users to pull data based on their need (or “democratizing” data) is one of the main components of being a DataOps organization. Several other DataOps principles include:

  • Publish all data. If you collect data make it actionable – never put it in offline storage.
  • Make data self-service
  • Break down data silos
  • Engender a data culture where managers use data to verify hypotheses

The data team continues to be the focal point for ETL, building pipelines, and other important functions for database maintenance and access. However, by providing a combination of tailored, yet still standardized methods for accessing data internally, there is less latency between real-time data and actionable insight.

Indeed, Ashish describes how Facebook grappled with becoming a DataOps enterprise in a book he co-authored with fellow Qubole Co-Founder Joydeep Sen Sarma (both are former Facebook employees). “Creating a Data-Driven Enterprise with DataOps” is available for free download from the Qubole website, located here. The book explains not only how to create a data-driven culture, but also how the data team can help transform a business – both extremely valuable insights for decision-makers in today’s data-rich enterprises.

Start Free Trial
Read Building QDS: AIR Infrastructure