How an Advertising Software Company Processes Over 10 Billion Daily Events with Presto on Qubole

April 4, 2019 by

A company’s big data infrastructure is key to its business success. For one leading advertising software platform, this infrastructure collects and processes billions of advertising events daily. Ensuring this happens smoothly falls to the company’s data engineering team, which was put into place to address the need for a single, core data operation.

The data engineering team manages all of the company’s data usage systems, which collect data at scale and make it available to clients, partners, and the rest of the business. The team also delivers custom reports to clients looking to optimize advertising campaigns, as well as detailed reports to publisher partners and internal business users.

Around two years ago, the company recognized that its existing data infrastructure was unable to cope with these growing storage and processing needs. In response, executives turned to Qubole and the Presto big data engine.

Increasing Productivity with Presto on Qubole

The company initially began using Presto to produce partner reports. The data engineering team saw immediate and dramatic improvements: with Presto on Qubole, it took less than an hour to generate reports that the previous infrastructure needed five to six hours to produce. In the span of a few months, the data engineering team moved all of its partner reporting to Presto on Qubole. This win was quickly recognized internally, which led to another Presto migration project for financial data operations.

With significant help from Qubole’s workload-aware autoscaling and other automation features, the company has been able to multiply the data volumes it collects tenfold — from one billion ad events per day to around 10 billion at peak. This capacity has enabled the company to seamlessly introduce new information sources and advertising channels like audio and social media.

Qubole’s managed autoscaling has been a huge differentiator for the data engineering team, as they are able to quickly scale EC2 nodes up and down. Data engineering can now rapidly scale up from 10- to 200-node computing clusters. This enables the team to work as efficiently and productively as possible — and in a way that was previously unimaginable due to doubts over infrastructure cost and reliability.

Delivering Greater Value to Customers

Using Presto on Qubole has also galvanized customer-facing interactions. Presto is an open-source distributed computing engine that is fully ANSI-SQL compliant and built for handling large and numerous concurrent queries.

Two years ago the advertising platform was delivering around 1,000 reports to clients each day — now they’re able to deliver 3,000 to 4,000 reports daily by running the same custom reporting solution on Presto. Clients can analyze the data using 200 different dimensions and metrics to generate reports that vary in size from a few megabytes to several gigabytes of data.

Using Presto on Qubole to process SQL workloads has also directly equated to more positive experiences for customers. The company has been able to optimize data pipelines and deliver rich reporting that allows their clients to make smarter business decisions.

Making Better Business Decisions

Leveraging Presto on Qubole and its new data infrastructure, the advertising platform has achieved improvements with far-reaching business impacts. Among these are:

  • Seamless processing of data collected from 10 billion events per day
  • Increase in quantity of reports delivered daily (from 1,000 to up to 4,000)
  • Drastic reduction in report generation, which originally required five to six hours and can now be completed in one hour
  • Projects are no longer shelved because of data infrastructure feasibility and unreliability
  • Easily accessible BI data means users can make better business decisions faster

Check out more stories from other Qubole users by visiting the Customer Stories section of the Qubole blog.

  • Blog Subscription

    Get the latest updates on all things big data.
  • Recent Posts

  • Categories

  • Events

    Spark + AI Summit

    Apr. 23, 2019 | San Francisco, CA

    Open Data Science Conference East

    Apr. 30, 2019 | Boston, MA

    AWS Summit Mumbai

    May. 10, 2019 | Mumbai, India

    Informatica World

    May. 20, 2019 | Las Vegas, NV

    Disney Data & Analytics Conference

    Aug. 20, 2019 | Orlando, FL

    Strata NY

    Sep. 23, 2019 | New York, NY

    Big Data World Asia

    Oct. 9, 2019 | Singapore

    Microsoft Ignite

    Nov. 4, 2019 | Orlando, FL

    AWS re:Invent

    Dec. 2, 2019 | Las Vegas, NV