Building a World-Class Digital Advertising Analytics Platform Using Qubole Data Service

About MediaMath

MediaMath, a 750-employee company based out of New York City, founded in 2007, is the leading global digital media- buying platform. MediaMath develops and sells tools for Digital Marketing Managers under the TerminalOne brand. TerminalOne allows Marketing Managers to plan, execute, optimize, and analyze marketing programs. This is a case study written by MediaMath for Qubole.


The Analytics and Insights team at MediaMath is responsible for delivering decision-making infrastructure and advisory services to our clients. The team does this by helping clients answer complex business questions using analytics that produce actionable insights. Examples of the team’s work includes but is not limited to:

  1. Segmenting audiences based on their behavior including such topics as user pathway and multi-dimensional recency analysis
  2. Building customer profiles (both uni/multivariate) across thousands of first party (i.e., client CRM files) and third party (i.e., demographic) segments
  3. Simplified attribution insights showing the effects of upper funnel prospecting on lower funnel remarketing media strategies


  • Segmenting audiences based on their behavior
  • Building customer profiles across thousands 1st and 3rd party segments
  • Simplified attribution insights showing upper funnel prospecting effects


  • Complexity of transforming Semi-Structured data
  • Repeatable Data Pipelines
  • Low Risk Apache Hadoop
  • Hadoop on-premise vs cloud

The Challenge

Our flagship product captures all kinds of data that is generated when our customers run digital marketing campaigns on TerminalOne. This data amounts to a few terabytes of structured and semi-structured data in a day. It consists of information on marketing plans, ad campaigns, ad impressions served, clicks, conversions, revenue, audience behavior, audience profile data, etc. At MediaMath, we are always looking to enhance our cutting edge infrastructure. We were looking to take our existing capabilities to the next level to manage new innovative analytics tasks.


Processing this raw data to segment the audience, optimize campaign yield, compute revenue attribution, etc., is a non- trivial problem for some of the following reasons:

Complexity of transforming Semi-Structured data

Transforming session log data to construct user sessions and click-path analysis for further analysis is a complex process. We knew that Apache Hadoop was an attractive alternative but we wanted a solution that our analysts could easily use and get started with quickly and did not have to worry about the operational management of such technical options. We wanted a solution where analysts could focus on their data and transformations without having to think about issues such as cluster sizes, Apache Hadoop versions, machine types and other elements of cluster operations.

Repeatable Data Pipelines

We needed a service to develop data pipelines that repeated the same transformations, day-after-day, week-after- week, without much intervention from my team once it was setup. Automating the execution of the data pipeline, while honoring the interdependencies between the pipeline activities was a crucial requirement! We had learned our lessons via prior experiments with cron that this wasn’t the best approach.

Low risk Apache Hadoop

We needed something that was reliable and easy to learn, setup, use and put into production without the risk and high expectations that comes with committing millions of dollars in upfront investment.

I am very happy with Qubole! Our goal at MediaMath was to take our existing industry leading infrastructure to the next level handling new complex analytics tasks. Qubole has helped us enable this goal with minimal risk. -Renee Englehardt, VP, Analytics, MediaMath

The Solution

Big Data Analytics Solution

During our trial, we quickly created an account on Qubole and the team helped us upload sample data. We started using the system and immediately started to see the value of it. Within hours, we were able to re-use a number of very useful, business-critical, custom Python libraries that we had developed, matured, and stabilized. These libraries computed revenue attribution by customer and by campaign by mashing together semi-structured and relational data, as well as other useful tricks.


We also noticed that the cloud-based Qubole clusters automatically grew the number of compute nodes as we started to run more queries and scaled the cluster down as the number of queries went down. This operational efficiency was a plus as we didn’t have to continually reach out to our partners in Engineering who have the complex task of managing our mission critical production systems.

Data Pipelines

Qubole’s engineering team worked with our team to build a custom data collector from our Oracle Database to my Amazon S3 account. Using their S3 Loader and Sqoop-as-a-Service offering, they setup a pipeline that loaded the S3 data into Qubole’s Big Data Analytics Solution, did all kinds of processing, and pushed the resulting summaries into a MySQL instance that both our customers and we could query using our BI tools. We were set up and running in a few days.

Risk Free

Qubole‘s interfaces, including its easy to use GUI that really simplifies big data and its support for SQL with easy ways of embedding custom libraries, made it easy to learn. Using their GUI, setting up and tearing down clusters was totally transparent — as an analyst I did not have to take on such an operations headache. We saved the company a few million dollars of upfront investment by going with Qubole. Also, the Qubole guys are a seasoned bunch who seem know what they are doing, and have credible answers and solutions to the team’s questions. They are a Skype-chat or a phone call away whenever my team needs help with issues or change requests. I don’t feel I am taking on a huge risk by going with Qubole. Over time, they have become a partner in my team’s success, one to whom I delegate my big data platform needs.

We needed something that was reliable and easy to learn, setup, use and put into production without the risk and high expectations that comes with committing millions of dollars in upfront investment. -Renee Englehardt, VP, Analytics, MediaMath

Download the PDF version of this case study.