This is a guest blog post written by Marc Rossen, a Qubole user and advocate. Team Qubole is grateful for Marc’s contribution to our blog.
About Marc and Mediamath
My name is Marc Rossen. I run the Client Facing Analytics & Insights team at MediaMath. I have been solving technical analytics problems in the digital marketing space for over a decade. I came across Qubole less than a year ago and I have been using their service over the past five months. I am happy to write this guest blog post on my experiences in Qubole, based on their request.
MediaMath, a 260-employee company based out of New York City, founded in 2007, is the leading global digital media-buying platform. At MediaMath, we develop and sell tools for Digital Marketing Managers under the TerminalOne brand. TerminalOne allows Marketing Managers to plan, execute, optimize, and analyze marketing programs.
The Analytics & Insights team is responsible for delivering decision-making infrastructure and advisory services to our clients. We do this by helping them answer complex business questions using analytics that produce actionable insights. Examples of our work include but are not limited to:
- Segmenting audiences based on their behavior including such topics as user pathway and multi-dimensional recency analysis.
- Building customer profiles (both uni/multivariate) across thousands of first party (i.e., client CRM files) and third party (i.e., demographic) segments.
- Simplified attribution insights showing the effects of upper funnel prospecting on lower funnel re-marketing media strategies.
Our flagship product captures all kinds of data that is generated when our customers run digital marketing campaigns on TerminalOne. This data amounts to a few terabytes of structured and semi-structured data in a day. It consists of information on marketing plans, ad campaigns, ad impressions served, clicks, conversions, revenue, audience behavior, audience profile data, etc. At MediaMath, we are always looking to enhance our cutting-edge infrastructure. We were looking to take our existing capabilities to the next level to manage new innovative analytics tasks. Processing this raw data to segment the audience, optimize campaign yield, compute revenue attribution, etc., is a non-trivial problem for some of the following reasons:
1. Complexity of transforming Semi-Structured data
Transforming session log data to construct user sessions and click-path analysis for further analysis is a complex process. We knew that Apache Hadoop was an attractive alternative but we wanted a solution that our analysts could easily use and get started with quickly and did not have to worry about the operational management of such technical options. We wanted a solution where analysts could focus on their data and transformations without having to think about issues such as cluster sizes, Apache Hadoop versions, machine types and other elements of cluster operations.
2. Data Pipelines
We needed a service to develop data pipelines that repeated the same transformations, day-after-day, week-after-week, without much intervention from my team, once it was setup. Automating the execution of the data pipeline, while honoring the inter-dependencies between the pipeline activities was a crucial requirement! We had learnt our lessons via prior experiments with cron that this wasn’t the best approach.
3. Low risk Apache Hadoop
We needed something that was reliable and easy to learn, setup, use and put into production without the risk and high expectations that comes with committing millions of dollars in upfront investment.
We evaluated a few Apache Hadoop based offerings and decided to give Qubole a try:
1. Big Data Analytics Solution
During our trial, we quickly created an account on Qubole and the team helped us upload sample data. We started using the system and immediately started to see the value of it. Within hours, we were able to re-use a number of very useful, business-critical, custom Python libraries that we had developed, matured, and stabilized. These libraries computed revenue attribution by customer and by campaign by mashing together semi-structured and relational data, as well as other useful tricks.
We also noticed that the cloud-based Qubole clusters automatically grew the number of compute nodes as we started to run more queries and scaled the cluster down as the number of queries went down. This operational efficiency was a plus as we didn’t have to continually reach out to our partners in Engineering who have the complex task of managing our mission critical production systems.
3. Data Pipelines
Qubole’s engineering team worked with our team to build a custom data collector from our Oracle Database to my Amazon S3 account. Using their S3 Loader and Sqoop-as-a-Service offering, they setup a pipeline that loaded the S3 data into Qubole’s Big Data Analytics Solution, did all kinds of processing, and pushed the resulting summaries into a MySQL instance that both our customers and we could query using our BI tools. We were set up and running in a few days.
4. Low Risk
Qubole‘s interfaces, including its easy to use GUI that really simplifies big data and its support for SQL with easy ways of embedding custom libraries, made it easy to learn. Using their GUI, setting up and tearing down clusters was totally transparent — as an analyst I did not have to take on such an operations headache. We saved the company a few million dollars of upfront investment by going with Qubole. Also, the Qubole guys are a seasoned bunch who seem know what they are doing, and have credible answers and solutions to the team’s questions. They are a Skype-chat or a phone call away whenever my team needs help with issues or change requests. I don’t feel I am taking on a huge risk by going with Qubole. Over time, they have become a partner in my team’s success, one to whom I delegate my big data platform needs.
I will conclude by saying that I am generally very happy with Qubole. Our goal at MediaMath was to take our existing industry-leading infrastructure to the next level handling new complex analytics tasks. Qubole has helped us enable this goal with minimal risk. I wish the Qubole team the best, and wish you well in your journey of discovering a big data solution!