Scaling Beyond a Data Warehouse to Meet Customer Demands

Ibotta hosted a meetup with Qubole and Looker, as part of the Boulder/Denver Big Data community group. In this forum Ibotta shared the story of how the company is in the process of transforming into a data-driven organization, able to keep up with  product and partner initiatives. They also took time to explain how they built a great set of eCommerce products that scale to meet the needs of their rapidly growing user base.

Watch the video and hear from several members of Ibotta’s data and analytics teams as they share their perspectives on what drove the demand for a data lake architecture.  Bottom line, the existing centralized Data Warehouse on Redshift was not up to the task. The objectives that the teams identified as the must-have capabilities for the new data lake were as follows:

  • Decoupling of storage from compute for better utilization of infrastructure resources and analytical tools.
  • Ephemeral and intelligent scaling Hadoop, Presto and Spark clusters that enable automation of their workloads, while reducing costs as they scale.
  • New features development that incorporate Machine Learning into Ibotta’s products, and value they’re able to provide to their customers.
  • Dynamic orchestration for data pipelines between Engineering and Data Science, using AirFlow as the glue behind the data lake operations – from streaming and ingestion of data to feeding data marts.

Each of the speakers from Ibotta cover in detail how the data lake and this operation has applied to their team’s use cases – answering questions about how, by leveraging a data platform they are now able to deliver new insights for the business; and how today, they can leverage the scale and performance available with Qubole and AWS to match their rapidly growing data in a centralized S3 object store, all while controlling costs.

The presentation and fireside chat close out with a lively discussion about how each team now has a new ability to leverage the right tool for the right job using technologies like Hadoop, AirFlow and Spark. They delve into how this diversity of both engines and analytic capability is allowing Ibotta to focus on faster time-to-market operations, and delivering an ROI driven self-service data platform that is evolving Ibotta into a powerhouse Ecommerce platform.

The Speakers

  • Nathan McIntyre (Lead Data Architect at Ibotta) sets the stage in the presentation, talking about the old world and the struggles of scaling Redshift, and why they needed to move to a data lake leveraging Qubole to help make it all happen in a timely manner. Today McIntyre is leveraging technologies such as Kafka, AirFlow, Hive, Spark, and Presto using AWS infrastructure to scale Ibotta’s data operations.
  • Charley Frazier (“Feature Engineering” Data Scientist at Ibotta) follows on in the talk to share how their team was able to seed value within a month upon leveraging Qubole and data in their object store to start productionalizing new features and go into production. He then goes into further detail on the cool things they’re able to do in machine learning and enrichment on their data and how the Data Science team is using Qubole (Hive, Spark, and Airflow) to manage and improve their own ops.
  • Heather Trujillo (Senior BI Analyst at Ibotta) will talk about the depth of reports they deliver to their business and how as their customers and complexity of reports have grown, this has driven them to creating a new model for how they get data to their customers. She then shares future of reporting for Ibotta and how they can enable better scale when necessary with Qubole (Presto and Hive) + Looker integration.

The Fireside Chat

  • The Meetup concludes with a fireside chat moderated by Andy Sautins, community leader of the Boulder/Denver Big Data group and Team Lead at Google Search. Along with the presentation speakers, they are joined in the fireside chat by Ron White (VP of Engineering at Ibotta), Ben Roubichek (Senior Solutions Architect at Qubole), Lucas Theolosen (VP of Professional Services at Looker) to highlight the lessons learned from making these shifts, how it impacts the business, and industry trends of successful companies moving to this infrastructure paradigm.