Why Nextdoor Ditched a Data Warehouse for a Centralized Data Lake - Ivan Peng, Nextdoor

October 28, 2020

Presented by Ivan Peng, Software Engineer - Data Platform, Nextdoor Qubole and Nextdoor have been partners for a while: we've leveraged Airflow and Qubole extensively to create ETL-pipelines-as-configuration, democratizing the data platform, and scaling effectively with company growth. However, we quickly realized that our data warehouse - housed on AWS Redshift and what powers our Tableau dashboards - could not meet the demands of rapidly-changing schemas and analytics query volume. Instead of investing in a data warehouse cluster capable of meeting the volume and throughput, we instead doubled down on the data lake architecture, moving to Looker to power our dashboards, and Qubole as the data warehouse engine. In this talk, I'm going to argue that the concession of processing performance in decoupling storage and compute pales in comparison to the data discoverability and the analytics velocity gained from an open data lake, as well as our journey to get here.

Previous Video
Running Apache Spark jobs cheaper while maximizing performance - Brad Caffey, Expedia Group
Running Apache Spark jobs cheaper while maximizing performance - Brad Caffey, Expedia Group

Presented by Brad Caffey, Staff Big Data Engineer, Expedia Group In a Covid-19 world, companies are lookin...

Next Video
Cost Optimization and Self-Service Reporting for a Data Lake Ecosystem - MiQ
Cost Optimization and Self-Service Reporting for a Data Lake Ecosystem - MiQ

Presented by Rohit Srivastava, Engineering Manager, MiQ & Bitanshu Das, Lead Data Engineer, MiQ This talk ...