Why Nextdoor Ditched a Data Warehouse for a Centralized Data Lake – Ivan Peng, Nextdoor

Presented by Ivan Peng, Software Engineer – Data Platform, Nextdoor Qubole and Nextdoor have been partners for a while: we’ve leveraged Airflow and Qubole extensively to create ETL-pipelines-as-configuration, democratizing the data platform, and scaling effectively with company growth. However, we quickly realized that our data warehouse – housed on AWS Redshift and what powers our Tableau dashboards – could not meet the demands of rapidly-changing schemas and analytics query volume. Instead of investing in a data warehouse cluster capable of meeting the volume and throughput, we instead doubled down on the data lake architecture, moving to Looker to power our dashboards, and Qubole as the data warehouse engine. In this talk, I’m going to argue that the concession of processing performance in decoupling storage and compute pales in comparison to the data discoverability and the analytics velocity gained from an open data lake, as well as our journey to get here.