Email Text Classification: Building an End-to-End Data Product (Return Path) Data Platforms 2018

Speaker: Sasha Mushovic, Data Scientist, Return Path

Presentation: Sasha will tell the story of building an end-to-end data product that feeds various parts of the Return Path business to optimize email programs for marketers. We will cover discovery, development, and production of an email classification model that uses Apache Spark to fit classifiers such as Random Forests and Support Vector Machines to read email text and classify the content. We will discuss the different methods of hyper-parameter tuning and ensembling used, and will describe different stages of production from batch jobs in Qubole Scheduler and Apache Airflow to streaming in Apache Kafka. We will also reflect on what it means to be a full stack data scientist, and how data science teams can be empowered to own their own data products.

