Auto Tuning Twitter Hadoop Jobs (Or: Don’t Touch That Analytics Dial!) Data Platforms 2018

Speakers: Ben Pence, Software Engineer, Twitter and Anton Panasenko, Software Engineer, Twitter

Presentation: Every day at Twitter, hundreds of thousands of Hadoop jobs transform and aggregate petabytes of data in our analytics stack. Historically, we’ve asked users to guess at tuning parameter values that affect how their Hadoop jobs run. For example, mapper and reducer counts, memory allocation, and intermediate serialization formats, among others. However, after looking at the values that users chose for tuning parameters in 2017, the data revealed that Hadoop jobs across our clusters were running sub-optimally and still not meeting users’ Service Level Agreement (SLA) targets. To address this problem, we implemented a service to automate the tuning of several of the most important Hadoop parameters, using historical per-job metrics to inform future runs.

In this talk, we will review how the system works, some of the auto-tuning we’ve implemented so far, and what we have on our roadmap for the future. Learn more at… Data Platforms Conference: https://www.dataplatforms.com Twitter: https://www.twitter.com Qubole: https://www.qubole.com