Speakers: Ben Pence, Software Engineer, Twitter and Anton Panasenko, Software Engineer, Twitter
Presentation: Every day at Twitter, hundreds of thousands of Hadoop jobs transform and aggregate petabytes of data in our analytics stack. Historically, we’ve asked users to guess at tuning parameter values that affect how their Hadoop jobs run. For example, mapper and reducer counts, memory allocation, and intermediate serialization formats, among others. However, after looking at the values that users chose for tuning parameters in 2017, the data revealed that Hadoop jobs across our clusters were running sub-optimally and still not meeting users’ Service Level Agreement (SLA) targets. To address this problem, we implemented a service to automate the tuning of several of the most important Hadoop parameters, using historical per-job metrics to inform future runs.
In this talk, we will review how the system works, some of the auto-tuning we’ve implemented so far, and what we have on our roadmap for the future. Learn more at… Data Platforms Conference: https://www.dataplatforms.com Twitter: https://www.twitter.com Qubole: https://www.qubole.com
Free access to Qubole for 30 days to build data pipelines, bring machine learning to production, and analyze any data type from any data source.
See what our Open Data Lake Platform can do for you in 35 minutes.