Auto Tuning Twitter Hadoop Jobs (Or: Don’t Touch That Analytics Dial!) Data Platforms 2018

October 9, 2018

Speakers: - Ben Pence, Software Engineer, Twitter - Anton Panasenko, Software Engineer, Twitter Presentation: Every day at Twitter, hundreds of thousands of Hadoop jobs transform and aggregate petabytes of data in our analytics stack. Historically, we've asked users to guess at tuning parameter values that affect how their Hadoop jobs run. For example, mapper and reducer counts, memory allocation, and intermediate serialization formats, among others. However, after looking at the values that users chose for tuning parameters in 2017, the data revealed that Hadoop jobs across our clusters were running sub-optimally and still not meeting users' Service Level Agreement (SLA) targets. To address this problem, we implemented a service to automate the tuning of several of the most important Hadoop parameters, using historical per-job metrics to inform future runs. In this talk, we will review how the system works, some of the auto-tuning we've implemented so far, and what we have on our roadmap for the future. Learn more at... Data Platforms Conference: https://www.dataplatforms.com Twitter: https://www.twitter.com/ Qubole: https://www.qubole.com

Previous Video
Qubole Security Update: Role-Based Access for Presto, Spark, and Hive Commands
Qubole Security Update: Role-Based Access for Presto, Spark, and Hive Commands

Restrict the visibility of commands to other users in the Qubole account by setting command access to private

Next Article
Embrace Big Data Choice: Curate and Analyze Data with Hive, Spark, and Presto
Embrace Big Data Choice: Curate and Analyze Data with Hive, Spark, and Presto

The big data ecosystem is insanely complex — just making sense of the right tools and technologies can be m...