Tech Talk

Operationalizing YARN Based Hadoop Clusters in the Cloud Lessons and Opportunities

Qubole’s Big Data Service began three years back with a hardened Hadoop 1 stack and later started offering YARN based clusters to offer next generation technologies like Spark and Tez in addition to MapReduce. YARN is a big shift from the traditional Hadoop model and operating it in cloud environments as ephemeral, auto-scaling clusters represented a big challenge. Qubole leverages public cloud features like spot instances, EBS volumes and cloud object stores. Achieving the same level of reliability and performance as our first generation Hadoop offering and being able to migrate over scores of customers represented a big challenge. In this talk we will cover our experience of navigating this migration. Some example topics this will talk with cover include:

In this Webinar, Ashish Thusoo and Sadiq Shaik will provide hands-on training on building high-performance, scalable Hive Queries. Learn:

1. Common Auto-Scaling Framework for YARN and multiple execution engines like MR, Spark & Tez
2. Adapting YARN/HDFS to take lossy spot instances in account
3. Issues using AWS S3 at scale and our efforts at moving to the S3A filesystem
4. Performance comparisons of Hive running against Hadoop-1, Hadoop-2 M/R and Tez in the Cloud
5. Providing long lived management UI over ephemeral clusters


Abhishek Modi