Running Apache Spark jobs cheaper while maximizing performance – Brad Caffey, Expedia Group

Presented by Brad Caffey, Staff Big Data Engineer, Expedia Group In a Covid-19 world, companies are looking for ways to reduce cloud spending as much as possible. While many Apache Spark tuning guides discuss how to get the best performance using Spark, none of them ever discuss that performance’s cost. In this session, we’ll cover a proven tuning technique for Apache Spark that lowers job costs on AWS while maximizing performance. Topics include:

  • The principle for how to make Apache Spark jobs cost-efficient
  • How to determine the AWS costs for your Apache Spark job
  • How to determine the most cost-efficient executor configuration for your cluster
  • How to migrate your existing jobs to the cost-efficient executor
  • How to improve performance with your cost-efficient executor