Spark optimization with Sparklens – Rohit Karlupia, Qubole

Debugging slow spark applications when done with trial and error, takes lots of time. Sparklens provides insights about scalability limits of spark applications from a single run of the application. In this talk, we will cover what Sparklens does and theory behind Sparklens. We will talk about how the structure of a spark application puts important constraints on its scalability. How can we find these structural constraints and how to use these constraints as a guide in solving performance and scalability problems of spark applications. Sparklens can answer if adding resources will decrease application runtime and by how much. It can answer if decreasing resources will increase efficiency and by how much. Sparklens makes the ROI of additional executor extremely obvious for a given application and needs just a single run of the application to determine how the application will behave with different executor counts. Specifically, it will help managers take the correct side of the tradeoff between spending developer time optimizing applications vs spending money on compute bills.