Running ad hoc analytics, streaming analytics, and machine learning workloads in the cloud offer unique cost, performance, and time to value advantages. But the unpredictability of both, workloads sizes and their associated costs, can become obstacles to growth and innovation if you don’t have efficient ways to monitor and manage them. Having the means to control costs and apply specified governance policies has become an even more critical topic for cloud data lakes.
In the third series of the Data Lake Summit speakers blog, we present you the list of speakers who will take us through their organisations’ journey of maintaining a sustainable, efficient, and cost-effective data lake platform. They will also share a few techniques to lower data lake job costs while maximizing the platform’s performance.
Brad Caffey, Expedia Group
With an impressive 20 years of experience in data engineering, Brad Caffey, Staff Big Data Engineer at Expedia Group, will speak at The Data Lake Summit on Running Apache Spark Jobs Cheaper While Maximizing Performance. In a Covid-19 world, companies are looking for ways to reduce cloud spending as much as possible. While many Apache Spark tuning guides discuss how to get the best performance using Spark, none of them ever discuss the monetary cost of achieving such performance. In this informative session, Brad will share a proven tuning technique for Apache Spark that lowers job costs while maximizing performance. His topics of discussion include:
- The principle for how to make Apache Spark jobs cost-efficient
- How to determine the compute costs for your Apache Spark job
- How to determine the most cost-efficient executor configuration for your cluster
- How to migrate your existing jobs to the cost-efficient executor
- How to improve performance with your cost-efficient executor
Brad particularly enjoys doing an in-depth analysis of complex data engineering problems.
Rohit Srivastava and Bitanshu Das, MiQ
Rohit Srivastava, Engineering Manager at MiQ, will be joined onstage by MiQ’s Team Lead Data Engineer Bitanshu Das. Together, they will be speaking on Cost Optimization and Self-Service Reporting for a Data Lake Ecosystem. They will talk about MiQ’s journey to maintain a sustainable, efficient, and cost-effective Data Lake solution. It covers cost optimization initiatives for data pipelines, as well as infrastructure. They will also take a deep dive into the recent approaches to achieve self-serve, automated cost reporting and debugging.
Both Rohit and Bitanshu have expertise in Programmatic Media Buying for the Ad-Tech domain. The data pipeline builders in their teams are responsible for optimizing data stores and building teams for the same from the ground up. They are experienced in performing root cause analysis and optimizing data pipelines at scale with a cost-effective approach.
Ori Reshef, Varada
Ori Reshef, Vice President Of Products at Varada, will join us on the session titled – Leverage the Power of Big Data Indexing to Optimize Price and Performance. In this session, he will discuss:
- How you can effectively index at petabyte scale
- What is adaptive indexing
- What is dynamic indexing
- A reality check: Which performance uplift you should expect
Ori has 15+ years of experience and deep expertise in the data space and specializes in speech, text, and big data analytics, and creating business solutions from cutting edge technologies. As a product expert, he helped shape various solutions that focus on improving customer lifetime value (LTV) by mining three primary types of data:
- Customer-generated data, to reveal the intent of the customer
- Brand-generated data, to improve sales and retention
- Operational data, to increase efficiency
Prior to joining Varada, Ori was the VP of Data Products and Head of Data Science for Clicktale (acquired by Contentsquare). He led the Analytics and Intelligence product for LivePerson (NASDAQ: LPSN) and several other senior product positions. Ori transitioned to executive product roles after serving as a Solution Delivery Manager and Senior Sales Consultant for Nice Systems (NASDAQ: NICE) and several other technology consulting positions.
For more information about the Data Lake Summit, please visit https://bit.ly/2S6oWas
Register today for two days of inspiration and learning experience. It’s 100% virtual and free!
The post Data Lake TCO Optimization – The Data Lake Summit Speaker Lineup appeared first on Qubole.