Data Lake TCO Optimization | The Data Lake Summit

October 1, 2020 by Smita Sinha and Shefali Aggarwal Updated March 20th, 2024

Running ad hoc analytics, streaming analytics, and machine learning workloads in the cloud offer unique cost, performance, and time to value advantages. But the unpredictability of both, workloads’ sizes and their associated costs can become obstacles to growth and innovation if you don’t have efficient ways to monitor and manage them. Having the means to control costs and apply specified governance policies has become an even more critical topic for cloud data lakes.

In the third series of the Data Lake Summit speakers blog, we present you the list of speakers who will take us through their organizations’ journey of maintaining a sustainable, efficient, and cost-effective data lake platform. They will also share a few techniques to lower data lake job costs while maximizing the platform’s performance.

Check out the full agenda of the summit

Also Read: Data Lakes for Artificial Intelligence and Machine Learning – The Data Lake Summit Speaker Lineup

Data Lakes and Data Warehouses – The Data Lake Summit Speaker Lineup

Brad Caffey, Expedia Group

With an impressive 20 years of experience in data engineering, Brad Caffey, Staff Big Data Engineer at Expedia Group, will speak at The Data Lake Summit on Running Apache Spark Jobs Cheaper While Maximizing Performance. In a Covid-19 world, companies are looking for ways to reduce cloud spending as much as possible. While many Apache Spark tuning guides discuss how to get the best performance using Spark, none of them ever discuss the monetary cost of achieving such performance. In this informative session, Brad will share a proven tuning technique for Apache Spark that lowers job costs while maximizing performance. His topics of discussion include:

The principle for how to make Apache Spark jobs cost-efficient
How to determine the compute costs for your Apache Spark job
How to determine the most cost-efficient executor configuration for your cluster
How to migrate your existing jobs to the cost-efficient executor
How to improve performance with your cost-efficient executor

Brad particularly enjoys doing an in-depth analysis of complex data engineering problems.

Rohit Srivastava and Bitanshu Das, MiQ

Rohit Srivastava, Engineering Manager at MiQ, will be joined onstage by MiQ’s Team Lead Data Engineer Bitanshu Das. Together, they will be speaking on Cost Optimization and Self-Service Reporting for a Data Lake Ecosystem. They will talk about MiQ’s journey to maintain a sustainable, efficient, and cost-effective Data Lake solution. It covers cost optimization initiatives for data pipelines, as well as infrastructure. They will also take a deep dive into the recent approaches to achieving self-serve, automated cost reporting, and debugging.

Both Rohit and Bitanshu have expertise in Programmatic Media Buying for the Ad-Tech domain. The data pipeline builders in their teams are responsible for optimizing data stores and building teams for the same from the ground up. They are experienced in performing root cause analysis and optimizing data pipelines at scale with a cost-effective approach.

Ori Reshef, Varada

Ori Reshef, Vice President Of Products at Varada, will join us in the session titled – Leverage the Power of Big Data Indexing to Optimize Price and Performance. In this session, he will discuss:

How you can effectively index at a petabyte-scale
What is adaptive indexing
What is dynamic indexing
A reality check: Which performance uplift you should expect

Ori has 15+ years of experience and deep expertise in the data space and specializes in speech, text, and big data analytics, and creating business solutions from cutting-edge technologies. As a product expert, he helped shape various solutions that focus on improving customer lifetime value (LTV) by mining three primary types of data:

Customer-generated data, to reveal the intent of the customer
Brand-generated data, to improve sales and retention
Operational data, to increase efficiency

Prior to joining Varada, Ori was the VP of Data Products and Head of Data Science for Clicktale (acquired by Contentsquare). He led the Analytics and Intelligence product for LivePerson (NASDAQ: LPSN) and several other senior product positions. Ori transitioned to executive product roles after serving as a Solution Delivery Manager and Senior Sales Consultant for Nice Systems (NASDAQ: NICE) and several other technologies consulting positions.

For more information about the Data Lake Summit, please visit https://bit.ly/2S6oWas

Start Free Trial

Data Lake TCO Optimization – The Data Lake Summit Speaker Lineup

Recent Posts

Categories

Read Architecting Data Lakes for Scale and Speed – The Data Lake Summit Speaker Lineup

Product

Company

Helpful Links

START YOUR FREE TRIAL OF QUBOLE

Contact Form

On-Demand Qubole Demo

Google Cloud Sessions

Thank you!

UNLOCK QUBOLE FOR FREE

UNLOCK QUBOLE FOR FREE

UNLOCK QUBOLE FOR FREE

UNLOCK QUBOLE FOR FREE

UNLOCK QUBOLE FOR FREE

UNLOCK QUBOLE FOR FREE

Data Lake TCO Optimization – The Data Lake Summit Speaker Lineup

Recent Posts

Categories

Read Architecting Data Lakes for Scale and Speed – The Data Lake Summit Speaker Lineup

START YOUR FREE TRIAL OF QUBOLE

Contact Form

On-Demand Qubole Demo

Google Cloud Sessions

Thank you!