Managing Transactions in Data Lakes - Joydeep Sen Sarma, CTO & Co-founder, Qubole

October 28, 2020

An enterprise data analytics and reporting platform typically runs data pipelines with long and complex jobs, spanning across many services, programs, tools, and scripts interacting together. These jobs need to run on an ad-hoc basis, have a set of dependencies on other existing datasets, and have other jobs that depend on them. Quickly this becomes a tangled mesh of computing and memory intensive processes, leading to a maintenance nightmare, instability, and poor performance. This calls for a need to build a scalable and optimized workflow management solution. While there are a plethora of open source solutions available to solve these problems, they may not fit everyone’s needs. So this talk provides an under-the-hood view into the architectural patterns of such solutions, and considerations for those companies that chose to build a more customizable, simple, and elegant solution without having to reinvent the wheel.

Previous Video
Cost Optimization and Self-Service Reporting for a Data Lake Ecosystem - MiQ
Cost Optimization and Self-Service Reporting for a Data Lake Ecosystem - MiQ

Presented by Rohit Srivastava, Engineering Manager, MiQ & Bitanshu Das, Lead Data Engineer, MiQ This talk ...

Next Video
Data Lakes & Machine Learning: Driving Innovation with your data - Jorge Lopez, AWS
Data Lakes & Machine Learning: Driving Innovation with your data - Jorge Lopez, AWS

Presented by Jorge Lopez, Global Lead, Big Data & Analytics Partner Strategy, AWS. Organizations that use...