Best Practices: How To Build Scalable Data Pipelines for Machine Learning

July 17, 2020

Data engineers today serve a wider audience than just a few years ago. Companies now need to apply machine learning (ML) techniques on their data in order to remain relevant. Among the new challenges faced by data engineers is the need to build and fill Data Lakes as well as reliably delivering complete large-volume data sets so that data scientists can train more accurate models. Aside from dealing with larger data volumes, these pipelines need to be flexible in order to accommodate the variety of data and the high processing velocity required by the new ML applications. Qubole addresses these challenges by providing an auto-scaling cloud-native platform to build and run these data pipelines. In this webinar we will cover: - Some of the typical challenges faced by data engineers when building pipelines for machine learning - Typical uses of the various Qubole engines to address these challenges. - Real-world customer examples

Previous Video
Right Tool for the Job: Using Qubole Presto for Interactive and Ad-Hoc Queries
Right Tool for the Job: Using Qubole Presto for Interactive and Ad-Hoc Queries

Presto is the go-to query engine of Qubole customers for interactive and reporting use cases due to its exc...

Next Video
Key Differences Between On-Prem and Cloud Data Platforms
Key Differences Between On-Prem and Cloud Data Platforms

Cloud service models have become the new norm for enterprise deployments in almost every category — and big...