Other content in this Stream

How big data engines are used for exploring and preparing data, building pipelines, and delivering data sets to ML applications


A data lake, where data is stored in an open format and accessed through open standards-based interfaces, is defined as an Open Data Lake.


A Whitepaper of Qubole that how it passionate about making data easily accessible for open data lake platforms while using Amazon AWS for our customer's data with proper security measures & compliance

How to position the data lake expenditure to finance.

TiVo shares best practices for ingesting, processing, and making available for analysis terabytes of streaming and batch viewership data from millions of households

Tips for when to use Presto versus Apache Spark, and how to enable self-service access to your data lake

Brief introduction to Apache Airflow, its optimal use cases, and real-world examples

Real-world data science practitioners offer perspectives and advice on six common Machine Learning problems

Benefits of migrating to a cloud-native data lake and how to choose the right data architecture

Why a unified experience with native notebooks, a command workbench, and integrated Apache Airflow are a must.

A comprehensive guide to understand effective financial governance

The benefits of a single cloud platform and centralized access to data

Best practices for data collaboration and data lake access using SQL

Best practices for building a cloud data lake operation—from people and tools to processes

Technical overview of Qubole's HiveServer2 solution that distributes memory-intensive processes and enables scalability

The best tool for every task in the data science life cycle — in a single, cloud-native platform