Apache CarbonData: Data Storage for ACID Ingest, Fast Query, and Machine Learning - Huawei

November 25, 2020

The growing volume of data requires skills to deal with dozens of new challenges like how to ingest streaming mutable data? How to build and cache index for a fast query? How to analyze data with ML? To address the above challenges, we present CarbonData - a data storage that offers SQL API to ingest, query, and analyze data. It empowers ingestion with cloud-native ACID transactions and Streaming Merge SQL. Empowers Query with index, materialized view technologies, and a novel distributed index caching and pruning system that improves query performance and outperforms existing cloud platforms. It empowers analytics with integrating data with ML frameworks and offers SQL API to track lineage and dependencies among data versions and models in the ML pipelines. The contributions are to call out the requirements of a high-performance data storage, share experiences in exploiting novel technologies, share our design in integrating data with ML by SQL, and discuss future challenges.

Previous Video
Declarative Pipelines & Intelligent Orchestration - Data’s Missing Link - Sean Knapp, Ascend.io
Declarative Pipelines & Intelligent Orchestration - Data’s Missing Link - Sean Knapp, Ascend.io

The last decade has brought significant advancements and innovations across the data management landscape, ...

Next Video
Data Lakes in a Real-time bidding environment - David Garty, Spotad
Data Lakes in a Real-time bidding environment - David Garty, Spotad

As Spotad is supporting millions of queries per second, in order to make data reliable and easily accessibl...