Faster analytics on cloud with RubiX - Shubham Tagra, Technical Director, Qubole

October 28, 2020

Cloud stores are inexpensive and infinitely scalable, which has lead to them being the de facto standard for data lakes. But when it comes to performance, local SSDs easily outperform them, especially with newer advancements like NVMe. Furthermore, due to the access over network cloud stores often struggle to provide consistent performance and in certain cases, like inter-region network access in AWS, it can be expensive. Given that all cloud providers have an option to provision machines with high-speed local disks, many of the shortcomings of the cloud stores can be avoided by using these disks for a cache. RubiX is Qubole's homegrown, open-source data caching framework that integrates with Big data engines like Hive, Presto, Spark to provide the data cache over any cloud store. In this talk, we will find out how RubiX works, how it integrates with the Big Data engines and the cloud store, what kind of improvements can be expected and what new features are being worked upon.

Previous Video
Scaling Data Science with Spark and R - Javier Luraschi, Software Engineer, RStudio
Scaling Data Science with Spark and R - Javier Luraschi, Software Engineer, RStudio

In this talk you will learn what makes R a great platform for Data Science, even when running large-scale w...

Next Video
The business value of Qubole’s Open Data Lake on Google Cloud - Eddie White, Google Cloud
The business value of Qubole’s Open Data Lake on Google Cloud - Eddie White, Google Cloud

Presented by Eddie White, Partner Development Manager, Google Cloud. As enterprises accelerate their digit...