Best-of-Breed Data Processing Engines

Qubole supports best-of-breed data processing engines and frameworks for end-to-end data processing. With Qubole’s platform-based approach new data processing engines and frameworks can be added with ease ensuring platform longevity.


Qubole runs the biggest Spark clusters in the cloud and supports a broad variety of use cases from ETL and machine learning to analytics. Qubole’s implementation of Spark is a performance-enhanced and cloud-optimized version of the open source framework Apache Spark. These enhancements bring all of the cost and performance optimization features of Qubole to Spark workloads.

Qubole’s Spark implementation greatly improves the performance of Spark workloads with enhancements such as fast storage, distributed caching, advanced indexing, and metadata caching capabilities. Other enhancements include job isolation on multi-tenant clusters and SparkLens, an open source Spark profiler that provides insights into the Spark application.


Qubole integrates an enhanced and cloud-optimized version of Presto. Qubole’s Presto implementation is an enterprise-ready and secure distributed SQL query engine, which allows analysts to quickly derive business insights from data.

Qubole has optimized Presto for the cloud. Qubole’s enhancements allow for dynamic cluster sizing based on workload and termination of idle clusters — ensuring high reliability while reducing compute costs. Qubole’s Presto clusters support multi-tenancy and provide logs and metrics to track performance of queries.


With support for a variety of deep learning libraries, users can build and train neural networks inside the Qubole. Qubole’s deep learning clusters come with anaconda environments that include all popular deep learning packages such as TensorFlow.

Qubole’s TensorFlow engine has been built to run on distributed Graphics Processing Units (GPUs) on Amazon Web Services.


Qubole provides it’s users an enhanced, cloud-optimized version of Apache Airflow. Qubole provides single-click deployment of Airflow, automates cluster and configuration management, and includes dashboards to visualize the Airflow Directed Acyclic Graphs (DAGs).


Qubole runs applications written in MapReduce, Cascading, Pig, Hive, Scalding, and Spark using Apache Hadoop. Qubole’s implementation of Hadoop is compatible with open source with added performance enhancements and optimizations for the cloud.


Qubole provides an enhanced, cloud-optimized, self-managing, and self-optimizing implementation of Apache Hive. Qubole’s implementation of Hive leverages AIR (Alerts, Insights, Recommendations) and allows data teams to focus on generating business value from data rather than managing the platform. Qubole Hive seamlessly integrates with existing data sources and third-party tools, while providing best-in-class security.