DATA PROCESSING ENGINE

Qubole supports best-of-breed data processing engines and frameworks for end-to-end data processing. With Qubole’s platform-based approach, new open-source big data engines and frameworks can be easily added to ensure platform longevity.

APACHE SPARK

Qubole runs the biggest Apache Spark clusters in the cloud and supports a broad variety of use cases from ETL and machine learning to analytics. 

Qubole’s implementation of Spark is a performance-enhanced and cloud-optimized version of the open-source framework Apache Spark. These enhancements bring all of the cost and performance optimization features of Qubole to Spark workloads.

Qubole’s Spark implementation greatly improves the performance of Spark workloads with enhancements such as fast storage, distributed caching, advanced indexing, and metadata caching capabilities. Other enhancements include job isolation on multi-tenant clusters and SparkLens, an open-source Spark profiler that provides insights into the Spark application.

PRESTO

Qubole integrates an enhanced and cloud-optimized version of Presto. 

Qubole’s Presto implementation is an enterprise-ready and secure distributed SQL query engine, which allows analysts to quickly derive business insights from data.

Qubole has optimized Presto for the cloud. Qubole’s enhancements allow for dynamic cluster sizing based on workload and termination of idle clusters — ensuring high reliability while reducing compute costs. Qubole’s Presto clusters support multi-tenancy and provide logs and metrics to track the performance of queries.

TENSORFLOW

With support for a variety of deep learning libraries, users can build and train neural networks inside the Qubole. Qubole’s deep learning clusters come with anaconda environments that include all popular deep learning packages such as TensorFlow.

Qubole’s TensorFlow engine has been built to run on distributed Graphics Processing Units (GPUs) on Amazon Web Services.

APACHE AIRFLOW

Qubole provides its users with an enhanced, cloud-optimized version of Apache Airflow. Qubole provides single-click deployment of Airflow, automates cluster and configuration management, and includes dashboards to visualize the Airflow Directed Acyclic Graphs (DAGs).

HADOOP

Qubole runs applications written in MapReduce, Cascading, Pig, Hive, Scalding, and Spark using Apache Hadoop. Qubole’s implementation of Hadoop on the cloud is compatible with open source. Qubole also delivers performance enhancements to optimize the use of Hadoop for machine learning, AI, and analytics workloads.

APACHE HIVE

Qubole provides an enhanced, cloud-optimized, self-managing, and self-optimizing implementation of Apache Hive. Qubole’s implementation of Hive leverages AIR (Alerts, Insights, Recommendations) and allows data teams to focus on generating business value from data rather than managing the platform. Qubole Hive seamlessly integrates with existing data sources and third-party tools, while providing best-in-class security.