Qubole on Google Cloud Platform

Qubole is a cloud-native data platform for machine learning, AI, and big data analytics. Qubole on Google Cloud Platform (GCP) provides a first-class user experience through a unified workbench that includes notebooks, dashboards, a native interface for all commands, and built-in tools for easy, secure collaboration.

Qubole’s self-service platform combines the performance, reliability, and scalability of GCP, enabling easier, more collaborative processing of big data workloads on Apache Spark and Hadoop.

Qubole on GCP delivers:

Unified experience for data science and data engineering

Native workbench that includes notebooks, dashboards, and a common interface for all commands and tasks. This enables data engineers and data scientists to collaborate using familiar tools, languages, and data processing engines.

Day-1 self-service access

Fast access to Qubole through GCP Marketplace, with automatic account setup, Google Cloud authentication, and simplified user onboarding.

24x7 support for open source engines

Highly optimized versions of open source engines and frameworks with advanced caching and performance optimizations. Dedicated support and engineering teams specialized by engine.

Low cost and high reliability

Automatic upscaling, rebalancing, and aggressive downscaling of clusters with a complete context of the workload, SLA, and priority of each job. Includes intelligent autonomous and policy-based management of regular compute instances or Preemptible VMs.

Enterprise-grade security

Fine-grained predefined or custom identity and access management roles to separate compute and data access. Qubole also offers role-based access controls for secure collaboration in notebooks and commands.

Easy access to many data sources

Connectors for Google Cloud Storage, Google BigQuery, Oracle, MySQL, Postgres, MongoDB, and more.

For Data Scientists

  • Notebooks for Cloud Storage and BigQuery
  • Python, Scala, SQL, and R in Notebooks
  • Spark MLlib, Scikit-learn, SparkR
  • Collaboration using ACLs and foldering
  • Schedule Notebooks natively or in Airflow
  • Visualizations and dashboards

For Data Engineers

  • Native workbench for all commands
  • Easy lookup of BigQuery and Hive tables
  • Command examples and shareable history
  • Schedule commands natively or in Airflow
  • Data import/export from multiple sources
  • Native API / SDK and UI for all commands

Fast Access and Simplified Onboarding

  • Easy purchase via GCP Marketplace
  • Try & Buy with Qubole Test Drive
  • Integrated with your GCP Bill
  • One-click Qubole account setup
  • Authentication with Google account
  • Optional customized scripted setup

Enterprise-grade Security

  • Predefined or granular custom IAM roles
  • Separate access roles for compute and storage
  • Role-based controls for users and groups
  • Secure collaboration for Notebooks and commands
  • Data access controls with Hive Authorization

Automated Cluster Lifecycle Management

Qubole allows you to efficiently manage all major functions of the cluster lifecycle — configure, provision, monitor, scale, optimize, and recover — through automation. Qubole’s built-in financial governance capabilities provide immediate visibility into platform usage costs with advanced tools for budget allocation, chargeback, and monitoring and controlling your cloud spend.

Workload-Aware Autoscaling

Qubole’s workload-aware autoscaling upscales, downscales, and rebalances clusters with a complete context of the workload, SLA, and priority of each job. Aggressive Cluster Downscaling uses intelligent self-learning algorithms such as Smart Victim Selection, Graceful Downscaling, and Container Packing to balance workloads across active nodes and decommission idle ones without the risk of data loss.

Intelligent Low-cost Compute Management

Qubole’s intelligent management of low-cost compute nodes allows organizations to optimize the use of Google’s Preemptible VMs, resulting in drastic cost savings. Qubole provides policy-based automation of Preemptible VM usage to balance performance, cost, and SLA compliance.

Heterogeneous Cluster Configuration

Qubole’s Heterogeneous Cluster Configuration for on-demand and Preemptible VMs allows you to pick the most cost-effective combination for your job through automation. Qubole enables you to configure heterogeneous clusters by mixing nodes of multiple instance types, delivering much greater data processing efficiency.