Qubole Now Supports Glue Data Catalog to Run ETL, ML, and Analytics Jobs

Start Free Trial
May 9, 2019 by Updated November 10th, 2020

You can now use the AWS Glue Data Catalog with Qubole Data Platform and instantly run your ETL, ad hoc analytics, and machine learning/data science jobs on Qubole using Glue as the metastore.

When and Why to Use AWS Glue

Glue Data Catalog is a centralized metastore repository available on AWS. It presents a unified view of all your data assets on AWS and offers a drop-in replacement of the Hive metastore. Furthermore, it provides some additional enhanced capabilities to discover, classify, and search through your data assets on AWS.

As an AWS tool, Glue Data Catalog has various ideal use cases, but it comes with some limitations and may not be suitable for all your use cases. Refer to the AWS documentation for more information to ensure it meets your requirements.

What Are the Benefits of Using Qubole with Glue?

Qubole provides you with flexibility and choice. You can now make the most of a unified data lake platform and a unified shared metastore. You can use Glue’s data crawlers to scan and classify data, extract schema details, and build the data catalog. You can then configure Qubole with this catalog as the metastore and share across your AWS accounts, applications, and services.

With Qubole’s multi-engine support, you are now able to run Hive, Presto queries, and Spark jobs leveraging this catalog. Alternatively, you can continue using your existing or Qubole-hosted metastore and having it synchronized with the Glue Data Catalog.

What Is Supported?

  1. Glue as a metastore in Qubole: All metadata reads and writes go to Glue instead of the default Hive metastore (i.e. the Hive metastore will not be updated).
  2. Glue catalog sync: Hive metastore continues to be the source of truth of metadata operations, but all metadata operations are replicated on Glue Data Catalog as well. In other words, Glue remains updated for consumption (by other AWS services, if required).

For further information refer to the Qubole product documentation.

Why Choose Qubole?

Accelerate your big data journey on Qubole Data Platform now with support for Glue Data Catalog. You can run all of your big data workloads with ease on Qubole while using Glue Data Catalog as the centralized metastore on AWS.

Qubole’s goal is to deliver an enterprise-grade data platform with the greatest flexibility, fastest time to value, and lowest TCO with best-of-breed engines across different use cases. If you are already using Glue or plan to use Glue as a metastore, you can now benefit from Qubole’s support for Glue.

Configure Glue as your metastore and instantly start running your ETL, ad hoc analytics, and machine learning/data science jobs all on Qubole. You can now migrate your existing AWS workloads to Qubole with ease and freedom of choice of metastore. With this added flexibility, there is no better time to give Qubole a try! For details on configuring Qubole with Glue, refer to Qubole’s documentation on setting up Glue Data Catalog on Qubole.




Start Free Trial
  • Blog Subscription

    Get the latest updates on all things big data.
  • Recent Posts

  • Categories

  • Events

    Data Lakes vs. Data Warehouses

    Feb. 25, 2021 | India
  • Read Improving Recover Partitions Performance with Spark on Qubole