Qubole Now Supports Glue Data Catalog to Run ETL, ML, and Analytics Jobs

Start Free Trial
May 9, 2019 by Updated April 1st, 2024

AWS Glue Data Catalog with Qubole Data Platform lets you instantly run your ETL, ad hoc analytics, and machine learning or data science jobs on Qubole using Glue as the metastore.

When And Why To Use AWS Glue?

Glue Data Catalog, a centralized metastore repository available on AWS, presents a unified view of all your data assets on AWS and offers a drop-in replacement of the Hive metastore. Furthermore, it provides some additional enhanced capabilities to discover, classify, and search through your AWS data assets.

As an AWS tool, Glue Data Catalog has various ideal use cases, but it comes with some limitations and may not be suitable for all your use cases. Refer to the AWS documentation for more information to ensure it meets your requirements. 

What Are The Benefits Of Using Qubole With Glue?

Qubole provides you with flexibility and choice to make the most of a unified data lake platform and a unified shared metastore. Glue’s data crawlers can be used to scan and classify data, extract schema details, and build the data catalog. You can then configure Qubole with this catalog as the metastore and share it across your AWS accounts, applications, and services.

With Qubole’s multi-engine support, you will be able to run Hive, Presto queries, and Spark jobs leveraging this catalog. Alternatively, you can continue using your existing or Qubole-hosted metastore and having it synchronized with the Glue Data Catalog.

What Is Supported?

  1. Glue as a metastore in Qubole: All metadata reads and writes go to Glue instead of the default Hive metastore (i.e., the Hive metastore will not be updated).
  2. Glue catalog sync: Hive metastore continues to be the source of truth of metadata operations, but all metadata operations are replicated on Glue Data Catalog as well. In other words, Glue remains updated for consumption (by other AWS services, if required).

For further information, refer to the Qubole product documentation.

Why Choose Qubole?

Accelerate your big data journey on the Qubole Data Lake Platform now with support for Glue Data Catalog. You can run all of your big data workloads with ease on Qubole while using Glue Data Catalog as the centralized metastore on AWS.

Qubole’s goal is to deliver an enterprise-grade data platform with the greatest flexibility, fastest time to value, and lowest TCO with best-of-breed engines across different use cases. If you are already using Glue or plan to use Glue as a metastore, you can now benefit from Qubole’s support for Glue. 

Configure Glue as your metastore and instantly start running your ETL, ad hoc analytics, and machine learning or data science jobs all on Qubole. You can migrate your existing AWS workloads to Qubole with ease and freedom of choice of metastore. With this added flexibility, there is no better time to give Qubole a try! For details on configuring Qubole with Glue, refer to Qubole’s documentation on setting up Glue Data Catalog on Qubole.

Start Free Trial
Read Improving Recover Partitions Performance with Spark on Qubole