APACHE HIVE

Home >
Developers >
Apache Hive on Qubole

What is Apache Hive?

Hive is an Apache open-source project built for querying, summarizing, and analyzing large data sets using a SQL-like interface. It is noted for bringing the familiarity of relational technology to big data processing with its Hive Query Language, as well as structures and operations comparable to those used with relational databases such as tables, JOINs, and partitions.

Apache Hive is particularly good for analyzing large data sets with complex JOIN conditions. For example, batch SQL processing; exploratory queries on large volumes of data; queries that could be interrupted and need to be resumed, among others.

Want to learn more about Hive and Qubole?

Learn More

HIVE IN BIG DATA

Qubole has provided a managed Hive service since 2013, with multiple Hive versions and regular upgrade cadence. HIve on Qubole was designed with cloud optimizations since the beginning and tailored to the needs of organizations that are either migrating to or already have a cloud data lake deployed.

Qubole blends the latest features from the open-source community with Qubole’s proprietary solutions to boost performance, reduce costs, improve user experience, and simplify administration and management.

KEY BENEFITS OF APACHE HIVE ON QUBOLE

Fast Time to Value

Guided steps to create Hive clusters in minutes
Multiple interfaces to access data via UIs, APIs, and drivers

Cost Efficiency

Reduce overall data processing costs by up to 50% compared to self-managed infrastructures

Productivity with Improved Performance

Curated table metadata management
Performance optimization with cloud storage for faster query processing

Enterprise-Ready

Enterprise-grade security
JDBC/ODBC connectors integrated with mainstream BI tools

APACHE HIVE ON QUBOLE

Hive Autoscaling

	Qubole	Open Source
Workload-aware autoscaling, for adapting to variability and burstiness of workloads
Multiple HiveServer2 Instances to accommodate burst traffic and increase the throughput of the service.

Hive Performance

	Qubole	Open Source
Direct writes eliminate slower file copy operations in cloud storage
Faster cloud storage I/O
Metadata caching
Automatic statistics collection and management for better query planning and execution

Hive Cost Optimization

	Qubole	Open Source
Automated Cluster Lifecycle Management
Heterogeneous instances to leverage price differences from other instance families, while keeping clusters at peak efficiency
Container Packing and Aggressive Downscaling when cluster only has light usage
Specialized support for cost-optimal scaling