Hive is an Apache open-source project built for querying, summarizing, and analyzing large data sets using a SQL-like interface. It is noted for bringing the familiarity of relational technology to big data processing with its Hive Query Language, as well as structures and operations comparable to those used with relational databases such as tables, JOINs, and partitions.
Apache Hive is particularly good for analyzing large data sets with complex JOIN conditions. For example, batch SQL processing; exploratory queries on large volumes of data; queries that could be interrupted and need to be resumed, among others.
Qubole has provided a managed Hive service since 2013, with multiple Hive versions and regular upgrade cadence. HIve on Qubole was designed with cloud optimizations since the beginning, and tailored to the needs of organizations that are either migrating to, or already have a cloud data lake deployed.
Qubole blends the latest features from the open-source community with Qubole’s proprietary solutions to boost performance, reduce costs, improve user experience, and simplify administration and management.
|Workload-aware autoscaling, for adapting to variability and burstiness of workloads|
|Multiple HiveServer2 Instances to accommodate burst traffic and increase the throughput of the service.|
|Direct writes eliminating slower file copy operations in cloud storage|
|Faster cloud storage I/O|
|Automatic statistics collection and management for better query planning and execution|
|Automated Cluster Lifecycle Management|
|Heterogeneous instances to leverage price differences from other instance families, while keeping clusters at peak efficiency|
|Container Packing and Aggressive Downscaling when cluster only has light usage|
|Specialized support for cost-optimal scaling|
|SQL-standards based Hive Authorization and Apache Ranger Support|
|ACID transactions support|
|Compliance (HIPAA, SOC2, ISO-27001)|