Apache Hive

What is Apache Hive? Hive is an Apache open-source project built on top of Hadoop for querying, summarizing and analyzing large data sets using a SQL-like interface. It’s noted for bringing the familiarity of relational technology to big data processing with its Hive Query Language, as well as structures and operations comparable to those used by relational databases such as tables, joins and partitions.

Apache Hive is used mostly for batch processing of large ETL jobs and batch SQL queries on very large data sets.

Apache Hive

A self-managing and self-optimizing implementation of Apache Hive

Qubole offers the first Autonomous Data Platform implementation of the original founders’ Apache Hive open source project.

Runs on your choice of popular public Cloud infrastructure

Leverages the platform’s AIR (Alerts, Insights, Recommendations) capabilities to help data teams focus on outcome, instead of the platform

QDS for AWS

QDS for Azure

QDS for Oracle Cloud

Supported Versions

0.13.1, 1.2 (AWS)

1.2 (Azure, Oracle BMC)

We collect events from our various systems via a Flume pipeline that writes data out to Amazon S3. From there, we use a data processing pipeline hosted by Qubole to process and aggregate statistics to Hive (computing) tables and to an AWS Redshift based data warehouse. For easy access to the data for the entire company, we use Tableau to navigate through our tables and produce visualizations.

Prakash Janakiraman, Co-Founder and VP Engineering at NextDoor