Apache Hive

Hive is an Apache open-source project built on top of Hadoop for querying, summarizing and analyzing large data sets using a SQL-like interface. It’s noted for bringing the familiarity of relational technology to big data processing with its Hive Query Language, as well as structures and operations comparable to those used by relational databases such as tables, joins and partitions.

Hive is used mostly for batch processing of large ETL jobs and batch SQL queries on very large data sets.

A self-managing and self-optimizing implementation of Hive

Qubole offers the first Autonomous Data Platform implementation of the original founders’ Apache Hive open source project.

Runs on your choice of popular public Cloud infrastructure

Leverages the platform’s AIR (Alerts, Insights, Recommendations) capabilities to help data teams focus on outcome, instead of the platform


Supported Versions

0.13.1, 1.2 (AWS)

1.2 (Azure, Oracle BMC)

We collect events from our various systems via a Flume pipeline that writes data out to Amazon S3. From there, we use a data processing pipeline hosted by Qubole to process and aggregate statistics to Hive (computing) tables and to an AWS Redshift based data warehouse. For easy access to the data for the entire company, we use Tableau to navigate through our tables and produce visualizations.

Prakash Janakiraman, Co-Founder and VP Engineering at NextDoor