Apache Hive

What is Apache Hive? Hive is an Apache open-source project built on top of Hadoop for querying, summarizing and analyzing large data sets using a SQL-like interface. It’s noted for bringing the familiarity of relational technology to big data processing with its Hive Query Language, as well as structures and operations comparable to those used by relational databases such as tables, joins and partitions.

Apache Hive is used mostly for batch processing of large ETL jobs and batch SQL queries on very large data sets.

Apache Hive

A self-managing and self-optimizing implementation of Apache Hive

Qubole offers the first Autonomous Data Platform implementation of the original founders’ Apache Hive open source project.

Runs on your choice of popular public Cloud infrastructure

Leverages the platform’s AIR (Alerts, Insights, Recommendations) capabilities to help data teams focus on outcome, instead of the platform

Agent technology augments original Hive with a self-managing and self-optimizing platform:

Cloud-optimized for faster workload performance

  • Smarter object storage access for split computation, batching of writes, pre-fetching, and multiple caching layers, SSD Caching
  • Use of Yarn as resource manager allows Hive metastore to be used across engines (Spark, Presto, Hive)

Easier to integrate with existing data sources and tools

  • ODBC/JDBC drivers
  • Database connectors (MySQL, SQL Server, Oracle DB, RDS, Redshift, Kinesis and many others)
  • Comprehensive dictionary of REST APIs for application integration

Best-in-class security

  • HDFS and SSL encryption
  • SAML Authentication
  • VPC support
  • Dual IAM roles

QDS for AWS

QDS for Azure

QDS for Oracle Cloud

Supported Versions

0.13.1, 1.2 (AWS)

1.2 (Azure, Oracle BMC)

We collect events from our various systems via a Flume pipeline that writes data out to Amazon S3. From there, we use a data processing pipeline hosted by Qubole to process and aggregate statistics to Hive (computing) tables and to an AWS Redshift based data warehouse. For easy access to the data for the entire company, we use Tableau to navigate through our tables and produce visualizations.

Prakash Janakiraman, Co-Founder and VP Engineering at NextDoor