Apache Hive as a Service
QDS optimizes Hive to run on Amazon Web Services (AWS), Google Compute Engine (GCE) and Microsoft Azure so that you can have the flexibility you need to succeed. Choose the cloud that’s right for you, knowing that QDS will make it simple, fast, cost effective and secure to process your big data.
Hive is a data warehouse infrastructure built on top of Hadoop for querying, summarizing and analyzing large data sets. It’s noted for bringing the familiarity of relational technology to big data processing with its Hive Query Language that is similar to standard SQL as well as comparable structures and operations to those used by relational databases such as tables, joins and partitions.
Complete Cluster Life Cycle Management
Like all big data solutions supported by QDS, clusters scale up and down automatically during query execution. And you don’t have to worry about starting or stopping Hive clusters. QDS does all the heavy lifting for you.
Cloud Optimized with Autoscaling
Read and write optimization for cloud storage dramatically enhances query performance and the user experience and reduces processing costs. Plus, with advanced Autoscaling, you’ll pay only for resources actually used.
The Hive metastore can be extended as a reference for all QDS Data Engines. Metadata is shared so that users can run SQL queries leveraging their metadata across all aspects of QDS.
QDS offers an extensive ODBC Connector Library so that analysts can use their favorite BI and data visualization tools with Hive.
Spot Instance Pricing
If you’re running on AWS, QDS lets you automatically incorporate Amazon spot instances that can be up to 90% less than the cost of on-demand instances.
Elastic Pricing Model
With QDS’ pay-per-use pricing model, you’ll only pay for what you actually use by compute hour.
Extensive User Interfaces
QDS gives you user interface options to match your use case. The web-based Workbench UI are suited for interactive analysis, and the SDKs and the REST API are ideal for programmatic access.
When Should I Use QDS for Hive?
Hive is used mostly for batch processing of large ETL jobs and batch SQL queries on very large data sets.
Batch Processing for Extract, Transform and Load (ETL)
One of the major benefits of Hive is the ability to extract, transform and load (ETL) large datasets in Hadoop rather than writing complex MapReduce programs. Technical users can easily execute batch ETL jobs to transform unstructured and semi-structured data into usable schema-based data. Hive is well suited for ETL with its mapping tools and a Hive Metastore that makes metadata for Hive tables and partitions easily accessible.
Batch SQL Queries
Hive is designed for batch queries on very large data sets (petabytes of data and beyond). Data analysts run SQL-like queries against data stored in Hive tables to turn the data into business insight. The Hive Metastore contains schemas and statistics which are useful in data exploration, query optimization, and query compilation.
Often, when traditional data sources can’t handle the processing of large SQL queries, users can import data into Hive and then run their queries there.