Apache Hive as a Service
QDS optimizes Hive to run on Amazon Web Services (AWS), Google Compute Engine (GCE) and Microsoft Azure so that you can have the flexibility you need to succeed. Choose the cloud that’s right for you, knowing that QDS will make it simple, fast, cost effective and secure to process your big data.
Hive is a data warehouse infrastructure built on top of Hadoop for querying, summarizing and analyzing large data sets. It’s noted for bringing the familiarity of relational technology to big data processing with its Hive Query Language that is similar to standard SQL as well as comparable structures and operations to those used by relational databases such as tables, joins and partitions.
Complete Cluster Life Cycle Management
Like all big data solutions supported by QDS, clusters scale up and down automatically during query execution. And you don’t have to worry about starting or stopping Hive clusters. QDS does all the heavy lifting for you.
Cloud Optimized with Autoscaling
Read and write optimization for cloud storage dramatically enhances query performance and the user experience and reduces processing costs. Plus, with advanced Autoscaling, you’ll pay only for resources actually used.
The Hive metastore can be extended as a reference for all QDS Data Engines. Metadata is shared so that users can run SQL queries leveraging their metadata across all aspects of QDS.
QDS offers an extensive ODBC Connector Library so that analysts can use their favorite BI and data visualization tools with Hive.
Spot Instance Pricing
If you’re running on AWS, QDS lets you automatically incorporate Amazon spot instances that can be up to 90% less than the cost of on-demand instances.
Elastic Pricing Model
With QDS’ pay-per-use pricing model, you’ll only pay for what you actually use by compute hour.
Extensive User Interfaces
QDS gives you user interface options to match your use case. The web-based Workbench UI are suited for interactive analysis, and the SDKs and the REST API are ideal for programmatic access.
When Should I Use QDS for Hive?
Hive is used mostly for batch processing of large ETL jobs and batch SQL queries on very large data sets.
Batch Processing for Extract, Transform and Load (ETL)
One of the major benefits of Hive is the ability to extract, transform and load (ETL) large datasets in Hadoop rather than writing complex MapReduce programs. Technical users can easily execute batch ETL jobs to transform unstructured and semi-structured data into usable schema-based data. Hive is well suited for ETL with its mapping tools and a Hive Metastore that makes metadata for Hive tables and partitions easily accessible.
Batch SQL Queries
Hive is designed for batch queries on very large data sets (petabytes of data and beyond). Data analysts run SQL-like queries against data stored in Hive tables to turn the data into business insight. The Hive Metastore contains schemas and statistics which are useful in data exploration, query optimization, and query compilation.
Often, when traditional data sources can’t handle the processing of large SQL queries, users can import data into Hive and then run their queries there.
“We collect events from our various systems via a Flume pipeline that writes data out to Amazon S3. From there, we use a data processing pipeline hosted by Qubole to process and aggregate statistics to Hive (computing) tables and to an AWS Redshift based data warehouse. For easy access to the data for the entire company, we use Tableau to navigate through our tables and produce visualizations.”
Co-Founder and VP Engineering at NextDoor
How Does Hive Fit into the QDS Landscape?
QDS gives you the freedom to work with Hive, Hadoop MapReduce, Spark, and Presto as part of one unified interface with unified metadata. Choose the right solution for the right workload rather than being locked into any single technology. Hive and MapReduce are tried and proven for batch ETL and SQL workloads where reliability and stability are of highest importance. In contrast, Spark is great for machine learning and other use cases that benefit from in-memory data and fast response time while Presto is a proven scalable SQL engine for simple, interactive analysis at companies such as Facebook, Netflix, Airbnb, and more.
Get Started Now
Qubole offers 2 weeks of QDS usage for free to explore Pig and other data engines. Users simply need to authenticate with SSO or enter their AWS credentials to begin interacting with their data in their own cloud environment. Try Pig Today!