Today we are pleased to announce the Beta offering of Qubole’s HBase-as-a-Service. QDS can now provide fully managed HBase 1.0.0 running on Hadoop 2.6.0 as a managed service on the AWS Cloud.
Introduction to HBase
Apache HBase is an integral part of the Apache Hadoop ecosystem. When fast reads and writes with high concurrency and strict consistency are required, experts use Apache HBase . A number of Qubole’s customers also use HBase along with QDS. Until now, they had to set up HBase outside of the Qubole infrastructure. Adding HBase-as-a-Service gives QDS users access to the power and speed of HBase without having to worry about configuration and management.
Qubole takes all the best and most powerful Big Data tools and optimizes them to run in the Public Cloud without the typical complexities of set up or ongoing maintenance.
We spoke to customers who use HBase as well as a few companies that run large HBase clusters on the cloud. A common theme that emerged was the difficulty of operating a HBase cluster in a public cloud environment. These issues were unique to the Cloud context:
- HBase clusters must be resilient to spikes in network latencies.
- Machines must be replaced regularly for various reasons including the cloud provider retiring the VMs.
Optimizing HBase for the public cloud also solves some typical HBase operational pain points in a unique way on the Cloud:
- Compactions can be run on ephemeral machines to avoid overloading regular nodes.
- Compactions could also be completely avoided when adding a new machine.
- Additional Cloud services like ElasticCache can be used to transparently speed up data access
HBase clusters offered by Qubole are also interoperable with Qubole’s Hadoop/Hive and Spark offerings — all from the same, single pane of glass — eliminating challenging integration efforts.
Based on this feedback, the theme of the first phase of our HBase service is Easy Management of HBase on the cloud. To be more specific, we are offering the following features:
- Automated Cluster Management: QDS-managed HBase clusters can be managed from the common cluster management console offered by QDS – along with Hadoop, Spark and Presto clusters. As a result – HBase benefits from all the standard goodness offered by Qubole – one click cluster provisioning, ability to specify VPCs, subnets, EBS, and instance-types.
QDS offers REST APIs to add, remove and replace machines in a HBase cluster. These operations have been optimized to move region data to the correct machines so that compactions are not required to localize the region data on the correct region server.
- HBase/Hadoop2 master HA: Users have the option to run the master nodes in HA configuration. This provides resiliency against losing machines or the need to replace master nodes.
- Incremental Backup/Restore to S3: Apache HBase supports Full Backup and Restore. We have added the ability to perform incremental backups and restores in QDS managed HBase clusters (by importing patches from https://issues.apache.org/jira/browse/HBASE-7912). These incremental backups can be used for DR as well as spinning up Test / Dev clusters.
- Optimal Configuration: Public clouds offer many instance types. HBase running on QDS chooses the best configuration for a given instance type. The right configuration is also important for resiliency in a cloud environment.
- Integration with Apache Zeppelin: HBase shell is a very important tool for triage on HBase clusters. We’ve developed an Apache Zeppelin plugin to easily access a HBase shell. Follow progress in this JIRA.
- Monitoring: All clusters in QDS come with basic monitoring tools. We’ve integrated https://github.com/sentric/hannibal for monitoring HBase specific metrics.
We would like to take this opportunity to thank and highlight the active HBase community that has built a fantastic platform to build this service on – as well as given valuable insights into making improvements to it!
We are ardent believers in dogfooding. We are using OpenTSDB (http://opentsdb.net/) to monitor QDS. The HBase cluster that supports OpenTSDB is run on QDS. Setting up OpenTSDB on QDS requires just a few commands. A sample command-line utility is shared on our Github page: https://github.com/qubole/quboletsdb