The Qubole Data Service (QDS) is a Software-as-a-Service analytics platform running on leading cloud offerings like AWS. Targeted towards data analysts, data scientists and ETL engineers - it can help users to get started analyzing data in a matter of minutes.
How Does Qubole Compare with Other Solutions?
There are a number of solutions and technologies available today to analyze and process large data sets. With the continued growth of data both in volume and in variety, there is a tremendous demand for technologies and solutions that enable end users to keep up with this growth. As a result, the vibrant industry around big data continues to thrive with a number of vendors and solutions. Apache Hadoop and its related technologies such as Apache Hive, Apache Pig, and Apache HBase have emerged as a strong ecosystem of technologies solving a vast variety of big data needs. Qubole currently provides Hadoop/Hive service in the cloud; we will be adding a number of other services and capabilities in the future.
Regarding typical delivery of these solutions to the end user, two categories are apparent:
- Cloud-based solutions
- Private data-center-based solutions
In the following sections, we compare and contrast Qubole with other providers in respect to these two categories.
A couple of prominent service providers for big data in this space include Amazon EMR and Google BigQuery. Qubole is similar to these services in that it is also a cloud-based service. However, Qubole strives to differentiate from these services by providing a wider set of tools focused at the data analyst, data scientist and ETL engineer.
While Google BigQuery provides a fast engine to analyze and run ad hoc queries on highly structured data, Qubole casts a wider net in terms of data format type - both structured and semi-structured - that can be processed with it. At the same time, Qubole goes beyond ad hoc queries to also provide support for writing data pipelines. Additionally, through its plugin mechanisms and adherence to open source systems like Hadoop and Hive, Qubole avoids vendor lock-in by working with open data formats and taking advantage of the rich ecosystem that exists around Hadoop-related technologies.
Amazon EMR provides the same Hadoop benefits that Qubole provides, but severely lacks the tools needed for it to be easily used by analysts, data scientists and ETL engineers. It still operates at the level of clusters, machine provisioning, etc. - complexities that need not be of much interest to data users. Qubole provides higher level abstractions and myriad integrated features that make it much easier to use than Amazon EMR. At the same time, given our team’s deep background in Apache Hive (our founders being creators of that project), we have been able to leverage our unique expertise to optimize Hadoop and Hive to run faster than comparable offerings.
There are a number of technologies and solutions that run in an organization’s own data centers and provide data processing abilities to the users in those organizations. These solutions work quite well for data that is locked up in private data centers and difficult to move to the cloud, either because of the system where it is created or because of sheer volume. On the other hand, these solutions do need to be provisioned for peak demand, even if that demand may arrive for only a few weeks in a year. This "peak provisioning” significantly increases the total cost of ownership (TCO) for these systems. Making these technologies work with a cloud-based solution such as Qubole can help reduce these costs by using the cloud-based solution to statisfy unexpected and "peak" ad hoc query workloads. For example, it is easy to envision a scenario where your private Hadoop clusters can "cloudburst" their workloads to a solution like Qubole and provide a service to the end users with a lower TCO.