One Service for All of Your Data


Qubole is based on the popular open source projects Apache Hadoop and Apache Hive. These technologies have over the years proved themselves to be equally useful in processing both structured data from sources such as RDBMS as well as semi-structured data from sources such as application logs, web crawl data, and JSON-encoded data. Qubole builds on top of these technologies to provide a single service interface that can be used to store, correlate and join both structured and semi-structured data. As a result, Qubole attempts to provide a true platform where data from different sources can be combined together to unlock insights that are not apparent if these sets are maintained in their silos. The following capabilities in Qubole make this possible:


  • Built-In Format Readers for JSON and Delimited Data
  • Qubole includes readers that can read JSON and character-delimited data (including CSV and TSV). It also provides easy tools to create metadata for such data formats stored in Amazon S3. As a result, these data sets can easily and quickly be converted into table and table partition objects using Qubole. They can then be easily queried and manipulated using SQL operations like joins, WHERE clause filters, and group bys for aggregations.

  • Ability to Plug In User-Defined Format Readers
  • Qubole also allows users to plug in code for reading data sets in any non-standard and non-built-in formats. Qubole supports the Apache Hive SerDe interface to write these format readers. This ensures that a lot of Hive SerDes (just another name for format readers) that are already available in open source can easily be used with Qubole.

  • Code Execution in Different Languages
  • Qubole leverages the Apache Hive's Map-Reduce support to enable users to transform data through algorithms and libraries that may not fit in well with a SQL abstraction. With these capabilities, Qubole gives the users the ability to execute transformation logic written in any language on their data sets. More importantly, the results of these transformations can then be tied back into a SQL workflow. This makes Qubole a platform that can truly support a number of generic types of computations in various different data formats.

  • Easy Mechanisms to Evolve Metadata with the Data
  • Qubole provides easy mechanisms to evolve the structure of tables and table partitions along with associated data sets. Partly through supporting formats like JSON and partly though Apache Hive's mechanisms of modifying table and table partition structures, Qubole enables the end user to keep up with the changing structure of a data set. The powerful language mechanisms in HiveQL provide a seamless way of querying such evolving data sets without having to go through time-consuming exercises of data and schema conversion.