We’re excited to announce that the Qubole Data Service (QDS) is now generally available on the Oracle Cloud Infrastructure Service. QDS for Oracle Cloud Infrastructure allows customers to run Spark, Hadoop MapReduce, and Hive jobs in a matter of minutes with automated deployment of Oracle Cloud compute resources and native integration with the Oracle Object Store. This makes it easy for customers to optimize performance, cost, and elasticity by orchestrating and automating workloads on high performance cloud infrastructure.
As data volumes continue to grow in enterprises, the ability to utilize data intelligently is fast becoming a core differentiator. Machine learning enables custom, tailored recommendations that enhance the customer experience. Processing and aggregating across huge, petabyte-level data scale helps deliver insights beyond the immediately obvious.
The open source community is leading the charge in big data analytics with the Apache Spark and Apache Hadoop projects. Spark has become a standard for data scientists with its breadth of tools, including machine learning libraries, graph processing, stream processing, and a simple SQL interface. Hadoop is still the standard for batch processing of big data, with mature tools such as Apache Hive providing reliable processing in data pipelines.
QDS optimizes these open source projects to take advantage of the elasticity, scale, and flexibility of the cloud. In partnering with the Oracle Cloud Infrastructure Service, we’re able to bring features such as auto-scaling, separation of compute and storage, and pay-as-you-go pricing to enterprises looking to build out their big data platform.
Specific to Oracle Cloud Infrastructure Service, we support all the compute shapes on the platform (BM.DenseIO1.36, BM.HighIO1.36, BM.Standard1.36, and all VM shapes). In particular, we’re excited about the shapes that include NVMe SSDs for disk storage, which increase performance by parallelizing queues for disk access (check out the benchmark we performed using these compute shapes).
QDS also provides native support for Oracle’s Object Store, which allows for separation of compute and storage. By using the object store as the central data lake, it allows for more elasticity and flexibility in compute. This means compute clusters can be brought down when not in use, multiple clusters can be used in parallel, and cluster size can be scaled dynamically to match the needs of the users.
The combination of Qubole’s auto-scaling and automatic cluster life-cycle management with the efficient, non-blocking and flat network architecture of Oracle Cloud Infrastructure Service provides users a compelling platform to tackle big data problems.
Finally, here is a list of all the supported open source big data engines (and some that are coming soon!) that can be used as part of QDS on Oracle Cloud Infrastructure:
- Apache Spark 2.0, including Spark SQL, Spark Streaming, MLlib, GraphX, Scala, PySpark, and R.
- Hive 1.2 for SQL-like query processing.
- Hadoop 2.6.0, with support for MapReduce and Cascading, for batch processing.
- Sqoop for import/export of data with relational databases.
- Presto (coming soon!) for fast ad hoc queries.