Apache Spark as a Service

 

Qubole Data Service (QDS) makes Spark enterprise ready with Spark processing on the AWS Cloud and Google Cloud Platform. QDS provides flexibility; simplifying time to deployment, making self-sufficient business users, and accelerating time to value.

Apache Spark can process data from a variety of data repositories, including the Hadoop Distributed File System (HDFS) and Amazon Simple Storage Service (S3). Spark supports in-memory processing to boost the performance of big data analytics applications and also supports disk-based processing.

 

Complete Cluster Life Cycle Management

 

Like all big data solutions supported by QDS, clusters scale up and down automatically during query execution. And you don't have to worry about starting or stopping Spark clusters... SEE MORE

Complete Cluster Life Cycle Management

 

Like all big data solutions supported by QDS, clusters scale up and down automatically during query execution. And you don't have to worry about starting or stopping Spark clusters. QDS does all the heavy lifting for you.

 

Quick and Easy Spark Debugging

 

QDS makes it easy to debug both active and historical jobs with a Spark Application UI. Results and logs are always available even without active running clusters.

 

Instance Selection Options

 

QDS supports a wide variety of Amazon EC2 instance types for your Spark workload, giving you the freedom to optimize instance selection for your workload requirements and AWS pricing options.

 

Spot Instance Pricing

 

QDS lets you automatically incorporate Amazon spot instances that can be up to 90% less than the cost of on-demand instances.

 

Elastic Pricing Model

 

With QDS' pay-per-use pricing model, you'll only pay for what you actually use by compute hour.

 

Extensive User Interfaces

 

QDS gives you user interface options to match your use case. The Spark Notebook and a web-based UI are suited for interactive analysis, and the SDK'ss and the REST API are ideal for programmatic access.

 

Amazon Virtual Private Cloud (VPC) Support

 

QDS supports Amazon VPC, a service that extends your private network into the cloud. Amazon VPC provides fine-grained access control both to and from Amazon EC2... SEE MORE

Amazon Virtual Private Cloud (VPC) Support

 

QDS supports Amazon VPC, a service that extends your private network into the cloud. Amazon VPC provides fine-grained access control both to and from Amazon EC2 instances in your virtual network. Plus, you can launch dedicated instances within a VPC on single-tenant hardware.

 

When Should I Use QDS for Spark?

QDS for Spark is ideal for data scientists that use a combination of machine learning, SQL and advanced statistical analysis, and graph processing.

QDS supports all Spark Libraries including:

MLlib (machine learning) | GraphX (graph processing) | Spark SQL | Spark Streaming | SparkR (for running Spark on R)

Machine Learning and Advanced Analytics

By allowing user programs to load data into a cluster's memory and query it repeatedly, Spark is well suited to processing machine learning algorithms. In addition, Spark's MLlib provides common machine learning algorithms such as classification, regression, clustering, collaborative filtering, dimensionality reduction, as well as underlying optimization primitives. SparkR lets you apply many additional algorithms. Using QDS for Spark, you can deliver machine learning applications that turn your data into actionable predictive intelligence, including recommendation engines, sentiment analysis, fraud detection, customer segmentation and many other applications.

 

Interactive Queries and Iterative Algorithm Development

Spark's in-memory capabilities can provide for faster interactive exploration. This works well with a Spark Notebook, an interactive computational environment, in which you can combine code execution, rich text, mathematics, plots, and rich media.

QDS preserves the Spark Notebook even when clusters are not in use.

 

Define Your Own Project

Spark provides support for additional use cases. Spark Streaming is useful for real-time processing of streaming data such as log files. Spark SQL supports relational data processing. And, because Spark typically caches recently-read data in memory, applications requiring fast SQL execution benefit from Spark's speed advantage over slower running Hadoop MapReduce jobs.

How Does Spark Fit into the QDS Landscape?

In QDS, work with Spark, Hadoop MapReduce, Presto, and Hive as part of one unified interface with unified metadata. Choose the right solution for the right workload rather than being locked into any single technology. Use Spark for machine learning and other use cases that benefit from in-memory data and fast response time. Switch to Hive and MapReduce for batch workloads. Similarly, Presto is a proven scalable SQL engine for simple, interactive analysis at companies such as Facebook, Netflix, and Airbnb.

 

Get Started Now

To help accelerate adoption of big data tools such as Spark running on the AWS cloud, Qubole is offering a promotion for commercial AWS users. AWS will cover two weeks of AWS usage for Proof-of-Concepts based on eligibility.

Let Us Fund Your POC!
 
clear