Getting started with Spark on QDS for Google Cloud Platform
- By Ashish Sachdeva
- December 17, 2015
Starting today, Qubole Data Service (QDS) users can launch Auto-scaling Spark Clusters and 1-click Persistent Notebooks to analyze data persisting in Google Cloud Storage. To set up a trial account, follow the instructions in our Google Cloud Platform Quick Start Guide.
With auto-scaling, you no longer need to manually set the cluster size to achieve high compute utilization. Instead, with QDS, you can simply specify the desired minimum and maximum size for their cluster. QDS auto-scales up when workloads increase, removing performance bottlenecks. And, when workloads decrease, QDS auto-scales the cluster down, reducing costs on the underlying compute resources.
The Spark Notebook on QDS provides multi-language support, including Scala and Python. Getting started is easy: you simply navigate to your Spark cluster within the QDS Control Panel and select ‘Spark Notebook’.
This takes you to a list of all notebooks for that cluster. Every cluster gets the ExampleNote, which goes through a basic tutorial of the Spark Notebook. Notebooks are persisted and are automatically saved to your Google Cloud Storage account. Here’s an example Scala command run in the notebook.
You get the same elastic pricing and flexibility of experience with Spark Notebooks as you do for Hadoop clusters on QDS. While the cluster is down, there is no charge for the underlying compute VMs. When the cluster is back up, you can continue where you left off in your analysis, since the notebook is automatically persisted. And, you can do all this with just one click within your web browser.
For Google Cloud Platform users who would like to learn more about the wide variety of Spark use cases, including machine learning, stream processing, interactive querying, and large-scale data transformation, we are hosting a webinar on January 14th at 11am PT. Online registration is required to attend.