Welcome to the Qubole Education landing page – our mission is to ensure your success with the available Cloud Technologies as well as the QDS Product. We recommend bookmarking this page so you can quickly return to register for Instructor Led Free Public Courses and stay up to date with the new material we are continually developing. Select a tab below to browse our services and begin your cloud education!

Self Service Education

Get started with Qubole’s Self Service Education offering by selecting the link below to navigate and enroll in the available courses which features videos, exercises and quizzes.

Please be aware that currently the Self Service Education environment is not yet tied to the Qubole product therefore you will need to create an additional login. We do recommend signing up with the email address used for QDS for when this integration occurs.

All Available Courses

“Getting Started” Courses

Qubole provides access to In Application Tutorials which provide high level overviews of the functionality within the application via a lightweight walkthrough widget.

The structure of the environment as well as how to navigate are reviewed in the In Application Tutorials.

These are available via the Help Center inside of Qubole – we recommend beginning with the below:

“Getting Started” Walk Through

Free Public Courses

Qubole provides free instructor led public courses which include access to a training environment for exercises.

Please scroll down to view the current schedule and access registration links.

Please register with the email address you use to log into the Qubole product.

Registration is manually accepted and upon approval a confirmation email will be sent.

All times stated in Eastern Standard Time.

Course TitleDate & Registration LinkDuration
Spark for End Users / Beginners / Analysts6/19 @ 2pm ET120 minutes
Selecting a SQL Engine6/26 @ 2pm ET120 minutes
Hive for Data Engineers7/3 @ 2pm ET120 minutes
Spark for Data Engineers7/10 @ 2pm ET120 minutes
Hive for Data Ops7/26 @ 2pm ET90 minutes
Spark for Data Ops8/2 @ 2pm ET90 minutes

QDS Product Courses

The following is our Product Course Catalogue which are part of our Version 1 course content.

This content will be updated to Version 2 by the end of Q3 2017.

Please contact education@qubole.com to schedule private training sessions of any of the courses listed below.

Please select the course number for the link to the course description.

DescriptionDurationQuestions Answered
User 101 120 minWhat is Qubole, what are the features available to me as a user and how does it interact with the cloud on my behalf?
User 201120 minWhat are the SQL engines available to me in the cloud, how do they differ and what are the decision points I need to consider when selecting the right SQL engine?
Admin 101(AWS)

Admin 101(Azure)

120 minWhat is a Cluster, what are the features available to me as an administrator and how do I manage the Users in my environment?
Admin 201120 minThere are several Clusters available in Qubole, what are the associated use cases and how do I select the right Cluster?
Admin 202120 min(AWS ONLY) How can Task Focused Analysis help me achieve my desired parallelism and select the best Instance Type while estimating the expected cost of running my Cluster for an hour?
API 10190 minHow do I leverage the REST API, what tool can I use to simplify query submission and what types of commands are available to me in Qubole?
Airflow90 minWhat is Airflow, how do I use the Airflow options in Qubole and how do I trigger DAGS that exist in Airflow from within Qubole.
ODBC JDBC30 minWhat are the ODBC and JDBC drivers and how are these used for connectivity to Qubole for reporting purposes? Note – there are no labs associated with this presentation.

Persona Engine / Cluster Courses

The following are our Persona Based Engine / Cluster Courses which are part of our Version 2 content.

These courses will become available in the Self Service Education through July 2017.

Please contact education@qubole.com to schedule private training sessions of any of the courses listed below.

DescriptionDuration w/ LabsAgenda
Spark for Data Analysts120 minPreRequisite: knowledge of SQL & familiarity with java, scala or python

Spark Commands , Resilient Distributed Datasets, Data Frames, Scala vs Python, Spark Notebooks, Spark Tuning, Executor AutoScaling, Notebook Interpreters

Spark for Data Engineers120 minPreRequisite: Spark for Data Analysts

Spark Execution Model, Actions & Transformations, Stages & Shuffle, Spark Parallelism Management, Executors, Cores & Tasks, Memory Settings, Executor AutoScaling, Job Server

Spark for Data Scientists120 minPreRequisite: Spark for Data Analysts

Spark Notebooks, Spark Functionality, Qubole Features, Notebook API Execution, Notebook Dashboards, Notebook Tuning, Interpreter Configuration, Executor Management Troubleshooting

Spark for Data Ops90 minPreRequisite: Spark for Data Engineers

Spark Cluster Architecture, Yarn Cluster Behavior, Spark Cluster, Spark Job Submission, Spark Notebook Administration, Notebook Submission, Notebook Logs

Hive for Data Analysts90 minPreRequisite: knowledge of SQL

Hive Commands, What is MapReduce?, Hive SQL, Hive SQL Syntax, By Clauses, Transitioning from Database, Hive Tuning, Query Level Settings, Map Joins, Enabling Tez

Hive for Data Engineers90 minPreRequisite: Hive for Data Analysts

Hive Dynamic Partitioning, Syntax & Best Practices, Too Many Small Files, Entire System Scan, Hive Commands, Improving Performance, Advanced Join Options

Hive for Data Ops120 minPreRequisite: Hive for Data Analysts

Hive Data Preparation, Columnar Optimizations, HDFS Split Size, Common Failure Scenarios, Hive Environment Management, Controlling Environment Behavior, Common Failure Scenarios

Presto for Data Analysts90 minPreRequisite: knowledge of SQL

Presto Commands, Use Case, Comparison to Hive Comparison to RDBMS, Hive Metadata, Presto Tuning, Syntax Best Practices, Job Lag, Job Failure

Presto for Data Ops90 minPreRequisite: Presto for Data Analysts

Presto Data Preparation, Columnar Data Format, Ordering Data, Snappy Compression, Split Slots, Presto Execution Tuning, Memory Pools, Managing Memory

Academic Engine / Cluster Courses

The following are our Academic Engine / Cluster Courses which are part of our Version 1 content.

These courses will be retired in July 2017 – while they will still be available for private delivery they will no longer be listed in our catalogue and will be replaced in the Self Service Education offering with the new Persona Based Engine / Cluster content.

Please contact education@qubole.com to schedule private training sessions of any of the courses listed below.

DescriptionDuration w/ LabsAgenda
Spark 101120 minWhat is Spark and the associated use cases, what are RDDs, Data Frames and Executors and how do I use Spark and Notebooks?
Spark 20190 minPreRequisite: Spark 101

How does Spark process data, why is Spark considered lazy and how does the structure of my code affect the Data Shuffle?

Spark 20260 minPreRequisite: Spark 201

How does the Spark Application UI support analysis of the Spark jobs as well as the behavior of the Stages, Tasks and Data Shuffle?

Spark 30190 minPreRequisite: Spark 201

How does Spark allocate memory across and within Executors and how can I anticipate and manage the parallelism in my Cluster?

Spark 30260 minPreRequisite: Spark 301

What additional considerations can affect Spark performance and how do I know when I need to increase the Cluster power?

Spark MLlib60 minPreRequisite: Spark 301

What is the Spark Machine Learning Library, how can I use it in Qubole and how do I create a Notebook Recommendation Engine?

Hive 10190 minWhat is Hive, how does MapReduce work and how should I think about writing SQL in Hive since it gets converted to MapReduce?
Hive 20190 minPreRequisite: Hive 101

I understand the basics of writing SQL in Hive, how can I control the MapReduce produced by Hive in response to the SQL I write?

Hive 20290 minPreRequisite: Hive 101

How can I be more efficient with my MapReduce memory usage and what are some best practices for Dynamic Partitioning?

Tez 10190 minPreRequisite: Hive 101

What is Tez, how does it differ from traditional MapReduce, what is the use case for Tez and how can I use it in Qubole?

Presto 10190 minWhat is Presto, what are the features available to me as a user and how can I optimize my Presto performance as an analyst?
Presto 20160 minPreRequisite: Presto 101

How does Presto manage memory across queries, how can I prevent failure and how can I optimize Presto as an administrator?