Airflow Movie Recommendation Engine Example

Get started with the basics of using Airflow with each big data engine in Qubole (Spark, Presto and Hive), to build an ETL pipeline to structure the MovieDB dataset. From there, learn how to use Airflow with Spark to run a batch ML job that can be used in productionizing the trained model on the now clean data.

Wikipedia Trends Pipeline with Hive & Airflow

A Big Data app that displays the topics that are trending on Wikipedia. There are two main parts: a webapp in Ruby on Rails that is fed by a Hive data pipeline hosted in the Qubole scheduler, there is also a variation in the demo to use Apache Airflow.

Demo Query that can invoke Qubole Autoscaling

This is a SQL query that was used in the Qubole Autoscaling white paper, and can be used for internal tests against multiple engines (Spark, Presto, and Hive).


Get instant access to Notebook examples by selecting any of the tiles below. Each example varies in difficultly from visualization to Machine Learning use cases using SQL, Python, Scala, AngularJS, and more. Download the Notebooks into Qubole Spark to run them yourself.

Financial Time Series Analysis

Advanced Analytics Retail


Customer Churn

Sentiment Analysis Modeling Using PySpark and H2O

More Notebooks Coming Soon