What is Apache Airflow?

Apache Airflow

Hey guys. Welcome back to the second episode of Airflow 101. In this episode, let’s talk about what is Apache Airflow. Airflow is an open-source platform for authoring, scheduling, and monitoring workflows. Now let me break down that definition into simpler terms.

Author in workflows in Airflow is written as directed acyclic graphs (DAGs) in Python programming language, and you can define your own graphs using that scheduling. The user can specify when a workflow should start, or end, after what interval it should run again, etc.

Monitoring Airflow provides an interactive interface to monitor your workflows. It has a bunch of different tools to monitor your workflows in real time.

Let’s represent the process of baking a pizza.

As a workflow, I have broken down the entire process into smaller tasks:

  1. The first task can be kneading the dough. It will require some flour, oil, yeast, and water.
  2. Another task can be preparing toppings, and getting sauce and cheese.

Only when these two tasks are complete, will I be able to move on to the next task which is:

Putting all the items on the base.

  1. The final task can be baking the pizza.

I have essentially broken down the process of baking a pizza into a workflow.

Let’s talk about the history and current scenario revolving around Apache Airflow. Airflow was started in October 2014 at Airbnb and the project joined Apache Software’s Incubator Program in March 2016.

As I speak, the project has over 15,000 stars on GitHub.

Airflow is used by all major companies such as Lyft, Cuebol, Slack, etc. In the next video, I’ll be talking about why you should use Apache Airflow. See you all in the next video.