Introducing Role-Based Access Control in Apache Airflow

February 26, 2019 by and Updated October 15th, 2019

This blog post is written by Sumit Maheshwari, an Engineering Manager at Qubole, and Aaditya Sharma, who is interning with Qubole in the ETL team.

Security has always been the topmost priority at Qubole, and we’ve approached Apache Airflow with the same mindset since we first started working with the platform (in January of 2016). From day one our Airflow clusters were tightly authenticated and authorized via Qubole’s control tier, and potential security issues like someone leaving Airflow’s DNS publicly accessible never occurred.

Since the development of role-based access control (RBAC) started in the open source community, Qubole was looking forward to including it. We evaluated the first Airflow release with RBAC (1.10.0) and found that it was not of production grade yet, so we’ve released Airflow 1.10.0 without RBAC support and called it 1.10.0 Beta.

Historically, Airflow at Qubole was bound via two roles — Administrator and User — provided they have already successfully passed through the authentications. A User role can do most things, except for some administrative tasks like managing Connections, Variables, XCom, etc.

The latest Airflow release (1.10.2) is pretty stable but still contains the non-RBAC UI code. Soon after the 1.10.2 release, the non-RBAC code was removed from the open source code. As a result, we felt it was a good time to advance our RBAC integration.

Introducing RBAC Beta

We took all commits up to February 5th of 2019 from the master branch of open source and called that the Qubole-specific 1.10.3 release. Please note that 1.10.3 has not been cut or released in open source yet. On the Qubole-specific 1.10.3 branch, we’ve made all the changes that are required to run Airflow as a service in the Qubole environment.


Various airflow versions offered by Qubole

The end result is quite beautiful, and any organization can start leveraging Airflow with RBAC without extra work required to create users or manage roles or policies. By default, any user who has access to an Airflow cluster running in their organization’s account can automatically be mapped to a default role within that Airflow cluster.

A user can have various cluster-level permissions on Qubole. Based on these permissions, the role of the user is get mapped to the Airflow web server.


Cluster-level permissions in Qubole
if can(:manage or :delete or :terminate or :clone or :update, cluster)
  role = "Admin"
elsif can(:start, cluster)
  role = "Op"
elsif can(:read, cluster)
  role = "User"
else
  role = "Viewer"
end

The simple yet powerful mechanism of tying Qubole roles to Airflow roles eases a lot of the pain of managing users. Administrators can also override these roles within Airflow, and those custom configurations will take precedence.

To override the roles, the cluster administrator can create a user on Airflow’s web server and assign the desired role or change the default existing permissions for various roles: Administrator, Op, User, Viewer, and Public. In the below example, an Administrator can assign User 2 to the Administrator role, who was by default was mapped to the Op role.


The role is overridden via webserver

Qubole’s authentication mechanism will first check whether the user exists on the web server created by the administrator. If the user exists, they will be assigned that role; otherwise, their role will be assigned based on cluster permissions. This makes authentication effortless for the administrator as well as the user.


The flow of assigning roles on the web server

We’ve also made sure that only folks with the Administrator role can run Shell Commands (from Analyze) or upload/download DAG files from DAG explorer for that cluster.

  • Blog Subscription

    Get the latest updates on all things big data.
  • Recent Posts

  • Categories

  • Events

    AWS re:Invent

    Dec. 2, 2019 | Las Vegas, NV