Airflow on Anaconda: A Match Made in Heaven, Perfected by Qubole

Start Free Trial
May 28, 2019 by Updated March 1st, 2021

Apache Airflow is a workflow management platform used to author workflows as Directed Acyclic Graphs (DAGs). This makes it easier to build data pipelines, monitor them, and perform ETL operations. A simple machine learning task may involve complex data pipelines. Triggering and monitoring these pipelines manually may cause unnecessary overhead and errors.

Qubole offers Airflow running on top of the Anaconda environment to make running machine learning pipelines and data science tasks seamless. Anaconda is an open source Python distribution for data science, machine learning, and large-scale data processing tasks with over 1,400 packages. This gives users the ease of running huge data pipelines along with better package support for their tasks. Qubole also offers Package Management, which allows users to install various Anaconda packages on their clusters directly from the UI without restarting the clusters.

Running Airflow on the Anaconda environment provides users with the simplicity of running machine learning and data science tasks by building complex data pipelines. It also gives them the flexibility to install various packages optimized for data science tasks available within the Anaconda environment on the go with the help of Qubole’s package management feature.

How to Run Airflow on Anaconda with Qubole

Step 1: Creating a cluster

  • From the cluster page, select the Airflow cluster with the Python version set to 3.5. This will automatically attach this cluster to an Anaconda environment.
  • A new Airflow cluster will be created and can then be used.

Step 2: Adding packages

  • Various Python packages can be installed on the cluster from the Qubole Environments page without restarting the cluster. Just open the page and select your cluster.
  • Add the package you require. The selected package will be installed in the Anaconda environment.

Step 3: Running shell commands on the cluster

  • Qubole provides the flexibility of performing various shell commands directly from the Analyze page.

With the steps shown above, we have demonstrated how you can simplify the building of your data pipelines with the help of Qubole. Now you can build, train, and deploy various machine learning/ data science pipelines effortlessly right on top of the Anaconda environment with the support of package management.

Start Free Trial
  • Blog Subscription

    Get the latest updates on all things big data.
  • Recent Posts

  • Categories

  • Events

    QUBOLE LIVE DEMO: Google Cloud Platform (GCP) Enables You To Simplify Today and Future Proof for Tomorrow

    Jan. 27, 2022 | Global

    Data Lake and Data Warehouse – A modern data strategy discussion

    Feb. 2, 2022 | Online

    QUBOLE LIVE DEMO: Stop The Cloud Cost Madness With Graviton and AWS. Switch And Save to Reduce Your Data Lake Costs Today

    Feb. 3, 2022 | Global

    CONTINUOUS INTELLIGENCE DAY – Continuous Intelligence in Finance 2022 and beyond

    Feb. 24, 2022 | Global

    Data Innovation Summit MEA 2022

    Mar. 7, 2022 | Global

    Data2030 Summit 2022 – APAC Edition – Data Strategies For Data And AI-Driven Organisations

    May. 24, 2022 | Global
  • Read How to Increase the Scalability of HiveServer2 with Qubole