Airflow on Anaconda: A Match Made in Heaven, Perfected by Qubole
Apache Airflow is a workflow management platform used to author workflows as Directed Acyclic Graphs (DAGs). This makes it easier to build data pipelines, monitor…
Apache Airflow is a workflow management platform used to author workflows as Directed Acyclic Graphs (DAGs). This makes it easier to build data pipelines, monitor…
Qubole Data Platform orchestrates thousands of clusters in the cloud for our customers on a daily basis. From years of experience, we’ve learned from cluster…
This notebook will walk you through the process of building and using a time-series analysis model to forecast future sales from historical sales data. In…
Our customers at Qubole use notebooks with Apache Spark as the back-end to build machine learning pipelines. Often, it is the data scientists who develop…
A 2018 Gartner article discussed the necessity of data lakes when it comes to implementing big data, stating “the fact remains that more than 80…
Customers often configure a small minimum size when autoscaling Presto clusters to save on costs. However, scheduling queries on a small cluster leads to query…
This post is a guest publication written by Wesley Goi, a Data Scientist at Honestbee. A version of this post first appeared on Medium’s Data…
If ever a problem and a solution were made for each other, it’s autonomous driving and Artificial Intelligence (AI). Turning the dream of driverless cars…
This post covers the use of Qubole, Zeppelin, PySpark, and H2O PySparkling to develop a sentiment analysis model capable of providing real-time alerts on customer…
Notebooks and Dashboards are the most common ways for Qubole users to play with data interactively using Apache Spark and Presto. Our notebook and dashboard…
Lyft’s recent IPO filings revealed a snapshot of the company’s staggering cloud costs. According to the filing, Lyft is contractually obligated to pay at least…
I’m very excited to announce our expanded partnership with Google Cloud Platform (GCP). We have joined forces to offer an enterprise self-service data platform powered…
Free access to Qubole for 30 days to build data pipelines, bring machine learning to production, and analyze any data type from any data source.
See what our Open Data Lake Platform can do for you in 35 minutes.