Tech Blog

Cloud-native Big Data Activation Platform

  • Qviz – Qubole Visualization Framework for Jupyter-Based Notebooks

    Qviz – Qubole Visualization Framework for Jupyter-Based Notebooks

    Data visualization is a critical aspect of Exploratory Data Analysis that helps Data Analysts and Scientists visualize frequency distributions, explore causal/correlated relationships between...

    Read Article
  • Data Discovery Tools – Qubole Workbench

    Data Discovery Tools – Qubole Workbench

    It is common knowledge that data lakes offer the right architecture to support multiple use cases and tools, but can be operationally complex to implement… The post Data Discovery Tools – Qubole...

    Read Article
  • Apache Airflow Tutorial – DAGs, Tasks, Operators, Sensors, Hooks & XCom

    Apache Airflow Tutorial – DAGs, Tasks, Operators, Sensors, Hooks & XCom

    Now that you have read about how different components of Airflow work and how to run Apache Airflow locally, it’s time to start writing our… The post Apache Airflow Tutorial – DAGs, Tasks,...

    Read Article
  • Presto on Qubole is 2.6x faster than competition!

    Presto on Qubole is 2.6x faster than competition!

    In the past 2-3 years, Presto has set the bar for fast analytical processing in modern cloud data lake architectures. Qubole has offered a Presto… The post Presto on Qubole is 2.6x faster than...

    Read Article
  • Terraforming the Open Data Lake

    Terraforming the Open Data Lake

    Image credits: https://science.howstuffworks.com/terraforming.htm The Qubole Open Data Lake Platform Qubole is the open data lake company that provides a simple and secure data lake platform… The...

    Read Article
  • Logan: A Data-Driven Log Analyzer for Easy Navigation of Apache Spark Logs

    Logan: A Data-Driven Log Analyzer for Easy Navigation of Apache Spark Logs

    Running Large distributed Apache Spark clusters in the public cloud, that handle exponential increase in volumes of data to fuel analytics and machine learning (ML)… The post Logan: A Data-Driven...

    Read Article
  • Cost and Performance efficiency with Multi-tenant Spark Platform

    Cost and Performance efficiency with Multi-tenant Spark Platform

    Introduction Ad-hoc analytics and data exploration require compute resources that can process incoming jobs instantaneously and keep the response time low. Apache Spark is a… The post Cost and...

    Read Article
  • Columnar Format in Data Lakes  For Dummies

    Columnar Format in Data Lakes For Dummies

    Columnar data formats have become the standard in data lake storage for fast analytics workloads as opposed to row formats. Columnar formats significantly reduce the… The post Columnar Format in...

    Read Article
  • Introducing Managed Spot Block Instances that provide up to 40% cost savings

    Introducing Managed Spot Block Instances that provide up to 40% cost savings

    Qubole is excited to announce the general availability of Managed Spot Block instances that provides up to 40% cost savings over On-Demand Ec2 Instances. Managed… The post Introducing Managed Spot...

    Read Article
  • Boosting Parallelism for ML in Python using scikit-learn, joblib & PySpark

    Boosting Parallelism for ML in Python using scikit-learn, joblib & PySpark

    As a general-purpose programming language, Python is universal. It’s quick and easy, but yet powerful with plenty of capabilities. It gives you an opportunity to… The post Boosting Parallelism for...

    Read Article
  • Introducing Qubole Release 59

    Qubole regularly releases its software for processing petabytes of data on the cloud through major releases once a quarter. This is in addition to several… The post Introducing Qubole Release 59...

    Read Article
  • Rails: Why Upgrading Matters – Part 2

    Rails: Why Upgrading Matters – Part 2

    This is Part 2 of a 2 blog series on this topic.  You can read Part 1 here. Rollout Strategy:  We have different tiers in… The post Rails: Why Upgrading Matters – Part 2 appeared first on Qubole.

    Read Article
  • Lower Time-To-Insight: the elusive streaming data processing goal

    Lower Time-To-Insight: the elusive streaming data processing goal

    What’s keeping streaming data processing investments from yielding “speedy” results? There are multiple streaming data processing solutions out there but none are well equipped to… The post Lower...

    Read Article
  • Ruby on Rails: Why Upgrading Matters – Part 1

    Ruby on Rails: Why Upgrading Matters – Part 1

    Ruby on Rails (or Rails) is a web development  framework that gives Rails developers an optimized experience to write their (Ruby) code. Rails is one… The post Ruby on Rails: Why Upgrading Matters...

    Read Article
  • New Enhancements for Qubole Notebooks

    New Enhancements for Qubole Notebooks

    In an earlier blog post, we discussed the availability of Jupyter-based Notebooks for machine learning (ML) and analytics with a host of features that make… The post New Enhancements for Qubole...

    Read Article
  • How to Optimize Spark Applications for Performance using Qubole Sparklens

    How to Optimize Spark Applications for Performance using Qubole Sparklens

    This final part of the three part spark optimization series explains how a Spark application can be optimized for performance by using Qubole Sparklens. The… The post How to Optimize Spark...

    Read Article
  • Managing Apache Spark Packages on Qubole

    Managing Apache Spark Packages on Qubole

    With 90% of all the data in the world created in just the last 2 years, data scientists who historically worked on Python or R… The post Spark Packages – How to Manage Them appeared first on Qubole.

    Read Article
  • Spark Cluster Optimization for Cost, Reliability and Performance

    Spark Cluster Optimization for Cost, Reliability and Performance

    How to Optimize Spark Clusters on Qubole for Cost Reliability and Performance This second blog from the three part series explains how a Spark cluster… The post Spark Cluster Optimization for...

    Read Article
  • Maximizing Spot Utilization by Leveraging Qubole Heterogeneous Clusters

    Maximizing Spot Utilization by Leveraging Qubole Heterogeneous Clusters

    How Qubole Maximizes Spot Utilization and Reduces Costs One of our customers—a large enterprise cloud content management company—runs several sophisticated machine learning (ML) predictive...

    Read Article
  • How to Install Apache Airflow to Run Different Executors

    How to Install Apache Airflow to Run Different Executors

    Now that we know about Airflow’s different components and how they interact, let’s start with setting up Airflow on our workstation so that we can… The post How to Install Apache Airflow to Run...

    Read Article
  • loading
    Loading More...