Corporate Blog

Cloud-native Big Data Activation Platform

  • A Message to Our Customers & Partners from Qubole CEO Ashish Thusoo

    To our valued customers and partners, I hope all of you, your colleagues, families and friends are safe and healthy and practicing social distancing in… The post A Message to Our Customers &...

    Read Blog
  • Data Lake Essentials, Part 3 – Data Catalog and Data Mining

    Data Lake Essentials, Part 3 – Data Catalog and Data Mining

    Data Lake Essentials, Part 3 – Data Lake Data Catalog, Metadata and Search In this multi-part series we will take you through the architecture of… The post Data Lake Essentials, Part 3 – Data...

    Read Blog
  • Cloud Data Lakes – Best Practices

    Cloud Data Lakes – Best Practices

    This is an abridged version of the article that appears on NewStack BI tools have been the go-to for data analysts who help business track… The post Cloud Data Lakes – Best Practices appeared...

    Read Blog
  • Apache Airflow Tutorial – ETL/ELT Workflow Orchestration Made Easy

    Apache Airflow Tutorial – ETL/ELT Workflow Orchestration Made Easy

    Apache Airflow is one of the most powerful platforms used by Data Engineers for orchestrating workflows. Airflow was already gaining momentum in 2018, and at… The post Apache Airflow Tutorial –...

    Read Blog
  • Data Lake Essentials, Part 2 – File Formats, Compression and Security

    Data Lake Essentials, Part 2 – File Formats, Compression and Security

    Data Lake essentials, part 2 – file formats, compression and security In this multi-part series we will take you through the architecture of a Data… The post Data Lake Essentials, Part 2 – File...

    Read Blog
  • Data Lake Essentials – Part 1 – Storage and Data Processing

    Data Lake Essentials – Part 1 – Storage and Data Processing

    Data Lake essentials, part 1 – storage and data processing In this multi-part series we will take you through the architecture of a Data Lake.… The post Data Lake Essentials – Part 1 – Storage and...

    Read Blog
  • Apache Spark Benchmark for Autoscaling: Qubole versus competition

    Apache Spark Benchmark for Autoscaling: Qubole versus competition

    This blog covers new benchmark tests to better understand Autoscaling behaviour of concurrent Apache Spark applications. We believe that this will help in advancing research… The post Apache Spark...

    Read Blog
  • Streamlining Operations of Machine Learning Models

    Streamlining Operations of Machine Learning Models

    Guest authors: Jerry Xu, Co-founder and CEO Datatron; Lekhni Randive, Product Manager, Datatron Qubole author: Jorge Villamariona, Sr. Product Marketing Manager, Qubole In today’s world,… The post...

    Read Blog
  • Apache Sqoop 1.4.7 – 9 reasons why you need it

    Apache Sqoop 1.4.7 – 9 reasons why you need it

    The sixth release of Apache Sqoop i.e. 1.4.7 is out! This is one of the most significant updates to the Sqoop platform. We give you… The post Apache Sqoop 1.4.7 – 9 reasons why you need it...

    Read Blog
  • Analytics and ML simplified with Jupyter Notebooks and Apache Spark

    Analytics and ML simplified with Jupyter Notebooks and Apache Spark

    Data scientists use Notebooks for data exploration, interactive data analytics, machine learning, and collaboration. Once set up, a Notebook provides a convenient way to save,… The post Analytics...

    Read Blog
  • Per-Bucket Configuration Support in Presto

    Per-Bucket Configuration Support in Presto

    Introduction Presto can access S3 Buckets using one of the following options: IAM roles provided in the configuration Access-key/Secret-key provided in the configuration Credentials fetched… The...

    Read Blog
  • Optimized Upscaling for Managing Workloads in Cloud

    Optimized Upscaling for Managing Workloads in Cloud

    Introduction Qubole provides powerful automation that optimizes underlying cloud compute management for data lakes. Qubole cluster management continuously optimizes both performance and cost by...

    Read Blog
  • Qubole: The Super Powers of Support

    Qubole: The Super Powers of Support

    Introducing Qubole Support Qubole processes over 250 Petabytes of data in a month, and the diversity of data we process, clouds platforms we run on,… The post Qubole: The Super Powers of Support...

    Read Blog
  • Addressing Regulatory GDPR and CCPA frameworks with Qubole ACID and Apache Ranger

    Addressing Regulatory GDPR and CCPA frameworks with Qubole ACID and Apache Ranger

    Data lakes are at the heart of digital transformation in the enterprises. As more organizations run analytics, machine learning, and ETL workloads on the data… The post Addressing Regulatory GDPR...

    Read Blog
  • Practical Guide to Financial Governance of Data Lake Initiatives

    Practical Guide to Financial Governance of Data Lake Initiatives

    Introduction Enterprises are today becoming more data-driven as their data is the fuel to their innovation engine to build new products, outmaneuver the competition and… The post Practical Guide...

    Read Blog
  • Introducing Qubole Release 57

    Introducing Qubole Release 57

    Release 57 (R57) brings many new capabilities and enhancements that help simplify and improve the efficiency and performance of your data processing projects.

    Read Blog
  • Calculating 30 billion speed estimates a week with Apache Spark on Qubole

    Calculating 30 billion speed estimates a week with Apache Spark on Qubole

    This post is a guest publication written by Saba El-Hilo, a Senior Data Engineer at Mapbox. A version of this post first appeared as a… The post Calculating 30 billion speed estimates a week with...

    Read Blog
  • Hive on Qubole runs 4x faster than Hive on Alternative Platforms

    Hive on Qubole runs 4x faster than Hive on Alternative Platforms

    Introduction ETL workloads form a major component of big data processing at any data-driven organization – from SMBs to enterprises, and ETL data pipelines at… The post Hive on Qubole runs 4x...

    Read Blog
  • Scaling Tez Application using Application Timeline Server v1.5

    Scaling Tez Application using Application Timeline Server v1.5

    Introduction In an earlier blog post, we presented a secure, multi-tenant, reliable, and scalable service that provides access to logs and history for MRv2 applications.… The post Scaling Tez...

    Read Blog
  • Qubole Open-Sources Multi-Engine Support for Updates and Deletes in Data Lakes

    Qubole Open-Sources Multi-Engine Support for Updates and Deletes in Data Lakes

    Qubole now supports efficient updates and deletes for data stored in Cloud data lakes. Users can make inserts, updates and deletes on transactional Hive Tables—defined… The post Qubole...

    Read Blog
  • loading
    Loading More...