Blogs

  • Data Lake Essentials – Part 1 – Storage and Data Processing

    Data Lake Essentials – Part 1 – Storage and Data Processing

    Data Lake essentials, part 1 – storage and data processing In this multi-part series we will take you through the architecture of a Data Lake.… The post Data Lake Essentials – Part 1 – Storage and...

    Read Blog
  • Apache Spark Benchmark for Autoscaling: Qubole versus competition

    Apache Spark Benchmark for Autoscaling: Qubole versus competition

    This blog covers new benchmark tests to better understand Autoscaling behaviour of concurrent Apache Spark applications. We believe that this will help in advancing research… The post Apache Spark...

    Read Blog
  • Streamlining Operations of Machine Learning Models

    Streamlining Operations of Machine Learning Models

    Guest authors: Jerry Xu, Co-founder and CEO Datatron; Lekhni Randive, Product Manager, Datatron Qubole author: Jorge Villamariona, Sr. Product Marketing Manager, Qubole In today’s world,… The post...

    Read Blog
  • Apache Sqoop 1.4.7 – 9 reasons why you need it

    Apache Sqoop 1.4.7 – 9 reasons why you need it

    The sixth release of Apache Sqoop i.e. 1.4.7 is out! This is one of the most significant updates to the Sqoop platform. We give you… The post Apache Sqoop 1.4.7 – 9 reasons why you need it...

    Read Blog
  • Analytics and ML simplified with Jupyter Notebooks and Apache Spark

    Analytics and ML simplified with Jupyter Notebooks and Apache Spark

    Data scientists use Notebooks for data exploration, interactive data analytics, machine learning, and collaboration. Once set up, a Notebook provides a convenient way to save,… The post Analytics...

    Read Blog
  • Per-Bucket Configuration Support in Presto

    Per-Bucket Configuration Support in Presto

    Introduction Presto can access S3 Buckets using one of the following options: IAM roles provided in the configuration Access-key/Secret-key provided in the configuration Credentials fetched… The...

    Read Blog
  • Optimized Upscaling for Managing Workloads in Cloud

    Optimized Upscaling for Managing Workloads in Cloud

    Introduction Qubole provides powerful automation that optimizes underlying cloud compute management for data lakes. Qubole cluster management continuously optimizes both performance and cost by...

    Read Blog
  • Qubole: The Super Powers of Support

    Qubole: The Super Powers of Support

    Introducing Qubole Support Qubole processes over 250 Petabytes of data in a month, and the diversity of data we process, clouds platforms we run on,… The post Qubole: The Super Powers of Support...

    Read Blog
  • Addressing Regulatory GDPR and CCPA frameworks with Qubole ACID and Apache Ranger

    Addressing Regulatory GDPR and CCPA frameworks with Qubole ACID and Apache Ranger

    Data lakes are at the heart of digital transformation in the enterprises. As more organizations run analytics, machine learning, and ETL workloads on the data… The post Addressing Regulatory GDPR...

    Read Blog
  • Practical Guide to Financial Governance of Data Lake Initiatives

    Practical Guide to Financial Governance of Data Lake Initiatives

    Introduction Enterprises are today becoming more data-driven as their data is the fuel to their innovation engine to build new products, outmaneuver the competition and… The post Practical Guide...

    Read Blog
  • Introducing Qubole Release 57

    Introducing Qubole Release 57

    Release 57 (R57) brings many new capabilities and enhancements that help simplify and improve the efficiency and performance of your data processing projects.

    Read Blog
  • Calculating 30 billion speed estimates a week with Apache Spark on Qubole

    Calculating 30 billion speed estimates a week with Apache Spark on Qubole

    This post is a guest publication written by Saba El-Hilo, a Senior Data Engineer at Mapbox. A version of this post first appeared as a… The post Calculating 30 billion speed estimates a week with...

    Read Blog
  • Hive on Qubole runs 4x faster than Hive on Alternative Platforms

    Hive on Qubole runs 4x faster than Hive on Alternative Platforms

    Introduction ETL workloads form a major component of big data processing at any data-driven organization – from SMBs to enterprises, and ETL data pipelines at… The post Hive on Qubole runs 4x...

    Read Blog
  • Scaling Tez Application using Application Timeline Server v1.5

    Scaling Tez Application using Application Timeline Server v1.5

    Introduction In an earlier blog post, we presented a secure, multi-tenant, reliable, and scalable service that provides access to logs and history for MRv2 applications.… The post Scaling Tez...

    Read Blog
  • Qubole Open-Sources Multi-Engine Support for Updates and Deletes in Data Lakes

    Qubole Open-Sources Multi-Engine Support for Updates and Deletes in Data Lakes

    Qubole now supports efficient updates and deletes for data stored in Cloud data lakes. Users can make inserts, updates and deletes on transactional Hive Tables—defined… The post Qubole...

    Read Blog
  • Announcing General Availability of Qubole on Google Cloud

    Announcing General Availability of Qubole on Google Cloud

    We, at Qubole are excited to announce General Availability of the Qubole data platform on Google Cloud – a self-service, collaborative, enterprise platform for data… The post Announcing General...

    Read Blog
  • Introducing Hive 3.1.1 in Qubole

    Introducing Hive 3.1.1 in Qubole

    Qubole is the first and only vendor to deliver Hive 3.1.1 in the cloud

    Read Blog
  • Building a Data Lake the Right Way

    Building a Data Lake the Right Way

    Key considerations for building a scalable transactional data lake Data-driven companies are driving rapid business transformation with cloud data lakes. Cloud data lakes are enabling… The post...

    Read Article
  • Announcing Presto Summit India on September 05, 2019

    Announcing Presto Summit India on September 05, 2019

    We are super excited to announce the first ever Presto Summit in India on September 05, 2019 with Presto Co-Founders – Martin, David, and Dain!… The post Announcing Presto Summit India on...

    Read Blog
  • Data Governance for SparkSQL

    Data Governance for SparkSQL

    Introducing the new Apache Spark Data Access Control Framework on the Qubole platform

    Read Blog
  • loading
    Loading More...