Tech Blog

Cloud-native Big Data Activation Platform

  • Part 3: Transactions on the Data Lake

    Part 3: Transactions on the Data Lake

    Data Lakes are becoming increasingly central to the analytical operations of organizations.  This brings in many more ‘transactional’ requirements on the pipeline architecture and the… The post...

    Read Article
  • Part 2: Tuning the Data Ingestion process

    Part 2: Tuning the Data Ingestion process

    In Part 1 of this series, we briefly touched upon the various design considerations to be made when architecting the Data Lake. We saw how… The post Part 2: Tuning the Data Ingestion process...

    Read Article
  • Enhanced Network Security with AWS PrivateLink on Qubole

    Enhanced Network Security with AWS PrivateLink on Qubole

    Increase data security and simplify the infrastructure with Qubole About Qubole Open Data Lake Platform Qubole is an open and secure data lake platform for… The post Enhanced Network Security with...

    Read Article
  • Part 1: Ingestion into the Data Lake

    Part 1: Ingestion into the Data Lake

    Data Lakes are a core pillar in an organization’s data strategy. Data lakes make organizational data from different sources, accessible to various end-users like business… The post Part 1:...

    Read Article
  • Qubole University Launches Badge Program

    Qubole University Launches Badge Program

    For decades our desks were covered in trophies, certificates, and medals demonstrating our accomplishments, achievements, and competencies. Over the time, these methods of recognition have… The...

    Read Article
  • Enabling Spark SQL MERGE via optimized ACID Data Source v0.6.0

    Enabling Spark SQL MERGE via optimized ACID Data Source v0.6.0

    We are pleased to announce the 0.6.0 release of ACID Data source for Apache Spark. This release should further empower Data lake users in enterprises… The post Enabling Spark SQL MERGE via...

    Read Article
  • Introducing Apache Spark 3.0 on Qubole

    Introducing Apache Spark 3.0 on Qubole

    We are pleased to announce the availability of Apache Spark 3.0 in the Qubole environment. Spark 3.0 release comes with a lot of exciting new… The post Introducing Apache Spark 3.0 on Qubole...

    Read Article
  • Apache Airflow Concepts – DAG Scheduling and Variables

    Apache Airflow Concepts – DAG Scheduling and Variables

    In our last blog, we covered all the basic concepts of Apache Airflow. In this blog, we will cover some of the advanced concepts and… The post Apache Airflow Concepts – DAG Scheduling and...

    Read Article
  • Introducing Capacity Reservation for Application Master to increase Workload Reliability despite Spot Interruptions

    Introducing Capacity Reservation for Application Master to increase Workload Reliability despite Spot Interruptions

    AWS Spot instances reduce cloud costs by up to 90% but can be interrupted by AWS at any given time causing running workloads to fail.… The post Introducing Capacity Reservation for Application...

    Read Article
  • Qviz – Qubole Visualization Framework for Jupyter-Based Notebooks

    Qviz – Qubole Visualization Framework for Jupyter-Based Notebooks

    Data visualization is a critical aspect of Exploratory Data Analysis that helps Data Analysts and Scientists visualize frequency distributions, explore causal/correlated relationships between...

    Read Article
  • Data Discovery Tools – Qubole Workbench

    Data Discovery Tools – Qubole Workbench

    It is common knowledge that data lakes offer the right architecture to support multiple use cases and tools, but can be operationally complex to implement… The post Data Discovery Tools – Qubole...

    Read Article
  • Apache Airflow Tutorial – DAGs, Tasks, Operators, Sensors, Hooks & XCom

    Apache Airflow Tutorial – DAGs, Tasks, Operators, Sensors, Hooks & XCom

    Now that you have read about how different components of Airflow work and how to run Apache Airflow locally, it’s time to start writing our… The post Apache Airflow Tutorial – DAGs, Tasks,...

    Read Article
  • Presto on Qubole is 2.6x faster than competition!

    Presto on Qubole is 2.6x faster than competition!

    In the past 2-3 years, Presto has set the bar for fast analytical processing in modern cloud data lake architectures. Qubole has offered a Presto… The post Presto on Qubole is 2.6x faster than...

    Read Article
  • Terraforming the Open Data Lake

    Terraforming the Open Data Lake

    Image credits: https://science.howstuffworks.com/terraforming.htm The Qubole Open Data Lake Platform Qubole is the open data lake company that provides a simple and secure data lake platform… The...

    Read Article
  • Logan: A Data-Driven Log Analyzer for Easy Navigation of Apache Spark Logs

    Logan: A Data-Driven Log Analyzer for Easy Navigation of Apache Spark Logs

    Running Large distributed Apache Spark clusters in the public cloud, that handle exponential increase in volumes of data to fuel analytics and machine learning (ML)… The post Logan: A Data-Driven...

    Read Article
  • Cost and Performance efficiency with Multi-tenant Spark Platform

    Cost and Performance efficiency with Multi-tenant Spark Platform

    Introduction Ad-hoc analytics and data exploration require compute resources that can process incoming jobs instantaneously and keep the response time low. Apache Spark is a… The post Cost and...

    Read Article
  • Columnar Format in Data Lakes  For Dummies

    Columnar Format in Data Lakes For Dummies

    Columnar data formats have become the standard in data lake storage for fast analytics workloads as opposed to row formats. Columnar formats significantly reduce the… The post Columnar Format in...

    Read Article
  • Introducing Managed Spot Block Instances that provide up to 40% cost savings

    Introducing Managed Spot Block Instances that provide up to 40% cost savings

    Qubole is excited to announce the general availability of Managed Spot Block instances that provides up to 40% cost savings over On-Demand Ec2 Instances. Managed… The post Introducing Managed Spot...

    Read Article
  • Boosting Parallelism for ML in Python using scikit-learn, joblib & PySpark

    Boosting Parallelism for ML in Python using scikit-learn, joblib & PySpark

    As a general-purpose programming language, Python is universal. It’s quick and easy, but yet powerful with plenty of capabilities. It gives you an opportunity to… The post Boosting Parallelism for...

    Read Article
  • Introducing Qubole Release 59

    Qubole regularly releases its software for processing petabytes of data on the cloud through major releases once a quarter. This is in addition to several… The post Introducing Qubole Release 59...

    Read Article
  • loading
    Loading More...