Apache Spark
-
Running Apache Spark at Scale in the Cloud
Deep dive into the use cases for Apache Spark on Qubole, including ETL and machine learning
-
Accelerating Time to Value of Big Data of Apache Spark
This ebook deep dives into Apache Spark optimizations that improve performance, reduce costs and deliver unmatched scale
-
Apache Spark Benchmark for Autoscaling: Qubole versus competition
This blog covers new benchmark tests to better understand Autoscaling behaviour of concurrent Apache Spark applications. We believe that this will help in advancing research… The post Apache Spark...
-
Apache Sqoop 1.4.7 – 9 reasons why you need it
The sixth release of Apache Sqoop i.e. 1.4.7 is out! This is one of the most significant updates to the Sqoop platform. We give you… The post Apache Sqoop 1.4.7 – 9 reasons why you need it...
-
Accelerate The Time To Value Of Apache Spark Applications With Qubole
Qubole improves the performance of Spark workloads with enhancements such as fast storage, distributed caching, advanced indexing, metadata caching, job isolation on multi-tenant clusters. Watch here
-
Ensighten: Building a world-class digital advertising analytics platform using Qubole
Ensighten was able to decouple their compute from storage and handle user-level management and permissions across a variety of Spark, Hadoop and Presto with Qubole
-
AgilOne: Machine Learning at Enterprise Scale
AgilOne runs a variety of workloads for querying data, running ML models, orchestrating ML workflows, and more on Qubole
-
Nauto Improves its Data Scientist Productivity, Accelerates Product Development
Nauto Improves its Data Scientist Productivity, Accelerates Product Development
-
TrafficGuard Halts Digital Ad Fraud with Qubole
TrafficGuard relies on big data processing to detect and prevent ad fraud, which requires a robust infrastructure.
-
Apache Spark Getting Started Guide
Self-paced guide to the Apache Spark analytics engine using Qubole
-
Big Data Activation Report
The data on big data -- what engines are used most, for what, and which are the rising stars.
-
Improve Apache Spark Performance by 2.9x with Amazon S3 Select Integration
Automatically use the S3 Select service whenever applicable to speed up queries
-
Using Qubole Notebooks to Predict Future Sales with PySpark
Build and use a time-series analysis model to forecast future sales from historical sales data
-
Improving Recover Partitions Performance with Spark on Qubole
Significantly improve the overall performance of running Hadoop-based engines on the cloud object store
-
9:37
Sentiment Analysis with H2O, PySpark and Word2Vec on Qubole
Using Qubole Notebooks to analyze Amazon product reviews using word2vec, pyspark, and H2O Sparkling water Developed and productionized on Qubole Notebooks.
-
Qubole Enhances Spark Performance with Dynamic Filtering, a SQL Join Optimization
How Dynamic Filtering in Spark dramatically improves the performance of Join Queries
-
Increase Apache Spark Performance by Up to 4x with RubiX Distributed Cache
How RubiX differs from Spark’s internal cache, and its performance improvement for Spark workloads
-
Using Direct Writes to Significantly Increase the Performance of Spark Workloads
Direct Writes delivers performance improvements of up to 40x for write-heavy Spark workloads
-
Sparklens Report: A Free Community Service from Qubole
Introducing sparklens.qubole.com, a reporting service built on top of Sparklens to lower the pain of sharing Sparklens output
-
How to Increase Your Big Data Value with Apache Spark on Qubole
Run large Apache Spark clusters on the cloud without fear of job loss or out-of-control cloud costs
-
Loading More...