Build streaming data pipelines to capture the benefits of real-time data for machine learning and ad-hoc analytics.

Qubole Pipelines Service is a Stream Processing Service that addresses real-time ingestion, decision, machine learning, and reporting use-cases.

ChallengesQubole Pipelines Service Solution
Stream Processing Pipelines are complex to build and take significant time.Built-in accelerated development cycle.
Observability and data quality are hard to achieve at scale.Pluggable storage, checkpointing, robust memory management and alerts.
Achieving high performance at low costs is challenging.Compreshensive operational management and deep insights to keep costs in check.
Data inconsistencies require constant clean-up of small files and result in file management overhead.Small files are compacted using Qubole ACID capabilties without blocking read/write operations.

Accelerate Development

Develop pipeline within minutes without writing even a single line of code and be deployed instantly

  • Create streaming data pipelines without writing software via the code generation wizard
  • Click, select and connect to the most popular streaming data sources and targets such as Kafka, Kinesis, S3, S3-SQS, GCS, HIVE, HIVE-ACID, Snowflake, BigQuery, ElasticSearch, MongoDB & Druid.
  • Test run and debug new pipelines to check connectivity and business logic with a built-in test framework.
  • Experience near-zero downtime and no data loss with seamless upgrades.

Scale Streaming Applications

Built upon Apache Spark Structured Streaming, proven technology, and framework.

  • Scale your application to billions of records without failures by leveraging pluggable state storage backed by RocksDB.
  • Scale your application to billions of records without failures by leveraging pluggable state storage backed by RocksDB.
  • Maintain consistency during checkpointing with Direct-Writes and S3-Guard support.
  • Achieve greater fault tolerance through more robust memory management. Prevent out-of-disk errors with log rolling and aggregation on the file system.
  • Get alerted via your preferred mechanism (Slack, email, pager-duty, etc).

Manage Streaming Applications

Holistically manage the lifecycle of streaming applications

  • Lower the cost of your streaming pipelines by leveraging intelligent spot/preemptible node management.
  • Track key metrics such as micro-batch latency, processing rate, and state store size through built-in integration with Prometheus/Grafana.
  • Stay informed through a 360-degree insights event pane.
  • Control access to pipeline artifacts with fine-grained access controls on CRUD operations.

Data Lake  Management

Simplified data lake operations and better data consistency

  • Simplified data lake management with periodic auto-compaction of small files into larger files. Deep integration with Qubole ACID tables allows this compaction without blocking concurrent reads and writes.
  • Detect invalid records and schema mismatches. Set alerts and prevent data loss by cleansing and reprocessing these records by storing them in a configurable cloud storage location.

Qubole Use Cases

  • Complement Open Data Lake as Ingestion Service
    • Ingestion of real-time events from clickstream, IoT devices, Logs, etc
    • CDC from cloud object stores and Messaging bus
  • Real-time reporting on data warehouse
    • Easy way to ingest streaming data to a variety of DW sinks
  • Complex event processing and applications based on it
    • Real-time Model Scoring / Recommendations
    • Identify Opportunity/ threat in real-time


MiQ has been developing a series of what they refer to as “Predictive Retargeting Solutions.” These products are designed to gather data from target users’ ongoing events, generate insights, and enable users to make decisions and take action—all in real-time. In short, they are on a quest for real-time consumer retargeting leveraging Qubole Pipelines Service, for building and managing these streaming data pipelines.

MiQ is a leading programmatic media partner for brands and agencies. Headquartered in London, MiQ has offices across North America, Europe, and the Asia Pacific. MiQ works with the world’s leading brands and media agencies such as Marriott, Dell Mercedes, Microsoft, GroupM, Dentsu Aegis, and IPG.

Qubole Streaming Pipelines Service is a great addition at Angel Broking for launching and maintaining our data pipelines with ease. We have deployed more than 20 pipelines in production to support Real Time Analytics @ Angel Broking. The support provided by Qubole, especially Ashish Kumar was essential for our success and very appreciated. We are looking forward to exploring more functionalities and onboarding more production pipelines in the coming days.”

— Harsh Gupta, Head – Analytics