Streaming Analytics

Build streaming data pipelines to capture the benefits of real time data for machine learning and ad-hoc analytics.

Qubole Pipelines Service is a Stream Processing Service that addresses real-time ingestion, decision, machine learning, and reporting use-cases.

ChallengesQubole Pipelines Service Solution
Stream Processing Pipelines are complex to build and take significant time.Built-in accelerated development cycle.
Observability and data quality are hard to achieve at scale.Pluggable storage, checkpointing, robust memory management and alerts.
Achieving high performance at low costs is challenging.Compreshensive operational management and deep insights to keep costs in check.
Data inconsistencies require constant clean-up of small files and result in file management overhead.Small files are compacted using Qubole ACID capabilties without blocking read/write operations.

Accelerated Development Cycle

Develop pipeline within minutes without writing even a single line of code and be deploy instantly

  • Create streaming data pipelines without writing software via code generation wizard
  • Click, select and connect to the most popular streaming data sources and targets such as Kafka, Kinesis, S3, S3-SQS, GCS, HIVE, HIVE-ACID, Snowflake, BigQuery, ElasticSearch, MongoDB & Druid.
  • Test run and debug new pipelines to check connectivity and business logic with a built-in test framework.
  • Experience near-zero down time and no data loss with seamless upgrades.

Deliver reliable long running streaming applications

Built upon Apache Spark Structured Streaming, a proven technology and framework.

  • Scale your application to billions of records without failures by leveraging pluggable state storage backed by RocksDB.
  • Scale your application to billions of records without failures by leveraging pluggable state storage backed by RocksDB.
  • Maintain consistency during checkpointing with Direct-Writes and S3-Guard support.
  • Achieve greater fault tolerance through more robust memory management. Prevent out-of-disk errors with log rolling and aggregation on the file system.
  • Get alerted via your preferred mechanism (Slack, email, pager-duty, etc).

Comprehensive operational management and continuous insights

Holistically manage the lifecycle of streaming applications

  • Lower the cost of your streaming pipelines by leveraging intelligent spot/preemptible node management.
  • Track key metrics such as micro-batch latency, processing rate, and state store size through a built-in integration with Prometheus/Grafana.
  • Stay informed through a 360-degree insights event pane.
  • Control access to pipeline artifacts with fine-grained access controls on CRUD opeprations.

Data Management and Consistency

Simplified data lake operations and better data consistency

  • Simplified data lake management with periodic auto-compaction of small files into larger files. Deep integration with Qubole ACID tables allows this compaction without blocking concurrent read and writes.
  • Detect invalid records and schema mismatches. Set alerts and prevent data loss by cleansing and reprocessing these records by storing them in a configurable cloud storage location.

Use-Cases

  • Complement Open Data Lake as Ingestion Service
    • Ingestion of real-time events from clickstream, IoT devices, Logs etc
    • CDC from cloud object stores and Messaging bus
  • Real-time reporting on data warehouse
    • Easy way to ingest streaming data to variety of DW sinks
  • Complex event processing and applications based on it
    • Real-time Model Scoring / Recommendations
    • Identify Opportunity/ threat in real-time

MiQ relies on Qubole Pipelines Service for their Predictive Retargeting Solutions

MiQ has been developing a series of what they refer to as “Predictive Retargeting Solutions.” These products are designed to gather data from target users’ on on-going events, generate insights, and enable users to make decisions and take action—all in real-time. In short, they are on a quest for real-time consumer retargeting leveraging Qubole Pipelines Service, for building and managing these streaming data pipelines.

MiQ is a leading programmatic media partner for brands and agencies. Headquartered in London, MiQ has offices across North America, Europe and Asia Pacific. MiQ works with the world’s leading brands and media agencies such as Marriott, Dell Mercedes, Microsoft, GroupM, Dentsu Aegis and IPG. Read the Case Study

Qubole Streaming Pipelines Service is a great addition at Angel Broking for launching and maintaining our data pipelines with ease. We have deployed more than 20 pipelines in production to support Real Time Analytics @ Angel Broking. The support provided by Qubole, especially Ashish Kumar was essential for our success and very appreciated. We are looking forward to exploring more functionalities and onboarding more production pipelines in the coming days.” — Harsh Gupta, Head – Analytics