Stream Analytics with Hive ACID Workshop

Thursday, Dec 9, 2021

11AM – 1PM IST

About the workshop

Capturing data and making it available within an organization quickly will be a differentiator for companies in the modern data era. For instance, a customer could be interacting with a bank’s website, and then they run into an issue applying for a mortgage and immediately there will be a response expected.

In this workshop, we’ll be talking about the importance of streaming, possible challenges and resolutions. Additionally, we’ll cover immutable and mutable data handling via streaming.

Who should attend

Data Architects

Data Engineers

Advanced Analytics teams

Product Security teams

Key Takeaways

  • Easy streaming pipelines deployment reliably at scale in production environments.
  • Monitoring/Alerting/Autoscaling support for the pipelines
  • Able to do continuous data ingestion (CDC) from the sources like cloud storage (S3, GCS, blob etc.), Kafka, Kinesis sources
  • Able to use Hive ACID tables and other data stores as sinks
  • Able to create GDPR/CCPA compliant pipeline solutions

Topics Covered

  • Creating Spark Streaming Cluster
  • Creating a Presto cluster with auto-scaling/features features
  • Creating Kafka setup for demonstration
    • Kafka cluster via Notebook
    • Event ingestion
  • Hive ACID table creation
    • Transactional and Partitioned table
  • Creating pipelines in assisted mode
    • Kafka data source selection
    • Hive ACID table selection as the sink
  • Executing Pipelines
    • End-to-end execution
    • Test run, monitoring, alerting
    • Aggregation operator via UI
  • Analytics on the sourced data
    • Operations on Hive ACID table
    • Presto queries for ad-hoc analytics