Declarative Pipelines & Intelligent Orchestration – Data’s Missing Link – Sean Knapp,

The last decade has brought significant advancements and innovations across the data management landscape, spanning large-scale processing engines, cloud-based data warehouses, next-generation data lakes, and unprecedented machine learning tools. But why are these powerful systems being controlled and orchestrated by the most rudimentary technologies? Data movement and transformation are often still dictated by manual, hard-coded triggers and rules, written in slow development cycles and resulting in brittle pipelines. As the number of pipelines and dependencies scale, data engineering teams become bogged down by constant maintenance, combing through code and logs to find hotspots, and constantly tuning infrastructure just to keep things running. This “state of the art” points to a gap in the modern data ecosystem, opening the door to more intelligent forms of data orchestration. As was done with Kubernetes, Terraform, and even React, we can apply a declarative approach to the domain of data pipelines, radically advancing the automation level that can be achieved. In this session, Sean Knapp, CEO and founder of, discusses this journey, including:

  • Imperative vs Declarative systems
  • Evolution of orchestration tools
  • Tradeoffs and design decisions in moving from task-centric to data-centric architectures
  • Challenges in running a multi-cloud, massive-scale, elastic infrastructure built on Kubernetes and Apache Spark.