1. Overview

  • Can be used to configure pipelines for batch and streaming data processing.
  • Suitable where data shows up in real time.
  • Used to build data pipelines.
  • Dataflow pipelines can read data from a BigQuery table.
  • Dataflow pipelines can transform and write output to Cloud Storage.
  • Dataflow pipeline transforms can be map operations or reduce operations.
  • Cloud Dataflow can be used to build expressive pipelines.
  • Each step in the Cloud Dataflow pipeline can be elastically scaled.
  • With Cloud Dataflow, there is no need to launch and manage a cluster.
  • Cloud Dataflow provides all compute resources needed on demand.
  • Cloud Dataflow has automated and optimized work partitioning built-in which can dynamically re-balance lagging work that reduces the need to worry about hotkeys.
  • Hotkeys refers to situations where a proportionately large chunks of input get mapped to the same cluster.
  • With Cloud Dataflow, there is no need to spin up a cluster or to size instances.
  • Cloud Dataflow fully automates the management of processing resources required.
  • Cloud Dataflow frees users from performance optimization.