- Can be used to configure pipelines for batch and streaming data processing.
- Suitable where data shows up in real time.
- Used to build data pipelines.
- Dataflow pipelines can read data from a BigQuery table.
- Dataflow pipelines can transform and write output to Cloud Storage.
- Dataflow pipeline transforms can be map operations or reduce operations.
- Cloud Dataflow can be used to build expressive pipelines.
- Each step in the Cloud Dataflow pipeline can be elastically scaled.
- With Cloud Dataflow, there is no need to launch and manage a cluster.
- Cloud Dataflow provides all compute resources needed on demand.
- Cloud Dataflow has automated and optimized work partitioning built-in which can dynamically re-balance lagging work that reduces the need to worry about hotkeys.
- Hotkeys refers to situations where a proportionately large chunks of input get mapped to the same cluster.
- With Cloud Dataflow, there is no need to spin up a cluster or to size instances.
- Cloud Dataflow fully automates the management of processing resources required.
- Cloud Dataflow frees users from performance optimization.