CloudRun

1. Autoscaling

  • Each Cloud Run revision is automatically scaled to the number of container instances needed to handle all incoming requests.
  • The number of instances scheduled is impacted by:
    • The amount of CPU needed to process a request
    • The concurrency setting
    • The maximum number of container instances setting
  • It may be necessary to limit the total number of container instances that can be started, for cost control reasons, or for better compatibility with other resources used by the service. 
  • Cloud Run service might interact with a database that can only handle a certain number of concurrent open connections.
  • The maximum container instances setting can be used to limit the total number of instances that can be started in parallel.

Maximum instances and traffic spikes

  • A revision scales up by creating new instances to handle incoming traffic load. 
  • When a maximum instances limit is set, there may be insufficient instances to meet traffic load. 
  • Incoming requests queue for up to 60 seconds. 
  • During the 60 second window, if an instance finishes processing requests, it becomes available to process queued requests. 
  • If no instances become available during the 60 second window, the request fails with a 429 error code on Cloud Run (fully managed).
  • The maximum instances limit is an upper limit. 
  • Setting a high limit does not mean that a revision will scale up to the specified number of container instances. 
  • Setting a high limit only means that the number of container instances at any point in time should not exceed the limit.
  • During rapid traffic surges, Cloud Run may, for a short period of time, create slightly more container instances than the specified max instances value. 
  • If a service cannot tolerate a temporary increase in instances beyond the max instance value, a safety margin needs to be factored in and a lower max instances value set.

Idle instances and minimizing cold starts

  • Users are only billed when an instance is handling a request.
  • Cloud Run does not always immediately shut down instances once they have handled all requests. 
  • To minimize the impact of cold starts, Cloud Run may keep some instances idle. 
  • Idle instances are ready to handle requests in case of a sudden traffic spike.
  • An idle container instance may persist resources, such as open database connections. 
  • Cloud Run (fully managed), the CPU will not be available for an idle instance.

Deployments

  • When a new revision is deployed, Cloud Run gradually migrates traffic from the old revision to the new one. 
  • The maximum instances limits set for each revision may be temporarily exceeded during the period after deployment.