- Each Cloud Run revision is automatically scaled to the number of container instances needed to handle all incoming requests.
- The number of instances scheduled is impacted by:
- The amount of CPU needed to process a request
- The concurrency setting
- The maximum number of container instances setting
- It may be necessary to limit the total number of container instances that can be started, for cost control reasons, or for better compatibility with other resources used by the service.
- Cloud Run service might interact with a database that can only handle a certain number of concurrent open connections.
The maximum container instances setting can be used to limit the total number of instances that can be started in parallel.
Maximum instances and traffic spikes
- A revision scales up by creating new instances to handle incoming traffic load.
- When a maximum instances limit is set, there may be insufficient instances to meet traffic load.
- Incoming requests queue for up to 60 seconds.
- During the 60 second window, if an instance finishes processing requests, it becomes available to process queued requests.
- If no instances become available during the 60 second window, the request fails with a 429 error code on Cloud Run (fully managed).
- The maximum instances limit is an upper limit.
- Setting a high limit does not mean that a revision will scale up to the specified number of container instances.
- Setting a high limit only means that the number of container instances at any point in time should not exceed the limit.
- During rapid traffic surges, Cloud Run may, for a short period of time, create slightly more container instances than the specified max instances value.
- If a service cannot tolerate a temporary increase in instances beyond the max instance value, a safety margin needs to be factored in and a lower max instances value set.
Idle instances and minimizing cold starts
- Users are only billed when an instance is handling a request.
- Cloud Run does not always immediately shut down instances once they have handled all requests.
- To minimize the impact of cold starts, Cloud Run may keep some instances idle.
- Idle instances are ready to handle requests in case of a sudden traffic spike.
- An idle container instance may persist resources, such as open database connections.
- Cloud Run (fully managed), the CPU will not be available for an idle instance.
- When a new revision is deployed, Cloud Run gradually migrates traffic from the old revision to the new one.
- The maximum instances limits set for each revision may be temporarily exceeded during the period after deployment.