KEP-2021: HPA supports scaling to/from zero pods for object/external metrics
KEP-2021: HPA supports scaling to/from zero pods for object/external metrics
- Release Signoff Checklist
- Summary
- Motivation
- Proposal
- Design Details
- Production Readiness Review Questionnaire
- Implementation History
- Drawbacks
- Alternatives
- Infrastructure Needed (Optional)
Release Signoff Checklist
Items marked with (R) are required prior to targeting to a milestone / release.
- (R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
- (R) KEP approvers have approved the KEP status as
implementable - (R) Design details are appropriately documented
- (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
- e2e Tests for all Beta API Operations (endpoints)
- (R) Ensure GA e2e tests meet requirements for Conformance Tests
- (R) Minimum Two Week Window for GA e2e tests to prove flake free
- (R) Graduation criteria is in place
- (R) all GA Endpoints must be hit by Conformance Tests
- (R) Production readiness review completed
- (R) Production readiness review approved
- “Implementation History” section is up-to-date for milestone
- User-facing documentation has been created in kubernetes/website , for publication to kubernetes.io
- Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
Summary
Horizontal Pod Autoscaler
(HPA) automatically scales the number of pods in any resource which supports the scale subresource based on observed CPU or memory utilization
(or, with custom metrics support, on some other application-provided metrics) from one to many replicas. This proposal adds support for scaling from zero to many replicas and back to zero for object and external metrics.
Scaling to zero is particularly effective for cost reduction when individual pods demand substantial resource requests, such as dedicated CPUs or GPUs. Since CPU and memory utilization can only be measured on running pods, scaling to zero will be limited to object and external metrics.
Motivation
With the addition of scaling based on object and external metrics it became possible to automatically adjust the number of running replicas based on an application provided metric. A typical use-case case for this is scaling the number of queue consumers based on the length of the consumed queue.
In cases of a frequently idle queue or a less latency-sensitive workload, there is no need to keep one replica running at all times. Instead, you can dynamically scale to zero replicas, especially for workloads with high resource demands, such as those requiring GPUs. This approach not only reduces costs but also has significant energy-saving potential, particularly as GPU workloads become more prevalent. When replicas are scaled to zero, the HPA must also be capable of scaling back up as soon as messages become available.
Goals
- Provide scaling to zero replicas for object and external metrics
- Provide scaling from zero replicas for object and external metrics
Non-Goals
- Provide scaling to/from zero replicas for resource metrics
- Provide request buffering at the Kubernetes Service level
Proposal
Allow the HPA to scale from and to zero using minReplicas: 0 and a HPA status condition.
User Stories (Optional)
Story 1: Scale a heavy queue consumer on-demand
As the operator of a video processing pipeline, I would like to reduce costs. While video processing is CPU intensive, it is not a latency sensitive workload. Therefore I want my video processing workers to only be created if there is actually a video to be processed and terminated afterwards.
Notes/Constraints/Caveats (Optional)
Currently disabling HPA is possible by manually setting the scaled resource to replicas: 0. This works as the HPA itself could never reach this state itself.
As replicas: 0 is now a possible state when using minReplicas: 0 it can no longer be used to differentiate between manually disabled or automatically scaled to zero.
Additionally the replicas: 0 state is problematic as updating a HPA object minReplicas from 0 to 1 has different behavior. If replicas was 0 during the update, HPA
will be disabled for the resource, if it was > 0, HPA will continue with the new minReplicas value.
To resolve these issues the KEP is introducing an explicit ScaledToZero condition inside the HorizontalPodAutoscalerStatus. When ScaledToZero=True was recorded the HPA will scale
up a workload from 0 ~> 1 and remove the condition ScaledToZero=True. If the condition is not found, the HPA maintains the current behavior of performing no change.
When the HPA scales a workload from 1 ~> 0, it records the ScaledToZero=True condition inside the status.
Risks and Mitigations
As ScaledToZero is no explicit property, applying a new Deployment with replicas: 0 and HPA minReplicas: 0 can be confusing as the Deployment will never scale.
This needs should be documented and is detectable by looking at the existing ScalingActive condition.
In the future pausing the HPA can become an explicit feature and the implicit pausing via replicas: 0 can be deprecate to remove this confusing.
Design Details
Add ScaledToZero as HPA HorizontalPodAutoscalerConditionType
const (
// ScaledToZero indicates that the HPA controller scaled the workload to zero.
ScaledToZero HorizontalPodAutoscalerConditionType = "ScaledToZero"
)
Test Plan
[x] I/we understand the owners of the involved components may require updates to existing tests to make this code solid enough prior to committing the changes necessary to implement this enhancement.
Most logic related to this KEP is contained in the HPA controller so the testing of
the various minReplicas, replicas and ScaledToZero should be achievable with unit tests.
Additionally integration tests should be added for enable scale to zero by, setting
ScaledToZero: true, setting minReplicas: 1 and waiting for replicas to become 0 and another test for increasing minReplicas: 1 and observing that replicas became 1 again and confirming that ScaledToZero: true has been removed.
Prerequisite testing updates
Unit tests
/pkg/controller/podautoscaler:2025-02-06-96.4
Integration tests
HPA integration tests are being introduced via https://github.com/kubernetes/kubernetes/pull/138464
, which includes
a scale-to-zero and back scenario for the HPAScaleToZero feature gate. As a follow-up for beta we plan to add
a negative test case asserting that scale-to-zero does not happen when an HPA is configured with a CPU (resource)
metric, since scaling to zero is intentionally limited to object/external metrics.
e2e tests
E2E tests under https://github.com/kubernetes/kubernetes/tree/master/test/e2e/autoscaling
cover scaling down to 0
and back up from 0 based on an external metric with the HPAScaleToZero feature gate enabled.
[sig-autoscaling] [Feature:HPA] [Feature:HPAScaleToZero] Horizontal pod autoscaling (scale to zero) should scale down to zero and back up based on external metric value: https://github.com/kubernetes/kubernetes/blob/master/test/e2e/autoscaling/horizontal_pod_autosclaing_external_metrics.go
Graduation Criteria
Alpha
- Implement the
ScaleToZerocondition recording - Ensure that all
minReplicasstate transitions from0to1are working as expected
Beta
- Condition-based scale from/to zero implementation merged (https://github.com/kubernetes/kubernetes/pull/135118 )
- Unit tests cover behavior with the
HPAScaleToZerofeature gate both enabled and disabled - E2E test for scale-to-zero and scale-from-zero based on an external metric
under
test/e2e/autoscalinggated onHPAScaleToZero - Integration tests cover scale-to-zero and back (https://github.com/kubernetes/kubernetes/pull/138464 ) and a negative case ensuring HPAs configured with resource (CPU) metrics are not scaled to zero
- Production readiness review approved for beta
- User-facing documentation updated in
kubernetes/website
GA
HPAScaleToZerofeature gate has been enabled by default in beta for at least one release without blocking bugs- Feedback from beta users has been gathered and addressed
- E2E test(s) have been running consistently without flakes
Upgrade / Downgrade Strategy
As this KEP changes the allowed values for minReplicas, special care is required for the downgrade case to not prevent any kind of updates for HPA objects using minReplicas: 0. API validation has accepted minReplicas: 0 with the HPAScaleToZero feature gate enabled since Kubernetes 1.16, so downgrades to any version >= 1.16 will not reject existing HPA objects.
Before downgrading to a version without the condition-based implementation, all HPAs using minReplicas: 0 should be set to minReplicas: 1 and their workloads scaled to at least one replica, otherwise workloads currently scaled to replicas: 0 may remain stuck at replicas: 0 (the old controller cannot distinguish “manually paused” from “HPA scaled to zero” without the ScaledToZero condition).
Version Skew Strategy
This feature only affects control-plane components (kube-apiserver and
kube-controller-manager); there is no interaction with the kubelet, CRI, CNI,
or CSI, so node version skew is not relevant.
The relevant skew is between kube-apiserver and kube-controller-manager:
kube-apiserverupgraded first,kube-controller-managerstill on the previous version: users may create or update HPAs withminReplicas: 0, but the older controller does not understand theScaledToZerocondition. It will treatreplicas: 0as “manually paused” and will not scale the workload back up from zero. Operators should either avoidminReplicas: 0until both components are on the new version, or manually scale affected workloads back toreplicas >= 1after the controller is upgraded.kube-controller-managerupgraded first,kube-apiserverstill on the previous version: this is not a supported skew direction in Kubernetes, but has no adverse effect in practice, since the older API server will continue to rejectminReplicas: 0and the controller simply never observes HPAs that require the new behavior.
The HPAScaleToZero feature gate lives in kube-apiserver (for validation)
and kube-controller-manager (for the condition-based scaling behavior).
Enabling the gate only in one of the two components is effectively a subset of
the skew cases above.
Production Readiness Review Questionnaire
Feature Enablement and Rollback
How can this feature be enabled / disabled in a live cluster?
- Feature gate (also fill in values in
kep.yaml)- Feature gate name:
HPAScaleToZero - Components depending on the feature gate:
kube-apiserver(acceptingminReplicas: 0during validation) andkube-controller-manager(the condition-based scale-from/to-zero behavior)
- Feature gate name:
- Other
Describe the mechanism:
When HPAScaleToZero feature gate is enabled HPA supports scaling to zero pods based on object or external metrics. HPA remains active as long as at least one metric value available.
Will enabling / disabling the feature require downtime of the control plane?
No
Will enabling / disabling the feature require downtime or reprovisioning of a node?
No
Does enabling the feature change any default behavior?
HPA creation/update with minReplicas: 0 is no longer rejected.
Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
Yes. To downgrade the cluster to version that does not support scale-to-zero feature or to disable to feature gate:
Make sure there are no hpa objects with minReplicas=0 and maxReplicas=0. Here is a oneliner to update it to 1:
$ kubectl get hpa --all-namespaces --no-headers=true | awk '{if($6==0) printf "kubectl patch hpa/%s --namespace=%s -p \"{\\\"spec\\\":{\\\"minReplicas\\\":1,\\\"maxReplicas\\\":1}}\"\n", $2, $1 }' | shDisable
HPAScaleToZerofeature gateIn case step 1. has been omitted, workloads might be stuck with
replicas: 0and need to be manually scaled up toreplicas: 1to re-enable autoscaling.
What happens if we reenable the feature if it was previously rolled back?
Nothing, the feature can be re-enabled without problems and workload with replicas: 0 targeted by a HPA will be scaled again.
Are there any tests for feature enablement/disablement?
Yes. Unit tests in pkg/controller/podautoscaler/horizontal_test.go exercise
the HPA controller with the HPAScaleToZero feature gate both enabled and
disabled, covering:
- HPA creation with
minReplicas: 0being rejected by API validation when the gate is off and accepted when the gate is on. - Scaling from zero only occurring when the
ScaledToZero=Truecondition is present (i.e. recorded by the HPA itself) and the gate is enabled. - The controller conservatively leaving a workload at
replicas: 0when the gate is off, so manually paused workloads are not disturbed.
An e2e test exists at
test/e2e/autoscaling/horizontal_pod_autosclaing_external_metrics.go
gated on the HPAScaleToZero feature gate via framework.WithFeatureGate.
Integration tests covering feature-gate on/off paths are being added in
https://github.com/kubernetes/kubernetes/pull/138464
.
Rollout, Upgrade and Rollback Planning
As this is a new field every usage is opt-in. In case the kubernetes version is downgraded, currently scaled to 0 workloads might need to be manually scaled to 1 as the controller would treat them as paused otherwise.
If a rollback is planned, the following steps should be performed before downgrading the kubernetes version:
Make sure there are no hpa objects with minReplicas=0 and maxReplicas=0. Here is a oneliner to update it to 1:
$ kubectl get hpa --all-namespaces --no-headers=true | awk '{if($6==0) printf "kubectl patch hpa/%s --namespace=%s -p \"{\\\"spec\\\":{\\\"minReplicas\\\":1,\\\"maxReplicas\\\":1}}\"\n", $2, $1 }' | shDisable
HPAScaleToZerofeature gateDowngrade the Kubernetes version
How can a rollout or rollback fail? Can it impact already running workloads?
There are no expected side-effects when the rollout fails as the new ScaleToZero condition should only be enabled once the version upgraded completed.
If the kube-apiserver has been upgraded before the kube-controller-manager, an HPA object has been updated to minReplicas: 0 and the workload is already scaled down to 0 replicas, you must manually scale the workload to at least one replica.
You can detect this situation in one of two ways:
Manually, by checking the HPA status and verifying that all entries show ScalingActive set to true and do not mention ScalingDisabled, or
Automatically, by using the
kube_horizontalpodautoscaler_status_conditionmetric provided by kube-state-metrics to ensure theScalingActivecondition istrue.
If an rollback is attempted, all HPAs should be updated to minReplicas: 1 as otherwise HPA for deployments with zero replicas will be disabled until
replicas have been raised explicitly to at least 1.
What specific metrics should inform a rollback?
If workloads an unexpected number of HPA entities contain a the status ScalingActive false and mention ScalingDisable the feature isn’t working as desired and all HPA objects should be updated to > 0 again and their managed workloads should be scaled to at least 1.
This condition can also be detected using the kube_horizontalpodautoscaler_status_condition metric provided by kube-state-metrics
, but reason should be manually confirmed for flagged HPA objects.
Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
The condition-based implementation merged for 1.36 (https://github.com/kubernetes/kubernetes/pull/135118 ), so the upgrade and rollback paths can now be exercised:
- Upgrade (gate off → on): existing HPAs are unaffected.
minReplicas: 0only starts being accepted once the gate is enabled onkube-apiserver, and scale-to-zero only occurs once the gate is enabled onkube-controller-manager. NoScaledToZerocondition exists on objects created before the upgrade, so the controller takes no new action on them. - Downgrade / disablement (gate on → off): the controller stops scaling
workloads to zero. Workloads already at
replicas: 0with a recordedScaledToZerocondition must be scaled back toreplicas: 1before the downgrade (see the rollback steps above), because the disabled/older controller treatsreplicas: 0as manually paused and will not scale it back up. - Upgrade → downgrade → upgrade: re-enabling the gate resumes
scale-from-zero for HPAs still configured with
minReplicas: 0; no manual recovery is required for objects that were left untouched.
This behavior is covered by the integration tests added in https://github.com/kubernetes/kubernetes/pull/138464 , which exercise the feature-gate on/off paths.
Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
No.
Monitoring Requirements
How can an operator determine if the feature is in use by workloads?
The new status will be visible inside the kube_horizontalpodautoscaler_status_condition metric provided by kube-state-metrics
as
and the minReplicas: 0 setting reflected in kube_horizontalpodautoscaler_spec_min_replicas.
How can someone using this feature know that it is working for their instance?
When this feature is enabled for a workload scaled based on an object or external metric, the workload should be scaled to 0 replicas when the metric is 0.
What are the reasonable SLOs (Service Level Objectives) for the enhancement?
No changes to the autoscaling SLOs.
Scaling from 0 ~> 1 is a new path, but it reuses the regular HPA reconcile
loop and is therefore bounded by the existing sync period
(--horizontal-pod-autoscaler-sync-period, default 15s): once the object or
external metric crosses the threshold, the workload is scaled up on the next
reconcile, plus whatever freshness the metrics pipeline adds. The downscale
stabilization window does not apply to scale-up, so no additional delay is
introduced beyond a single sync period.
What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
No changes to the autoscaling SLIs.
Are there any missing metrics that would be useful to have to improve observability of this feature?
No, in regards to this KEP.
Dependencies
Does this feature depend on any specific services running in the cluster?
The addition has the same dependencies as the current autoscaling controller.
Scalability
Will enabling / using this feature result in any new API calls?
No, the amount of autoscaling related API calls will remain unchanged. No other components are affected.
Will enabling / using this feature result in introducing new API types?
No, this only modifies the existing API types.
Will enabling / using this feature result in any new calls to the cloud provider?
No, the amount of autoscaling related cloud provider calls will remain unchanged. No other components are affected.
Will enabling / using this feature result in increasing size or count of the existing API objects?
Yes, one additional boolean field inside the spec of every HorizontalPodAutoscaler resource.
Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
No, the are no visible latency changes expected for existing autoscaling operations.
Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, …) in any components?
No, the are no visible changes expected for existing autoscaling operations.
Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
No.
Troubleshooting
How does this feature react if the API server and/or etcd is unavailable?
Autoscaling will not occur, this is the same as the current behaviour.
What are other known failure modes?
- Failed to fetch the relevant object or external metrics.
- Detection:
ScalingActive: falsecondition withFailedGetExternalMetricorFailedGetObjectMetricreason. - Mitigations: manually scale the resource.
- Diagnostics: Related errors should be printed as the messages of
ScalingActive: false. - Testing: https://github.com/kubernetes/kubernetes/blob/0e3818e02760afa8ed0bea74c6973f605ca4683c/pkg/controller/podautoscaler/replica_calculator_test.go#L451
- Detection:
What steps should be taken if SLOs are not being met to determine the problem?
Check metric_computation_duration_seconds to see which metric encountered the latency issue.
If the latency problem is caused by metrics used for scaling to zero, you can remove those metrics again from your HPA(s).
Implementation History
- (2019/02/25) Original design doc: https://github.com/kubernetes/kubernetes/issues/69687#issuecomment-467082733
- (2019/07/16) Alpha implementation (https://github.com/kubernetes/kubernetes/pull/74526 ) merged for Kubernetes 1.16
- (2026/03/18) Alpha re-implementation (https://github.com/kubernetes/kubernetes/pull/135118 ) merged for Kubernetes 1.36
- (2026/04/21) Targeted for Beta graduation in Kubernetes 1.37
Drawbacks
Alternatives
Third-party solutions like KEDA already support scaling to zero for various resource (e.g. RabbitMQ Queues . However, these solutions often introduce additional paradigms and complexity. Since Horizontal Pod Autoscaling is already a core feature of Kubernetes and supports scaling to one, adding native support for scaling to zero would be a valuable and low-complexity enhancement.