Lower Default terminated-pod-gc-threshold
to Prevent Excessive Accumulation of Failed Pods
#760
andy108369
started this conversation in
General
Replies: 1 comment
-
I have a hunch that this sort of issue might be the one that contributed (not caused, but contributed) to the following issues:
I did see the
e.g. there were https://gist.github.com/andy108369/b277e5b27fd9d18f089bc47914c5b2b8
The Kubernetes change should rather be simple: |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
The current default value of
terminated-pod-gc-threshold
in Kubernetes (12500
) allows a large number of failed pods to accumulate in a cluster, which can lead to operational inefficiencies. While pod failures, such as those caused by exceeding ephemeral storage limits, are expected in scenarios where tenants allocate less storage than applications require, the excessive accumulation of failed pods introduces unnecessary overhead.Observed Issue:
Error
orContainerStatusUnknown
states are retained unnecessarily, creating clutter in namespaces and increasing etcd storage usage, which can affect API performance and operational management.Real example of the Issue:
Suggested Change:
Lower the default value of
terminated-pod-gc-threshold
to 10 to:Rationale:
This adjustment ensures that the garbage collection mechanism aligns with real-world scenarios where pod failures are anticipated but should not result in unbounded accumulation of terminated pods. A lower threshold strikes a balance between retaining recent pod history for troubleshooting and maintaining an efficient cluster state.
Would appreciate feedback from the community on the feasibility and implications of this proposed change.
Beta Was this translation helpful? Give feedback.
All reactions