Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue:3308] SIGTERM Graceful shutdown functionality #3340

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jaswanthikolla
Copy link

@jaswanthikolla jaswanthikolla commented Jun 14, 2024

This is to make runner compatible with Kubernetes' Karpenter, and in general k8s pod movement . This fixes the #3308 by handling graceful shutdown of the runner. It does following.

  1. If the runner is just listening for jobs and Idle, It will just exit.
  2. If the runner is running a job, It will wait RUNNER_GRACEFUL_STOP_TIMEOUT seconds before terminating or job completion whichever happens first.

@jaswanthikolla jaswanthikolla requested a review from a team as a code owner June 14, 2024 02:44
@jaswanthikolla
Copy link
Author

Any ETA on when can we expect a review on this PR?

@ccincotti3
Copy link

This would be really great to get in assuming it works, we're also experiencing this.

@moosh3
Copy link

moosh3 commented Oct 27, 2024

Would love to see this merged

@joosangkim
Copy link

This PR is an essential bug fix for using github runner with Karpenter.

@jaswanthikolla
Copy link
Author

jaswanthikolla commented Nov 20, 2024

Karpenter support is essential to save significant cost savings across all companies. We save easily $300k+ per year, Scaling that across 1000's of tech companies, Karpenter support can easily save $50 million+ and associated CO2 Emissions.

Can we prioritize reviewing and merging this PR?

@velkovb
Copy link

velkovb commented Dec 12, 2024

Upvote for the PR. We ended up implementing a custom image and baking in the script. However, we noticed that it is not behaving properly in dind runners as the signal is only captured on the runner container and the docker socket dies. Moving dind to a sidecar container has solved it for us - actions/actions-runner-controller#3842

@alec-drw
Copy link

@velkovb could I inquire as to the errors you saw when the runner did not capture the signal correctly? I have observed behavior in with ephemeral pvc's get stuck in the Released state after docker fails to cleanly shutdown, leading to an eventual break in the storage provisioner.

Have been leaning towards using the Kubernetes buildkit driver as the solution, but a side car would certainly be easier

@velkovb
Copy link

velkovb commented Dec 12, 2024

@velkovb could I inquire as to the errors you saw when the runner did not capture the signal correctly? I have observed behavior in with ephemeral pvc's get stuck in the Released state after docker fails to cleanly shutdown, leading to an eventual break in the storage provisioner.

Have been leaning towards using the Kubernetes buildkit driver as the solution, but a side car would certainly be easier

We were seeing errors that connection to the docker socket was lost during an image build. We get a SIGTERM signal and the runner container handles it properly but the dind one doesn't and terminates so docker host disappears and build breaks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants