-
Notifications
You must be signed in to change notification settings - Fork 478
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A cancelled run may leave you with a broken .NET installation #501
Comments
Hello @prplecake |
Hello @prplecake Thank you for reaching out to us and providing the information about your issue. I'm unable to reproduce it. To help us understand it better, could you kindly share the repo link with minimum code(or workflow file) to reproduce the issue. Please feel free to reach out us in case of any further queries! |
It shouldn't matter what the repo contents are. The workflow was cancelled in the middle of an archive extraction. Then subsequent attempts to run the workflow would fail because setup-dotnet thinks .NET finished installing since the directory is present, I assume.
However, since the directory is only half-extracted, dotnet did not finish installing, so trying to use it threw a bunch of CS0006 errors. I really can't provide more information than that since that's all of it! Get a workflow to cancel in the middle of a dotnet extraction - that's the reproduction. It's possible the chances of this happening are low enough I was just extremely unlucky to have been affected. There needs to be a better "cleanup process" for cancelled workflows. |
Hello @prplecake, The issue you encountered where cancelling a workflow during the setup-dotnet action left the runner in a broken state appears to be a transient one, specific to that particular run. The reason is that GitHub Actions are designed to ensure each job starts in a clean state. This is achieved by automatically cleaning up runners between jobs, which includes removing any changes made to the runner's environment during the job execution. This means the issue you experienced should not persist across different runs, and a new run should start with a clean, operational runner. However, in this case, the cancellation appears to have happened at a critical point during the archive extraction, which led to the unexpected state. But the same issue was not reproducing from our side if we cancelling the run in mid of the archieve extraction and the subsequent runs installing the respective .net version freshly irrespective of the previous if it got cancelled in midway. In this URL, tried to cancel the stepup-dotnet step in mid of the archieve extraction. But the .net installation got succeeded immediately in the next run. Please feel free to reach us incase of any other concerns. Thank You!! |
Got it. I appreciate the explanation. It's probably mostly a result of my self-hosted runner being a bit less ephemeral than normal runners. |
Hello @prplecake, Thank you for the confirmation!! For now we are closing this issue as this is not a recurrent issue and it is successfully installing the partially installed files in the subsequent runs even it is partially installed for the cancelled run. Please feel free to reach us in case of any further concerns. |
Hello @shaanmugapriya, Today we encountered this exact same error. A cancelled workflow cancelled at exactly the right moment, and we ended up with a
When we inspected this runner manually we confirmed the
Note the 12 seconds between All subsequent executions of
Had it attempted to repair the missing SDKs we would not have observed this issue. We use GitHub Enterprise Server and for a variety of performance reasons we have long-lived GHA runners that only rotate out on failed health checks or a 24 hour time limit. Once we identified the problematic runner and the specific issue of the incomplete Possible repro stepsWe have not tested this, though we believe it would simulate the problem we encountered.
Suggested low-tech solution: installation lockfilesIf setup-dotnet attempts to install an SDK and does not run to successful completion, it should not mark the SDK as actually installed. Preferably it should not appear in A straightforward fix would be a lockfile in the |
Hello Everyone, We are reopening this issue for Implementing Lockfile Mechanism to handle Incomplete .NET SDK Installations caused by cancelled jobs. |
Description:
Cancelling a workflow, manually or due to concurrency groups rules, while it's trying to setup a .NET installation may leave the runner in a broken state.
Task version:
v4
Platform:
Runner type:
Repro steps:
Cancel a workflow while it's running the setup-dotnet action, possibly specifically while it's extracting an archive.
setup-dotnet log
Specifically, this left me with a half-extracted directory at
C:\Program Files\dotnet\packs\Microsoft.NETCore.App.Ref\6.0.26
. Deleting the6.0.26
directory, then re-running the workflow was successful.Expected behavior:
Either finish the archive extraction, or rollback unfinished changes.
Actual behavior:
Workflow stops immediately, leaving runner in broken state.
The text was updated successfully, but these errors were encountered: