Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workers don't terminate after tests finish #164

Open
avik-pal opened this issue Jul 19, 2024 · 0 comments
Open

Workers don't terminate after tests finish #164

avik-pal opened this issue Jul 19, 2024 · 0 comments

Comments

@avik-pal
Copy link

I have been seeing this specifically on GPU tests. See the logs in the link https://buildkite.com/julialang/luxlib-dot-jl/builds/797#0190cc64-0b5a-4e2a-9e47-795d8fa7176e/309-616

The Batch Normalization, Group Normalization, and Instance Normalization tests are "DONE" but those workers never terminate, which eventually leads to the job timing out. This problem doesn't show up when the same tests are run on Github Actions (exclusively CPU tests).

If I set the number of workers to not run in parallel then tests finish as expected. I have ReTestItems setup to run GPU testing on other repos (and they work perfectly), so I am not sure what is causing this issue.

P.S. This repo is amazing, it has cut down on our CI timings a great deal (and makes local testing so much easier)!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant