-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cygwin: signal: Do not handle signal when __SIGFLUSHFAST is sent #251
Cygwin: signal: Do not handle signal when __SIGFLUSHFAST is sent #251
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This member was renamed on master but not in 3.5 branch
After the commit d243e51, zsh sometimes hangs at startup. This occurs because SIGCHLD, which should trigger sigsuspend(), is handled in cygwait() that is used to wait for a wakeup event in sig_send(), even when __SIGFLUSHFAST is sent. Despite __SIGFLUSHFAST being required to return before handling the signal, this does not happen. With this patch, if the signal currently being sent is __SIGFLUSHFAST, do not handle the received signal and keep it asserted after the cygwait() for the wakeup event. Apply the same logic to the cygwait() in the retrying loop for WriteFile() as well. Applied-from: https://inbox.sourceware.org/cygwin-patches/20241223013332.1269-1-takashi.yano@nifty.ne.jp Addresses: https://cygwin.com/pipermail/cygwin/2024-December/256954.html Fixes: d243e51 ("Cygwin: signal: Fix deadlock between main thread and sig thread") Reported-by: Daisuke Fujimura <booleanlabel@gmail.com> Reviewed-by: Signed-off-by: Takashi Yano <takashi.yano@nifty.ne.jp> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
6326293
to
f1affcc
Compare
Thanks @jeremyd2019! The build is now successful. |
I tested the msys-2.0.dll from the CI artifact on the arm64 autobuild, rebuilding imagemagick which hung 3 times in a row with 3.5.5-1, and it succeeded. |
Hrm. This is apparently still not good enough. I noticed that Git's test suite would time out after 6h with MSYS2 runtime v3.5.5, opened git-for-windows/msys2-runtime#80 to apply this here fix, and then triggered another Git test suite run. It looks like it is hanging again (albeit in less matrix jobs, which is good). |
Have you reported this to upstream Cygwin/Takashi? |
@jeremyd2019 No, I didn't have time to investigate yet... so I do not even have a reproducer. |
I spent quite a while on trying to reproduce, and unsurprisingly it is hard. It is almost certainly a race condition, indicating that some sort of locking is needed and just isn't there. I'll just jot down some notes to be able to jog my memory later: The test script in Git's test suite that I zoomed in on is I actually ran Apparently this was "lucky" because re-running that test script individually did not cause a hang. Then, I ran Now, this is where it gets interesting: both of those jobs got stuck in the first "test case" (Git's test suite uses a home-grown test framework that lacks modern test frameworks' features like "setup" and "teardown" functions, therefore the first test case frequently serves as a common "setup" step and it is quite normal, unfortunately so, for the test cases to depend on side effects of previous test cases; As you can tell, I am absolutely not a fan of the way Git's test suite is architected), when running The When I had Ctrl+C'ed the hung test script and there were still a couple of left-over processes blocking the working directory of that test from being deleted, I had actually tried to make some sense of the process topology when that
I had to stop, manually (via This is an unsatisfying place to take a break from my investigation, but it'll have to do for now. Potential next steps:
|
This is odd, so far the issues that this patch helped could be killed with SIGTERM within Cygwin. |
There were two stuck jobs here right now: https://github.com/msys2/msys2-autobuild/actions/runs/12514784401/job/34911395541 |
After the commit d243e51, zsh sometimes hangs at startup. This occurs because SIGCHLD, which should trigger sigsuspend(), is handled in cygwait() that is used to wait for a wakeup event in sig_send(), even when __SIGFLUSHFAST is sent. Despite __SIGFLUSHFAST being required to return before handling the signal, this does not happen. With this patch, if the signal currently being sent is __SIGFLUSHFAST, do not handle the received signal and keep it asserted after the cygwait() for the wakeup event. Apply the same logic to the cygwait() in the retrying loop for WriteFile() as well.
Applied-from: https://inbox.sourceware.org/cygwin-patches/20241223013332.1269-1-takashi.yano@nifty.ne.jp Addresses: https://cygwin.com/pipermail/cygwin/2024-December/256954.html Fixes: d243e51 ("Cygwin: signal: Fix deadlock between main thread and sig thread")
Reported-by: Daisuke Fujimura booleanlabel@gmail.com
Reviewed-by: