-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix double-fork hang on Windows/ARM64 #73
Conversation
I tried the detach (the patch in the comment above) first. and that blew up pretty spectacularly. I think I tried something like |
I should learn how to read: @jeremyd2019 already reported in https://inbox.sourceware.org/cygwin-developers/23f23b0a-e60e-e3ff-4c1e-295599fdc813@jdrake.com/ that:
|
@jeremyd2019 FWIW I just tried to reproduce the hang in a VS Code terminal, running |
I think if you tried the double-fork reproducer I had (https://gist.github.com/jeremyd2019/3156721497096d0bba00ef19a507f619) over and over and over, eventually you would see a crash. For me, the QC710 seemed to have an issue sooner than the other devices I have (2023 Dev Kit & Raspberry Pi 4 running Windows 10 - necessarily i686 cygwin 3.3 on that one). Particularly, the crash showed up in the exit status check https://gist.github.com/jeremyd2019/3156721497096d0bba00ef19a507f619#file-testfork-c-L56 |
@jeremyd2019 right you are; the |
588f46a
to
6469fc6
Compare
@jeremyd2019 I hope that you don't mind that inspired by your report, I lifted your commit from |
The more testing the better! Personally, I was planning to wait until upstream had a chance to look at it before opening PRs for msys2-runtime. |
Good news! I have been running the reproducer for almost an hour now, without reproducing the dead-lock. Will keep it running for a while, but this is exciting news! |
@dennisameling would you have time to craft a git-sdk-arm64 PR that downloads this PR build's artifact, replaces |
Thanks for looking into this! My agenda is jam-packed for the rest of the week, so earliest I can do is Saturday. Will block some time then and will post an update here when I have something 👍🏼 |
Oh, no worries, if you're even more jam-packed than I am, I'll try to get to it later today. |
…indows/ARM64 deadlocks I updated git-for-windows/msys2-runtime#73 to build the newest iteration of the dead-lock workaround, which worked 100% in my tests. Let's verify that it works around the issues in `update-via-pacman.ps1`, too. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
…indows/ARM64 deadlocks I updated git-for-windows/msys2-runtime#73 to build the newest iteration of the dead-lock workaround, which worked 100% in my tests. Let's verify that it works around the issues in `update-via-pacman.ps1`, too. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
…indows/ARM64 deadlocks I updated git-for-windows/msys2-runtime#73 to build the newest iteration of the dead-lock workaround, which worked 100% in my tests. Let's verify that it works around the issues in `update-via-pacman.ps1`, too. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
…indows/ARM64 deadlocks I updated git-for-windows/msys2-runtime#73 to build the newest iteration of the dead-lock workaround, which worked 100% in my tests. Let's verify that it works around the issues in `update-via-pacman.ps1`, too. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
…indows/ARM64 deadlocks I updated git-for-windows/msys2-runtime#73 to build the newest iteration of the dead-lock workaround, which worked 100% in my tests. Let's verify that it works around the issues in `update-via-pacman.ps1`, too. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
…indows/ARM64 deadlocks I updated git-for-windows/msys2-runtime#73 to build the newest iteration of the dead-lock workaround, which worked 100% in my tests. Let's verify that it works around the issues in `update-via-pacman.ps1`, too. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
…indows/ARM64 deadlocks I updated git-for-windows/msys2-runtime#73 to build the newest iteration of the dead-lock workaround, which worked 100% in my tests. Let's verify that it works around the issues in `update-via-pacman.ps1`, too. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
…indows/ARM64 deadlocks I updated git-for-windows/msys2-runtime#73 to build the newest iteration of the dead-lock workaround, which worked 100% in my tests. Let's verify that it works around the issues in `update-via-pacman.ps1`, too. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
…indows/ARM64 deadlocks I updated git-for-windows/msys2-runtime#73 to build the newest iteration of the dead-lock workaround, which worked 100% in my tests. Let's verify that it works around the issues in `update-via-pacman.ps1`, too. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
…indows/ARM64 deadlocks I updated git-for-windows/msys2-runtime#73 to build the newest iteration of the dead-lock workaround, which worked 100% in my tests. Let's verify that it works around the issues in `update-via-pacman.ps1`, too. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Well, this was more tedious than I wanted, but I finally got through all the usual GitHub Actions debugging until I got the workflow definition right. And it works as I hoped! @dennisameling with this, I am willing to start that beta phase of Git for Windows/ARM64. @jeremyd2019 This is another piece of evidence that your patch is the long-awaited work-around for our problems! |
/open pr The workflow run was started |
Watch out for msys2/msys2-runtime#234 (comment), potential fixup msys2/msys2-runtime#236 |
This addresses an extremely difficult to debug deadlock when running under emulation on ARM64. A relatively easy way to trigger this bug is to call `fork()`, then within the child process immediately call another `fork()` and then `exit()` the intermediate process. It would seem that there is a "code emulation" lock on the wait thread at this stage, and if the thread is terminated too early, that lock still exists albeit without a thread, and nothing moves forward. It seems that a `SuspendThread()` combined with a `GetThreadContext()` (to force the thread to _actually_ be suspended, for more details see https://devblogs.microsoft.com/oldnewthing/20150205-00/?p=44743) makes sure the thread is "booted" from emulation before it is suspended. Hopefully this means it won't be holding any locks or otherwise leave emulation in a bad state when the thread is terminated. Also, attempt to use `CancelSynchonousIo()` (as seen in `flock.cc`) to avoid the need for `TerminateThread()` altogether. This doesn't always work, however, so was not a complete fix for the deadlock issue. Cherry picked from commit e09c64ef65 (cygthread: suspend thread before terminating., 2024-11-11) from msys2/msys2-runtime. Addresses: https://cygwin.com/pipermail/cygwin-developers/2024-May/012694.html Signed-off-by: Jeremy Drake <cygwin@jdrake.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Suppress error output if ReadFile on child wait pipe returns ERROR_OPERATION_ABORTED due to addition of CancelSynchronousIo call. Cherry picked from commit eafd9a22bc (fixup! cygthread: suspend thread before terminating., 2024-11-13) from msys2/msys2-runtime. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
6469fc6
to
54a7252
Compare
/open pr The workflow run was started |
On Windows/ARM64, running the 64-bit version of Git for Windows could infrequently cause deadlocked threads (see e.g. [this report](msys2/msys2-autobuild#62) or [this one](https://inbox.sourceware.org/cygwin-developers/78f294de-4c94-242a-722e-fd98e51edff9@jdrake.com/)), [which was addressed](git-for-windows/msys2-runtime#73). Signed-off-by: gitforwindowshelper[bot] <gitforwindowshelper-bot@users.noreply.github.com>
This is yet another new attempt to address the hangs we observed e.g. in msys2/msys2-autobuild#62, based on the new insight that code-cache locks causes the hangs because the associated threads are already gone.
The idea here is to suspend the
wait_thread
(callingGetThreadContext()
to actually ensure that it is suspended before continuing) before terminating it.