Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FOR-TESTING] Merge the cygwin-3_5-branch #78

Draft
wants to merge 51 commits into
base: main
Choose a base branch
from
Draft

Conversation

dscho
Copy link
Member

@dscho dscho commented Dec 7, 2024

This is purely for some early testing of changes that are slated to go into Cygwin v3.5.5.

github-cygwin and others added 30 commits August 25, 2024 11:59
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
When win32-input-mode (which is supported by Windows Termainal) is
set by "\033[?9001h", cons_master_thread does not work properly and
consumes larger and larger memory space. This is because sending
event by WriteConsoleInput() is translated into the sequence that
is used by win32-input-mode. Due to this behaviour, write-back
of the INPUT_RECORDs does not work as expected. With this patch,
cons_master_thread is disabled on win32-input-mode where the signal
keys such as Ctrl-C, Ctrl-Z etc. never comes.

Addresses: https://cygwin.com/pipermail/cygwin/2024-August/256380.html
Fixes: ff4440f ("Cygwin: console: Introduce new thread which handles input signal.")
Reported-by: Adamyg Mob <adamyg.mob@gmail.com>
Signed-off-by: Takashi Yano <takashi.yano@nifty.ne.jp>
(cherry picked from commit 84d77e5918e18170c393407d477140fcf5d3e432)
After the commit 7f3c22532577, writing to pipe extremely slows down.
This is because cygwait(select_sem, 10, cw_cancel) is called even
when write operation is already completed. With this patch, the
cygwait() is called only if the write operation is not completed.

Addresses: https://cygwin.com/pipermail/cygwin/2024-August/256398.html
Fixes: 7f3c22532577 ("Cygwin: pipe: handle signals explicitely in raw_write")
Reported-by: Jim Reisert AD1C <jjreisert@alum.mit.edu>
Reviewed-by: Corinna Vinschen <corinna@vinschen.de>
Signed-off-by: Takashi Yano <takashi.yano@nifty.ne.jp>
(cherry picked from commit f78009cb1ccf84cc343cf2441c76196461d87532)
All three warnings produced with -Og are false positives.
But given we're using -Werror unconditionally it's better
to be safe than sorry.

Reported-by: Kevin Ushey <kevinushey@gmail.com>
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
(cherry picked from commit 2a2a6486a089b90d05b258bd853d136f00cb7c69)
If a cygwin app is executed from a non-cygwin app and the cygwin
app exits, the read pipe remains in the non-blocking mode because
of the commit fc691d0246b9. Due to this behaviour, the non-cygwin
app cannot read the pipe correctly after that. Similarly, if a
non-cygwin app is executed from a cygwin app and the non-cygwin
app exits, the read pipe remains in the blocking mode. With this
patch, the blocking mode of the read pipe is stored into a variable
was_blocking_read_pipe on set_pipe_non_blocking() when the cygwin
app starts and restored on close(). In addition, the pipe mode is
set to non-blocking mode in raw_read() if the mode is blocking
mode by referring the variable is_blocking_read_pipe as well.
is_blocking_read_pipe is a member of fhandler_pipe class and is set
by set_pipe_non_blocking(), so if other process sets the pipe mode
to blocking mode, the current process cannot know the pipe is
blocking mode. Therefore, is_blocking_read_pipe is also set on the
signal __SIGNONCYGCHLD, which is sent to the process group when
non-cygwin app is started.

Addresses: git-for-windows/git#5115
Fixes: fc691d0246b9 ("Cygwin: pipe: Make sure to set read pipe non-blocking for cygwin apps.");
Reported-by: isaacag, Johannes Schindelin <Johannes.Schindelin@gmx.de>
Reviewed-by: Corinna Vinschen <corinna@vinschen.de>, Ken Brown <kbrown@cornell.edu>
Signed-off-by: Takashi Yano <takashi.yano@nifty.ne.jp>
If the parent process has already used pread() or pwrite(), these
functions fail with EBADF if used on the inherited fd.  Ensure that
fix_after_fork() is called to invalidate the prw_handle.  This issue
has been detected by 'stress-ng --pseek 1'.

Fixes: c36cd56 ("* fhandler.cc (fhandler_base::open): Drop local create_options variable.")
Signed-off-by: Christian Franke <christian.franke@t-online.de>
timer_delete() always returned failure.  This issue has been
detected by 'stress-ng --hrtimers 1'.

Fixes: 229ea3f ("Cygwin: posix timers: reimplement using OS timer")
Signed-off-by: Christian Franke <christian.franke@t-online.de>
The decomposition needs to be into 12+24 bits of precision for extra-
precision multiplication, but was into 13+24 bits. On i386 with -O1 the
bug was hidden by accidental extra precision, but on amd64, in 2^32
trials the bug caused about 200000 errors of more than 1 ulp, with a
maximum error of about 80 ulps. Now the maximum error in 2^32 trials
on amd64 is 0.8573 ulps. It is still 0.8316 ulps on i386 with -O1.

The nearby decomposition of 1/ln2 and the decomposition of 2/(3ln2) in
the double precision version seem to be sub-optimal but not broken.

Reference: freebsd/freebsd-src@b4437c3
Original Author: Bruce Evans
(1) The bit for the 1.0 part of bp[k] was right shifted by 4.  This
    seems to have been caused by a typo in converting e_pow.c to
    e_powf.c.
(2) The lower 12 bits of ax+bp[k] were not discarded, so t_h was
    actually plain ax+bp[k].  This seems to have been caused by a logic
    error in the conversion.

These bugs gave wrong results like:

    powf(-1.1, 101.0) = -15158.703 (should be -15158.707)
      hex values: BF8CCCCD 42CA0000 C66CDAD0 C66CDAD4

Fixing (1) gives a result wrong in the opposite direction
(hex C66CDAD8), and fixing (2) gives the correct result.

ucbtest has been reporting this particular wrong result on i386 systems
with unpatched libraries for 9 years.  I finally figured out the extent
of the bugs.  On i386's they are normally hidden by extra precision.
We use the trick of representing floats as a sum of 2 floats (one much
smaller) to get extra precision in intermediate calculations without
explicitly using more than float precision.  This trick is just a
pessimization when extra precision is available naturally (as it always
is when dealing with IEEE single precision, so the float precision part
of the library is mostly misimplemented).  (1) and (2) break the trick
in different ways, except on i386's it turns out that the intermediate
calculations are done in enough precision to mask both the bugs and
the limited precision of the float variables (as far as ucbtest can
check).

ucbtest detects the bugs because it forces float precision, but this
is not a normal mode of operation so the bug normally has little effect
on i386's.

On systems that do float arithmetic in float precision, e.g., amd64's,
there is no accidental extra precision and the bugs just give wrong
results.

Reference: freebsd/freebsd-src@12be4e0
Original Author: Bruce Evans
Fixed another precision bug in powf(). This one is in the computation
[t=p_l+p_h High]. We multiply t by lg2_h, and want the result to be
exact. For the bogus float case of the high-low decomposition trick, we
normally discard the lowest 12 bits of the fraction for the high part,
keeping 12 bits of precision. That was used for t here, but it doesnt't
work because for some reason we only discard the lowest 9 bits in the
fraction for lg2_h.  Discard another 3 bits of the fraction for t to
compensate.

This bug gave wrong results like:

powf(0.9999999, -2.9999995) = 1.0000002 (should be 1.0000001)
hex values: 3F7FFFFF C03FFFFE 3F800002 3F800001

As explained in the log for the previous commit, the bug is normally
masked by doing float calculations in extra precision on i386's, but is
easily detected by ucbtest on systems that don't have accidental extra
precision.

Reference: freebsd/freebsd-src@5f20e5c
Original Author: Bruce Evans
Previously, adding a new lock by lockf() over multiple existing locks
failed. This is due to a bug that lf_setlock() tries to create a lock
that has already been created. This patch fixes the issue.

Addresses: https://cygwin.com/pipermail/cygwin/2024-October/256528.html
Fixes: a998dd7 ("* flock.cc: Implement all advisory file locking here.")
Reported-by: Christian Franke <Christian.Franke@t-online.de>
Reviewed-by: Corinna Vinschen <corinna@vinschen.de>
Signed-off-by: Takashi Yano <takashi.yano@nifty.ne.jp>
Previously, lockf() printed a warning message when the number of locks
per file exceeds the limit (MAX_LOCKF_CNT). This patch makes lockf()
return ENOLCK in that case rather than printing the warning message.

Addresses: https://cygwin.com/pipermail/cygwin/2024-October/256528.html
Fixes: 31390e4 ("(inode_t::get_all_locks_list): Use pre-allocated buffer in i_all_lf instead of allocating every lock.  Return pointer to start of linked list of locks.")
Reported-by: Christian Franke <Christian.Franke@t-online.de>
Reviewed-by: Corinna Vinschen <corinna@vinschen.de>
Signed-off-by: Takashi Yano <takashi.yano@nifty.ne.jp>
There was a long-standing issue that pseudo console ownership could
not hand over from the process whose ctty is /dev/cons* rather than
/dev/pty*. This problem happens when a cygwin app starts non-cygwin
app in a pty, then the non-cygwin app starts multiple cygwin apps,
and the non-cygwin app ends before the second cygwin apps end.
In this case, the stub process of the non-cygwin app hands over the
ownership of pcon to one of the second cygwin apps, however, this
app does not hand over the ownership of pcon to another second
cygwin app. This is due to the fact that the hand-over feature is
implemented only in fhandler_pty_slave but not in fhandler_console.

With this patch, the second cygwin apps check if their console device
is inside a pseudo console, and if so, it tries to hand over the
ownership of the pseudo console to anther process that is attached
to the same pseudo console.

Addresses: https://cygwin.com/pipermail/cygwin/2024-February/255388.html
Fixes: 253352e ("Cygwin: pty: Allow multiple apps to enable pseudo console simultaneously.")
Reported-by: lmari Lauhakangas <ilmari.lauhakangas@libreoffice.org>, Hossein Nourikhah <hossein@libreoffice.org>
Signed-off-by: Takashi Yano <takashi.yano@nifty.ne.jp>
The expression computing the next-less-power of 2 for the next write
when the pipe buffer is getting filled up allows negative shift values.
This works on Intel CPUs because the shift expression only evaluates the
5 LSBs, but it's undefined behaviour per the C standard.  Use the
correct expression to get a positive shift value.

Fixes: 170e6badb621 ("Cygwin: pipe: improve writing when pipe buffer is almost full")
Reported-by: Takashi Yano <takashi.yano@nifty.ne.jp>
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
Previously, sigfe had a long-standing problem that the signal handler
destroys fpu states. This is caused by fninit instruction in sigdelayed.
With this patch, instead of fnstcw/fldcw and fninit, fnstenv/fldenv
are used to maintain fpu states.
Addresses: https://cygwin.com/pipermail/cygwin/2024-October/256503.html

Fixes: ed89fbc ("* gendef (sigdelayed (x86_64)): Save and restore FPU control word.")
Reported-by: Christian Franke <Christian.Franke@t-online.de>
Reviewed-by:
Signed-off-by: Takashi Yano <takashi.yano@nifty.ne.jp>
Currently, open() tries to attach to the console which is owned by
the console owner process. However, when the owner process calls
exec(), AttachConsole() to dwProcessId may sometimes fail due to
unlucky timing. With this patch, open() tries to attach also to
exec_dwProcessId if attaching to dwProcessId fails. That is, open()
tries to attach to both the stub process and target process to
prevent the above situation.

Fixes: 3721a75 ("Cygwin: console: Make the console accessible from other terminals.")
Signed-off-by: Takashi Yano <takashi.yano@nifty.ne.jp>
Previous fix (commit df0953aa298c) fixes only a part of the problem.
Since exec() overrides the cygwin pid of the caller process, it makes
console owner handling complex. This patch makes console use Windows
pid as the owner pid (con.owner) instead of cygwin pid to make the
handling simpler.

Fixes: df0953aa298c ("Cygwin: console: Fix open() failure when the console owner calls exec().")
Signed-off-by: Takashi Yano <takashi.yano@nifty.ne.jp>
(cherry picked from commit 90ddab98780da0e888ff09987d4651107c0f48d6)
…ssId

The commit 90ddab98780d uses myself->dwProcessId to get windows pid.
However, it will be overridden in stub process if exec() is called.
With this patch, GetCurrentProcessId() instead of myself->dwProcessId.

Fixes: 90ddab98780d ("Cygwin: console: Re-fix open() failure on exec() by console owner")
Signed-off-by: Takashi Yano <takashi.yano@nifty.ne.jp>
(cherry picked from commit 8ee8b0c974d723d618182fbbdeb1e9186b69c6f3)
Previously, the condition to clean up input/output mode was based
on wrong premise. This patch fixes that.

Fixes: 8ee8b0c974d7 ("Cygwin: console: Use GetCurrentProcessId() instead of myself->dwProcessId")
Signed-off-by: Takashi Yano <takashi.yano@nifty.ne.jp>
(cherry picked from commit 30d266947842fec82cae9a190bc8b5bf2e108cd0)
…ntax errors

This commit revises `pthread_cleanup_push` and `pthread_cleanup_pop`
macros to use a `do { ... } while(0)` wrapper, preventing syntax errors
when used in certain contexts. The original code could fail when they
are wrapped within a `do { ... } while(0)`, causing unintended behavior
or compilation issues.

Example of error:

  #include <pthread.h>

  #define pthread_cleanup_push_wrapper(_fn, _arg) do { \
    pthread_cleanup_push(_fn, _arg); \
  } while (0)

  #define pthread_cleanup_pop_wrapper(_execute) do { \
    pthread_cleanup_pop(_execute); \
  } while (0)

  void cleanup_fn (void *arg) {}

  void *thread_func (void *arg)
  {
    pthread_cleanup_push_wrapper(cleanup_fn, NULL);
    pthread_cleanup_pop_wrapper(1);
    return NULL;
  }

  int main (int argc, char **argv) {
    pthread_t thread_id;
    pthread_create(&thread_id, NULL, thread_func, NULL);
  }

This would fail due to unmatched braces in the macro expansion. The new
structure ensures the macro expands correctly in all cases.

Fixes: 007276b ("* cygwin.din: Add _pthread_cleanup_push and _pthread_cleanup_pop.")
Signed-off-by: Shaobo Song <shnusongshaobo@gmail.com>
Change the first parameter of pthread_sigqueue() to be a thread id rather
than a thread pointer. The change is to match the Linux implementation of
this function.

The user-visible function prototype is changed in include/pthread.h.
The pthread_sigqueue() function is modified to work with a passed-in thread
id rather than an indirect thread pointer as before.  (It used to be
"pthread_t *thread", i.e., class pthread **.)  The release note for Cygwin
3.5.5 is updated.

Reported-by: Christian Franke <Christian.Franke@t-online.de>
Addresses: https://cygwin.com/pipermail/cygwin/2024-September/256439.html
Signed-off-by: Mark Geisert <mark@maxrnd.com>
Fixes: 50350ca ("* cygwin.din (pthread_sigqueue): Export.")
(cherry picked from commit 1e8c92e21d386d2e4a29fa92e8258979ff19ae6b)
The pointer pfni gets allocated the buffer at the begin,
and is used in the NtQueryDirectoryFile call before the loops.
In the loop the pointer pfni is also used as iterator.
Therefore it holds no longer the initial buffer at the call
to NtQueryDirectoryFile in the while conditition at the bottom.

Fixes: 28fa2a7 ("* syscalls.cc (check_dir_not_empty): Check surplus directory entries")
Co-authored-by: Corinna Vinschen <corinna@vinschen.de>
Signed-off-by: Bernhard Übelacker <bernhardu@mailbox.org>
(cherry picked from commit dbb8069df56cb68ea1167b3bc0ceb66fa6c35d3f)
The commit ae181b0ff122 has a bug that the pointer is referred bofore
NULL check in the function lf_clearlock(). This patch fixes that.

Addresses: https://cygwin.com/pipermail/cygwin/2024-November/256750.html
Fixes: ae181b0ff122 ("Cygwin: lockf: Make lockf() return ENOLCK when too many locks")
Reported-by: Sebastian Feld <sebastian.n.feld@gmail.com>
Reviewed-by: Corinna Vinschen <corinna@vinschen.de>
Signed-off-by: Takashi Yano <takashi.yano@nifty.ne.jp>
(cherry picked from commit e7ef920d7d0dcff8cfe7a0c914f803b8c78900bb)
Currently, create_lock_obj() can create multiple locks with the same
lock range that have different version number. However, lf_setlock()
and lf_clearlock() cannot handle this case appropriately. With this
patch, make lf_setlock() and lf_clearlock() find overlap again even
when ovcase = 1 (lock and overlap have the same lock range).

Addresses: https://cygwin.com/pipermail/cygwin/2024-November/256750.html
Fixes: 2e560a0 ("* flock.cc (LOCK_OBJ_NAME_LEN): Change to accommodate extra lf_ver field.")
Reported-by: Sebastian Feld <sebastian.n.feld@gmail.com>
Reviewed-by: Corinna Vinschen <corinna@vinschen.de>
Signed-off-by: Takashi Yano <takashi.yano@nifty.ne.jp>
(cherry picked from commit 8dee07a1f12682307be07e12a7fd8d5c8ecc1e2b)
The following debug message occassionally shows up in strace output:

  SetThreadName: SetThreadDescription() failed. 00000000 10000000

The HRESULT of 0x10000000 is not an error, rather the set bit just
indicates that this HRESULT has been created from an NTSTATUS value.

Use the IS_ERROR() macro instead of just checking for S_OK.

Fixes: d4689b9 ("Cygwin: Set threadnames with SetThreadDescription()")
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
(cherry picked from commit 21a2c9db6952954608cdf92638b411b15e7606c6)
Previously, two bugs exist in sigtimedwait(). One is, that since
_my_tls.sigwait_mask was left non-zero if the signal arrives after
the timeout, sigpacket::process() would wrongly try to handle it.
The other is if a timeout occurs after sigpacket::process() is
called, but not completed yet, the signal handler can be called
accidentally. If the signal handler is set to SIG_DFL or SIG_IGN,
access violation will occur in both cases.

With this patch, in sigwait_common(), check if sigwait_mask == 0
to confirm that sigpacket::process() cleared it. In this case,
do not treat WAIT_TIMEOUT, but call cygwait() again to retrieve
the signal. Furthermore, sigpacket::process() checks whether
timeout occurs in sigwait_common() and if timeout already happens,
do not treat the signal as waited. In both cases, to avoid race
issues, the code is guarded by cygtls::lock().

Addresses: https://cygwin.com/pipermail/cygwin/2024-November/256762.html
Fixes: 24ff42d ("Cygwin: Implement sigtimedwait")
Reported-by: Christian Franke <Christian.Franke@t-online.de>
Reviewed-by: Corinna Vinschen <corinna@vinschen.de>
Signed-off-by: Takashi Yano <takashi.yano@nifty.ne.jp>
(cherry picked from commit 26144e4008cd8f7288f3387eea697bba4006e16f)
Fixes: 6b2a2aa ("Add missing files.")
Signed-off-by: Christian Franke <christian.franke@t-online.de>
Behave like sched_setparam() if the requested policy is identical
to the fixed value (SCHED_FIFO) returned by sched_getscheduler().

Fixes: 9a08b2c ("* sched.cc: New file.  Implement sched*.")
Signed-off-by: Christian Franke <christian.franke@t-online.de>
(cherry picked from commit 522f3e921aa034c6a146021a918ce8b479a9111e)
Currently, the buffer of 128KB is passed to GetConsoleProcessList().
This causes page fault in the select() loop for console due to:
microsoft/terminal#18264
because the previous code calls GetConsoleProcessList() with large
buffer and PeekConsoleInput() with small buffer alternately.
With this patch, the minimum buffer size is used that is determined
by GetConsoleProcessList() with small buffer passed.

Addresses: https://cygwin.com/pipermail/cygwin/2024-December/256841.html
Fixes: 7277014 ("Cygwin: pty: Prevent pty from changing code page of parent console.")
Reported-by: Steven Buehler <buehlersj@outlook.com>
Signed-off-by: Takashi Yano <takashi.yano@nifty.ne.jp>
(cherry picked from commit 1a49c17840660bc0e493b7db2d9ce5289906049d)
Blackhex and others added 21 commits December 4, 2024 19:08
(cherry picked from commit d636b11d2e167cbcc310000ac1a977219b819fee)
Signed-off-by: Radek Bartoň <radek.barton@microsoft.com>
(cherry picked from commit c79954439e6d91dc6f421a761953f0d935c5f3d5)
Since commit 314c2d2 ("* syscalls.cc (try_to_bin): Handle remote
shares as well.") try_to_bin() transposes the .cyg prefix for temporary
files to invalid low surrogate halfs on filesystems setting the
FILE_UNICODE_ON_DISK flag.

This works on NTFS, but not necessarily on other filesystems, which often
require all chars in a filename to be valid Unicode chars.  Fix this by
transposing into the private use area instead.

Fixes: 314c2d2 ("* syscalls.cc (try_to_bin): Handle remote shares as well.")
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
(cherry picked from commit 0924d5f1078b368b6e290d11c9f3d57bc7767576)
That according to C99/POSIX, nextafter(x,y) should return y if x==y.

[1] NetBSD fix for this: IIJ-NetBSD/netbsd-src@3bc6852
[2] glibc fix for this: bminor/glibc@bc9f600#diff-bcc0628a39c3c2003047dcb5a40a8b50c00f01a74b1c8c1100d770a8e48b1ce2
[3] Linux man page: https://man7.org/linux/man-pages/man3/nextafter.3.html

(cherry picked from commit 7ccfe49e8ce908c581f82c8dbf0f9064df488a24)
Previously, a deadlock happened if many SIGSTOP/SIGCONT signals were
received rapidly. If the main thread sends __SIGFLUSH at the timing
when SIGSTOP is handled by the sig thread, but not is handled by the
main thread yet (sig_handle_tty_stop() not called yet), and if SIGCONT
is received, the sig thread waits for cygtls::current_sig (is SIGSTOP
now) cleared. However, the main thread waits for the pack.wakeup using
WaitForSingleObject(), so the main thread cannot handle SIGSTOP. This
is the mechanism of the deadlock. This patch uses cygwait() instead of
WaitForSingleObject() to be able to handle the pending SIGSTOP.

Addresses: https://cygwin.com/pipermail/cygwin/2024-November/256744.html
Fixes: 7759daa ("(sig_send): Fill out sigpacket structure to send to signal thread rather than racily sending separate packets.")
Reported-by: Christian Franke <Christian.Franke@t-online.de>
Reviewed-by: Corinna Vinschen <corinna@vinschen.de>
Signed-off-by: Takashi Yano <takashi.yano@nifty.ne.jp>
(cherry picked from commit d243e51ef1d30312ba1e21b4d25a1ca9a8dc1f63)
With the previous code, the queued signal is tried to resend only when
a new signal arrives or pending_signals::pending() is called.
With this patch, if the signal is queued and the retry flag is not set
and the new signal is not received yet, the sig thread tries to handle
the queued signal again. Without this patch, the chance to handle the
queue would be delayed.

Addresses: https://cygwin.com/pipermail/cygwin/2024-November/256744.html
Fixes: 5e31c80 ("(pending_signals::pending): Force an additional loop through wait_sig by setting retry whenever this function is called.")
Reported-by: Christian Franke <Christian.Franke@t-online.de>
Reviewed-by: Corinna Vinschen <corinna@vinschen.de>
Signed-off-by: Takashi Yano <takashi.yano@nifty.ne.jp>
(cherry picked from commit e10f822a2b39bade20858ba3c704a8bb7d965cb4)
Previously, the retry flag was always set when pending_signal::pending()
was called. However, if the queue is empty sig thread tries to flush
the queue even though it is not necessary. With this patch, the retry
flag is set only if the queue is not empty.

Addresses: https://cygwin.com/pipermail/cygwin/2024-November/256744.html
Fixes: 5e31c80 ("(pending_signals::pending): Force an additional loop through wait_sig by setting retry whenever this function is called.")
Reported-by: Christian Franke <Christian.Franke@t-online.de>
Reviewed-by: Corinna Vinschen <corinna@vinschen.de>
Signed-off-by: Takashi Yano <takashi.yano@nifty.ne.jp>
(cherry picked from commit 57ce5f1e0bf4a6fef7173b2549edc4f2090dd0e7)
In _cygtls::handle_SIGCONT(), the sig thread waits for the main thread
to process the signal without unlocking the TLS area. This causes a
deadlock if the main thread tries to acquire a lock for the TLS area
in the meantime. With this patch, unlock the TLS before calling yield()
in handle_SIGCONT().

Addresses: https://cygwin.com/pipermail/cygwin/2024-November/256744.html
Fixes: 26158dc("* exceptions.cc (sigpacket::process): Lock _cygtls area of thread before accessing it.")
Reported-by: Christian Franke <Christian.Franke@t-online.de>
Reviewed-by: Corinna Vinschen <corinna@vinschen.de>
Signed-off-by: Takashi Yano <takashi.yano@nifty.ne.jp>
(cherry picked from commit 9ae51bcc51a7901559c476e6301597760c2726fd)
Various forms of describing what we do with the stacklock are
used. Try to be consistent.

Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
(cherry picked from commit 63804a28b330ccda7e1ef1c04d2d1c5e22a8e40d)
This patch calls Sleep(0) in the wait loop in lock() to increase the
chance of being unlocked in other threads. The lock(), unlock() and
locked() are moved from sigfe.s to cygtls.h so that allows inline
expansion.

Addresses: https://cygwin.com/pipermail/cygwin/2024-November/256744.html
Fixes: 6152219 ("* Merge in cygwin-64bit-branch.")
Reported-by: Christian Franke <Christian.Franke@t-online.de>
Reviewed-by: Corinna Vinschen <corinna@vinschen.de>
Signed-off-by: Takashi Yano <takashi.yano@nifty.ne.jp>
(cherry picked from commit 9b7a84d24aa17bdd727eb402cc544fa36f4d812e)
The main thread waits for the sig thread to read the signal pipe by
calling Sleep(10) if writing to the signal pipe has failed. However,
if the signal thread waiting for another signal being handled in the
main thread, the sig thread does not read the signal pipe. To avoid
such a situation, this patch replaces Sleep(10) to cygwait().

Addresses: https://cygwin.com/pipermail/cygwin/2024-November/256744.html
Fixes: 6f05b32 ("(sig_send): Retry WriteFiles which fail when there is no error but packbytes have not been sent.")
Reported-by: Christian Franke <Christian.Franke@t-online.de>
Reviewed-by: Corinna Vinschen <corinna@vinschen.de>
Signed-off-by: Takashi Yano <takashi.yano@nifty.ne.jp>
(cherry picked from commit 2544e753963e9c15a9e3fe35188d8dea7b39d748)
Previously, the sig thread ran in THREAD_PRIORITY_HIGHEST priority.
This causes a critical delay in the signal handling in the main
thread if too many signals are received rapidly and the CPU is very
busy. In this case, most of the CPU time is allocated to the sig
thread, so the main thread cannot have a chance of handling signals.
With this patch, to avoid such a situation, the priority of the sig
thread is set to THREAD_PRIORITY_NORMAL priority.

Addresses: https://cygwin.com/pipermail/cygwin/2024-November/256744.html
Fixes: 53ad6f1 ("(cygthread::cygthread): Use three only arguments for detached threads, and start the thread via QueueUserAPC/async_create.")
Reported-by: Christian Franke <Christian.Franke@t-online.de>
Reviewed-by: Corinna Vinschen <corinna@vinschen.de>
Signed-off-by: Takashi Yano <takashi.yano@nifty.ne.jp>
(cherry picked from commit 9a274a967d1fe9a28d0d658749b08e65d09d3ee0)
If the process() fails and the signal remains in the queue, the most
possible reason is that the target thread is already armed by another
signal and does not handle it yet. With this patch, to increase the
chance of handling it in the other threads, call yield() before
retrying process().

Addresses: https://cygwin.com/pipermail/cygwin/2024-November/256744.html
Fixes: e10f822a2b39 ("Cygwin: signal: Handle queued signal without explicit __SIGFLUSH")
Reported-by: Christian Franke <Christian.Franke@t-online.de>
Reviewed-by:
Signed-off-by: Takashi Yano <takashi.yano@nifty.ne.jp>
(cherry picked from commit c48d58d838d90df6247123c487686699742d75de)
Commit 0b6fbd3 ("* exceptions.cc (_cygtls::interrupt_now): Revert
to checking for "spinning" when choosing to defer signal.") introduced
a bug in the loop inside the stabilize_sig_stack subroutine:

First, stabilize_sig_stack grabs the stacklock. The _cygtls::incyg
flag is then incremented before checking if a signal has to be handled
for the current thread.

If no signal waits, the code simply jumps out, decrements _cygtls::incyg
and returns to the caller, which eventually releases the stacklock.

However, if a signal is waiting, stabilize_sig_stack releases the
stacklock, calls _cygtls::call_signal_handler(), and returns to
the start of the subroutine, trying to grab the lock.

After grabbing the lock, it increments _cygtls::incyg... wait...
again?

The loop does not decrement _cygtls::incyg after
_cygtls::call_signal_handler(), which returns with _cygtls::incyg
set to 1.  So it increments incyg to 2.  If no other signal is
waiting, stabilize_sig_stack jumps out and decrements _cygtls::incyg
to 1.  Eventually, setjmp or longjmp both will return to user
code with _cygtls::incyg set to 1.  This *may* be fixed at some later
point when signals arrive, but there will be a time when the application
runs in user code with broken signal handling.

Fixes: 0b6fbd3 ("* exceptions.cc (_cygtls::interrupt_now): Revert to checking for "spinning" when choosing to defer signal.")
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
(cherry picked from commit 41e1013e6846f1774dfec085ff983990f67e6437)
Given that spinning is only checked once at the start of a
_cygtls::interrupt_now() which is called in a loop, it's probably
not necessary to mark _cygtls::spinning as volatile.

However, spinning is changed from assembler code and we don't
want the compiler to make funny assumptions, so, better safe than
sorry.

Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
(cherry picked from commit 3a9fb7c561d9e8a78e444dd3920c57e857a92baa)
Per the comment in _cygtls::interrupt_now(), the spinning flag
is supposed to guard that the targeted thread is about to enter
the Cygwin DLL.  Setting spinning has then been added to _sigfe,
_sigbe, sigdelayed and stabilize_sig_stack, the latter being called
from setjmp/longjmp.

However, setjmp/longjmp only enter the DLL in case of a pending
signal, calling _cygtls::call_signal_handler(). This in turn is
already guarded by setting the incyg flag, and there's no other
action in stabilize_sig_stack which might interfere with the
signal setup.  All the rest of setjmp/longjmp is plain userspace.

Therefore, drop setting the spinning flag from stabilize_sig_stack,
because it results in dropped signals in tight longjmp loops.

Fixes: edc4f86 ("* Makefile.in (clean): Remove sigfe.s.")
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
(cherry picked from commit 96d856320a1d740546eaf8a6c0ddb3d489e10492)
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
(cherry picked from commit f71550e0dbef51e0a4a706edae49c33c5372db19)
The queue is cleaned up by removing the entries having si_signo == 0
while processing the queued signals, however, sigpacket::process() may
set si_signo in the queue to 0 of the entry already processed but not
succeed by calling sig_clear(). This patch ensures the sig_clear()
to remove the entry from the queue chain. For this purpose, the pointer
prev has been added to the sigpacket. This is to handle the following
case appropriately.

Consider the queued signal chain of:
A->B->C->D
without pointer prev. Assume that the pointer 'q' and 'qnext' point to
C, and process() is processing C. If B is cleared in process(), A->next
should be set to to C in sigpacket::clear().

Then, if process() for C succeeds, C should be removed from the queue,
so A->next should be set to D. However, we cannot do that because we do
not have the pointer to A in the while loop in wait_sig().

With the pointer prev, we can easily access A and C in sigpacket::clear()
as well as A and D in the while loop in wait_sig() using the pointer prev
and next without pursuing the chain.

Addresses: https://cygwin.com/pipermail/cygwin/2024-November/256744.html
Fixes: 9d21550 ("(wait_sig): Define variable q to be the start of the signal queue.  Just iterate through sigq queue, deleting processed or zeroed signals")
Reported-by: Christian Franke <Christian.Franke@t-online.de>
Reviewed-by: Corinna Vinschen <corinna@vinschen.de>
Signed-off-by: Takashi Yano <takashi.yano@nifty.ne.jp>
(cherry picked from commit d565aca46f06117ef16ec37c51767a5e140ee9e2)
Currently, the signal queue is touched by the thread sig as well as
other threads that call sigaction_worker(). This potentially has
a possibility to destroy the signal queue chain. A possible worst
result may be a self-loop chain which causes infinite loop. With
this patch, lock()/unlock() are introduce to avoid such a situation.

Fixes: 474048c ("* sigproc.cc (pending_signals::add): Just index directly into signal array rather than treating the array as a heap.")
Suggested-by: Corinna Vinschen <corinna@vinschen.de>
Reviewed-by: Corinna Vinschen <corinna@vinschen.de>
Signed-off-by: Takashi Yano <takashi.yano@nifty.ne.jp>
(cherry picked from commit 496fa7b2ce0052550eab8900723ebb59c33d25e7)
(cherry picked from commit cfadd852aa5787eb29403efc75bfa637028bfd40)
This is more like an experiment, to see what v3.5.5 will bring when it
lands. And hoping that some of the gifts might have the result of fixing
those hangs/deadlocks I frequently experience when trying to Ctrl+C
something in a `tmux` session.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
@dscho dscho self-assigned this Dec 7, 2024
@dscho
Copy link
Member Author

dscho commented Dec 9, 2024

Seems as if at least some deadlocks/races are gone with this version; At least in my test, git show (which exercises a spawned non-MSYS process that spawns an MSYS process) in a tmux session no longer ends up with garbled text, and running vim.exe in the same tmux session also does not require a force-redraw to show indented long lines correctly.

tl;dr I am really looking forward to v3.5.5.

@dscho
Copy link
Member Author

dscho commented Dec 12, 2024

git show (which exercises a spawned non-MSYS process that spawns an MSYS process) in a tmux session no longer ends up with garbled text

Never mind, this is still an issue. Example: showing git-for-windows/build-extra@a190017 (which is short enough that less -X quits immediately after showing the content because it is less than a page long, but it also has a line that is longer than the width of the window) results in this:

image

This shows only the latter part of the added (long) line. Note that this problem only occurs when the text needs to scroll up, the diff is shown correctly after Ctrl+L to clear the screen (i.e. when the text does not need to be scrolled up):

image

This must be caused by something in Cygwin v3.5; I cannot remember seeing the same issue in v3.4 (but then, I also switched my setup from disable_pcon to enable_pcon with that version upgrade, so maybe it was always broken with Cygwin's pseudo console support).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants