Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error: "cysignals must be compiled without _FORTIFY_SOURCE" #194

Open
orlitzky opened this issue Dec 1, 2023 · 26 comments
Open

error: "cysignals must be compiled without _FORTIFY_SOURCE" #194

orlitzky opened this issue Dec 1, 2023 · 26 comments

Comments

@orlitzky
Copy link

orlitzky commented Dec 1, 2023

#73 is back for the third time. When building with clang on Gentoo,

$ CC=clang python setup.py build
running build
running build_py
...
In file included from build/src/cysignals/signals.c:2365:
build/src/cysignals/implementation.c:27:2: error: "cysignals must be compiled without _FORTIFY_SOURCE"
#error "cysignals must be compiled without _FORTIFY_SOURCE"

The constant __USE_FORTIFY_LEVEL=3 is built in to the compiler, so the two different hacks that we have to work around this are ineffective (and yes, the result still crashes if I remove the guard). I have yet to find a third workaround.

This is also Gentoo bug 918934.

@user202729
Copy link
Contributor

For what it's worth I use the workaround explained in #80 and it works.

$ CC=clang CFLAGS="-Wp,-U_FORTIFY_SOURCE" pip install cysignals

Not sure how to specify compiler flags in pyproject.toml or meson.build etc.

@orlitzky
Copy link
Author

Does it fail without those CFLAGS? You would need a clang where the source fortification constant is built in to the compiler to reproduce it. (You also need glibc, in case you are on Void, Alpine, Gentoo, or some other distro that offers musl.)

longjmp seems to be causing us a lot of problems:

@orlitzky
Copy link
Author

With newer compilers 🤦:

 * QA Notice: Package triggers severe warnings which indicate that it
 *            may exhibit random runtime failures.
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:19608:22: warning: variable '_save' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:19267:13: warning: variable '__pyx_v_atexit' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:19272:13: warning: variable '__pyx_t_1' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:19273:13: warning: variable '__pyx_t_2' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:19033:22: warning: variable '_save' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:18198:13: warning: variable '__pyx_t_1' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:18199:13: warning: variable '__pyx_t_2' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:18200:13: warning: variable '__pyx_t_3' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:24449:23: warning: variable 'exc_info' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:17813:13: warning: variable '__pyx_r' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:17815:13: warning: variable '__pyx_t_1' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:17816:13: warning: variable '__pyx_t_2' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:24449:23: warning: variable 'exc_info' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:17348:13: warning: variable '__pyx_t_1' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:17349:13: warning: variable '__pyx_t_2' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:17350:13: warning: variable '__pyx_t_3' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:24449:23: warning: variable 'exc_info' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:16134:66: warning: variable '__pyx_v_x' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:16015:13: warning: variable '__pyx_t_2' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:16017:13: warning: variable '__pyx_t_4' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:14482:8: warning: variable '__pyx_v_i' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:14488:8: warning: variable '__pyx_t_3' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:24449:23: warning: variable 'exc_info' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:13691:22: warning: variable '_save' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:13193:22: warning: variable '_save' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:12783:22: warning: variable '_save' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:12659:22: warning: variable '_save' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:12407:22: warning: variable '_save' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:12155:22: warning: variable '_save' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:11903:22: warning: variable '_save' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:11651:22: warning: variable '_save' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:11399:22: warning: variable '_save' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:10822:13: warning: variable '__pyx_t_2' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:10092:22: warning: variable '_save' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:9788:22: warning: variable '_save' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:9642:22: warning: variable '_save' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:8278:22: warning: variable '_save' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:8026:22: warning: variable '_save' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:7783:22: warning: variable '_save' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:16943:13: warning: variable '__pyx_t_1' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:16944:13: warning: variable '__pyx_t_2' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:16945:13: warning: variable '__pyx_t_3' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:24449:23: warning: variable 'exc_info' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:14189:13: warning: variable '__pyx_t_1' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:14190:13: warning: variable '__pyx_t_2' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:14191:13: warning: variable '__pyx_t_3' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:24449:23: warning: variable 'exc_info' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:15659:13: warning: variable '__pyx_t_1' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:15660:13: warning: variable '__pyx_t_2' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:15675:13: warning: variable '__pyx_t_17' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:15676:13: warning: variable '__pyx_t_18' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:15677:13: warning: variable '__pyx_t_19' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:24449:23: warning: variable 'exc_info' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
 * src/cysignals/tests.cpython-312-riscv64-linux-musl.so.p/src/cysignals/tests.pyx.c:10445:13: warning: variable '__pyx_r' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]

@dimpase
Copy link
Member

dimpase commented Dec 21, 2024

are these compiler warnings meaningful? I see gcc having a long-standing issue with them
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=21161

@dimpase
Copy link
Member

dimpase commented Dec 21, 2024

At least a part of these random bugs in libgap interface are due to missing volatile declarations. Newer compilers optimise more of function calls, passing arguments in registers, which get clobbered on longjumps. But they may hold a reference to a freshly created GAP object, oops...

@orlitzky
Copy link
Author

I think they are meaningful in this case and illustrate nicely the fundamental problem. All cython functions that eventually call sig_on() and sig_off() get some cython boilerplate like,

static PyObject *blahblahblah(...) // the C implementation of test_signal_during_malloc()
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  PyObject *__pyx_t_1 = NULL;
  PyObject *__pyx_t_2 = NULL;
  PyObject *__pyx_t_3 = NULL;
  int __pyx_t_4;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  ...

before they eventually get to the sig_on(). And this is for a function that declares no local variables! How will we ever declare those volatile? There's no special data flow that invalidates the warning. I think those variables can really be clobbered, and the warning is emphasizing how hard it will be to properly use setjmp/longjmp from cython.

@user202729
Copy link
Contributor

Actually it is not necessary to declare everything volatile, only the local variables that are changed between setjmp and longjmp need to be declared volatile.

I don't know what these temporary _t_ variables are used for (probably for temporary variables), but worst case their values are indeterminate.

But, yes, ideally the whole toolchain (cython, gcc) would be setjmp/longjmp-aware.

@dimpase
Copy link
Member

dimpase commented Dec 22, 2024

Actually it is not necessary to declare everything volatile, only the local variables that are changed between setjmp and longjmp need to be declared volatile.

I don't know what these temporary _t_ variables are used for (probably for temporary variables), but worst case their values are indeterminate.

But, yes, ideally the whole toolchain (cython, gcc) would be setjmp/longjmp-aware.

the point is that with this problem, things like
libgap.Foo(libgap.Baz()) are not safe, as Foo() may throw a GAP error, and this will clobber the pointer to the result of Baz() kept in a register.

@user202729
Copy link
Contributor

But if you call libgap.Foo(libgap.Baz()) and Foo throws an error then the result of Baz is discarded anyway?

Are there guards to delete them afterwards somewhere?

@orlitzky
Copy link
Author

But if you call libgap.Foo(libgap.Baz()) and Foo throws an error then the result of Baz is discarded anyway?

It isn't that we need the value of Baz(), but that if you clobber the pointer to it with random data, you'll get a segfault when you try to free() it. (The fact that a pointer is used here should be an internal cython detail.) In a more complicated example you might actually try to use the return value, which could dereference the nonsense pointer. In python you should be able to catch the error and respond intelligently with something like,

x = libgap.Baz()
try:
    y = libgap.Foo(x)
except:
    # x might be garbage!
    print("Foo error with x=", x)

There are actually two closely-related problems here, since libgap uses setjmp/longjmp internally too. In either case, the situation really sucks for the caller because he has to know where and how setjmp/longjmp are used in these third-party libraries in order to use them safely. And even beyond that, he has to know what C code cython is going to generate, to make sure that its internal details can't be clobbered by longjmp, too.

@dimpase
Copy link
Member

dimpase commented Dec 23, 2024

these (needing volatile) only started manifest themselves as segfaults (rather than just memory leaks - or perhaps these weren't even memory leaks, and were garbage-collectable by GAP? Or perhaps this still was leaving refcounted "handles" meant to be pointing to GAP objects created in libgap Cython interface behind) since Python 3.12, which changed the codepath for Pythnon error handling.

@culler
Copy link

culler commented Dec 23, 2024

This discussion does not match my understanding of how sig_on and sig_off are used by sage.

The primary use case is making it possible for Sage to exit cleanly after a segfault occurs in an external library function.

In particular, Sage's longjmp call should never occur in any sort of normal error processing or exception handling by, say, libgap. The call to longjmp and subsequent variable clobbering should only happen if there is a signal which is not handled by any code in the external library (usually SIGSEGV).

Is there an example code snippet that is known to generate the segfault with libgap that is being discussed here? I suspect that real problem is a seqfault in libgap, not some design flaw in sig_on or sig_off. Of course compiler errors would be a different story.

@dimpase
Copy link
Member

dimpase commented Dec 23, 2024

GAP uses longjmp to handle its errors.
Handling nested setjmp/longjmp pairs is fun.

See e.g.
sagemath/sage#37026
and
sagemath/sage#37951

@culler
Copy link

culler commented Dec 23, 2024

I am still not seeing any evidence that sig_on and sig_off are involved in these crashes. A setjmp with no corresponding longjmp should be harmless. And Sage only calls longjmp from signal handlers installed by sig_on. So Sage's longjmp should only be able to cause trouble in the case when a signal is being generated for which sig_on had installed a handler. If GAP is calling longjmp from its own signal handler then it would have had to install that handler on top of Sage's, and Sage's handler would not be invoked. If GAP is calling longjmp in some other way then Sage's longjmp should not get called. The crash suggests to me that there is a SIGSEGV signal being generated by the GAP code. The fact that this SIGSEGV is being handled by the handler installed by sig_on only means that sig_on is working as it was designed to work.

@culler
Copy link

culler commented Dec 23, 2024

longjmp seems to be causing us a lot of problems:

* this

* Incompatibility with signal handling on Windows (@culler mentioned this on the mailing list last month)

While it was last month when I mentioned that longjmp cannot be called from a Windows signal handler, this is not a new phenomenon. It has been at least a decade since we first ran into this with cypari.

  • Marc

@orlitzky
Copy link
Author

orlitzky commented Dec 23, 2024

I am still not seeing any evidence that sig_on and sig_off are involved in these crashes.

They may not be responsible for some GAP crashes because libgap uses its own setjmp/longjmp. With respect to cysignals I was moreso pointing out that this sort of pattern is doomed:

  • Do arbitrary stuff
  • Call sig_on
  • Do arbitrary stuff
  • Call sig_off

because the person writing cython code needs to be aware of C implementation details not only in cysignals, but in cython itself for this to be safe.

FWIW I think the most common use of sig_on and sig_off is to catch Ctrl-C in long-running computations. If libfoo was segfaulting then obviously we could fix the segfault to avoid the longjmp from sig_on/sig_off, but there's nothing we can do to avoid the user pressing Ctrl-C to kill a computation that's taking too long.

@dimpase
Copy link
Member

dimpase commented Dec 24, 2024

By the way, our way to trigger this libgap crash is via using variadic GAP functions with wrong arguments. Perhaps the GAP's own problems with these functions and modern C standard are at play here? I don't know.

tobiasdiez added a commit to tobiasdiez/sage that referenced this issue Dec 24, 2024
@user202729
Copy link
Contributor

Not sure what this discussion about crashes/timeouts/etc. in SageMath has to do with _FORTIFY_SOURCE, but I think fundamentally longjmp from a signal handler causes undefined behavior anyway.

In particular,

If a signal handler interrupts the execution of an unsafe function, and the handler terminates via a call to longjmp(3) or siglongjmp(3) and the program subsequently calls an unsafe function, then the behavior of the program is undefined.

In particular, if you longjmp from inside a free(), then call free() afterwards, the behavior is undefined. This is almost unavoidable.

In theory, the best way would be for libraries to provide the equivalent of sig_check() hooks that can cleanly interrupts whatever computation there are. In practice, modifying the libraries would be prohibitive, so the least we can do is to implement something like _clean_up_states_and_release_internal_mutex() to all libraries.

(I just get a deadlock from interrupt inside __lll_lock_wait_private (LLL = low level lock) inside free.)

@culler
Copy link

culler commented Dec 24, 2024

I agree with @orlitzky that handling Ctrl-C interrupts is another basic application of the sig_on/sig_off machinery. And we certainly don't want to respond to a Ctrl-C interrupt by creating a segfault. I also agree with @user202729 that the interrupt does not really work unless there is some way for the SIGINT handler to clean up any memory allocations that were left hanging when the external function call was interrupted. I think it does work with many PARI calls since interrupting a PARI function just leaves a mess on the stack, which is easy to clean up. I would guess that for other libraries, like libgap, interrupting a call to a library function call creates a memory leak.

@orlitzky
Copy link
Author

Not sure what this discussion about crashes/timeouts/etc. in SageMath has to do with _FORTIFY_SOURCE, but I think fundamentally longjmp from a signal handler causes undefined behavior anyway.

If you build cysignals with _FORTIFY_SOURCE, it crashes because glibc detects a problem:

>>> import cysignals
*** longjmp causes uninitialized stack frame ***: python terminated

I myself am not so certain that this is a false positive.

@user202729
Copy link
Contributor

user202729 commented Dec 25, 2024

Some preliminary investigation

  • in

    glibc source code ``` #define CHECK_INVALID_LONGJMP \ cmp %R8_LP, %RSP_LP; \ jbe .Lok; \ ... ```

    if we're removing stack frames (jump to old RSP ≤ [R8 = new RSP]), we jump to label OK.

     stack: [            |#############]
                         │    ↑
                         └────┘   this longjmp is okay
    
    
     stack: [            |#############]
                  ↑      │
                  └──────┘        this longjmp needs further checking
    
    
     normal stack: [          |##########] // alternative stack: [            |#############]
                                  ↑                                           │
                                  └───────────────────────────────────────────┘    this longjmp is probably okay
    
  • in longjmp causes uninitialized stack frame #73 jdemeyer says:

    The problem is that the "longjmp causes uninitialized stack frame" is really a false positive. I'm abusing the stack in ways beyond the imagination of glibc.

  • in implementation.c the hack is explained

     /* A trampoline to jump to after handling a signal.
      *
      * The jump to sig_on() uses cylongjmp(), which does not restore the
      * signal context. This is done for efficiency, as cysetjmp() is
      * significantly faster this way. But in order to get away from our alt
      * stack after handling a signal, we need an additional siglongjmp()
      * call to restore the signal context. This is the call from the signal
      * handler to this trampoline function.
      *
      * Setting this up requires some trickery:
      * (A) create a separate stack for this trampoline function
      * (B) start a new thread using this stack
      * (C) set a jump point on the trampoline stack using cysetjmp()
      * (D) exit the thread
      * (E) back in the main thread, jump to the point set at (C). Now we are
      *     on the trampoline stack
      * (F) set a jump point with savesigs=1. This is where we will jump to
      *     after handling a signal
      * (G) jump back to the main program
      *
      * NOTE: it may look strange to use threads for this, but there are not
      * a lot of good ways to get code running on an arbitrary stack. In
      * fact, POSIX recommends threads in
      * http://pubs.opengroup.org/onlinepubs/009695299/functions/makecontext.html
      */
    
  • man sigsetjmp:

     If,  and  only  if,  the savesigs argument provided to sigsetjmp() is nonzero, the process's
     current signal mask is saved in env and will be restored if a  siglongjmp()  is  later  per‐
     formed with this env.
    
  • This raises the question why can't sigprocmask simply be used if available? ( @jdemeyer ?)

    sigprocmask appears to do exactly what we want without the hack:

    The sigprocmask function is used to examine or change the calling process’s signal mask.

    — from https://www.gnu.org/software/libc/manual/html_node/Process-Signal-Mask.html

@dimpase
Copy link
Member

dimpase commented Dec 25, 2024

Jeroen Demeyer had left the project about 5 years ago. I am not sure anyone here keeps in touch with him, maybe @videlec

@user202729
Copy link
Contributor

Understandable. I guess someone can contribute a version using sigprocmask if it appears worth, although I don't think that's the cause for the crashes/deadlocks.

@jdemeyer
Copy link
Collaborator

I don't recall, but it was probably some portability issue. We needed to support different OS'es (Linux, Darwin, Solaris, ...) and getting something to work on all OS'es was much harder than simply getting it to work on Linux.

I vaguely recall that some OS would remember that the thread is running a signal handler (and treating it specially in some way). This "thread is running inside a signal handler" flag would be cleared by siglongjmp, but not by other mechanisms.

@jdemeyer
Copy link
Collaborator

jdemeyer commented Dec 25, 2024

TL;DR: there is no guarantee on all OS'es that siglongjmp is functionally equivalent to longjmp + sigprocmask.

@jdemeyer
Copy link
Collaborator

A possible alternative I've looked in to was makecontext and friends: http://pubs.opengroup.org/onlinepubs/009695299/functions/makecontext.html but that says

The obsolescent functions getcontext(), makecontext(), and swapcontext() can be replaced using POSIX threads functions.

(which isn't actually true, I wouldn't be using them to implement threading)

But I don't know how portable it would be to use those functions...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants