-
-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
error: "cysignals must be compiled without _FORTIFY_SOURCE" #194
Comments
For what it's worth I use the workaround explained in #80 and it works. $ CC=clang CFLAGS="-Wp,-U_FORTIFY_SOURCE" pip install cysignals Not sure how to specify compiler flags in pyproject.toml or meson.build etc. |
Does it fail without those longjmp seems to be causing us a lot of problems:
|
With newer compilers 🤦:
|
are these compiler warnings meaningful? I see gcc having a long-standing issue with them |
At least a part of these random bugs in libgap interface are due to missing |
I think they are meaningful in this case and illustrate nicely the fundamental problem. All cython functions that eventually call static PyObject *blahblahblah(...) // the C implementation of test_signal_during_malloc()
PyObject *__pyx_r = NULL;
__Pyx_RefNannyDeclarations
PyObject *__pyx_t_1 = NULL;
PyObject *__pyx_t_2 = NULL;
PyObject *__pyx_t_3 = NULL;
int __pyx_t_4;
int __pyx_lineno = 0;
const char *__pyx_filename = NULL;
int __pyx_clineno = 0;
... before they eventually get to the |
Actually it is not necessary to declare everything volatile, only the local variables that are changed between I don't know what these temporary But, yes, ideally the whole toolchain (cython, gcc) would be setjmp/longjmp-aware. |
the point is that with this problem, things like |
But if you call Are there guards to delete them afterwards somewhere? |
It isn't that we need the value of x = libgap.Baz()
try:
y = libgap.Foo(x)
except:
# x might be garbage!
print("Foo error with x=", x) There are actually two closely-related problems here, since libgap uses setjmp/longjmp internally too. In either case, the situation really sucks for the caller because he has to know where and how setjmp/longjmp are used in these third-party libraries in order to use them safely. And even beyond that, he has to know what C code cython is going to generate, to make sure that its internal details can't be clobbered by longjmp, too. |
these (needing volatile) only started manifest themselves as segfaults (rather than just memory leaks - or perhaps these weren't even memory leaks, and were garbage-collectable by GAP? Or perhaps this still was leaving refcounted "handles" meant to be pointing to GAP objects created in libgap Cython interface behind) since Python 3.12, which changed the codepath for Pythnon error handling. |
This discussion does not match my understanding of how sig_on and sig_off are used by sage. The primary use case is making it possible for Sage to exit cleanly after a segfault occurs in an external library function. In particular, Sage's longjmp call should never occur in any sort of normal error processing or exception handling by, say, libgap. The call to longjmp and subsequent variable clobbering should only happen if there is a signal which is not handled by any code in the external library (usually SIGSEGV). Is there an example code snippet that is known to generate the segfault with libgap that is being discussed here? I suspect that real problem is a seqfault in libgap, not some design flaw in sig_on or sig_off. Of course compiler errors would be a different story. |
GAP uses longjmp to handle its errors. See e.g. |
I am still not seeing any evidence that sig_on and sig_off are involved in these crashes. A setjmp with no corresponding longjmp should be harmless. And Sage only calls longjmp from signal handlers installed by sig_on. So Sage's longjmp should only be able to cause trouble in the case when a signal is being generated for which sig_on had installed a handler. If GAP is calling longjmp from its own signal handler then it would have had to install that handler on top of Sage's, and Sage's handler would not be invoked. If GAP is calling longjmp in some other way then Sage's longjmp should not get called. The crash suggests to me that there is a SIGSEGV signal being generated by the GAP code. The fact that this SIGSEGV is being handled by the handler installed by sig_on only means that sig_on is working as it was designed to work. |
While it was last month when I mentioned that longjmp cannot be called from a Windows signal handler, this is not a new phenomenon. It has been at least a decade since we first ran into this with cypari.
|
They may not be responsible for some GAP crashes because libgap uses its own setjmp/longjmp. With respect to cysignals I was moreso pointing out that this sort of pattern is doomed:
because the person writing cython code needs to be aware of C implementation details not only in cysignals, but in cython itself for this to be safe. FWIW I think the most common use of |
By the way, our way to trigger this libgap crash is via using variadic GAP functions with wrong arguments. Perhaps the GAP's own problems with these functions and modern C standard are at play here? I don't know. |
Not sure what this discussion about crashes/timeouts/etc. in SageMath has to do with In particular,
In particular, if you In theory, the best way would be for libraries to provide the equivalent of (I just get a deadlock from interrupt inside |
I agree with @orlitzky that handling Ctrl-C interrupts is another basic application of the sig_on/sig_off machinery. And we certainly don't want to respond to a Ctrl-C interrupt by creating a segfault. I also agree with @user202729 that the interrupt does not really work unless there is some way for the SIGINT handler to clean up any memory allocations that were left hanging when the external function call was interrupted. I think it does work with many PARI calls since interrupting a PARI function just leaves a mess on the stack, which is easy to clean up. I would guess that for other libraries, like libgap, interrupting a call to a library function call creates a memory leak. |
If you build cysignals with
I myself am not so certain that this is a false positive. |
Some preliminary investigation
|
Jeroen Demeyer had left the project about 5 years ago. I am not sure anyone here keeps in touch with him, maybe @videlec |
Understandable. I guess someone can contribute a version using |
I don't recall, but it was probably some portability issue. We needed to support different OS'es (Linux, Darwin, Solaris, ...) and getting something to work on all OS'es was much harder than simply getting it to work on Linux. I vaguely recall that some OS would remember that the thread is running a signal handler (and treating it specially in some way). This "thread is running inside a signal handler" flag would be cleared by siglongjmp, but not by other mechanisms. |
TL;DR: there is no guarantee on all OS'es that siglongjmp is functionally equivalent to longjmp + sigprocmask. |
A possible alternative I've looked in to was
(which isn't actually true, I wouldn't be using them to implement threading) But I don't know how portable it would be to use those functions... |
#73 is back for the third time. When building with clang on Gentoo,
The constant
__USE_FORTIFY_LEVEL=3
is built in to the compiler, so the two different hacks that we have to work around this are ineffective (and yes, the result still crashes if I remove the guard). I have yet to find a third workaround.This is also Gentoo bug 918934.
The text was updated successfully, but these errors were encountered: