Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

c++ code frequency dlclose/dlopen *.so compiled by rust cause crash #134820

Open
Rust401 opened this issue Dec 27, 2024 · 1 comment
Open

c++ code frequency dlclose/dlopen *.so compiled by rust cause crash #134820

Rust401 opened this issue Dec 27, 2024 · 1 comment
Labels
A-dynamic-library Area: Dynamic/Shared Libraries A-linkage Area: linking into static, shared libraries and binaries C-discussion Category: Discussion or questions that doesn't represent real issues. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@Rust401
Copy link

Rust401 commented Dec 27, 2024

reproduction code upload to this repo

Scenario:

  1. Use rust to compile a staticlib with cxx build, target is aarch64-linux-android
  2. Integrate the staticlib(*.a) to a c++ compiled .so
  3. Use dlopen/dlclose to use the symbol in this .so
  4. Run the binary on android(which use bionic libc)

then we will find the segment fault

Hello from Rust!
dude loop 125
Hello from C++!
Hello from Rust!
dude loop 126
Hello from C++!
Hello from Rust!
dude loop 127
Hello from C++!
Segmentation fault

info from logcat

Cmdline: ./test_dlopen 128
pid: 14405, tid: 14405, name: test_dlopen  >>> ./test_dlopen <<<
uid: 0
tagged_addr_ctrl: 0000000000000001
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x7fc28c7ff8
Cause: stack pointer is in a non-existent map; likely due to stack overflow.
#00 pc 00000000000c4cdc  /data/local/libdude.so (std::sys_common::thread_local_key::StaticKey::lazy_init::h713657dd8d2d4621+36)
#01 pc 000000000009216c  /data/local/libdude.so (core::ops::function::FnOnce::call_once::h726b8069cd1e002c+80)
#02 pc 00000000000b7868  /data/local/libdude.so (std::panicking::rust_panic_with_hook::h2748add3cd52cde1+84)
#03 pc 00000000000b77dc  /data/local/libdude.so (std::panicking::begin_panic_handler::_$u7b$$u7b$closure$u7d$$u7d$::hf339e6c238ee80b2+144)
#04 pc 00000000000b51c4  /data/local/libdude.so (std::sys_common::backtrace::__rust_end_short_backtrace::hf829d410f7587982+8)
#05 pc 00000000000b7550  /data/local/libdude.so (rust_begin_unwind+48)
#06 pc 00000000000d94d4  /data/local/libdude.so (core::panicking::panic_fmt::h955ec3f09bb74715+40)
#07 pc 00000000000d992c  /data/local/libdude.so (core::panicking::assert_failed_inner::hd795eb67b74b452d+276)
#08 pc 0000000000094cb4  /data/local/libdude.so (core::panicking::assert_failed::h2f68f007dd54e097+44)
#09 pc 00000000000c4d74  /data/local/libdude.so (std::sys_common::thread_local_key::StaticKey::lazy_init::h713657dd8d2d4621+188)
#10 pc 000000000009216c  /data/local/libdude.so (core::ops::function::FnOnce::call_once::h726b8069cd1e002c+80)
#11 pc 00000000000b7868  /data/local/libdude.so (std::panicking::rust_panic_with_hook::h2748add3cd52cde1+84)
#12 pc 00000000000b77dc  /data/local/libdude.so (std::panicking::begin_panic_handler::_$u7b$$u7b$closure$u7d$$u7d$::hf339e6c238ee80b2+144)

Based on my analysis, this caused by pthread_key_create(which was never deleted) from emutls.c
The process is as follows

__emutls_get_address =>
    emutls_get_index =>
        pthread_once(&once, emutls_init) =>
            emutls_init =>
                abort =>

Simply put:
1. After dlopen, a threadlocal variable generated when rust func was first called use pthread_key_create, but the matching pthread_key_delete was not called when dlclose.
2. Each dlopen -> call -> dlclose loop will occupy a key_map util the BIONIC_PTHREAD_KEY_COUNT was arrived.
3. Then the abort happens.

But the same code ran happliy on an x86 Linux machine.
We hack libc(both bionic for android and glibc2.35 for my ubuntu 22.04).
We found that android(which use bionic on arm64 cpu) will generate a thread_local variable when rust func was first called after dlopen use pthread_key_create(code in bionic libc).
But linux(which user glibc on x86 cpu) not call pthread_key_create(Maybe glibc use another mechanism to use manager threadlocal variable)

So, my final question is:

  1. Is my scenario, which dlopen a wrapped rust cdylib, use the function and then dlclose, for n loop, reasonable?
  2. Is there anyway to release the threadlocal variable generate by rust when dlclose?
@Rust401 Rust401 added the C-bug Category: This is a bug. label Dec 27, 2024
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Dec 27, 2024
@Noratrieb
Copy link
Member

The standard libary's thread-locals do not support being dlclose'd. You should never use dlclose in general, but especially not when thread-local variables are used.

On a different note, you say you're running your binary on Android. Did you use the proper Android targets for that?

@Noratrieb Noratrieb added A-FFI Area: Foreign function interface (FFI) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. C-discussion Category: Discussion or questions that doesn't represent real issues. A-linkage Area: linking into static, shared libraries and binaries A-dynamic-library Area: Dynamic/Shared Libraries and removed C-bug Category: This is a bug. needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. A-FFI Area: Foreign function interface (FFI) labels Dec 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-dynamic-library Area: Dynamic/Shared Libraries A-linkage Area: linking into static, shared libraries and binaries C-discussion Category: Discussion or questions that doesn't represent real issues. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

3 participants