Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle RPATH in user program #48

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

vchuravy
Copy link

Running heaptrack --use-inject julia fails due to
the dlopen injection breaks rpath/runpath in the user binary.

I encountered this previously in openucx/ucx#4001
and I took inspiration from their solution.

With this patch I can successfully trace malloc use in the Julia runtime.

@vchuravy
Copy link
Author

With heaptrack v1.4.0 the error I get is:

heaptrack output will be written to "/home/vchuravy/heaptrack.julia.205212.zst"
/usr/lib/heaptrack/libheaptrack_inject.so
starting application, this might take some time...
fatal: error thrown and no exception handler available.
InitError(mod=:OpenBLAS_jll, error=ErrorException("could not load library "libopenblas64_.so"
libopenblas64_.so: cannot open shared object file: No such file or directory"))
ijl_errorf at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/rtutils.c:77
ijl_load_dynamic_library at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/dlload.c:369
#dlopen#3 at ./libdl.jl:117
dlopen at ./libdl.jl:116 [inlined]
dlopen at ./libdl.jl:116 [inlined]
__init__ at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/usr/share/julia/stdlib/v1.9/OpenBLAS_jll/src/OpenBLAS_jll.jl:70
jfptr___init___56870.clone_1 at /home/vchuravy/.julia/juliaup/julia-1.9.1+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/gf.c:2940
jl_apply at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/julia.h:1879 [inlined]
jl_module_run_initializer at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/toplevel.c:75
_finish_julia_init at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/init.c:855
julia_init at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/init.c:804
jl_repl_entrypoint at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/jlapi.c:711
main at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/cli/loader_exe.c:59
unknown function (ip: 0x7fb224ccd84f)
__libc_start_main at /usr/lib/libc.so.6 (unknown line)
unknown function (ip: 0x401098)
heaptrack stats:
	allocations:          	213927
	leaked allocations:   	54281
	temporary allocations:	39048
Heaptrack finished! Now run the following to investigate the data:

with this patch:

heaptrack output will be written to "/home/vchuravy/src/heaptrack/build/heaptrack.julia.205373.zst"
starting application, this might take some time...
heaptrack stats:
        allocations:            10845
        leaked allocations:     6498
        temporary allocations:  3463
Heaptrack finished! Now run the following to investigate the data:

  heaptrack --analyze "/home/vchuravy/src/heaptrack/build/heaptrack.julia.205373.zst"

heaptrack_gui detected, automatically opening the file...
Warning: Ignoring XDG_SESSION_TYPE=wayland on Gnome. Use QT_QPA_PLATFORM=wayland to run on Wayland anyway.

The other question I had was could we also track mmap/madvise and maybe even mprotect?

Copy link
Collaborator

@milianw milianw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey, this is cool but can you please document this better?

at the very least give an explanation and summary in your commit message. then ideally also add comments to the new functionality in the code.

finally, is there a way to test this? can we create a test binary that would fail in a similar way like julia when we don#t do whatever is done here?

Dl_serinfo serinfo_size;
void *module;

module = original(module_path, RTLD_LAZY);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto module = original(...);

void *module;

module = original(module_path, RTLD_LAZY);
if (module == NULL) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (!module)


int res = dlinfo(module, RTLD_DI_SERINFOSIZE, &serinfo_size);
if (res) {
dlclose(module);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output an error?

*serinfo = serinfo_size;
res = dlinfo(module, RTLD_DI_SERINFO, serinfo);
if (res) {
::free(serinfo);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output an error?

static void* hook(const char* filename, int flag) noexcept
{
auto ret = original(filename, flag);
void *ret = NULL;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please extract this code into a separate function with a descriptive name

char file_path[PATH_MAX];
for (unsigned int i = 0; i < serinfo->dls_cnt; i++) {
file_path[0] = 0;
strcat(file_path, serinfo->dls_serpath[i].dls_name);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strcat is unsafe and can overflow, please use strncat and track buffer size, or can we just use std::string instead?

@@ -176,9 +176,80 @@ struct dlopen
static constexpr auto name = "dlopen";
static constexpr auto original = &::dlopen;

static Dl_serinfo* load_serinfo(const char* module_path) noexcept
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of returning an memory allocated serinfo, can't you take that Dl_serinfo * as an out-param, and construct it on the stack of the caller and return bool instead?

if (!res) {
fallback = true;
} else {
Dl_serinfo *serinfo;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above: here just do Dl_serinfo serinfo; and then pass &serinfo to the load_serinfo call below?

if (res) {
continue;
}
::free(serinfo);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

potentially leaked, has to be freed outside - better use std::unique_ptr if the heap allocation is needed, but best use the stack

} else {
struct stat file_stat;
char file_path[PATH_MAX];
for (unsigned int i = 0; i < serinfo->dls_cnt; i++) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add documentation to explain what this loop does - it seems to stat some files and build some path, can you give some examples for how the path is build?

@vchuravy
Copy link
Author

vchuravy commented Oct 6, 2023

Sorry for the long silence, I currently don't have time to work on this and put it up in a bit of haste during a hackathon.

The general observation is that the dynamic linker resolves RPATH/RUNPATH with respect to the library calling dlopen.
By rewriting dlopen as heaptrack does (using --use-inject) the path taken for RPATH resolution is now the heaptrack binary
instead of the program under observation. This means programs can suddenly fail to find their dependencies.

Maybe I should have opened an issue first, before opening a PR ;)

@milianw
Copy link
Collaborator

milianw commented Oct 7, 2023

OK, but why does this only apply the injection mechanism and not the LD_PRELOAD one?

IIUC then a MWE would simply call dlopen("mylib.so") or similar. As long as that lib isn't in the same path as the heaptrack lib that overrides dlopen this would fail, correct?

@vchuravy
Copy link
Author

vchuravy commented Oct 7, 2023

Yes, as long as the library you are dlopening has an rpath and is linking against another library, the behavior should be triggered.

The LD_PRELOAD mechanism simply failed for me when I tried it with Julia so I hadn't dug in if a similar change might be needed there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants