Page MenuHomePhabricator

Make sure the interpreter module was loaded before making checks against it
ClosedPublic

Authored by aadsm on Feb 12 2021, 2:37 PM.

Details

Summary

This issue was introduced in https://reviews.llvm.org/D92187.
The guard I'm changing were is supposed to act when linux is loading the linker for the second time (due to differences in paths like symlinks).
This is done by checking module_sp != m_interpreter_module.lock() however this will be true when m_interpreter_module wasn't initialized, making linux unload the linker module (the most visible result here is that lldb will stop getting notified about new modules loaded by the process, because it can't set the rendezvous breakpoint again after the stepping over it once).
The m_interpreter_module is not getting initialize when it goes through this path: https://github.com/llvm/llvm-project/blob/dbfdb139f75470a9abc78e7c9faf743fdd963c2d/lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DynamicLoaderPOSIXDYLD.cpp#L332, which happens when lldb was able to read the address from the dynamic section of the executable.

What I'm not sure about though, is if when we go through this path if we still load the linker twice on linux. If that's the case then it means we need to somehow set the m_interpreter_module instead of the fix I provide here. I've only tested this on Android.

Diff Detail

Event Timeline

aadsm requested review of this revision.Feb 12 2021, 2:37 PM
aadsm created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 12 2021, 2:37 PM

I'd like @mgorny to confirm this, but I think this should be fine -- the m_rendezvous.IsValid() branch does not do any funky module loading, so funky unloading should also be unnecessary.

From the description is seems like trigerring this bug should be relatively simple. All it takes is to attach to a process (after it has finished setting up the dynamic section). And indeed, from my simple experiments, it seems that we are indeed failing to load modules in that scenarios.

=> We should have a test for this.

And thanks for catching this. :)

aadsm added a comment.Feb 16 2021, 6:46 PM

We should have a test for this.

how do you recommend doing this? I spent a couple of hours on this but got no where. From what I understood we should prefer lit tests, so I was thinking of creating a binary that dlopens a module. However, I wasn't able to create a binary that I can start and capture its pid address so that I can attach to. Here's what I've tried so far:

// RUN: cp %s %s.cpp
// RUN: %clang -g -O0 --target=x86_64-linux-gnu %s.cpp -o %s.out
// RUN: PID=$(%s.out)
// RUN: %lldb -p $PID -b -o 'target list' | FileCheck %s
// RUN: kill -9 $PID
// CHECK: foo

#include <stdio.h>
#include <unistd.h>

int main() {
    pid_t pid = fork();
    if (pid > 0) {
        // parent process, print child pid
        printf("%d", pid);
        return 0;
    } else if (pid < 0) {
        printf("Unable to fork\n");
        return -1;
    }
    // child process
    pause();
}

The lit test get stuck on // RUN: PID=$(%s.out). Not sure why, the parent process shouldn't wait on its children..

We should have a test for this.

how do you recommend doing this? I spent a couple of hours on this but got no where. From what I understood we should prefer lit tests, so I was thinking of creating a binary that dlopens a module. However, I wasn't able to create a binary that I can start and capture its pid address so that I can attach to. Here's what I've tried so far:

// RUN: cp %s %s.cpp
// RUN: %clang -g -O0 --target=x86_64-linux-gnu %s.cpp -o %s.out
// RUN: PID=$(%s.out)
// RUN: %lldb -p $PID -b -o 'target list' | FileCheck %s
// RUN: kill -9 $PID
// CHECK: foo

#include <stdio.h>
#include <unistd.h>

int main() {
    pid_t pid = fork();
    if (pid > 0) {
        // parent process, print child pid
        printf("%d", pid);
        return 0;
    } else if (pid < 0) {
        printf("Unable to fork\n");
        return -1;
    }
    // child process
    pause();
}

The lit test get stuck on // RUN: PID=$(%s.out). Not sure why, the parent process shouldn't wait on its children..

I would do an end to end test for this. We have many attach tests that should be easy to modify and pause() and then try to load a local dylib that is dlopen'ed. Unless Pavel has a differing opinion?

Yeah, I'm with Greg. Although I would recommend using lit tests in general, I don't think they're a good fit for anything that involves attaching, or other kinds of inter-process synchronization. Once you start dealing with subprocesses you're entering very messy (and unportable) waters. Just make this a dotest test. You can base this off of one of the existing attach tests there...

The lit test get stuck on // RUN: PID=$(%s.out). Not sure why, the parent process shouldn't wait on its children..

I don't think this does what you think it does. The $() doesn't give you the process id of anything -- it substitutes a string by the result of running that string as a shell command. So, the PID variable would get the (entire) stdout of %s.out. Obviously, the command has to terminate in order for it to be able to compute that...

aadsm updated this revision to Diff 324866.Feb 18 2021, 8:32 PM

Add api test

aadsm added a comment.Feb 18 2021, 8:36 PM

I don't think this does what you think it does. The $() doesn't give you the process id of anything -- it substitutes a string by the result of running that string as a shell command. So, the PID variable would get the (entire) stdout of %s.out

I'm confused here, "the PID variable would get the (entire) stdout of %s.out" is exactly what I'm expecting to happen, the stdout of the program is its pid.

I was finally able to figure out what the issue was. I thought pause() would continue once the debugger attached because it sends a signal, but that doesn't seem to be the case?

I don't think this does what you think it does. The $() doesn't give you the process id of anything -- it substitutes a string by the result of running that string as a shell command. So, the PID variable would get the (entire) stdout of %s.out

I'm confused here, "the PID variable would get the (entire) stdout of %s.out" is exactly what I'm expecting to happen, the stdout of the program is its pid.

I was finally able to figure out what the issue was. I thought pause() would continue once the debugger attached because it sends a signal, but that doesn't seem to be the case?

It does on mac , and I don't think it does on linux.

clayborg added inline comments.Feb 19 2021, 5:07 PM
lldb/test/API/functionalities/module_load_attach/TestModuleLoadAttach.py
29

Is this racy? What happens on a really slow system? Can we fail to attach? If we do attach, are we guaranteed to be at a place where we can set "flip_to_1_to_continue = 1"? The nice thing is it is a global variable that we should be able to set no matter where we stop.

33–34

Don't we need to break before the dlopen and make sure we don't have a libfeature.so in our module list, then run over the dlopen and verify we do see it afterwards? Wasn't this bug that we will see shared libraries correctly one time when we attach, but just not get any updates after this??

aadsm added inline comments.Feb 20 2021, 10:36 AM
lldb/test/API/functionalities/module_load_attach/TestModuleLoadAttach.py
29

Is this racy?

I don't think so because we already have a pid at that point in time, so we should always be able to attach.

If we do attach, are we guaranteed to be at a place where we can set "flip_to_1_to_continue = 1"?

yeah, that's exactly why I made it global. I could also wait until there's a flip_to_1_to_continue in the scope if you think it's worthwhile.

33–34

that was a completely different bug and I have a different test for that situation as well.
Something that I could test though, is that before we got an update for an unresolved breakpoint to make sure we did indeed transitioned from unresolved -> resolved.

I'll add that.

aadsm added inline comments.Feb 20 2021, 10:44 AM
lldb/test/API/functionalities/module_load_attach/TestModuleLoadAttach.py
33–34

Forget this, I shouldn't be answering comments in the morning.

that was a completely different bug and I have a different test for that situation as well.
Something that I could test though, is that before we got an update for an unresolved breakpoint to make sure we did indeed transitioned from unresolved -> resolved.

I'll add that.

aadsm updated this revision to Diff 325221.Feb 20 2021, 11:11 AM

Checks the module is not loaded right after we attach

This revision was not accepted when it landed; it landed in state Needs Review.Feb 21 2021, 9:28 AM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
aadsm added a comment.Feb 21 2021, 9:37 AM

oh no, I picked the wrong commit to land :(. I think this is fine because I already addressed the comments, but if there's still something I should work on here, I'll put another diff up.

I have some improvements to the test suite -- it would be great if you could incorporate them into the next version of the patch.

BTW, it would be nice if the revert commit message included a (brief) explanation of why the patch is being reverted.

lldb/test/API/functionalities/module_load_attach/Makefile
2

Delete, and use self.registerSharedLibrariesWithTarget in python code

5

feature sounds very specific and unusual. I guess that is inspired by whatever was the original use case that caused you to find this bug, but maybe you could pick a more generic name here: liba, or libload_after_attach, ...

lldb/test/API/functionalities/module_load_attach/TestModuleLoadAttach.py
29

I don't think so because we already have a pid at that point in time, so we should always be able to attach.

Attaching -- yes, but I think that if we attach _really_ early we may not be able to flip the variable, as the loader has not yet finished setting up the main module. It will also make the test nondeterministic, as (depending on how early we attach) we may or may not be getting notifications about the loading of dependent libraries (libc and stuff). Other attach tests use synchronization by having the inferior create a file when it's ready to be attached, and the test waits for this via lldbutil.wait_for_file_on_target. It would be good to use that here too..

32–33

With the other modifications, you should be able to drop these. The way this test is phrased, it should run everywhere, so it'd be a pity to not make use of that.

42

Use self.platformContext.shlib_prefix and .shlib_extension instead of "lib" and ".so".

lldb/test/API/functionalities/module_load_attach/main.c
1

replace with "dylib.h" and use dylib_open instead of dlopen