This is an archive of the discontinued LLVM Phabricator instance.

Fix rendezvous for rebase_exec=true case
ClosedPublic

Authored by emrekultursay on Sep 14 2021, 5:05 PM.

Details

Summary

When rebase_exec=true in DidAttach(), all modules are loaded
before the rendezvous breakpoint is set, which means the
LoadInterpreterModule() method is not called and m_interpreter_module
is not initialized.

This causes the very first rendezvous breakpoint hit with
m_initial_modules_added=false to accidentally unload the
module_sp that corresponds to the dynamic loader.

This bug (introduced in D92187) was causing the rendezvous
mechanism to not work in Android 28. The mechanism works
fine on older/newer versions of Android.

Test: Verified rendezvous on Android 28 and 29
Test: Added dlopen test

Diff Detail

Event Timeline

emrekultursay requested review of this revision.Sep 14 2021, 5:05 PM
emrekultursay created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptSep 14 2021, 5:06 PM
emrekultursay edited the summary of this revision. (Show Details)Sep 14 2021, 5:13 PM
emrekultursay added reviewers: mgorny, labath.

I'm going to test this on FreeBSD in a minute. Could you fix clang-formatting in the meantime?

Ok, I don't see any obvious regressions while testing, so LGTM modulo formatting changes.

Applied clang-format

Since the meaning of m_initial_modules_added is basically "should I do a full scan through the linked list" and LoadAllCurrentModules (called from DidAttach) does a full scan, it seems to me the real bug is that this function does not set m_initial_modules_added = true. Would that fix your issue?

Also, what makes Android 28 special? Is it the presence/absence of a dynamic linker symlink? I'm wondering if we could reproduce (test) this more generally by ensuring we launch a binary through a symlink (or not).

Added m_initial_modules_added=true into LoadAllCurrentModules.

Since the meaning of m_initial_modules_added is basically "should I do a full scan through the linked list" and LoadAllCurrentModules (called from DidAttach) does a full scan, it seems to me the real bug is that this function does not set m_initial_modules_added = true. Would that fix your issue?

Yes, it does fix my bug, thanks for the suggestion. Would you like me to remove the lines 445-447 that I added, or keep them as a secondary protection (since module_sp != m_interpreter_module.lock() condition doesn't seem to be strong enough).

Also, what makes Android 28 special? Is it the presence/absence of a dynamic linker symlink? I'm wondering if we could reproduce (test) this more generally by ensuring we launch a binary through a symlink (or not).

On Android, GetExecutableModule() returns GetModuleAtIndex(0). For Android 27/29, this returns app_process64 (correct), but for Android 28 it returns [vdso], which is incorrect. This causes ResolveExecutableModule() to fall through due to module_spec (app_process64) not matching module_sp ([vdso]). This causes target.SetExecutableModule to be called, which triggers ClearModules. Since all modules are cleared, we cannot get the load address of the .dynamic section when we check whether we should rebase (i.e., addr.GetLoadAddress(&target) returns invalid address), thus rebase_exec=true. The rest is in CL description.

To test this, we would need a binary that (1) doesn't have eTypeExecutable in any of its modules (2) has a module other than the executable module at location 0.

Since the meaning of m_initial_modules_added is basically "should I do a full scan through the linked list" and LoadAllCurrentModules (called from DidAttach) does a full scan, it seems to me the real bug is that this function does not set m_initial_modules_added = true. Would that fix your issue?

Yes, it does fix my bug, thanks for the suggestion. Would you like me to remove the lines 445-447 that I added, or keep them as a secondary protection (since module_sp != m_interpreter_module.lock() condition doesn't seem to be strong enough).

I suppose keeping it is fine. It will make sure m_interpreter_module is up to date, if anyone needs to access it in the future.

Also, what makes Android 28 special? Is it the presence/absence of a dynamic linker symlink? I'm wondering if we could reproduce (test) this more generally by ensuring we launch a binary through a symlink (or not).

On Android, GetExecutableModule() returns GetModuleAtIndex(0). For Android 27/29, this returns app_process64 (correct), but for Android 28 it returns [vdso], which is incorrect. This causes ResolveExecutableModule() to fall through due to module_spec (app_process64) not matching module_sp ([vdso]). This causes target.SetExecutableModule to be called, which triggers ClearModules. Since all modules are cleared, we cannot get the load address of the .dynamic section when we check whether we should rebase (i.e., addr.GetLoadAddress(&target) returns invalid address), thus rebase_exec=true. The rest is in CL description.

To test this, we would need a binary that (1) doesn't have eTypeExecutable in any of its modules (2) has a module other than the executable module at location 0.

I'm not sure all of this is relevant. I've just tried to debug a very simple (linux) executable that does a dlopens a library. If I attach to the executable before the dlopen call, lldb does not notice the new library. This patch fixes the issue. So I guess the test could be just that. You could do something similar to TestLoadUnload -- either add a new test case which attaches to that binary, or use it as inspiration to write a new test.

Add new dlopen test

Minor touch-ups to the newly added test.

I'm not sure all of this is relevant. I've just tried to debug a very simple (linux) executable that does a dlopens a library. If I attach to the executable before the dlopen call, lldb does not notice the new library. This patch fixes the issue. So I guess the test could be just that. You could do something similar to TestLoadUnload -- either add a new test case which attaches to that binary, or use it as inspiration to write a new test.

PTAL. I added a new dlopen test for this. I can only test on Linux at the moment, so LMK if there's something I need to do to support more platforms or remove them from testing matrix.

Update test to pass on MacOS

Minor fix in test.

emrekultursay edited the summary of this revision. (Show Details)Sep 21 2021, 2:01 PM

plabath/mgorny: This change is ready for another round of review. Thanks.

The test looks mostly fine. I made some comments on how to make it compatible with windows. I'm not sure if it would be enough to make it run there, but I think it has a fair chance. I might just let it run and then disable it if it turns out to be failing.

lldb/test/API/functionalities/dlopen/TestDlopen.py
14–16 ↗(On Diff #374027)

see self.platformContext.shlib_prefix/extension for how to portably build this path (TestLoadUnload uses a mixture of these, as only those tests which work/make sense on windows have been updated).

lldb/test/API/functionalities/dlopen/main.cpp
8–10 ↗(On Diff #374027)

why this argument dance? can we remove it?

26 ↗(On Diff #374027)

see dylib.h and the functions within (the inferior of TestLoadUnload uses them) for a windows-compatible way to load shared libraries.

emrekultursay edited the summary of this revision. (Show Details)

Make test also run on Windows.

emrekultursay marked 3 inline comments as done.Sep 22 2021, 7:57 AM

PTAL. The test now passes on Linux, MacOS, and Windows.

lldb/test/API/functionalities/dlopen/main.cpp
26 ↗(On Diff #374027)

Since we are attaching to an already running process, which cannot find the dynamic library unless I pass the full path to dlopen(). That's why I couldn't use dylib.h (which doesn't add full path), but created my own version here.

labath accepted this revision.Sep 23 2021, 3:47 AM

lgtm, modulo comments.

lldb/test/API/functionalities/dlopen/TestDlopen.py
1 ↗(On Diff #374237)

I'd put this under functionalities/load_after_attach/TestLoadAfterAttach.py to better fit into the existing naming scheme, and to emphasize the attach aspect of the test (as that is what really makes this test special).

lldb/test/API/functionalities/dlopen/b.cpp
1–4 ↗(On Diff #374237)

What's up with the whitespace?

lldb/test/API/functionalities/dlopen/main.cpp
11–17 ↗(On Diff #374237)

std::this_thread::sleep_for

26 ↗(On Diff #374027)

Ok, what I think you're saying is that when we run a process for attaching, we skip the code paths which set ((DY)LD_LIBRARY_)PATH, which is what makes the relative imports work. It shouldn't be too hard to extend the launch infrastructure to do that. Let's commit this in this form, and I'll do that as a follow-up.

This revision is now accepted and ready to land.Sep 23 2021, 3:47 AM
emrekultursay marked 4 inline comments as done.

Renamed test to load_after_attach.

Done. Can you also submit it please?

lldb/test/API/functionalities/dlopen/main.cpp
26 ↗(On Diff #374027)

Yes. That SGTM. Thanks.

This revision was automatically updated to reflect the committed changes.