This is an archive of the discontinued LLVM Phabricator instance.

LLDB does not respect platform sysroot when loading core on Linux
ClosedPublic

Authored by EugeneBi on Jul 23 2018, 11:23 AM.

Details

Summary

The situation arises when we are trying to load a core dump which was generated on a different machine
running a different version of Linux. The problem is that shared libraries differ on these machines and
LLDB either fails to load some libraries or loads wrong ones.

How it should work:

  1. We copy the directory tree with libraries from target machine into some local directory
  2. in LLDB, create a platform setting sysroot to that directory
  3. Create target using that platform

The fix is to pass additional "sysroot" argument from lldb_private::Platform::GetSharedModule to
lldb_private::ModuleList::GetSharedModule

Diff Detail

Event Timeline

EugeneBi created this revision.Jul 23 2018, 11:23 AM
labath edited subscribers, added: lldb-commits; removed: llvm-commits.Jul 24 2018, 2:47 AM

Which version is this patch based on? The line numbers don't seem to match what I see on master.

Could you rebase the patch to master, and upload the patch with a full diff (e.g. via git diff -U9999, see https://llvm.org/docs/Phabricator.html#id3).

include/lldb/Core/ModuleList.h
545 ↗(On Diff #156833)

Please make this an llvm::StringRef (and then change AsCString to GetStringRef below).

clayborg requested changes to this revision.Jul 24 2018, 7:19 AM

I think doing this in the module list is not the right place. Why? Some platforms might have multiple sysroot to check. iOS for instance has a directory for each device that Xcode has connected to which can be checked. I am fine with adding this ability to lldb_private::Platform, but I would just do it in there. Try GetRemoteSharedModule with the spec, if it fails, try again after modifying the spec to prepend the sysroot path. Possible even just check the sysroot path + path first if m_sdk_sysroot is filled in. I don't see the need to change ModuleList itself.

include/lldb/Core/ModuleList.h
544–545 ↗(On Diff #156833)

Revert this? See my main comment

source/Core/ModuleList.cpp
710–714 ↗(On Diff #156833)

Revert

766–770 ↗(On Diff #156833)

Revert

source/Target/Platform.cpp
228

Revert

230

Here just make a module spec that has m_sdk_sysroot prepended if m_sdk_sysroot is set, and check for that file first. If that succeeds, return that, else do the same code as before.

This revision now requires changes to proceed.Jul 24 2018, 7:19 AM

The problem is that shared libraries differ on these machines and
LLDB either fails to load some libraries *or loads wrong ones*.

Not finding the modules is not surprising but the latter (loading the wrong
modules) is a bit concerning. Do you know why the module build-id is not
used when searching for the local binary?

We should never be loading the wrong shared libraries. The module spec we fill out must contain the UUID of the file are looking for. If it doesn't we have no chance of every really loading the right stuff.

We should never be loading the wrong shared libraries. The module spec we fill out must contain the UUID of the file are looking for. If it doesn't we have no chance of every really loading the right stuff.

Well, that's what it does on my machine. So prepending sysroot *after* trying to load module without it will cause problems to me. Also, I assume that if you specified sysroot for a platform, we should not try to load solibs without it - these paths are just not a part of the platform.

I think doing this in the module list is not the right place. Why? Some platforms might have multiple sysroot to check. iOS for instance has a directory for each device that Xcode has connected to which can be checked. I am fine with adding this ability to> lldb_private::Platform, but I would just do it in there. Try GetRemoteSharedModule with the spec, if it fails, try again after modifying the spec to prepend the sysroot path. Possible even just check the sysroot path + path first if m_sdk_sysroot is filled in. I don't see the need to change ModuleList itself.

I do not see how this prepend sysroot could be done in the platform... Essentially, my fix is doing what you suggest: the platform supplies sysroot to the module list, and two different platforms (for two XCode devices, etc.) would supply different sysroots. What do I miss?

We should never be loading the wrong shared libraries. The module spec we fill out must contain the UUID of the file are looking for. If it doesn't we have no chance of every really loading the right stuff.

Well, that's what it does on my machine. So prepending sysroot *after* trying to load module without it will cause problems to me. Also, I assume that if you specified sysroot for a platform, we should not try to load solibs without it - these paths are just not a part of the platform.

I think doing this in the module list is not the right place. Why? Some platforms might have multiple sysroot to check. iOS for instance has a directory for each device that Xcode has connected to which can be checked. I am fine with adding this ability to> lldb_private::Platform, but I would just do it in there. Try GetRemoteSharedModule with the spec, if it fails, try again after modifying the spec to prepend the sysroot path. Possible even just check the sysroot path + path first if m_sdk_sysroot is filled in. I don't see the need to change ModuleList itself.

I do not see how this prepend sysroot could be done in the platform... Essentially, my fix is doing what you suggest: the platform supplies sysroot to the module list, and two different platforms (for two XCode devices, etc.) would supply different sysroots. What do I miss?

You would just move:

auto resolved_module_spec(module_spec);
  if (sysroot != nullptr)
    resolved_module_spec.GetFileSpec().PrependPathComponent(sysroot);

into the code in Platform.cpp and do it all there.

EugeneBi updated this revision to Diff 157088.Jul 24 2018, 11:21 AM

Rebased to recent master.
Included the whole file in diff.

This comment was removed by EugeneBi.

You would just move:

auto resolved_module_spec(module_spec);
  if (sysroot != nullptr)
    resolved_module_spec.GetFileSpec().PrependPathComponent(sysroot);

into the code in Platform.cpp and do it all there.

Ah, I see. Will do, thanks.

EugeneBi updated this revision to Diff 157135.Jul 24 2018, 2:49 PM
EugeneBi marked 6 inline comments as done.

Code review followup:

  • Restricted change to Platform.cpp
  • Restricted change only to remote platforms.
clayborg accepted this revision.Jul 24 2018, 3:08 PM

This will help, but not fix us loading incorrect versions of the shared library. I wonder if there is anything in the core file that could help use get the build ID/UUID of each binary. I doubt it, but might be worth checking. Linux core files are really useless unless they can be processed with the exact binaries and that is a shame. The windows minidump files that breakpad can create are much better than standard core files since they do contain the information we need (path + version + UUID) and they are very similar to core files but just don't always save the same amount of memory (just the thread stacks, not all mapped memory), though that can easily be fixed.

This revision is now accepted and ready to land.Jul 24 2018, 3:08 PM
EugeneBi added inline comments.Jul 24 2018, 3:28 PM
source/Target/Platform.cpp
252

A bug here. must be resolved_spec if it succeeds.

EugeneBi updated this revision to Diff 157146.Jul 24 2018, 3:36 PM

Fix a bug with resolved_spec path.

EugeneBi marked an inline comment as done.Jul 24 2018, 3:36 PM
EugeneBi added inline comments.Jul 24 2018, 3:43 PM
source/Target/Platform.cpp
252

Now I have my doubts: what "platform file spec" really means? I mean, which code
is correct: current or previous one?

Could you also add a test for this?

Could you also add a test for this?

I never ran LLDB tests, not sure where they are and what they are.
Also, how would you test that? I know now my open core dump works, but I cannot share it.

Could you also add a test for this?

I never ran LLDB tests, not sure where they are and what they are.
Also, how would you test that? I know now my open core dump works, but I cannot share it.

Good question. Is this functionality specific to shared libraries, or could it be used to locate the main exe too?

If yes, then it should be sufficient to take one of the existing core&exe files we have, copy the exe into a path matching what the core file expects, set the sysroot, and make sure the exe gets found when we open up the core file. You can look at the existing tests in test/testcases/functionalities/postmortem/elf-core for examples.

If not, then it might get trickier because we'll need new core files, and it's hard to generate small ones with shared libraries.

Could you also add a test for this?

I never ran LLDB tests, not sure where they are and what they are.
Also, how would you test that? I know now my open core dump works, but I cannot share it.

Good question. Is this functionality specific to shared libraries, or could it be used to locate the main exe too?

If yes, then it should be sufficient to take one of the existing core&exe files we have, copy the exe into a path matching what the core file expects, set the sysroot, and make sure the exe gets found when we open up the core file. You can look at the existing tests in test/testcases/functionalities/postmortem/elf-core for examples.

If not, then it might get trickier because we'll need new core files, and it's hard to generate small ones with shared libraries.

It is specific to shared libraries. Opening the executable and core dump follows different code path.

It is specific to shared libraries. Opening the executable and core dump follows different code path.

Which path is that? Is the path different even when you don't specify an executable when opening the core file (target create --core /my/file.core). I believe that in this case we should end up here https://github.com/llvm-mirror/lldb/blob/7c07b8a9314eef118f95b57b49fa099be0808eac/source/Plugins/Process/elf-core/ProcessElfCore.cpp#L255, which should eventually call Platform::GetSharedModule. Is that not the case?

It is specific to shared libraries. Opening the executable and core dump follows different code path.

Which path is that? Is the path different even when you don't specify an executable when opening the core file (target create --core /my/file.core). I believe that in this case we should end up here https://github.com/llvm-mirror/lldb/blob/7c07b8a9314eef118f95b57b49fa099be0808eac/source/Plugins/Process/elf-core/ProcessElfCore.cpp#L255, which should eventually call Platform::GetSharedModule. Is that not the case?

I did not try not to specify executable - what's the point in doing that? LLDB would fail to find symbols.

Here is the stack I captured when I debugged the problem. I believe that DynamicLoaderPOSIXDYLD::LoadAllCurrentModules did not try to load the executable.

(gdb) bt
#0 lldb_private::ModuleList::GetSharedModule (module_spec=..., module_sp=std::shared_ptr (empty) 0x0,

module_search_paths_ptr=0x83ad60, old_module_sp_ptr=0x7fffffffbb50, did_create_ptr=0x7fffffffbb07,
always_create=false) at /home/eugene/llvm/tools/lldb/source/Core/ModuleList.cpp:710

#1 0x00007fffedc2d130 in lldb_private::Platform::<lambda(const lldb_private::ModuleSpec&)>::operator()(const lldb_private::ModuleSpec &) const (__closure=0x8581b0, spec=...)

at /home/eugene/llvm/tools/lldb/source/Target/Platform.cpp:234

#2 0x00007fffedc34ff2 in std::_Function_handler<lldb_private::Status(const lldb_private::ModuleSpec&), lldb_private::Platform::GetSharedModule(const lldb_private::ModuleSpec&, lldb_private::Process*, lldb::ModuleSP&, const lldb_private::FileSpecList*, lldb::ModuleSP*, bool*)::<lambda(const lldb_private::ModuleSpec&)> >::_M_invoke(const std::_Any_data &, const lldb_private::ModuleSpec &) (functor=..., args#0=...) at /usr/include/c++/5/functional:1857
#3 0x00007fffedc37978 in std::function<lldb_private::Status (lldb_private::ModuleSpec const&)>::operator()(lldb_private::ModuleSpec const&) const (this=0x7fffffffba80, __args#0=...) at /usr/include/c++/5/functional:2267
#4 0x00007fffedc3375a in lldb_private::Platform::GetRemoteSharedModule(lldb_private::ModuleSpec const&, lldb_private::Process*, std::shared_ptr<lldb_private::Module>&, std::function<lldb_private::Status (lldb_private::ModuleSpec const&)> const&, bool*) (this=0x839330, module_spec=..., process=0x84d310, module_sp=std::shared_ptr (empty) 0x0,

module_resolver=..., did_create_ptr=0x7fffffffbb07)
at /home/eugene/llvm/tools/lldb/source/Target/Platform.cpp:1628

#5 0x00007fffedc2d2cd in lldb_private::Platform::GetSharedModule (this=0x839330, module_spec=...,

process=0x84d310, module_sp=std::shared_ptr (empty) 0x0, module_search_paths_ptr=0x83ad60,
old_module_sp_ptr=0x7fffffffbb50, did_create_ptr=0x7fffffffbb07)
at /home/eugene/llvm/tools/lldb/source/Target/Platform.cpp:240

#6 0x00007fffedc9957c in lldb_private::Target::GetSharedModule (this=0x846960, module_spec=..., error_ptr=0x0)

at /home/eugene/llvm/tools/lldb/source/Target/Target.cpp:1952

#7 0x00007fffef8e0d11 in lldb_private::DynamicLoader::LoadModuleAtAddress (this=0x9a0a70, file=...,

link_map_addr=139700267943784, base_addr=139700263510016, base_addr_is_offset=true)
at /home/eugene/llvm/tools/lldb/source/Core/DynamicLoader.cpp:171

#8 0x00007fffedd8fb55 in DynamicLoaderPOSIXDYLD::LoadAllCurrentModules (this=0x9a0a70)

at /home/eugene/llvm/tools/lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DynamicLoaderPOSIXDYLD.cpp:537

#9 0x00007fffedd8de52 in DynamicLoaderPOSIXDYLD::DidAttach (this=0x9a0a70)

at /home/eugene/llvm/tools/lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DynamicLoaderPOSIXDYLD.cpp:171

#10 0x00007fffedc476d9 in lldb_private::Process::LoadCore (this=0x84d310)

at /home/eugene/llvm/tools/lldb/source/Target/Process.cpp:2853

Hmm... I never thought I can do that :)
Anyway, with the change as it is now, LLDB would try to load executable in the sysroot, fail that, then open it without the sysroot.

lemo added a comment.Jul 27 2018, 11:35 AM

I never *ran LLDB tests*, not sure where they are and what they are.

I hope you're planning to look into this before submitting the change :)

Hmm... I never thought I can do that :)
Anyway, with the change as it is now, LLDB would try to load executable in the sysroot, fail that, then open it without the sysroot.

Does that mean it is possible to exercise this code path (by placing the executable in the sysroot for lldb to find it)?

I never *ran LLDB tests*, not sure where they are and what they are.

I hope you're planning to look into this before submitting the change :)

Good idea :)

Scanning dependencies of target check-lldb
[100%] Built target check-lldb

Hmm... I never thought I can do that :)
Anyway, with the change as it is now, LLDB would try to load executable in the sysroot, fail that, then open it without the sysroot.

Does that mean it is possible to exercise this code path (by placing the executable in the sysroot for lldb to find it)?

I looked at the tests - is it all in Python? Not sure I have time to learn a new language... Is there anything in C++?

I looked at the tests - is it all in Python? Not sure I have time to learn a new language... Is there anything in C++?

We have unit tests in c++, but it's going to be quite hard to tickle this code path from there.

FWIW, I don't think you really need to *know* python to write a test like this. You should be able to fudge it by cargo-culting some code from existing tests and some basic python examples. I expect the test should be something like:

# copy core file an .exe into an appropriate directory tree
self.runCmd("platform select remote-linux --sysroot '%s'" % sysroot)
target = self.dbg.CreateTarget(None)
process = target.LoadCode(core)
self.assertEquals(1, target.GetNumModules())
self.assertEquals(exe, target.GetModuleAtIndex(0).GetFileSpec())

I looked at the tests - is it all in Python? Not sure I have time to learn a new language... Is there anything in C++?

We have unit tests in c++, but it's going to be quite hard to tickle this code path from there.

FWIW, I don't think you really need to *know* python to write a test like this. You should be able to fudge it by cargo-culting some code from existing tests and some basic python examples. I expect the test should be something like:

# copy core file an .exe into an appropriate directory tree
self.runCmd("platform select remote-linux --sysroot '%s'" % sysroot)
target = self.dbg.CreateTarget(None)
process = target.LoadCode(core)
self.assertEquals(1, target.GetNumModules())
self.assertEquals(exe, target.GetModuleAtIndex(0).GetFileSpec())

I am going on vacation till the end of August and I still have something to do at work. Most likely I will not have a chance to look at it until I am back.

@labath Eugene asked me to help add a unit test for this. I have the updated diff, but I can't seem to attach it to this code review -- it must be because I'm not the original author? I'll attach the diff to this comment and maybe you can try to update the review. Otherwise we can wait until Eugene gets back on Monday.

The diff makes the following changes:

  • Add an executable: test/testcases/functionalities/postmortem/elf-core/mock_sysroot/home/labath/test/a.out
  • This is a copy of linux-i386.out (which is why it's in mock_sysroot/home/labath...). Let me know if this is ok, or if I should try to create a new binary/core
  • Implement the test you suggested: set platform and sysroot, open the core, check threads/stacks are valid

I verified that the test fails without the changes to Platform.cpp and passes after the changes.

@labath Eugene asked me to help add a unit test for this. I have the updated diff, but I can't seem to attach it to this code review -- it must be because I'm not the original author? I'll attach the diff to this comment and maybe you can try to update the review. Otherwise we can wait until Eugene gets back on Monday.

The diff makes the following changes:

  • Add an executable: test/testcases/functionalities/postmortem/elf-core/mock_sysroot/home/labath/test/a.out
  • This is a copy of linux-i386.out (which is why it's in mock_sysroot/home/labath...). Let me know if this is ok, or if I should try to create a new binary/core
  • Implement the test you suggested: set platform and sysroot, open the core, check threads/stacks are valid

I verified that the test fails without the changes to Platform.cpp and passes after the changes.

Mark, thanks a lot for doing this. I have one request though: let's not add new binaries to the repo. Instead, the test should create a temporary directory somewhere (in /tmp, I guess) and copy necessary files there. After the test is done, wipe it out.

You can always run a.out through obj2yaml and check that in. There is test suite support for using a yaml file that converts it to a binary and debugs it

functionalities/gdb_remote_client/gdbclientutils.py
429:    def createTarget(self, yaml_path):
431:        Create a target by auto-generating the object based on the given yaml
437:        yaml_base, ext = os.path.splitext(yaml_path)
438:        obj_path = self.getBuildArtifact(yaml_base)
439:        self.yaml2obj(yaml_path, obj_path)
EugeneBi updated this revision to Diff 161520.Aug 20 2018, 11:15 AM

Mark added the test. Please let us know if this is OK.

labath accepted this revision.Aug 21 2018, 12:09 PM

Thank you for writing the test. This is more-or-less what I had in mind. Just make sure we don't put the test files in /tmp (e.g., because windows doesn't have it).

It's unfortunate that the core file has my username embedded in it (I should have build this in /tmp or somewhere), but if that doesn't bother you then I'm fine with it too (you can find leftover usernames in some llvm tests too, although, with this test, this becomes a bit more apparent). Alternatively, we can regenerate the core file and the matching executable. (We could use yaml2obj on the executable, but it unfortunately doesn't support (elf) core files at the moment).

packages/Python/lldbsuite/test/functionalities/postmortem/elf-core/TestLinuxCore.py
215 ↗(On Diff #161520)

Please put this in the build folder (self.getBuildDir) instead of /tmp.

231–232 ↗(On Diff #161520)

Then, this will not be necessary.

EugeneBi updated this revision to Diff 161959.Aug 22 2018, 7:55 AM

Do not use /tmp directory in the test

EugeneBi closed this revision.Aug 22 2018, 7:55 AM
EugeneBi marked an inline comment as done.
EugeneBi added inline comments.
packages/Python/lldbsuite/test/functionalities/postmortem/elf-core/TestLinuxCore.py
215 ↗(On Diff #161520)

I uploaded new diff and closed revision.
Do I need to do anything else?

labath added inline comments.Aug 25 2018, 12:41 PM
packages/Python/lldbsuite/test/functionalities/postmortem/elf-core/TestLinuxCore.py
215 ↗(On Diff #161520)

Normally, the revision would be automatically closed when the patch is committed. Do you have commit access or you need someone to commit this for you?

EugeneBi marked an inline comment as done.Aug 27 2018, 10:24 AM
EugeneBi added inline comments.
packages/Python/lldbsuite/test/functionalities/postmortem/elf-core/TestLinuxCore.py
215 ↗(On Diff #161520)

I do not have commit access.
Could you commit it, please?