This is an archive of the discontinued LLVM Phabricator instance.

[lldb] Report old modules from ModuleList::ReplaceEquivalent
ClosedPublic

Authored by JosephTremoulet on Oct 9 2020, 12:16 PM.

Details

Summary

This allows the Target to update its module list when loading a shared
module replaces an equivalent one.

A testcase is added which hits this codepath -- without the fix, the
target reports libbreakpad.so twice in its module list.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald Transcript
JosephTremoulet requested review of this revision.Oct 9 2020, 12:16 PM

The change makes sense.

If I run lldb on a binary, hit a breakpoint, run to it, then change a file, rebuild the executable outside lldb and rerun, I only see one instance of the executable. Why does that path not need your fix?

lldb/include/lldb/Core/ModuleList.h
142–148

You added another parameter, so you should add it to the docs.

  • Comment new parameter
JosephTremoulet marked an inline comment as done.Oct 13 2020, 1:20 PM
lldb/source/Core/ModuleList.cpp
779–781

If I run lldb on a binary, hit a breakpoint, run to it, then change a file, rebuild the executable outside lldb and rerun, I only see one instance of the executable. Why does that path not need your fix?

The codepath where I'm inserting this comment (ModuleList.cpp lines 779-781/784-786) is the one that handles that when I run those steps under the debugger.

lldb/source/Core/ModuleList.cpp
779–781

(see also D89156 which this change depends on)

I'd been hoping to get this bugfix into 11.0.1, but it (and D89156, which is just some cleanup that occurred to me while working on this code) have been stalled for a few weeks, and I don't have any insight into the reason. If there's anything i can do to help either one along, please let me know. One idea I had is that I could rebase this on master so it doesn't depend on the other, would that help?

Thanks.

jingham accepted this revision.Oct 28 2020, 3:26 PM

I can't see anything wrong with this, it seems to follow the practice of other parts of GetSharedModule. I'm still not quite certain why I've never seen an instance where the absence of your fix has caused problems, and it isn't clear from the patch why that is. If there something special about handling breakpad files that moves you out of the normal code path for replacing overridden modules? So I still have a lurking feeling I'm missing something?

OTOH, seems like any time you find that one module is a newer version of one you held onto you should mention it, so this seems okay.

This revision is now accepted and ready to land.Oct 28 2020, 3:26 PM

Thanks for the review!

I'm still unsure how to proceed w.r.t. D89156 -- did you intend to include that when you approved this change, or do you have a sense when you'll have time to look at it, or should I change this to not depend on it, or find another reviewer, or ... ? I'd appreciate any advice.

I'm still not quite certain why I've never seen an instance where the absence of your fix has caused problems, and it isn't clear from the patch why that is. If there something special about handling breakpad files that moves you out of the normal code path for replacing overridden modules? So I still have a lurking feeling I'm missing something?

I'm confused by that too. When reducing the repro / creating the testcase, I was surprised how tough it was to hit exactly this codepath. I don't know what the scenario was that these calls to ReplaceEquivalent were originally put in for (I couldn't figure it out from source history) to be able to say why it didn't matter there about not reporting the old module. There is some fallback code in Target::GetOrCreateModule to deal with the case that GetSharedModule didn't report the old module:

// GetSharedModule is not guaranteed to find the old shared module, for
// instance in the common case where you pass in the UUID, it is only
// going to find the one module matching the UUID.  In fact, it has no
// good way to know what the "old module" relevant to this target is,
// since there might be many copies of a module with this file spec in
// various running debug sessions, but only one of them will belong to
// this target. So let's remove the UUID from the module list, and look
// in the target's module list. Only do this if there is SOMETHING else
// in the module spec...
if (!old_module_sp) {
  if (module_spec.GetUUID().IsValid() &&
      !module_spec.GetFileSpec().GetFilename().IsEmpty() &&
      !module_spec.GetFileSpec().GetDirectory().IsEmpty()) {
    ModuleSpec module_spec_copy(module_spec.GetFileSpec());
    module_spec_copy.GetUUID().Clear();

    ModuleList found_modules;
    m_images.FindModules(module_spec_copy, found_modules);
    if (found_modules.GetSize() == 1)
      old_module_sp = found_modules.GetModuleAtIndex(0);
  }

ProcessMinidump::ReadModuleList calls into GetOrCreateModule without a UUID because it needs to handle the case that the thing reported in the uuid field of the minidump is actually a hash of the text section rather than a standard uuid, so it's possible that other cases usually have uuids and get saved by that fallback code, perhaps?

Another thing that's potentially unique for minidumps is that they will often try to resolve the executable twice -- once in ResolveExecutable, but then a second time if the executable appears in the module list in the minidump (which it should).

I've looked a bit at the codepaths used when loading ELF core dumps, and they don't have this module list iteration but rather call DidAttach on the dynamic loader to populate the target module list. I'm not familiar with the details of the dynamic loader code or how it avoids running into this issue.

I'm also under the impression that opening dumps with user-specified sysroots on Linux is not a frequently-used scenario, because this is the fourth bug in that area that my team has discovered, and I don't think we were otherwise doing anything particularly exotic. This patch isn't sysroot-specific, but having a sysroot specified alters how GetSharedModule is called (with different directories and search paths), and I wasn't hitting the key codepath when I tried to reduce the repro by removing the sysroot specification.

All the parts of D89156 that are just being consistent about always passing old_modules as a vector seem okay. But then you get to a point where you shouldn't really have multiple modules replacing a single one so you aren't really sure what to do about it. That part makes me a little uneasy.

But then you get to a point where you shouldn't really have multiple modules replacing a single one so you aren't really sure what to do about it. That part makes me a little uneasy.

Yeah, I wouldn't claim that the handling of the multiple-old-module case there is ideal. My thinking is that it makes an incremental step in the right direction, though -- the same potential for having multiple old modules is there with or without the change; the change makes the issue more apparent to readers of the code, makes note of it in the relevant log if we hit the issue at runtime, and refactors the code so there is one place to handle the issue. If you think it's best to hold off on that until we have a better way to actually handle the case, I can see the logic in that, just let me know and I'll rebase this to fix the bug in the meantime.

Thanks!

But then you get to a point where you shouldn't really have multiple modules replacing a single one so you aren't really sure what to do about it. That part makes me a little uneasy.

Yeah, I wouldn't claim that the handling of the multiple-old-module case there is ideal. My thinking is that it makes an incremental step in the right direction, though -- the same potential for having multiple old modules is there with or without the change; the change makes the issue more apparent to readers of the code, makes note of it in the relevant log if we hit the issue at runtime, and refactors the code so there is one place to handle the issue. If you think it's best to hold off on that until we have a better way to actually handle the case, I can see the logic in that, just let me know and I'll rebase this to fix the bug in the meantime.

Thanks!

Since we'd just be dropping all the other modules that we don't know how they are coming about on the floor before your patch, that seems strictly better. Can you emit more info when this happens, like the module that's replacing this set of old modules, and the UUID of the duplicate entries. That way when this happens for some reason we didn't expect - which will invariably be on some proprietary code that we can't debug locally - we have a better chance of figuring out what went wrong. Then this should be fine.

This revision was landed with ongoing or failed builds.Oct 30 2020, 12:14 PM
This revision was automatically updated to reflect the committed changes.