This is an archive of the discontinued LLVM Phabricator instance.

Minidump debugging using the native PDB reader
Needs RevisionPublic

Authored by lemo on Nov 30 2018, 12:32 PM.

Details

Summary

A few changes required to enable the use of the native PDB reader when debugging minidumps:

  1. Consume PDBs even if the module binary is not available Implementing a PlaceholderObjectFile to complement the existing PlaceholderModule
  1. Use the minidump exception record if present
  1. Add support for "target symbols add" with PDBs

Overall this is a modest step towards improving the minidump debugging experience. But more importantly,
it enables the basic consumption of PDBs using the NativePDB(TM) reader as a foundation to add more functionality incrementally.

Diff Detail

Event Timeline

lemo created this revision.Nov 30 2018, 12:32 PM

This looks good to me, save for a couple comment nits.

source/Plugins/SymbolFile/NativePDB/PdbIndex.cpp
160

Every apparently valid PDB or certain ones? If it's particular ones, then perhaps we should create a bug report with a repro to make it easier to investigate in the future.

source/Plugins/SymbolFile/NativePDB/SymbolFileNativePDB.cpp
133

Did you mean "directory" instead of "directly"?

Also, this is worded as a TODO, but it describes the current situation. Describing the current situation is fine, but I'd like the TODO to more clearly say what's do be done. I gather that's "implement a proper SymbolVendor."

zturner added inline comments.Nov 30 2018, 4:59 PM
source/Plugins/Process/minidump/ProcessMinidump.cpp
60

Should we return something valid for this function?

source/Plugins/SymbolFile/NativePDB/CompileUnitIndex.h
92

This method name is a little bit confusing. Is it referring to a numeric index like 1, 2, 3, 4, 5. Or a CompilandIndexItem? I would call this something like HasCompilandInfo(uint16_t modi)

source/Plugins/SymbolFile/NativePDB/PdbIndex.cpp
136–139

What kind of symbols have a segment and offset of 0?

155–162

If we're going to comment it out, then just delete the code IMO.

Do you have some llvm-pdbutil output that demonstrates this on a valid PDB? I guess we'd be looking for 2 symbols with the same Virtual Address. Seems odd to have that, but maybe an example of where it's happening would make it clear whether this is actually a valid case.

source/Plugins/SymbolFile/NativePDB/SymbolFileNativePDB.cpp
106–107

Perhaps obj_file should be a reference just to clarify that it can't be null.

114–117

Instead of trying to load it as a PE/COFF fail and then trying something else if it fails (hoping that it's a minidump), perhaps we could use llvm::identify_magic() to figure out what it is actually is first. That function currently cannot detect a Windows minidump, but we coudl teach it to support that. I think that would make this code more logical.

if (type == exe) {
} else if (type == minidump) {
} else {
  // actual error
}
labath added a comment.Dec 3 2018, 1:08 AM

I don't see any tests :(.

Also, the three bullet points in the description sound like rather independent pieces of functionality. Would it be possible to split them up into separate patches? That would make things easier to review, particularly for those who don't look at this code very often :).

lemo marked 9 inline comments as done.Dec 5 2018, 10:31 AM

I don't see any tests :(.
Also, the three bullet points in the description sound like rather independent pieces of functionality. Would it be possible to split them up into separate patches? That would make things easier to review, particularly for those who don't look at this code very often :).

I agree with both comments. The intention is to add some tests but I wanted to get the review out early to surface concerns, if any. I also needed more time to investigate a few complex failures uncovered by this change (ex. https://bugs.llvm.org/show_bug.cgi?id=39882 and https://bugs.llvm.org/show_bug.cgi?id=39897)

Yes, this change can also be split in three parts: the reason it's bundled up in this review is that all three parts are required to enable the basic functionality (and overall it's a relatively small change). Maybe it was better if I sent out the parts separately, but right now I'd like to preserve the continuity in the review comments.
I'm about to send out a new revision and once this review satisfies all the comments I'll split it out and send individual reviews.

source/Plugins/SymbolFile/NativePDB/PdbIndex.cpp
136–139
 68 | S_LDATA32 [size = 44] `trace_event_unique_atomic140`
      type = 0x0013 (__int64), addr = 0000:0000
112 | S_LDATA32 [size = 44] `trace_event_unique_atomic171`
      type = 0x0013 (__int64), addr = 0000:0000
156 | S_LDATA32 [size = 44] `trace_event_unique_atomic196`
      type = 0x0013 (__int64), addr = 0000:0000

...

Tracked by : https://bugs.llvm.org/show_bug.cgi?id=39882

155–162

I spent more time to investigate this and opened a bug to track it: https://bugs.llvm.org/show_bug.cgi?id=39897

With very large PDBs it's not easy to have a definitive verdict so a 2nd opinion is welcome, but as I noted in the bug is seems that ICF is one case where we can end up with multiple symbols at the same address. This means that the data structure needs to be revisited.

For now, I think it's better to suppress the assert so we can exercise the rest of the native PDB reader (I updated the comment to point to the new bug I just opened)

source/Plugins/SymbolFile/NativePDB/SymbolFileNativePDB.cpp
106–107

That would make sense. Unfortunately, obj_file can't be const (ObjectFile::GetUUID() is non const and it's not easy to surgically fix it)

So we'd have to pass by non-const ref, which would read "out-parameter", which IMO is more confusing than the non-null part is worth.

114–117

We need to load the binary anyway (this part was kept from the original code). If we can't find the binary we just look for the PDB in the current directly, none of which is specific to minidumps.

I'm sure there's a more elegant solution but again, this is a (hopefully short lived) stop-gap that should go away.

lemo updated this revision to Diff 176853.Dec 5 2018, 10:31 AM
lemo marked an inline comment as done.

This seems like a step in the right direction.

labath added a comment.Dec 6 2018, 1:04 AM

I agree with both comments. The intention is to add some tests but I wanted to get the review out early to surface concerns, if any. I also needed more time to investigate a few complex failures uncovered by this change (ex. https://bugs.llvm.org/show_bug.cgi?id=39882 and https://bugs.llvm.org/show_bug.cgi?id=39897)

Yes, this change can also be split in three parts: the reason it's bundled up in this review is that all three parts are required to enable the basic functionality (and overall it's a relatively small change). Maybe it was better if I sent out the parts separately, but right now I'd like to preserve the continuity in the review comments.
I'm about to send out a new revision and once this review satisfies all the comments I'll split it out and send individual reviews.

Sounds good. I think it would be nice to see the isolated patches standing next to their tests.

source/Plugins/Process/minidump/ProcessMinidump.cpp
134–136

I don't know whether you'll run into any problems because of this, but the way this works for "normal" object files is that the each object file has a list of "own" sections, and then the Module has a "unified" list, which contains the sections of the main object file possibly combined with sections from the symbol object files. Here you are adding a section to the unified list without adding it to the object file, which is a bit nonstandard. I think it would be better to just add it to both places.

source/Plugins/SymbolFile/NativePDB/SymbolFileNativePDB.cpp
106–107

I don't think the ObjectFile& would be an out-parameter any more than the existing BumpPtrAllocator& is an out-parameter. So making this a reference would also make things locally consistent.

150

make this sizeof(guid) for consistency.

171–174

Mainly out of curiosity, what's the complication here? The llvm interface does not provide the means to retrieve the age?

lemo updated this revision to Diff 177329.Dec 7 2018, 2:51 PM
lemo marked 3 inline comments as done.

Incorporating feedback + adding a test case

lemo added inline comments.Dec 7 2018, 2:52 PM
source/Plugins/SymbolFile/NativePDB/SymbolFileNativePDB.cpp
171–174

Yes, the interface doesn't expose the age, although that's not not really the problem (the age is a detail - part of the generic "Debug ID")

We just need to make the handling of PE/COFF & VC UUIDs more consistent: right now most places expect a stripped version (just the GUID). This is wrong since a correct PDB match should take the age into account, but it's outside the scope of this change.

BTW: check my changes in: https://reviews.llvm.org/D55522
It will be interesting to you since it parses the linux maps info if it is available in breakpad generated minidump files. This will give us enough info to create correct sections for object files when we have no ELF file or no symbol file (ELF/DWARF or breakpad).

This looks fine to me. The test is a bit more coarse-grained than I'd like, though I can't think of a substantially better approach right now. One of the pdb folks should ok this too.

source/Plugins/Process/minidump/ProcessMinidump.cpp
386

Since you're changing this, you might as well change the type to something more appropriate (size_t?).

lemo updated this revision to Diff 177576.Dec 10 2018, 11:32 AM
lemo marked an inline comment as done.

How large is the PDB file here?

FYI: my approach to getting symbols to load was a bit different. I always let a simple PlaceholderModule be created, but I played some tricks in the GetObjectFile() method if someone had setting symbols for the module with "target symbols add ..". I will attach my PlaceholderModule so you can see what I had done. Since these modules always must be associated with a target, I keep a weak pointer to the target in the constructor. Then later, if someone does "target symbols add ..." the module will have Module:: m_symfile_spec filled in, so I set the m_platform_file to be m_file, and then move m_symfile_spec into m_file and then call Module::GetObjectFile(). This is nice and clean because you don't have to make up fake symbols to fake sections. When you create the placeholder module it needs the target:

auto placeholder_module =
    std::make_shared<PlaceholderModule>(module_spec, GetTarget());

Here is the copy of the PlaceholderModule that does what was discussed above:

//------------------------------------------------------------------
/// A placeholder module used for minidumps, where the original
/// object files may not be available (so we can't parse the object
/// files to extract the set of sections/segments)
///
/// This placeholder module has a single synthetic section (.module_image)
/// which represents the module memory range covering the whole module.
//------------------------------------------------------------------
class PlaceholderModule : public Module {
public:
  PlaceholderModule(const ModuleSpec &module_spec, Target& target) :
    Module(module_spec.GetFileSpec(), module_spec.GetArchitecture()),
    m_target_wp(target.shared_from_this()),
    m_base_of_image(LLDB_INVALID_ADDRESS), m_size_of_image(0) {
    if (module_spec.GetUUID().IsValid())
      SetUUID(module_spec.GetUUID());
  }

  // Creates a synthetic module section covering the whole module image (and
  // sets the section load address as well)
  void CreateImageSection(const MinidumpModule *module) {
    m_base_of_image = module->base_of_image;
    m_size_of_image = module->size_of_image;
    TargetSP target_sp = m_target_wp.lock();
    if (!target_sp)
      return;
    const ConstString section_name(".module_image");
    lldb::SectionSP section_sp(new Section(
        shared_from_this(),     // Module to which this section belongs.
        nullptr,                // ObjectFile
        0,                      // Section ID.
        section_name,           // Section name.
        eSectionTypeContainer,  // Section type.
        module->base_of_image,  // VM address.
        module->size_of_image,  // VM size in bytes of this section.
        0,                      // Offset of this section in the file.
        module->size_of_image,  // Size of the section as found in the file.
        12,                     // Alignment of the section (log2)
        0,                      // Flags for this section.
        1));                    // Number of host bytes per target byte
    section_sp->SetPermissions(ePermissionsExecutable | ePermissionsReadable);
    GetSectionList()->AddSection(section_sp);
    target_sp->GetSectionLoadList().SetSectionLoadAddress(
        section_sp, module->base_of_image);
  }

  ObjectFile *GetObjectFile() override {
    // Since there is no object file for these place holder modules, check
    // if the symbol file spec has been set, and if so, then transfer it over
    // to the file spec so the module can make a real object file out of it.
    if (m_symfile_spec) {
      // We need to load the sections once. We check of m_objfile_sp is valid.
      // If it is, we already have loaded the sections. If it isn't, we will
      // load the sections.
      const bool load_sections = !m_objfile_sp;
      if (load_sections) {
        m_platform_file = m_file;
        m_file = m_symfile_spec;
      }
      ObjectFile *obj_file = Module::GetObjectFile();
      if (load_sections && obj_file) {
        TargetSP target_sp = m_target_wp.lock();
        // The first time we create the object file from the external symbol
        // file, we must load its sections and unload the ".module_image"
        // section
        bool changed;
        SetLoadAddress(*target_sp, m_base_of_image, false, changed);
      }
      return obj_file;
    }
    return nullptr;
  }

  SectionList *GetSectionList() override {
    return Module::GetUnifiedSectionList();
  }
protected:
  // In case we need to load the sections in PlaceholderModule::GetObjectFile()
  // after a symbol file has been specified, we might need the target.
  lldb::TargetWP m_target_wp;
  lldb::addr_t m_base_of_image;
  lldb::addr_t m_size_of_image;
};
lemo added a comment.Dec 10 2018, 3:13 PM

How large is the PDB file here?

~100kb

zturner added inline comments.Dec 11 2018, 10:38 AM
source/Plugins/SymbolFile/NativePDB/SymbolFileNativePDB.cpp
139–144

We talked about this offline, but bringing the discussion back here. Can you describe the use case that this is addressing? As you mention, this is a temporary hack until we have proper symbol searching logic, but proper symbol searching logic will do more than just look up symbols in a symbol server. It will also, for example, look in the same directory as the executable file. If we changed this logic to do that, would your use case still be addressed? At least that way, the logic we're adding is not temporary, even if it will eventually live in a different place (e.g. the SymbolVendor).

How large is the PDB file here?

~100kb

We have a couple of tests in LLVM where PDB files are checked in, but they are very few. We cannot explode the repo with large numbers of binary files. So this is probably fine, but if this becomes a pattern, we will need to come up with a different solution.

lemo updated this revision to Diff 177898.Dec 12 2018, 1:08 PM
lemo edited the summary of this revision. (Show Details)
lemo marked an inline comment as done.Dec 12 2018, 1:18 PM
lemo added inline comments.
source/Commands/CommandObjectTarget.cpp
4246 ↗(On Diff #177898)

note I had to bypass this check: we don't (yet) have a ObjectFilePDB so the SymbolFileNativePDB always points to the associated PE binary.

the check itself seems valuable as a diagnostic but not strictly required. Should I add a TODO comment and/or open a bug to revisit this?

labath added inline comments.Dec 13 2018, 3:29 AM
source/Commands/CommandObjectTarget.cpp
4246 ↗(On Diff #177898)

I not sure this is a good idea. Isn't this the only way of providing feedback about whether the symbols were actually added? If we are unable to load the symbol file specified (perhaps because the user made a typo, or the file is corrupted), then the symbol vendor will just create a default SymbolFile backed by the original object file. Doesn't that mean this will basically always return true now?

I think this is strictly worse that the previous solution as it lets the objectless-symbol-file hack leak out of SymbolFilePDB.

clayborg requested changes to this revision.Dec 13 2018, 1:38 PM

Just need a way to verify symbols are good. See my inline comment.

source/Commands/CommandObjectTarget.cpp
4246 ↗(On Diff #177898)

We need to add some sanity check where we ask the symbol file if it is valid. It should be virtual function in SymbolFile that defaults to:

virtual bool SymbolFile::IsValid() const {
  return GetObjectFile() != nullptr;
}

And we can override for SymbolFile subclasses that arenb't objfile based. How does this sound?

This revision now requires changes to proceed.Dec 13 2018, 1:38 PM

I thought about what my "ideal" design would look like. I'm not suggesting anyone should actually go off and do this, but since we're brainstorming design anyway, it doesn't hurt to consider what an optimal one should look like (and for the record, there may be things I haven't considered that cuase this design to not work or be suboptimal or not work at all). Just throwing it out there.

  1. Rename ObjectFile to ModuleFile to indicate that a) it makes no sense without a Module, and b) it doesn't make sense for symbol-only files.
  1. Rename SymbolVendor to SymbolFileLocator. Mostly cosmetic, but more clearly differentiates the responsibilities of the two classes. (e.g. wouldn't you expect SymbolFiles to be able to "vend" symbols)?
  1. SymbolFileLocator returns a unique_ptr<SymbolFile>. Nothing more. It has no other methods.
  1. Target contains a vector<unique_ptr<SymbolFileLocator>> as well as methods AddSymbolFileLocator(unique_ptr<SymbolFileLocator>) and unique_ptr<SymbolFile> LocateSymboleFile(Module&). The former gives things like the Platform or Process the ability to customize the behavior. The latter, when called, will iterate through the list of SymbolFileLocators, trying to find one that works.
  1. Module contains a unique_ptr<SymbolFile>. When GetSymbolFile() is called for the first time, it calls Target::LocateSymbolFile(*this) and updates its member variable.
  1. ModuleFile is updated to support "capabilities" much like SymbolFile is today. One of these capabilities can be "has unwind info". Likewise, "has unwind info is added to SymbolFile capabilities as well. Module can then select the best provider of unwind info using this approach.

And finally

  1. SymbolFile now stores a variable m_module instead of m_obj_file since this is usually what the SymbolFile requires anyway, and it also eliminates the dependency on SymbolFile being backed by an ObjectFile. Note that in this case, a SymbolFileBreakpad for example would have its m_module variable set to the actual module that its providing symbols for.

Some interesting ideas here. I like some of them, but I have reservations about others. I'm going to go through them one by one.

I thought about what my "ideal" design would look like. I'm not suggesting anyone should actually go off and do this, but since we're brainstorming design anyway, it doesn't hurt to consider what an optimal one should look like (and for the record, there may be things I haven't considered that cuase this design to not work or be suboptimal or not work at all). Just throwing it out there.

  1. Rename ObjectFile to ModuleFile to indicate that a) it makes no sense without a Module, and b) it doesn't make sense for symbol-only files.

This is the one I have most problems with, actually, as it goes against my "ideal" design. I would actually like to have ObjectFile (or whatever it's called) be completely independent of the Module class. The motivation for this is, unsurprisingly, lldb-server, which needs to have access to very lightweight info about an object file (on the level of: UUID, architecture, maybe entry point, etc.), but nothing like the fancy interface the Module class provides. So ideally, I'd like for the ObjectFile to live on a completely different level, so that it can be used in lldb-server without pulling in Module and everything that goes with it. So, I don't agree with your (a) goal here. The (b) goal seems completely reasonable though.

  1. Rename SymbolVendor to SymbolFileLocator. Mostly cosmetic, but more clearly differentiates the responsibilities of the two classes. (e.g. wouldn't you expect SymbolFiles to be able to "vend" symbols)?

+1000. It has always bugged me that SymbolVendor has all these methods that don't do anything and just forward their arguments to the SymbolFile (after performing a dozen null checks). I think it would be better if the symbol file interface was called directly. If some SymbolVendor/SymbolLocator ever wants to take an existing SymbolFile and alter its behavior slightly, it can always use the composition pattern and return it's own delegating SymbolFile object instead.

  1. SymbolFileLocator returns a unique_ptr<SymbolFile>. Nothing more. It has no other methods.

That seems to be the logical consequence of the previous item.

  1. Target contains a vector<unique_ptr<SymbolFileLocator>> as well as methods AddSymbolFileLocator(unique_ptr<SymbolFileLocator>) and unique_ptr<SymbolFile> LocateSymboleFile(Module&). The former gives things like the Platform or Process the ability to customize the behavior. The latter, when called, will iterate through the list of SymbolFileLocators, trying to find one that works.

I am not sure if Target is a good receptacle for this functionality, because one of lldb's goals (which seems completely reasonable to me) is to be able to reuse the same Module instance if the module happens to be loaded in multiple targets. If the target is responsible for locating the symbol files, then this opens the door for one target to unwittingly alter the symbols of another target. One way to get out of this would be to say that targets share a Module only if they (independently) choose the same symbol file. This might enable us to also solve the current problem, where "target symbols add" executed in one debugger, alters the behavior of another debugger instance. Another would be to store the list of SymbolLocators in a more global place. This would actually be pretty similar to what we have now.

  1. Module contains a unique_ptr<SymbolFile>. When GetSymbolFile() is called for the first time, it calls Target::LocateSymbolFile(*this) and updates its member variable.

Nothing to argue about here.

  1. ModuleFile is updated to support "capabilities" much like SymbolFile is today. One of these capabilities can be "has unwind info". Likewise, "has unwind info is added to SymbolFile capabilities as well. Module can then select the best provider of unwind info using this approach.

I don't believe a single bit is enough to describe the suitability of unwind info, as the choice of unwind strategy may vary from function to function and depend on other circumstances too (e.g., whether we are stopped at a call site). Instead, I'd try to have much more granular capabilities. E.g., one of those capabilities might be "has function boundaries". Then if either entity provides the information about function boundaries, the Module can create an InstructionEmulationUnwinder, which uses these boundaries to create emulated unwind plans. And even this capability might not be just a "bit", but rather a pooling of function boundary information gathered from various sources.

And finally

  1. SymbolFile now stores a variable m_module instead of m_obj_file since this is usually what the SymbolFile requires anyway, and it also eliminates the dependency on SymbolFile being backed by an ObjectFile. Note that in this case, a SymbolFileBreakpad for example would have its m_module variable set to the actual module that its providing symbols for.

Sounds good.

teemperor added inline comments.
lit/Minidump/Windows/Sigsegv/sigsegv.test
7

This test fails for me because lldb is picking up my custom lldbinit (which changes the format of the frame output). Not sure what's the right way to fix it, but setting a custom fixed frameformat in the sigsegv.lldbinit seems like an obvious solution.

labath added inline comments.Jan 9 2019, 6:48 AM
lit/Minidump/Windows/Sigsegv/sigsegv.test
7

Are you sure you have the right patch here?

Regardless, I actually think the right solution would be to pass --no-lldb-init as a part of the %lldb substitution (I'm surprised we don't do that already) to make sure no test sources random commands from users' .lldbinit files.

labath resigned from this revision.Aug 9 2019, 2:01 AM