This is an archive of the discontinued LLVM Phabricator instance.

Refactor CommandObjectTargetSymbolsAdd::AddModuleSymbols
ClosedPublic

Authored by amccarth on Jan 28 2020, 4:59 PM.

Details

Summary
  • [NFC] Renamed local matching_module_list to matching_modules for conciseness.
  • [NFC] Eliminated redundant local variable num_matches to reduce the risk that changes get it out of sync with matching_modules.GetSize().
  • Used an early return from case where the symbol file specified matches multiple modules. This is a slight behavior change, but it's an improvement: It didn't make sense to tell the user that the symbol file simultaneously matched multiple modules and no modules.
  • [NFC] Used an early return from the case where no matches are found, to better align with LLVM coding style.
  • [NFC] Simplified call of AppendWarningWithFormat("%s", stuff) to AppendWarning(stuff). I don't think this adds any copies. It does construct a StringRef, but it was going to have to scan the string for the length anyway.
  • [NFC] Removed unnecessary comments and reworded others for clarity.
  • Used an early return if the symbol file could not be loaded. This is a behavior change because previously it could fail silently.
  • Used an early return if the object file could not be retrieved from the symbol file. Again, this is a change because now there's an error message.
  • If we successfully load a symbol file that seems to match, but then detect a file name mismatch, now we only issue a warning. We've already determined they match, so it seems pointless to change our minds.
  • [NFC] Eliminated a namespace alias that wasn't particularly helpful.

Diff Detail

Event Timeline

amccarth created this revision.Jan 28 2020, 4:59 PM
labath added a subscriber: labath.Jan 29 2020, 12:45 AM
labath added inline comments.
lldb/source/Commands/CommandObjectTarget.cpp
4102

This is not unreasonable in the non-pdb world. You can have a stripped version of a file somewhere prepared for deployment to some device (or downloaded from the device), and then you'll also have an unstripped version of that file (with the same name) in some build folder.

4147
4173–4176

This part is not good. Everywhere else except pdbs this means that we were in fact unable to load the symbol file. What happens is that if we cannot load the object file representing the symbol file (at SymbolVendor.cpp:48), we fall back to creating a SymbolFile using the object file of the original module (line 52).

The effect of that is that the symbol file creation will always succeed, the previous checks are more-or-less useless, and the only way to check if we are really using the symbols from the file the user specified is to compare the file specs.

Now, PDB symbol files are abusing this fallback, and reading the symbols from the pdb files even though they were in fact asked to read them from the executable file. This is why this may sound like a "discrepancy" coming from the pdb world, but everywhere else this just means that the symbol file was not actually loaded.

clayborg added inline comments.Jan 29 2020, 11:21 AM
lldb/source/Commands/CommandObjectTarget.cpp
4147

This assert will never trigger with the above code already checking for empty on line 4123 and then checking if size > 1 on line 4140. Remove it?

4173

This could also fail if the symbol file spec was a bundle which got resolved when the module was loaded. If the user specified "/path/to/foo.dSYM" and the underlying code was able to resolve the symbol file in the dSYM bundle to "/path/to/foo.dSYM/Contents/Resources/DWARF/foo".

amccarth marked 3 inline comments as done.Jan 29 2020, 11:43 AM

Thanks for the feedback. Obviously I'm confused about how LLDB handles split debug info, so I need more clarification about how to proceed.

lldb/source/Commands/CommandObjectTarget.cpp
4102

Sure, if you've got two files with the same name in two directories that's fine.

But the loop tries peeling the extension off of the supplied file name and comparing it to the target's file name. I guess it's trying to match something like foo.unstripped to foo. Is that a typical thing?

4173–4176

Interesting. I did not expect that fallback behavior from the API.

PDB symbol files are abusing this fallback

I'm not sure there's abuse going on here. I'm not understanding something about how lldb models this situation. Consider the case where the user explicitly asks lldb to load symbols from a PDB:

(lldb) target add symbols D:\my_symbols\foo.pdb

Then symbol_file is the PDB and object_file is the executable or DLL. I would expect those file specs to match only in the case where the symbols are in the binary (in other words, when symbol file and object file are the same file). Even ignoring the PDB case, it seems this wouldn't even work in the case you mentioned above where the object file is stripped and the symbol file is an unstripped copy of the object file (possibly with a different name). You can also get here if the symbols were matched by UUID rather than file spec. Or when the symbols were matched by the basename minus some extension.

Given that dsyms work, I must be misunderstanding something here. Can you help me understand?

What's the right thing to do here? If I just undo this refactor, then we're back to silent failures when the symbols exist in a different file than the object.

amccarth marked an inline comment as done.Jan 29 2020, 11:51 AM
amccarth added inline comments.
lldb/source/Commands/CommandObjectTarget.cpp
4147

It's true that this assert won't happen given the early returns above, so I will remove it if you like.

But I did it intentionally: (1) to document the requirement and (2) to protect against future changes accidentally breaking the assumptions. If the function were much shorter, then it might be obvious enough as is. But early returns, especially halfway down a long function, can be easy to miss.

clayborg added inline comments.Jan 29 2020, 1:11 PM
lldb/source/Commands/CommandObjectTarget.cpp
4147

Gotcha. It is fine to leave it in, just switch to assert() as Pavel mentioned.

amccarth marked 2 inline comments as done.Jan 29 2020, 1:27 PM
amccarth added inline comments.
lldb/source/Commands/CommandObjectTarget.cpp
4173

In your example, the file specs do not match.

  1. The old code would therefore reject the symbol file and return an error.
  2. The new code would therefore emit a warning but proceed with the selected symbol file and return success.

Do you want either of these behaviors or something else?

clayborg added inline comments.Jan 29 2020, 1:52 PM
lldb/source/Commands/CommandObjectTarget.cpp
4173

all I know is I can and have typed:

(lldb) target symbols add /path/to/foo.dSYM

And it would successfully associate it with the binary, probably because the code on 4071 would see a UUID and accept it before it gets to this point.

What is the point of the warning? When would this happen?

amccarth marked an inline comment as done.Jan 29 2020, 2:28 PM
amccarth added inline comments.
lldb/source/Commands/CommandObjectTarget.cpp
4173

There are examples in the thread with Pavel just below this thread. On Windows, your symbols may come from a file called foo.pdb for an executable called foo.exe. The old code would successfully load the PDB file, then decide it was no good because the file specs don't match. (Heck, at this point, we've just gone through a bunch of gyrations to allow matching of differing file specs.) So you'd get a mostly silent failure for code that actually worked.

My intent was to simply warn about such cases without actually forcing a failure, and to do so with some explicit feedback so that the user understands.

I'll wait to hear what Pavel says. If necessary, I'll just roll back this part of the change (so that this remains just a refactor), and I'll figure out a way around the problem in a later patch.

labath added inline comments.Jan 30 2020, 1:08 AM
lldb/source/Commands/CommandObjectTarget.cpp
4102

Ah, sorry, I missed that part.

Yes, this is also thing that's used sometimes (probably even more often than the first thing). One of the conventions for automatically finding separate debug files involves appending .debug to your file and then placing it in a special folder. I guess this was added to enable one to do "target symbols add /whereever/foo.debug" and have it match "foo".

That said, stripping the extension in a loop does seem like overkill.

4173–4176

Greg's use case will work because the bundle path will be resolved to the actual file (in PlatformDarwin::ResolveSymbolFile) before we get to this place.

Overall, I think changing this to a warning would be acceptable. Thinking about this more, I am not sure if the scenario I described below could happen (easily -- it might for some very weird elf files where we were able to extract the uuid, but cannot create an actual ObjectFileELF instance). So calling this a discrepancy would be fine.

And I do think that the current pdb behavior is a big discrepancy, which should be fixed. Either we should go the breakpad way, and introduce an "ObjectFilePDB" so that the pdb symbol file plugin(s) can behave like the dwarf code does, or we should change the SymbolFile class so that it does not require an object file instance in order to read symbols from a separate file (and then we can delete ObjectFileBreakpad).

Having a more explicit method of checking whether the symbol file creation was successful (instead of checking file specs) would not hurt either...

amccarth marked 3 inline comments as done.Jan 30 2020, 10:35 AM

I backed out the TODO and the warn-and-continue behavior to expedite this patch which is just supposed to be a refactor for readability.

We can revisit the issue of debug symbol file names not matching the object file names in the future (if necessary).

lldb/source/Commands/CommandObjectTarget.cpp
4102

OK, I'm removing the TODO in favor of a comment with the foo and foo.debug you gave. Whether it should be a loop or not can be addressed later. Whether we should do something that would also work for Windows (e.g., foo.exe matching foo.pdb) can also be considered later.

This patch is just supposed to be a refactor for readability/debuggability.

4173–4176

Thanks for all the clarification.

I am planning to introduce an ObjectFilePDB in a future patch to solve other problems.

To keep this patch a behavior-preserving refactor, I'll back out the warning message. Once PDBs can be loaded reliably, I can circle back here to see if there's a better way to detect failure and/or to communicate matching problems to the user.

amccarth updated this revision to Diff 241510.Jan 30 2020, 10:37 AM
amccarth marked an inline comment as done.
  • Backed out the warn-and-continue behavior change. It should be addressed later.
  • Changed lldb_assert to assert.
  • Improved comment in place of the proposed TODO.
clayborg accepted this revision.Jan 30 2020, 7:00 PM

Looks reasonable to me. Any objections from anyone else? We do have tests for this right?

This revision is now accepted and ready to land.Jan 30 2020, 7:00 PM
labath accepted this revision.Jan 31 2020, 12:42 AM

Looks reasonable to me. Any objections from anyone else? We do have tests for this right?

LGTM too. There are tests for this command already, but of course, we could always use a bunch more, particularly for the "error" cases...

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 3 2020, 2:23 PM