This is an archive of the discontinued LLVM Phabricator instance.

[Symbolizer] Add Build ID flag to llvm-symbolizer.
ClosedPublic

Authored by mysterymath on Jan 31 2022, 10:30 AM.

Details

Summary

This adds a --build-id=<hex build ID> flag to llvm-symbolizer. If --obj
is unspecified, this will attempt to look up the provided build ID using
whatever mechanisms are available to the Symbolizer (typically,
debuginfod). The semantics are then as if the found binary were given
using the --obj flag.

Diff Detail

Event Timeline

mysterymath created this revision.Jan 31 2022, 10:30 AM

Update docs.

Consistent caps for "build ID."

mysterymath published this revision for review.Jan 31 2022, 10:54 AM
mysterymath added reviewers: phosek, mcgrathr.
Herald added a project: Restricted Project. · View Herald TranscriptJan 31 2022, 10:54 AM
phosek added inline comments.Feb 1 2022, 12:09 AM
llvm/docs/CommandGuide/llvm-symbolizer.rst
189

I wonder whether it wouldn't be less error prone to diagnose the case when user uses both --obj and --build-id as an error?

llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
282

This is only invoked from one location so I'd consider inlining it.

jhenderson added inline comments.Feb 1 2022, 7:32 AM
llvm/docs/CommandGuide/llvm-symbolizer.rst
189

+1: I think it should be an error if a user specifies both, unless you have a good use-case for it.

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
395–396

Don't think you want this blank line.

llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
385–388

This is untested.

Also, the error format is not consistent with the LLVM coding standards or what llvm-symbolizer prints when there's a missing file, as far as I can see.

ormris removed a subscriber: ormris.Feb 1 2022, 9:41 AM
mysterymath marked 5 inline comments as done.
  • Refactored build ID lookup functionality into LLVMSymbolizer. This makes build IDs a first class means to specify modules. In particular, this produces consistent error behavior between paths and build IDs. This will also make it straightforward to extend the stdin syntax to support alternating build IDs and paths.
  • Made --build-id and --obj mutually exclusive.
  • Added tests.
llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
385–388

There were some considerable semantic divergences between --obj and --build-id; the former would return EXIT_SUCCESS and keep going, producing only one error, while the latter would immediately exit with a nonzero status.

The current approach also doesn't extend well to handling multiple build IDs specified on a line-by-line basis, which is a near-term TODO.

Accordingly, I've reworked this functionality to take place inside the Symbolizer, rather than in the wrapping tool. This makes BuildIDs a first-class mechanism that can be used to specify modules; it thus intrinsically has similar error behavior to the other ways of specifying modules. This will also make it easy to build syntax into the stdin mechanism to switch out the module specifier type.

Thanks for the rework. I've added some basic comments, but I think @phosek (or someone else) would be better to continue this review beyond these points.

llvm/docs/CommandGuide/llvm-symbolizer.rst
189

I'd delete "family" since "-e" etc are just aliases, and therefore it's implied.

240
llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h
61–66

Not necessarily something that you need to do right now, but nonetheless worth thinking about: it's clear by adding a third variation of each of these methods that some sort of refactoring would be appropriate to simplify extensibility of this code. In this case, I'd probably make each of the different input types (object, module name, build ID) a separate class, with the relevant methods attached. symbolizeCode would then just become something like:

template <typename T>
Expected<DILineInfo> symbolizeCode(const T &Input, object::SectionedAddress ModuleOffset) {
  return Input.symbolizeCode(ModuleOffset, ...);
}

and so on. Adding a new input kind wouldn't then require modifying this class again, keeping this class stable, and just adding new classes for each input kind.

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
389
mysterymath marked 3 inline comments as done.

Address nits.

llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h
61–66

Yeah, this occurred to me too. I'll take a stab at this afterwards and see if I can design a clean abstraction.

jhenderson accepted this revision.Feb 4 2022, 12:35 AM

LGTM, but please wait for @phosek.

This revision is now accepted and ready to land.Feb 4 2022, 12:35 AM
phosek added inline comments.Feb 4 2022, 12:56 AM
llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
439

IIUC this slightly changes the semantics. Currently, even if we failed to download a file (for example because of the intermittent network error), we would try again on then next symbolization request. With this change, we would cache the error result (that is, an empty string) after the first attempt and never retry. Maybe it would be better to drop this line to preserve the current semantics?

llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
368

It might be more efficient to check if both --obj and --build-id was specified before parsing --build-id and allocating memory for build ID, but it only makes a difference for the error path so it's not particularly important.

Retry build ID lookups each time, but only report failures once.

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
439

This is slightly tricky: we should retry the network request if it failed previously, but we probably shouldn't spam the console with repeated error messages for the build ID, since that differs from the way errors work for paths (one per path).

I've tried to keep the error behavior while still retrying the lookup under the hood. If a retry succeeds, the data becomes available for all future symbolization request; otherwise, the error is silenced.

Always call through to getOrFindDebugBinary, even if there were a previous
error.

Make the missing build ID test treat stdout and stderr separately.

Fix the bugs this uncovered.

phosek accepted this revision.Feb 8 2022, 1:24 AM

LGTM

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
439

Personally I'd prefer getting an error on every unsuccessful attempt; that may be more spammy but avoids hiding potentially useful information from the user. If the errors go to stderr, user can always redirect it to /dev/null if needed. Either solution is fine with me though.

llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
367–368

I'd use hasArg.

mysterymath marked an inline comment as done.
  • Sync from main.
  • Use hasArg.
  • Report buildID lookup failures every time.
mysterymath added inline comments.Feb 8 2022, 12:31 PM
llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
367–368

There's a use to having a value that means "flag unset": you can unset flags via concatenation, which can help when using utilities in scripts. E.g., an environment variable is set up with a ton of flags outside your control, and you want to remove exactly one of them. This can require either some tricky string manipulation or just appending "--obj= ", depending on the semantics of '--obj'.

That being said, I did a quick random sampling of flags in LLVM, and it's wildly variable whether or not a given flag offers this property. At least some of that is probably due to GCC backcompat concerns, but it's definitely not a property a user can rely on across the board, so doesn't seem to be much point in making more flags behave that way.

This revision was landed with ongoing or failed builds.Feb 8 2022, 3:08 PM
This revision was automatically updated to reflect the committed changes.
MaskRay added inline comments.Feb 9 2022, 2:09 PM
llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
532

There was a -Wunused-variable warning in -DLLVM_ENABLE_ASSERTIONS=off builds. Fixed by f8701a30f648dd07d472c108f90fa675c1198c46