This adds a --build-id=<hex build ID> flag to llvm-symbolizer. If --obj
is unspecified, this will attempt to look up the provided build ID using
whatever mechanisms are available to the Symbolizer (typically,
debuginfod). The semantics are then as if the found binary were given
using the --obj flag.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/docs/CommandGuide/llvm-symbolizer.rst | ||
---|---|---|
189 | +1: I think it should be an error if a user specifies both, unless you have a good use-case for it. | |
llvm/lib/DebugInfo/Symbolize/Symbolize.cpp | ||
395–396 | Don't think you want this blank line. | |
llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp | ||
385–388 | This is untested. Also, the error format is not consistent with the LLVM coding standards or what llvm-symbolizer prints when there's a missing file, as far as I can see. |
- Refactored build ID lookup functionality into LLVMSymbolizer. This makes build IDs a first class means to specify modules. In particular, this produces consistent error behavior between paths and build IDs. This will also make it straightforward to extend the stdin syntax to support alternating build IDs and paths.
- Made --build-id and --obj mutually exclusive.
- Added tests.
llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp | ||
---|---|---|
385–388 | There were some considerable semantic divergences between --obj and --build-id; the former would return EXIT_SUCCESS and keep going, producing only one error, while the latter would immediately exit with a nonzero status. The current approach also doesn't extend well to handling multiple build IDs specified on a line-by-line basis, which is a near-term TODO. Accordingly, I've reworked this functionality to take place inside the Symbolizer, rather than in the wrapping tool. This makes BuildIDs a first-class mechanism that can be used to specify modules; it thus intrinsically has similar error behavior to the other ways of specifying modules. This will also make it easy to build syntax into the stdin mechanism to switch out the module specifier type. |
Thanks for the rework. I've added some basic comments, but I think @phosek (or someone else) would be better to continue this review beyond these points.
llvm/docs/CommandGuide/llvm-symbolizer.rst | ||
---|---|---|
189 | I'd delete "family" since "-e" etc are just aliases, and therefore it's implied. | |
240 | ||
llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h | ||
61–66 | Not necessarily something that you need to do right now, but nonetheless worth thinking about: it's clear by adding a third variation of each of these methods that some sort of refactoring would be appropriate to simplify extensibility of this code. In this case, I'd probably make each of the different input types (object, module name, build ID) a separate class, with the relevant methods attached. symbolizeCode would then just become something like: template <typename T> Expected<DILineInfo> symbolizeCode(const T &Input, object::SectionedAddress ModuleOffset) { return Input.symbolizeCode(ModuleOffset, ...); } and so on. Adding a new input kind wouldn't then require modifying this class again, keeping this class stable, and just adding new classes for each input kind. | |
llvm/lib/DebugInfo/Symbolize/Symbolize.cpp | ||
389 |
Address nits.
llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h | ||
---|---|---|
61–66 | Yeah, this occurred to me too. I'll take a stab at this afterwards and see if I can design a clean abstraction. |
llvm/lib/DebugInfo/Symbolize/Symbolize.cpp | ||
---|---|---|
439 | IIUC this slightly changes the semantics. Currently, even if we failed to download a file (for example because of the intermittent network error), we would try again on then next symbolization request. With this change, we would cache the error result (that is, an empty string) after the first attempt and never retry. Maybe it would be better to drop this line to preserve the current semantics? | |
llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp | ||
368 | It might be more efficient to check if both --obj and --build-id was specified before parsing --build-id and allocating memory for build ID, but it only makes a difference for the error path so it's not particularly important. |
Retry build ID lookups each time, but only report failures once.
llvm/lib/DebugInfo/Symbolize/Symbolize.cpp | ||
---|---|---|
439 | This is slightly tricky: we should retry the network request if it failed previously, but we probably shouldn't spam the console with repeated error messages for the build ID, since that differs from the way errors work for paths (one per path). I've tried to keep the error behavior while still retrying the lookup under the hood. If a retry succeeds, the data becomes available for all future symbolization request; otherwise, the error is silenced. |
Make the missing build ID test treat stdout and stderr separately.
Fix the bugs this uncovered.
LGTM
llvm/lib/DebugInfo/Symbolize/Symbolize.cpp | ||
---|---|---|
439 | Personally I'd prefer getting an error on every unsuccessful attempt; that may be more spammy but avoids hiding potentially useful information from the user. If the errors go to stderr, user can always redirect it to /dev/null if needed. Either solution is fine with me though. | |
llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp | ||
367–368 | I'd use hasArg. |
llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp | ||
---|---|---|
367–368 | There's a use to having a value that means "flag unset": you can unset flags via concatenation, which can help when using utilities in scripts. E.g., an environment variable is set up with a ton of flags outside your control, and you want to remove exactly one of them. This can require either some tricky string manipulation or just appending "--obj= ", depending on the semantics of '--obj'. That being said, I did a quick random sampling of flags in LLVM, and it's wildly variable whether or not a given flag offers this property. At least some of that is probably due to GCC backcompat concerns, but it's definitely not a property a user can rely on across the board, so doesn't seem to be much point in making more flags behave that way. |
llvm/lib/DebugInfo/Symbolize/Symbolize.cpp | ||
---|---|---|
532 | There was a -Wunused-variable warning in -DLLVM_ENABLE_ASSERTIONS=off builds. Fixed by f8701a30f648dd07d472c108f90fa675c1198c46 |
I wonder whether it wouldn't be less error prone to diagnose the case when user uses both --obj and --build-id as an error?