This is an archive of the discontinued LLVM Phabricator instance.

[symbolizer] Support symbol+offset lookup
AbandonedPublic

Authored by sepavloff on Dec 12 2022, 10:17 AM.

Details

Summary

[symbolizer] Support symbol+offset lookup

GNU addr2line supports lookup by symbol name in addition to the existing
address lookup. llvm-symbolizer starting from e144ae54dcb96838a6176fd9eef21028935ccd4f
supports lookup by symbol name. This change extends this lookup with
possibility to specify optional offset.

Now the address for which source information is searched for can be
specified with offset:

llvm-symbolize --obj=abc.so "SYMBOL func_22+0x12"

It decreases the gap in features of llvm-symbolizer and GNU addr2line.

Diff Detail

Event Timeline

sepavloff created this revision.Dec 12 2022, 10:17 AM
Herald added a project: Restricted Project. · View Herald Transcript
sepavloff requested review of this revision.Dec 12 2022, 10:17 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 12 2022, 10:17 AM
tschuett added inline comments.
llvm/include/llvm/DebugInfo/DWARF/DWARFContext.h
91 ↗(On Diff #482181)

Please use std::optional. The LLVM variant is going away.

Does unix addr2line support this, or is there other precedent we should consider when evaluating this new feature?

sepavloff updated this revision to Diff 482482.Dec 13 2022, 7:43 AM

Replace llvm::Optional with std::optional

sepavloff marked an inline comment as done.Dec 13 2022, 7:51 AM

Does unix addr2line support this, or is there other precedent we should consider when evaluating this new feature?

Documentation on addr2line (https://sourceware.org/binutils/docs/binutils/addr2line.html) says:

addr2line translates addresses or `symbol+offset` into file names and line numbers. Given an address or symbol+offset in an executable or an offset in a section of a relocatable object, it uses the debugging information to figure out which file name and line number are associated with it.

Resolving symbol+offset is available in binutils 2.39. Resolving symbols only also work, at least in some cases.

Does unix addr2line support this, or is there other precedent we should consider when evaluating this new feature?

Documentation on addr2line (https://sourceware.org/binutils/docs/binutils/addr2line.html) says:

addr2line translates addresses or `symbol+offset` into file names and line numbers. Given an address or symbol+offset in an executable or an offset in a section of a relocatable object, it uses the debugging information to figure out which file name and line number are associated with it.

Resolving symbol+offset is available in binutils 2.39. Resolving symbols only also work, at least in some cases.

Could you give an example usage of how this feature is supported, i.e. what the invocation and output look like when using GNU addr2line, please. We should aim to be compatible with them, where possible.

llvm/unittests/ProfileData/MemProfTest.cpp
48–49

These tidy ups should be in a separate NFC commit, rather than adding noise to this patch.

Make support of symbol+offset lookup closer to GNU addr2line

Hi @sepavloff,

It looks like you've ignored both my requests from my previous comment (the need to show how GNU addr2line works, and to make cleanups in separate patches). Please address these points, and then I'll potentially be able to review this code.

sepavloff planned changes to this revision.Mar 28 2023, 9:57 PM

Hi @sepavloff,

It looks like you've ignored both my requests from my previous comment (the need to show how GNU addr2line works, and to make cleanups in separate patches). Please address these points, and then I'll potentially be able to review this code.

Sorry for misunderstanding. I am preparing a set of prerequisite patches.

sepavloff updated this revision to Diff 516614.Apr 24 2023, 8:55 PM

Updated patch

sepavloff retitled this revision from [llvm-symbolize] Allow finding location of a symbol to [symbolizer] Support symbol+offset lookup.Apr 24 2023, 9:00 PM
sepavloff edited the summary of this revision. (Show Details)

In your patch description, please note the tool name is llvm-symbolizer, not llvm-symbolize. Please update accordingly.

Does GNU addr2line support just using a symbol without the offset? If so, I think it would be good to split the offset lookup functionality into a follow-on patch. It helps reduce the surface area of this patch into more digestible pieces for review and testing.

llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
222

Why has this changed?

sepavloff planned changes to this revision.May 3 2023, 9:20 AM

Does GNU addr2line support just using a symbol without the offset? If so, I think it would be good to split the offset lookup functionality into a follow-on patch. It helps reduce the surface area of this patch into more digestible pieces for review and testing.

D149759 implements simplified version of the lookup support, restricted to symbol names only. Files related to building test relocatable file are moved into separated revision D149757. Hopefully it can simplify review.

Update implementation after symbol lookup was committed

sepavloff edited the summary of this revision. (Show Details)Mon, Nov 20, 10:34 AM
jhenderson added inline comments.Tue, Nov 21, 12:29 AM
llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h
107–108

Please don't abbreviate Offset to Ofs. It makes it harder to understand what the variable actually is for no real benefit. Applies throughout.

llvm/test/tools/llvm-symbolizer/symbol-search.test
36

Please make sure there is testing for hex as well as decimal numbers.

37

Same throughout, for consitency with the existing tests.

44

It's slightly offputting that the same set of arguments don't exist for both the llvm-symbolizer and llvm-addr2line test cases. Could you rationalise them, please, so that they both test the same set?

llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
162

Why has this name changed?

256

"probably" implies most of the time, but there's no guarantee that's how the option is used.

271

A trailing + needs testing. Also something of the form notanumber+alsonotanumber and notanumber+12notanumber.

I forgot to ask, does the GNU version allow negative offsets?

sepavloff updated this revision to Diff 558145.Wed, Nov 22, 5:39 AM

Address review notes

  • add tests,
  • change some names.
sepavloff marked 4 inline comments as done.Wed, Nov 22, 7:51 AM
sepavloff added inline comments.
llvm/test/tools/llvm-symbolizer/symbol-search.test
44

Actually they are the same, the test is fixed accordingly.

llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
162

Now this parameter may be assigned either a module offset, or a displacement relative to a symbol, so the old name is not precise. More suitable name is Offset but it requires to rename the variable Offset used in this function body. Now the names and comments must be more consistent.

jhenderson added inline comments.Thu, Nov 23, 12:19 AM
llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h
111

Why has this changed from StringRef?

llvm/test/tools/llvm-symbolizer/symbol-search.test
44

Thanks, this is much easier to read now.

Sorry for late response. Unfortunately Phabricator stopped sending mail notifications and I overlooked your note.

llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h
111

This is a small optimization.

Template function LLVMSymbolizer::findSymbolCommon accepts ModuleSpecifier as const & and passes it to getOrCreateModuleInfo, which expects the module name as const std::string &. Before this change LLVMSymbolizer::findSymbol called str() to convert the name passed as StringRef. Making the argument const std::string & avoids creation of the extra string instance.