This is an archive of the discontinued LLVM Phabricator instance.

DebugInfo/Symbolize: Retrieve filename from the preceding STT_FILE for .symtab symbolization
ClosedPublic

Authored by MaskRay on Feb 2 2021, 10:40 PM.

Details

Summary

The ELF spec says:

STT_FILE: Conventionally, the symbol's name gives the name of the source file associated with the object file. A file symbol has STB_LOCAL binding, its section index is SHN_ABS, and it precedes the other STB_LOCAL symbols for the file, if it is present.

For a local symbol, the preceding STT_FILE symbol is almost always in the same
file[1]. GNU addr2line uses this heuristic to retrieve the filename associated
with a local symbol (e.g. internal linkage functions in C/C++).

GNU addr2line can assign STT_FILE filename to a non-local symbol, too, but the trick
only works if no regular symbol precede STT_FILE. This patch does not implement this corner case
(not useful for most executables which have more than one files).

In case of filename mismatch between .debug_line & .symtab, arbitrarily make .debug_line win.

[1]: LLD does not synthesize STT_FILE symbols
(https://bugs.llvm.org/show_bug.cgi?id=48023 see also
https://sourceware.org/bugzilla/show_bug.cgi?id=26822). An assembly file
without .file directives can cause mis-attribution. This is an edge case.

Diff Detail

Event Timeline

MaskRay created this revision.Feb 2 2021, 10:40 PM
MaskRay requested review of this revision.Feb 2 2021, 10:40 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 2 2021, 10:40 PM
MaskRay edited the summary of this revision. (Show Details)Feb 2 2021, 10:59 PM
dblaikie accepted this revision.Feb 6 2021, 2:25 PM
dblaikie added inline comments.
llvm/lib/DebugInfo/Symbolize/SymbolizableObjectFile.cpp
240

Could do the pair removal/rolling the StringRef into the struct as a preliminary patch to focus this patch a bit more. Not absolutely necessary by any means, though.

llvm/lib/DebugInfo/Symbolize/SymbolizableObjectFile.h
87

Separate patch, but this might be more tidily written as return std::tie(Addr, Size) < std::tie(RHS.Addr, RHS.Size);

This revision is now accepted and ready to land.Feb 6 2021, 2:25 PM

There are some pre-merge bot failures looking at this patch, although I guess some of them might be a patch ordering issue.

Do we have test cases for the following negative aspects where the symbol isn't used?

  1. The symbol name can't be looked up due to a corrupt st_name value.
  2. There are no STT_FILE symbols before the local symbol in question (possibly there may be some after).
  3. The symbol is a global symbol, with an STT_FILE symbol before it.
  4. The symbol has a corresponding STT_FILE symbol, but the name from .debug_line can be found.

There are some pre-merge bot failures looking at this patch, although I guess some of them might be a patch ordering issue.

Do we have test cases for the following negative aspects where the symbol isn't used?

  1. The symbol name can't be looked up due to a corrupt st_name value.
  2. There are no STT_FILE symbols before the local symbol in question (possibly there may be some after).
  3. The symbol is a global symbol, with an STT_FILE symbol before it.
  4. The symbol has a corresponding STT_FILE symbol, but the name from .debug_line can be found.

The current --use-symbol-table=true (which allows .symtab information to override debug info) has no test coverage. The previous patch and this patch added more coverage in this area.

There are some interesting cases but I think the coverage can be added separately.

Making sure .file & debug info have consistent information is the responsibility of other components (DebugInfo/AsmPrinter/ELFObjectWriter/etc). The additional tests can check what we behave when inconsistent information is present.

There are some pre-merge bot failures looking at this patch, although I guess some of them might be a patch ordering issue.

Do we have test cases for the following negative aspects where the symbol isn't used?

  1. The symbol name can't be looked up due to a corrupt st_name value.
  2. There are no STT_FILE symbols before the local symbol in question (possibly there may be some after).
  3. The symbol is a global symbol, with an STT_FILE symbol before it.
  4. The symbol has a corresponding STT_FILE symbol, but the name from .debug_line can be found.

The current --use-symbol-table=true (which allows .symtab information to override debug info) has no test coverage. The previous patch and this patch added more coverage in this area.

There are some interesting cases but I think the coverage can be added separately.

All of the cases I highlighted are new cases introduced by this patch, since the STT_FILE symbols were not used to look up file names previously. Whether or not the rest of --use-symbol-table=true functionality is tested is rather irrelevant to this, I think. We shouldn't be creating additional technical debt by adding more areas that aren't tested properly.

Making sure .file & debug info have consistent information is the responsibility of other components (DebugInfo/AsmPrinter/ELFObjectWriter/etc). The additional tests can check what we behave when inconsistent information is present.

The Symbolizer library needs testing for consumption of both consistent and inconsistent information. I don't think the responsibility of ensuring this information is consistent is relevant to that.

MaskRay updated this revision to Diff 322571.Feb 9 2021, 7:37 PM

more tests

jhenderson accepted this revision.Feb 10 2021, 1:18 AM

LGTM, with a few nits.

llvm/test/DebugInfo/Symbolize/ELF/symtab-file-conflict.s
10

Nit: Do we actually need the CHECK-EMPTY line?

llvm/test/DebugInfo/Symbolize/ELF/symtab-file2.yaml
19

Same comment as above - do we need the trailing CHECK-EMPTY?

48

This is a bit unfortunate - possibly worth adding a TODO saying that we shouldn't throw away the symbol's name?

56

Should this be CHECK2-NEXT?

58

Unnecessary CHECK-EMPTY?

MaskRay updated this revision to Diff 322712.Feb 10 2021, 9:44 AM
MaskRay marked 5 inline comments as done.

comments

This revision was landed with ongoing or failed builds.Feb 10 2021, 9:47 AM
This revision was automatically updated to reflect the committed changes.

Hi @MaskRay,

Looks like this commit caused some test failures on linux build machines. Buildbots for reference:
http://lab.llvm.org:8011/#/builders/94/builds/2238
http://lab.llvm.org:8011/#/builders/105/builds/5805

Could I have you take a look?
-Justice

MaskRay added a comment.EditedFeb 10 2021, 2:21 PM

Hi @MaskRay,

Looks like this commit caused some test failures on linux build machines. Buildbots for reference:
http://lab.llvm.org:8011/#/builders/94/builds/2238
http://lab.llvm.org:8011/#/builders/105/builds/5805

Could I have you take a look?
-Justice

Thanks. The filename in the llvm-symbolizer output changed. I temporarily changed the test's CHECK line and will check with sanitizer maintainers.

edit: vitalybuka said the filename change is fine.

@MaskRay the sanitizer buildbots are still broken by this: http://lab.llvm.org:8011/#/builders/121