This is an archive of the discontinued LLVM Phabricator instance.

DebugInfo/Symbolize: Allow STT_NOTYPE/STT_GNU_IFUNC symbols for .symtab symbolization
ClosedPublic

Authored by MaskRay on Feb 2 2021, 7:45 PM.

Details

Summary

In assembly files, omitting .type foo,@function is common. Such functions have
type STT_NOTYPE and llvm-symbolizer reports ?? for them.

An ifunc symbol usually has an associated resolver symbol which is defined at
the same address. Returning either one is fine for symbolization. The resolver
symbol may not end up in the symbol table if (object file) .L is used (linked
image) .symtab is stripped while .dynsym is retained.

This patch allows non SymbolRef::ST_Data non SymbolRef::ST_Function symbols,
thus allowing ELF STT_NOTYPE/STT_GNU_IFUNC symbols for .symtab symbolization.

I have left TODO in the test files for an unimplemented STT_FILE heuristic.

Diff Detail

Event Timeline

MaskRay created this revision.Feb 2 2021, 7:45 PM
MaskRay requested review of this revision.Feb 2 2021, 7:45 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 2 2021, 7:45 PM

Note: There are some interesting tests which can be made easy by allowing ld.lld, e.g. multiple STT_FILE. It is possible to construct such extra tests with yaml2obj but they can be really verbose.
https://lists.llvm.org/pipermail/llvm-dev/2021-January/148048.html the debuginfo-test idea

Note: There are some interesting tests which can be made easy by allowing ld.lld, e.g. multiple STT_FILE.

Couldn't that be done with an assembly file with multiple .file directives? (as much as possible, it'd be good to keep test coverage in the project where the code is - with the cross-project lit testing area more for end-to-end/extra integration (though targeted/narrowly scoped) coverage)

llvm/test/DebugInfo/Symbolize/ELF/symtab-notype.s
14–15

what are these two lines of output from? (if the gap is covered by the checks on 18 and 19?

27–35

Is this consistent with gnu addr2line behavior? Seems slightly weird to symbolize addresses outside the size of the section, I'd have thought?

45–46

any reason for the variety of instructions? I'd have thought nops might be the clearest/most uninteresting instructions to use here (& in the other test)

MaskRay updated this revision to Diff 320982.Feb 2 2021, 8:41 PM
MaskRay marked 3 inline comments as done.

comment

Couldn't that be done with an assembly file with multiple .file directives? (as much as possible, it'd be good to keep test coverage in the project where the code is - with the cross-project lit testing area more for end-to-end/extra integration (though targeted/narrowly scoped) coverage)

In GNU as, the placement of .file appears to affect its symbol table position. However, MC ELFObjectWriter places STT_FILE before all other symbols.

llvm/test/DebugInfo/Symbolize/ELF/symtab-notype.s
14–15

(I forgot to write | FileCheck...)

27–35

Added a TODO.

dblaikie accepted this revision.Feb 2 2021, 9:24 PM

Couldn't that be done with an assembly file with multiple .file directives? (as much as possible, it'd be good to keep test coverage in the project where the code is - with the cross-project lit testing area more for end-to-end/extra integration (though targeted/narrowly scoped) coverage)

In GNU as, the placement of .file appears to affect its symbol table position. However, MC ELFObjectWriter places STT_FILE before all other symbols.

Sounds like that might be worth fixing for correctness and testability.

llvm/test/DebugInfo/Symbolize/ELF/symtab-notype.s
50

retq that could be a nop so as not to draw undue attention/questions (could cause a reader to wonder if there's some importance to this difference - if there is no important difference, it'd be more readable if it weren't different)

This revision is now accepted and ready to land.Feb 2 2021, 9:24 PM
MaskRay updated this revision to Diff 321008.Feb 2 2021, 10:36 PM
MaskRay edited the summary of this revision. (Show Details)

Improve test for the next patch

Some time ago, I filed https://bugs.llvm.org/show_bug.cgi?id=45729, which I think this will address? Is there a particular reason to pay any attention to the symbol type at all, other than to pick the "best" one if there are multiple candidates? Even if someone is using DATA or CODE, I'm not sure we should pay any attention to the symbol type - just pick the symbol that matches the requested address, possibly restricting it to one in the matching symbol. (I haven't checked GNU behaviour to make sure what I'm saying makes sense compared to addr2line, mind you).

llvm/lib/DebugInfo/Symbolize/SymbolizableObjectFile.cpp
156
185

The last line of this comment seems messed up? I've suggested what I think is intended.

llvm/test/DebugInfo/Symbolize/ELF/symtab-file.yaml
2

Top-level test comment saying what this test is trying to show? Same in the other tests.

MaskRay updated this revision to Diff 321497.Feb 4 2021, 9:45 AM
MaskRay edited the summary of this revision. (Show Details)

Simplify
Allow all symbol tables

MaskRay marked an inline comment as done.Feb 4 2021, 9:52 AM

Some time ago, I filed https://bugs.llvm.org/show_bug.cgi?id=45729, which I think this will address?

Seems so.

Is there a particular reason to pay any attention to the symbol type at all, other than to pick the "best" one if there are multiple candidates? Even if someone is using DATA or CODE, I'm not sure we should pay any attention to the symbol type - just pick the symbol that matches the requested address, possibly restricting it to one in the matching symbol. (I haven't checked GNU behaviour to make sure what I'm saying makes sense compared to addr2line, mind you).

I think ST_Function/ST_Data were designed for BFD types. They are not good use outside of tools such as llvm-objdump/llvm-nm. I only kept ST_Data just to not duplicate Objects entries in Functions.
I am mildly confident that being more permissive won't regress other binary formats' behavior. For ELF there is some interesting case to consider, the relevant test coverage can be added separately.

jhenderson added inline comments.Feb 5 2021, 1:16 AM
llvm/lib/DebugInfo/Symbolize/SymbolizableObjectFile.cpp
192

Looks like some of my previous comment wasn't addressed.

llvm/test/DebugInfo/Symbolize/ELF/symtab-file.yaml
2

Ping this in this particular file?

MaskRay updated this revision to Diff 321807.Feb 5 2021, 9:41 AM
MaskRay marked 5 inline comments as done.

comments

So I just thought about this - in the event we have two symbols at the same location with different types, e.g. STT_NOTYPE and STT_FUNC, I think it would make more sense to use the STT_FUNC symbols. I haven't checked GNU behaviour for this though. What does GNU do in that situation?

MaskRay added a comment.EditedFeb 8 2021, 12:31 AM

So I just thought about this - in the event we have two symbols at the same location with different types, e.g. STT_NOTYPE and STT_FUNC, I think it would make more sense to use the STT_FUNC symbols. I haven't checked GNU behaviour for this though. What does GNU do in that situation?

addrline just picks the first symbol. I don't think this heuristic is useful. In the case where several aliases are defined at the same address, it is rare that they have different symbol types. (Even for symbol assignments, see lld D86263)

An arguably slightly more useful heuristic is to prefer a non-local symbol to a local symbol, but personally I don't think it is worthy to implement, either. addrline does not implement this heuristic.

jhenderson accepted this revision.Feb 8 2021, 3:10 AM

So I just thought about this - in the event we have two symbols at the same location with different types, e.g. STT_NOTYPE and STT_FUNC, I think it would make more sense to use the STT_FUNC symbols. I haven't checked GNU behaviour for this though. What does GNU do in that situation?

addrline just picks the first symbol. I don't think this heuristic is useful. In the case where several aliases are defined at the same address, it is rare that they have different symbol types. (Even for symbol assignments, see lld D86263)

An arguably slightly more useful heuristic is to prefer a non-local symbol to a local symbol, but personally I don't think it is worthy to implement, either. addrline does not implement this heuristic.

Having given it more thought, I agree that it probably isn't a useful heuristic - if two symbols are at the same location, it doesn't really matter which is picked, as the information provided should be plenty for the situation.

There is one other situation related to this, however: symbols with address 0. There could be both defined and undefined symbols at address 0. We probably want to restrict it to defined symbols only (I also wondered about absolute symbols, as they could be confused with the real function, but in some limited cases, absolute symbols could actually be the real function). Probably this should be a separate change though?

So I just thought about this - in the event we have two symbols at the same location with different types, e.g. STT_NOTYPE and STT_FUNC, I think it would make more sense to use the STT_FUNC symbols. I haven't checked GNU behaviour for this though. What does GNU do in that situation?

addrline just picks the first symbol. I don't think this heuristic is useful. In the case where several aliases are defined at the same address, it is rare that they have different symbol types. (Even for symbol assignments, see lld D86263)

An arguably slightly more useful heuristic is to prefer a non-local symbol to a local symbol, but personally I don't think it is worthy to implement, either. addrline does not implement this heuristic.

Having given it more thought, I agree that it probably isn't a useful heuristic - if two symbols are at the same location, it doesn't really matter which is picked, as the information provided should be plenty for the situation.

There is one other situation related to this, however: symbols with address 0. There could be both defined and undefined symbols at address 0. We probably want to restrict it to defined symbols only (I also wondered about absolute symbols, as they could be confused with the real function, but in some limited cases, absolute symbols could actually be the real function). Probably this should be a separate change though?

Yes. STT_SECTION can have 0 address. STT_TLS can have a small address overlapping other addresses. But I think these cases which could regress things are rare enough to be worried in this patch for now.

I also started a topic with internal sanitizer folks why the "DATA" command was needed and whether it is still useful today. It'd be best if we can make data & functions behave the same way and drop "DATA" all together.

MaskRay updated this revision to Diff 322189.Feb 8 2021, 12:28 PM

Add ignored symbol type test
Hard code ELF only. Allowing ST_Debug/ST_Other can break Windows tests.

This revision was landed with ongoing or failed builds.Feb 8 2021, 12:29 PM
This revision was automatically updated to reflect the committed changes.

I also started a topic with internal sanitizer folks why the "DATA" command was needed and whether it is still useful today. It'd be best if we can make data & functions behave the same way and drop "DATA" all together.

Please don't remove publicly facing features like this. We document this to our downstream users, and whilst I cannot say for certain whether we have any who use the DATA command, removing an option like this could potentially break people's workflows in non-trivial ways.

This change broke arm32 llvm-symbolizer which triggered some sanitizer regression. Previously, ARM mapping symbols [1] were ignored but after this change they are shown as the function name.

For instance:

$ ./bin/llvm-symbolizer --obj=projects/compiler-rt/test/asan/ARMHFLinuxConfig/TestCases/Output/invalid-pointer-pairs.cpp.tmp --functions 0xcbf48
f(char, char*, char*)
[...]/llvm/llvm-project/compiler-rt/test/asan/TestCases/invalid-pointer-pairs.cpp:17:14

After this:

./bin/llvm-symbolizer --obj=projects/compiler-rt/test/asan/ARMHFLinuxConfig/TestCases/Output/invalid-pointer-pairs.cpp.tmp --functions 0xcbf48
$a.2
/home/adhemerval.zanella/llvm/llvm-project/compiler-rt/test/asan/TestCases/invalid-pointer-pairs.cpp:17:14

This change broke arm32 llvm-symbolizer which triggered some sanitizer regression. Previously, ARM mapping symbols [1] were ignored but after this change they are shown as the function name.

For instance:

$ ./bin/llvm-symbolizer --obj=projects/compiler-rt/test/asan/ARMHFLinuxConfig/TestCases/Output/invalid-pointer-pairs.cpp.tmp --functions 0xcbf48
f(char, char*, char*)
[...]/llvm/llvm-project/compiler-rt/test/asan/TestCases/invalid-pointer-pairs.cpp:17:14

After this:

./bin/llvm-symbolizer --obj=projects/compiler-rt/test/asan/ARMHFLinuxConfig/TestCases/Output/invalid-pointer-pairs.cpp.tmp --functions 0xcbf48
$a.2
/home/adhemerval.zanella/llvm/llvm-project/compiler-rt/test/asan/TestCases/invalid-pointer-pairs.cpp:17:14

Fix D96617

This broke debug info for RISC-V. Test case with broken symbolization:

$ cat test.c
void _start() {}
$ clang --target=riscv64 --gcc-toolchain=<path> -nostdlib -g test.c
$ addr=$((0x`readelf -Ws a.out | grep " _start" | awk '{print $2}'` + 4))
$ hexaddr=`printf '%x\n' $addr`
$ r=`llvm-addr2line -f -e a.out $hexaddr | head -n 1`
[ -z "$r" ] && echo "bad" || echo "good"

Apologies for the terrible shell script, it was the most expedient thing I could come up with.
That test case as written needs the GNU toolchain because LLD will complain about the R_RISCV_ALIGN relaxation relocation, and using -mno-relax breaks the test.

MaskRay added a comment.EditedMar 15 2021, 3:54 PM

This broke debug info for RISC-V. Test case with broken symbolization:

$ cat test.c
void _start() {}
$ clang --target=riscv64 --gcc-toolchain=<path> -nostdlib -g test.c
$ addr=$((0x`readelf -Ws a.out | grep " _start" | awk '{print $2}'` + 4))
$ hexaddr=`printf '%x\n' $addr`

(llvm-symbolizer works with a decimal address. So $hexaddr is not needed. However, addr2line parses the value in hexadecimal even if there is no 0x prefix.)

$ r=llvm-addr2line -f -e a.out $hexaddr | head -n 1
[ -z "$r" ] && echo "bad" || echo "good"

Apologies for the terrible shell script, it was the most expedient thing I could come up with.
That test case as written needs the GNU toolchain because LLD will complain about the `R_RISCV_ALIGN` relaxation relocation, and using `-mno-relax` breaks the test.

It can be argued that addr2line has the same problem.

With -g or -fasynchronous-unwind-tables, .debug_frame/.eh_frame has a label below _start. Its address is _start + 2 and its name is empty.
Neither addr2line nor llvm-symbolizer skips such a symbol with empty name.

% ~/Dev/binutils-gdb/out/riscv64/binutils/addr2line -fe a.out 0x100e8        
_start
??:?
% ~/Dev/binutils-gdb/out/riscv64/binutils/addr2line -fe a.out 0x100ea
??
test.c:?

In ARM, such symbols have names ($a $t $d and they are modeled as SF_FormatSpecific.
In binutils, such symbols are not printed by nm.
I will file a feature request about empty symbols on binutils and probably make such symbols SF_FormatSpecific.

With -g or -fasynchronous-unwind-tables, .debug_frame/.eh_frame has a label below _start. Its address is _start + 2 and its name is empty.
Neither addr2line nor llvm-symbolizer skips such a symbol with empty name.

But D98669 doesn't fix my original test case...?

With -g or -fasynchronous-unwind-tables, .debug_frame/.eh_frame has a label below _start. Its address is _start + 2 and its name is empty.
Neither addr2line nor llvm-symbolizer skips such a symbol with empty name.

But D98669 doesn't fix my original test case...?

Could you recheck? (I am quite certain it fixes your case)

addr2line PR: https://sourceware.org/bugzilla/show_bug.cgi?id=27585

Without that patch (my fllvm-symbolizer is an alias)

% fllvm-symbolizer -fe a.out $(nm a.out | awk '/ _start/{printf "%#x",strtonum("0x"$1)+4}')                                 

a.c:0:0

With that patch

% fllvm-symbolizer -fe a.out $(nm a.out | awk '/ _start/{printf "%#x",strtonum("0x"$1)+4}')
_start
??:0:0

Could you recheck? (I am quite certain it fixes your case)

My apologies, I must have messed up something. It does fix it. Thanks for the quick turn-around!