This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objdump] Fix reliability of call target disassembling
AbandonedPublic

Authored by thopre on Mar 4 2020, 11:27 AM.

Details

Summary

When disassembling a call, llvm-objdump searches for the symbol for the
target of the call among the symbols of the last section whose start
address is smaller or equal to the target address of the call,
irregardless of the type of sections. Since this considers non
allocatable sections, there's possibility of the wrong section being
selected, i.e. if the target address is 0 where non allocatable section
are likely to have 0 for value. This commit restrict the search to
section that contain text.

With this patch test tools/llvm-objdump/X86/disassemble-functions.test
starts to fail because of the call to foo in main that gets disassembled
as "callq <foo>" while there's an implicit-check-not=foo. The test is
adapted to use "<foo>:" and similarly "<somedata>:" in the
implicit-check-not.

With this patch, test ELF/pre_init_fini_array.s starts to fail because
it calls the initializer and finilizer arrays as if it were functions
while it's only arrays of function pointers. Test is adapted to use
indirect call instead and use the --syms option to cross-check the
address of the call targets.

Diff Detail

Event Timeline

thopre created this revision.Mar 4 2020, 11:27 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 4 2020, 11:27 AM
Herald added a subscriber: rupprecht. · View Herald Transcript

@MaskRay just made a change to some disassembly output (see D75713). I'm not sure if it's going to impact this change or not, but you should check. Also, could you clarify what you mean by "sometimes disassembled" please? Why does it change?

thopre updated this revision to Diff 248736.Mar 6 2020, 8:03 AM

Rebase and adapt after D75713

thopre added a comment.Mar 6 2020, 8:08 AM

@MaskRay just made a change to some disassembly output (see D75713). I'm not sure if it's going to impact this change or not, but you should check. Also, could you clarify what you mean by "sometimes disassembled" please? Why does it change?

I'm not sure why does it change since I've only seen dump from the CI and haven't been able to reproduce the failure locally. However the dump I get shows "callq </tmp/a.c>" which seems like an arbitrary choice (it's not the first symbol for that address, it's a STT_FILE instead of STT_FUNC). Also if anything the right output should be callq <foo> which would fail with the current --implicit-check-not. I've updated it to use <foo>: instead of "foo:" given the D75713 change you pointed at. By the way it makes me wonder if there isn't some CHECK-NOT or implicit check not relying on the old output which are never going to be triggered even if the output goes wrong.

By the way it makes me wonder if there isn't some CHECK-NOT or implicit check not relying on the old output which are never going to be triggered even if the output goes wrong

That thought had occurred to me too. @MaskRay, did you do any check to make sure you haven't invalidated any CHECK-NOT/--implicit-check-not patterns?

I'm not sure why does it change since I've only seen dump from the CI and haven't been able to reproduce the failure locally. However the dump I get shows "callq </tmp/a.c>" which seems like an arbitrary choice (it's not the first symbol for that address, it's a STT_FILE instead of STT_FUNC)

I don't know the ins and outs of the LLVM disassembly logic, but it might be that disassembling references just pick an arbitrary symbol with a given address and doesn't pay any attention to the symbol type (in other words, an STT_FILE symbol with addess 0 is just as valid as a function symbol with that address).

To help investigate the CI issue, is it always the same build configuration, or are there different configurations being used that could have an impact? Did it start changing at some point, or has it always been an issue as far as you know?

The change looks good to me regardless, but i'd like to understand the issue causing the variability in behaviour before giving formal approval, as non-determinism like this is bad, and might indicate something like an unitialised variable somewhere or some other bug.

llvm/test/tools/llvm-objdump/X86/disassemble-functions.test
6 ↗(On Diff #248736)

"somedata" might want to become "<somedata>:" here and below?

By the way it makes me wonder if there isn't some CHECK-NOT or implicit check not relying on the old output which are never going to be triggered even if the output goes wrong

That thought had occurred to me too. @MaskRay, did you do any check to make sure you haven't invalidated any CHECK-NOT/--implicit-check-not patterns?

I wrote a dumb script to update CHECK lines in ~350 tests changed by D75713. I manually fixed ~50 tests. I did not make extra efforts to verify any implicit-check-not= is still kept for the tests updated by my script.

implicit-check-not should be used carefully. Usually, only when a positive pattern also occurs, a negative pattern can be used. Otherwise a negative pattern can easily become stale after the output format changes.

As an example, --implicit-check-not=somedata: is a bad pattern.

I'm not sure why does it change since I've only seen dump from the CI and haven't been able to reproduce the failure locally. However the dump I get shows "callq </tmp/a.c>" which seems like an arbitrary choice (it's not the first symbol for that address, it's a STT_FILE instead of STT_FUNC)

I don't know the ins and outs of the LLVM disassembly logic, but it might be that disassembling references just pick an arbitrary symbol with a given address and doesn't pay any attention to the symbol type (in other words, an STT_FILE symbol with addess 0 is just as valid as a function symbol with that address).

To help investigate the CI issue, is it always the same build configuration, or are there different configurations being used that could have an impact? Did it start changing at some point, or has it always been an issue as far as you know?

The change looks good to me regardless, but i'd like to understand the issue causing the variability in behaviour before giving formal approval, as non-determinism like this is bad, and might indicate something like an unitialised variable somewhere or some other bug.

I will also be nervous to approve this change, without understanding how the test fails.

The change looks good to me regardless, but i'd like to understand the issue causing the variability in behaviour before giving formal approval, as non-determinism like this is bad, and might indicate something like an unitialised variable somewhere or some other bug.

I will also be nervous to approve this change, without understanding how the test fails.

Agreed.

-eric

thopre marked an inline comment as done.Mar 24 2020, 3:42 PM

The change looks good to me regardless, but i'd like to understand the issue causing the variability in behaviour before giving formal approval, as non-determinism like this is bad, and might indicate something like an unitialised variable somewhere or some other bug.

I will also be nervous to approve this change, without understanding how the test fails.

Agreed.

-eric

Sorry for taking so long to come back to this. TL;DR: while I cannot reproduce the test failure on my own computer there's inherent behavior variation due to qsort not guaranteeing any specific order for elements that are equivalents [1]. I think algorithm to determine target name could be improved.

[1] https://en.cppreference.com/w/cpp/algorithm/qsort

Long summary:

Test llvm/test/tools/llvm-objdump/X86/Inputs/simple-executable-x86_64.yaml translates to an executable file. For non relocatable files, llvm-objdump finds the last section in the qsort'ed list of sections whose address is below or equal to the target of the call. On my machine that ends up being .strtab. Since there's no associated symbols with that section, it looks among absolute symbols for the last one with an address less or equal to the target of the call. It finds only 1 absolute symbol: the STT_FILE symbol /tmp/a.c. While I cannot reproduce the test failure on my machine I presume on some system qsort will sort the sections with the same target address (0) differently leading to very different code executing.

Given the nature of the symbol to be found (target of a call) I think it would make sense to restrict the section search to SHT_PROGBITS + SHF_ALLOC|SHF_EXECINSTR. Since there's isText() available in SectionRef I've used that instead so it only checks for SHF_EXECINSTR. That makes the test fail as it should since the call's target in main is foo.

thopre updated this revision to Diff 252446.Mar 24 2020, 3:43 PM

Fix source of variability in test

thopre retitled this revision from [test] Fix reliability of disassemble-functions.test to [llvm-objdump] Fix reliability of call target disassembling.Mar 24 2020, 3:43 PM
thopre edited the summary of this revision. (Show Details)
thopre marked an inline comment as done.
thopre added inline comments.
lld/test/ELF/pre_init_fini_array.s
148–153

Suggestion welcome on how to get better output here. I tried replacing the call by leaq with some .align here and there to avoid overlap of start and end symbols but that only works for GNU objdump, llvm-objdump doesn't print the target of lea.

This still needed after D76739?

davidb added inline comments.Mar 25 2020, 8:22 AM
lld/test/ELF/pre_init_fini_array.s
148–153

Should be an lea right? from my host libc. Changing it to lea will make this test unrelated to your fix, as that only does symbol lookups for calls/branches.

   2:	4c 8d 3d 00 00 00 00 	lea    0x0(%rip),%r15        # 9 <__libc_csu_init+0x9>
			5: R_X86_64_PC32	__preinit_array_start-0x4
   9:	41 56                	push   %r14
   b:	4c 8d 35 00 00 00 00 	lea    0x0(%rip),%r14        # 12 <__libc_csu_init+0x12>
			e: R_X86_64_PC32	__preinit_array_end-0x4
davidb added inline comments.Mar 25 2020, 8:29 AM
llvm/tools/llvm-objdump/llvm-objdump.cpp
1503

This will cause an unused variable warning in release? (it is only used in assert).

I'm not sure this assert is really valuable. In. fact, this might trigger for binaries linked with OVERLAY segments (See D44780).

This still needed after D76739?

D76739 will make tools/llvm-objdump/X86/disassemble-functions.test status consistent accross targets but I still think it doesn't make sense to look for the target of a call in anything but code related sections.

thopre updated this revision to Diff 252590.Mar 25 2020, 8:45 AM
thopre marked an inline comment as done.

Remove assert

davidb added inline comments.Mar 25 2020, 8:49 AM
llvm/tools/llvm-objdump/llvm-objdump.cpp
1497

Can we check the section isText() in here rather than having CodeSectionAddresses?

thopre marked 2 inline comments as done.Mar 25 2020, 10:10 AM
thopre added inline comments.
llvm/tools/llvm-objdump/llvm-objdump.cpp
1497

We can't because partition_point expects a range where the predicate is true at the beginning and false later. If we check isText it'll alternate between true and false. Since it's a predicate, there's unfortunately no way to return a "skip" value.

jhenderson added inline comments.Mar 26 2020, 1:54 AM
lld/test/ELF/pre_init_fini_array.s
148–153

I think here the instruction chosen isn't important to the test. What is important is to show that the references to the start/end symbols have been patched up correctly, which can be done by using the disassembly as it used to. In your latest version, this isn't the case any more. Using another instruction would be the way to fix it, assuming that there's another instruction where the disassembler does look ups of symbols.

llvm-objdump doesn't print the target of lea.

This sounds like something that should be fixed in llvm-objdump to me.

llvm/tools/llvm-objdump/llvm-objdump.cpp
1503

FWIW, @MaskRay was saying in another review that llvm builds with the unused variable warning off.

thopre updated this revision to Diff 252834.Mar 26 2020, 7:09 AM
thopre marked 3 inline comments as done.

Cross-check address of call targets in lld/test/ELF/pre_init_fini_array.s against section address with --syms

thopre edited the summary of this revision. (Show Details)Mar 26 2020, 7:09 AM
thopre marked an inline comment as done.Mar 26 2020, 7:16 AM
thopre added inline comments.
lld/test/ELF/pre_init_fini_array.s
73–137

This is somewhat redundant with the DISASM checks below, except for the cross-check of the address with the section that would be lost. Should this be removed noneless? Alternatively I could remove the whole readobj section and translate into llvm-objdump equivalent instead.

davidb added inline comments.Mar 26 2020, 7:23 AM
lld/test/ELF/pre_init_fini_array.s
148–153

The instruction chosen should be important as this change only affects calls/branches and shouldn't affect the symbol resolution of general address loads (lea), as the former checks isText(), and the latter shouldn't....

Though as mentioned above, llvm-objdump doesn't appear to dump symbols for data....

llvm/tools/llvm-objdump/llvm-objdump.cpp
1503

So are variables used for asserts encouraged? Seems like an odd choice of warning to disable.

thopre marked 2 inline comments as done.Mar 26 2020, 7:57 AM
thopre added inline comments.
lld/test/ELF/pre_init_fini_array.s
148–153

This specific test is an existing test which is not aimed at testing the disassembly of a call instruction. It just happen to start failing with the patch because it's calling something that is data, not code. What @jhenderson meant is that this test could use leaq instead of call and still test what it's supposed to test (that a reference to those initializer and finalizers are resolved correctly).

davidb requested changes to this revision.Mar 26 2020, 8:43 AM

Test change looks good (address lookup rather than by name), but it makes the objdump change completely irrelevant to this now, so I'd revert the objdump change.

This revision now requires changes to proceed.Mar 26 2020, 8:43 AM
thopre abandoned this revision.Mar 26 2020, 9:17 AM
thopre marked an inline comment as done.

Test change looks good (address lookup rather than by name), but it makes the objdump change completely irrelevant to this now, so I'd revert the objdump change.

Alright, moved the test change to D76852. Dropping this patch.