This is an archive of the discontinued LLVM Phabricator instance.

[DWARF] Inlined variables with no location should not have a DW_TAG_variable
ClosedPublic

Authored by jmorse on Jan 28 2021, 7:17 AM.

Details

Summary

Discussed in this thread:

https://lists.llvm.org/pipermail/llvm-dev/2021-January/148139.html

DwarfDebug::collectEntityInfo accidentally distinguishes between variable locations that never have a location specified, and variable locations that have an empty location specified. The latter leads to the creation of an empty variable referring to the abstract origin:

DW_TAG_formal_parameter
    DW_AT_abstract_origin       (0x0000005a "bar")

Which is needless and potentially misleading. This patch makes collectEntityInfo behave the same way for both ways an empty variable can be represented, by only emitting a concrete variable if it actually has a location. The added test exercises this, and checks variables are created for "normal" functions too.

The NVPTX tests fail with this; this is because they were compiled to LLVM-IR before retainedNodes was correctly tracked, apparently. I've made the following changes to reflect changes in output:

  • debug-addr-class.ll: Added the list of retained nodes to the input. This causes two types to shift position in the output for reasons I don't understand.
  • debug-info.ll: The size of the unit changes because we cease emitting two parameters with abstract-origins as a result of this patch
  • debug-loc-offset.ll: adding retainedNodes to this makes the abbrev numbering change, and one new variable to be emitted.

Diff Detail

Event Timeline

jmorse created this revision.Jan 28 2021, 7:17 AM
jmorse requested review of this revision.Jan 28 2021, 7:17 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 28 2021, 7:17 AM
djtodoro added inline comments.Feb 1 2021, 2:46 AM
llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
1789–1800

Can we refactor this into a predicate function (according to the CodingStandards)?

1803

Are we covering inlined variables only with this?

jmorse updated this revision to Diff 320797.Feb 2 2021, 8:17 AM

Shift "does this list have a location?" predicate into a function

jmorse marked an inline comment as done.Feb 2 2021, 8:20 AM
jmorse added inline comments.
llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
1803

This filter covers everything -- otherwise we risk having some other distinction between inlined and non-inlined variables with no location IMO.

There's a loop at the bottom of this function to create concrete entities for non-inlined variables that weren't given a location, and back in endFunctionImpl there's a loop over all abstract scopes to ensure they get abstract entities created.

djtodoro accepted this revision.Feb 3 2021, 1:05 AM

lgtm, thanks!

NOTE: I will try to address this in llvm-dwarfdump --stats/locstats as well (asap).
This revision is now accepted and ready to land.Feb 3 2021, 1:05 AM
jmorse updated this revision to Diff 321379.Feb 4 2021, 4:34 AM

I reverted this as a builtbot objected:

http://lab.llvm.org:8011/#builders/105/builds/5486

The failing test, missing-abstract-variable.ll, is a known bad behaver on certain platforms. In the past it's been XFailed on those platforms because certain inlined parameters get optimised out. That xfail was deleted after these two commits:

https://reviews.llvm.org/rGff2073a51fe283bde05f48c300e3a9ee807b1b6d
https://reviews.llvm.org/rG93faeecd8fa1fc148f2ee0d0cb64f11c3d41ce49

I believe the latter was allowing more DBG_VALUEs to pass through isel, which are then optimised out in this test. The behaviour that lets this test pass on (mips, powerpc64, s390x, sparc) is thus exactly the behaviour this patch is trying to disable, and it's fine to re-apply the XFail.

I'll leave this as re-opened but accepted for a ~day, I'm confident about the fix but wanted somewhere to write the explanation, plus leave a little time in case there's a wider view.

jmorse reopened this revision.Feb 4 2021, 4:34 AM
This revision is now accepted and ready to land.Feb 4 2021, 4:34 AM

I reverted this as a builtbot objected:

http://lab.llvm.org:8011/#builders/105/builds/5486

The failing test, missing-abstract-variable.ll, is a known bad behaver on certain platforms. In the past it's been XFailed on those platforms because certain inlined parameters get optimised out. That xfail was deleted after these two commits:

https://reviews.llvm.org/rGff2073a51fe283bde05f48c300e3a9ee807b1b6d
https://reviews.llvm.org/rG93faeecd8fa1fc148f2ee0d0cb64f11c3d41ce49

I believe the latter was allowing more DBG_VALUEs to pass through isel, which are then optimised out in this test. The behaviour that lets this test pass on (mips, powerpc64, s390x, sparc) is thus exactly the behaviour this patch is trying to disable, and it's fine to re-apply the XFail.

I'll leave this as re-opened but accepted for a ~day, I'm confident about the fix but wanted somewhere to write the explanation, plus leave a little time in case there's a wider view.

In a separate/preparatory commit, perhaps you could add checks that these inlined formal parameters have DW_AT_location. (& adding the xfail for the architectures that don't have locations on these variables/parameters)

& then this change won't have to touch that test, I think.

MaskRay added a subscriber: MaskRay.EditedFeb 6 2021, 2:06 PM

Reproduce: https://gist.github.com/MaskRay/4c67b29bb038d5a016260437c9e25a7a

# git revert -n 56fa34ae3570a34fd0f4c2cf1bfaf095da01a959 # revert the workaround
% cat run.sh 
#!/bin/sh -e
llc -split-dwarf-file x.dwo -filetype=obj $1 -o x.o
if llvm-dwarfdump -debug-gnu-pubnames x.o 2> log; then
  exit 1
fi
grep 'name lookup table at' log
% ./run.sh a.ll
error: name lookup table at offset 0x12cc has a terminator at offset 0x12da before the expected end at 0x12fd
jmorse added a comment.Feb 8 2021, 2:23 AM

In a separate/preparatory commit, perhaps you could add checks that these inlined formal parameters have DW_AT_location. (& adding the xfail for the architectures that don't have locations on these variables/parameters)

& then this change won't have to touch that test, I think.

Added in e05c10380ce7, I'll leave it to soak for a bit, then put this patch back in.

[...]

error: name lookup table at offset 0x12cc has a terminator at offset 0x12da before the expected end at 0x12fd

I think this was destined for D94976 (the _other_ dwarf patch of mine that's going in and out), I'll copy it there and start investigating, thanks for the reproducer.

thakis added a subscriber: thakis.Feb 8 2021, 5:27 PM

Added in e05c10380ce7, I'll leave it to soak for a bit, then put this patch back in.

That change makes the test fail on arm (or at least on mac/arm): http://45.33.8.238/macm1/3175/step_10.txt Please take a look.

jmorse added a comment.Feb 9 2021, 2:10 AM

Hi Nico,

That change makes the test fail on arm (or at least on mac/arm): http://45.33.8.238/macm1/3175/step_10.txt Please take a look.

I've rolled back the change; this is curious though, because it was passing on a 32-bit arm bot [0], and I added aarch64 to the list of XFails so would imagine that arm-macs would be caught by that. What triple does that bot target?

[0] http://lab.llvm.org:8011/#/builders/111/builds/1545 <--- note this has unrelated test failures

thakis added a comment.Feb 9 2021, 5:14 AM

Hi Nico,

That change makes the test fail on arm (or at least on mac/arm): http://45.33.8.238/macm1/3175/step_10.txt Please take a look.

I've rolled back the change; this is curious though, because it was passing on a 32-bit arm bot [0], and I added aarch64 to the list of XFails so would imagine that arm-macs would be caught by that. What triple does that bot target?

[0] http://lab.llvm.org:8011/#/builders/111/builds/1545 <--- note this has unrelated test failures

I won't be near my arm mac for a few more hours, but de1bca4b36deb2a405bed85f7a7c71c09ab3c586 suggests that the triple is arm64-apple.

thakis added a comment.Feb 9 2021, 1:05 PM

I won't be near my arm mac for a few more hours, but de1bca4b36deb2a405bed85f7a7c71c09ab3c586 suggests that the triple is arm64-apple.

Here's what cmake prints when running it on an m1 mbp:

-- LLVM host triple: arm64-apple-darwin20.2.0
-- LLVM default target triple: arm64-apple-darwin20.2.0
jmorse added a comment.Feb 9 2021, 1:23 PM
  • LLVM host triple: arm64-apple-darwin20.2.0
  • LLVM default target triple: arm64-apple-darwin20.2.0

Hmmmmm. I'll stick "arm64" into the XFail list when re-applying that patch, there's very little chance of the term "arm64" ever referring to another architecture. To state the obvious, it's not ideal that there are two names for the same architecture, but I'll move that discussion to the mailing list rather than code review.

Oh, interesting note I just realized in some other code I was investigating (for https://bugs.llvm.org/show_bug.cgi?id=49769 ) - this sort of change should be made for non-inline functions too. WEll, specifically non-inline functions that have an abstract origin.

eg:

void do_stuff();
__attribute__((always_inline)) void f1() {
  int i;
  do_stuff();
}
void f2() { f1(); }
$ clang++-tot -O1 test.cpp -g -c && llvm-dwarfdump-tot test.o
...
0x0000002a:   DW_TAG_subprogram
                DW_AT_low_pc    (0x0000000000000000)
                DW_AT_high_pc   (0x0000000000000008)
                DW_AT_frame_base        (DW_OP_reg7 RSP)
                DW_AT_GNU_all_call_sites        (true)
                DW_AT_abstract_origin   (0x0000005b "_Z2f1v")

0x0000003d:     DW_TAG_variable
                  DW_AT_abstract_origin (0x00000067 "i")

0x00000042:     DW_TAG_GNU_call_site
                  DW_AT_abstract_origin (0x00000050 "_Z8do_stuffv")
                  DW_AT_low_pc  (0x0000000000000006)

0x0000004f:     NULL
ro added a subscriber: ro.Feb 7 2022, 3:50 AM

no-empty-child-vars.ll FAILs on SPARC: the first

; CHECK:     DW_TAG_subprogram
; CHECK:       DW_AT_abstract_origin (0x{{[0-9a-f]*}} "foo")

is missing completely and both qux and croix lack DW_TAG_inlined_subroutine.

Any suggestion what to do about this?

Thanks.

jmorse added a comment.Feb 8 2022, 5:28 AM
In D95617#3300552, @ro wrote:

no-empty-child-vars.ll FAILs on SPARC: the first

; CHECK:     DW_TAG_subprogram
; CHECK:       DW_AT_abstract_origin (0x{{[0-9a-f]*}} "foo")

is missing completely and both qux and croix lack DW_TAG_inlined_subroutine.

Any suggestion what to do about this?

Hi @ro,

I took a look at the DWARF produced when passing "-mtriple sparc--", to llc, which I've assumed is a good enough triple as it's replicated what you're describing. In the test, there's a single instruction with several inlining scopes:

%add.i = add nsw i32 %quux, 12, !dbg !33

The debug-info describes it as being from the "foo" function, inlined into croix, and that's what the test expects to be in the DWARF. However, sparc applies some optimisations to the add instruction which makes it disappear from the line tables, which stops the inlining information from appearing in the DWARF, hence the test failure.

The simplest and most straightforwards fix is to change the "!dbg !34" attachment on the IR "ret" instruction in croix to "!dbg !33", and repeat for the other two functions (!16 -> !15 in foo, !25 -> !24 in qux). That makes the test pass for me locally, and matches the intention of the check, which is to look at the DWARF that's emitted. The changes avoids relying on the add instruction appearing in the output, which sparc seems to optimise away.

Before taking that path though, could I ask whether the sparc line table is correct: I've no good understanding of what's going on on sparc (there are instruction bundles, I think?). At the end of compilation, the MIR looks like this:

RETL 8, implicit $o0, debug-location !29; test.c:2:16 @[ test.c:12:15 ] {
  $o0 = nsw ADDri killed $o0, 12, debug-location !29; test.c:2:16 @[ test.c:12:15 ]
}

Where the ADDri instruction has the correct location + inlining information attached to it. Looking at the line table produced, the ADDri instruction has it's location dropped, and it gets the line number from the RETL instruction. If that's expected and the correct behaviour for sparc, then we should alter the test, however if that ADDri is supposed to get the inlined line number then there's a bug somewhere else.

In D95617#3300552, @ro wrote:

no-empty-child-vars.ll FAILs on SPARC: the first

; CHECK:     DW_TAG_subprogram
; CHECK:       DW_AT_abstract_origin (0x{{[0-9a-f]*}} "foo")

is missing completely and both qux and croix lack DW_TAG_inlined_subroutine.

Any suggestion what to do about this?

Thanks.

Did the add instructions get somehow lost in Sparc? (like they got merged into the ret instruction somehow, or something in the Sparc backend transformed the add instructions and lost the debug locations along the way? Perhaps that could/should be fixed - but also, I think using call instructions rather than add instructions is more robust for this sort of testing (call to an opaque/declared-but-not-defined function is unlikely to get transformed in interesting ways or lose its debug location), so the test case could be changed to do that)

ro added a comment.Feb 8 2022, 12:31 PM

Did the add instructions get somehow lost in Sparc? (like they got merged into the ret instruction somehow, or something in the Sparc backend transformed the add instructions and lost the debug locations along the way? Perhaps that could/should be fixed - but also, I think using call instructions rather than add instructions is more robust for this sort of testing (call to an opaque/declared-but-not-defined function is unlikely to get transformed in interesting ways or lose its debug location), so the test case could be changed to do that)

The add insns didn't get lost at all, but were placed in the delay slots of retl. Which reminded me: this seems to be yet another instance of Issue #46473.

ro added a comment.Feb 11 2022, 7:24 AM

I just found the -disable-sparc-delay-filler option in llvm/lib/Target/Sparc/DelaySlotFiller.cpp. As expected, with it added to the llc invocation the unmodified test PASSes.

I wonder what the best way forward is: just XFAIL the test on SPARC as was done for other instances of Issue #46473, add that option as a workaround (it's silently ignored on non-SPARC targets), or go for the solution/workaround @jmorse suggested?

ro added a comment.Feb 21 2022, 1:56 AM

I've now updated Issue #46473 for this failure and posted D120238 to XFAIL the test. I think that's the best solution for now and in line with what was done for the other affected testcases. This way, when/if the bug gets fixed, all affected tests start to XPASS and can be adjusted.