Page MenuHomePhabricator

[DWARF] Allow cross-CU references of subprogram definitions
ClosedPublic

Authored by vsk on Fri, Nov 15, 4:05 PM.

Details

Summary

This allows a call site tag in CU A to reference a callee DIE in CU B
without resorting to creating an incomplete duplicate DIE for the callee
inside of CU A.

We already allow cross-CU references of subprogram declarations, so it
doesn't seem like definitions ought to be special.

This improves entry value evaluation and tail call frame synthesis in
the LTO setting. During LTO, it's common for cross-module inlining to
produce a call in some CU A where the callee resides in a different CU,
and there is no declaration subprogram for the callee anywhere. In this
case llvm would (unnecessarily, I think) emit an empty DW_TAG_subprogram
in order to fill in the call site tag. That empty 'definition' defeats
entry value evaluation etc., because the debugger can't figure out what
it means.

As a follow-up, maybe we could add a DWARF verifier check that a
DW_TAG_subprogram at least has a DW_AT_name attribute.

rdar://46577651

Diff Detail

Event Timeline

vsk created this revision.Fri, Nov 15, 4:05 PM
dblaikie added inline comments.Fri, Nov 15, 4:22 PM
llvm/test/DebugInfo/X86/lto-cross-cu-call-origin-ref.ll
8–12

By no means mandatory suggestion - my usual approach here would be:

  • use a function call to an optnone function instead of a volatile increment (single call instruction, simple/clear to read in IR, etc)
  • add an always_inline (or alwaysinline? I never remember the spelling) attribute on 'foo' so as not to rely on optimizations?

Neither make a huge difference & this is nice and simple as-is, but just some ideas.

aprantl added inline comments.Fri, Nov 15, 4:33 PM
llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp
190–191

Technically DISubprogram that are function definitions aren't part of the type system (as opposed to function declarations). Maybe just the comment needs to be reworded, but I wonder if this could have some unintended consequences when llvm-linking two llvm::Modules that both define different functions with the same name.

What decision is isShareableAcrossCUs() used for?

vsk updated this revision to Diff 229666.Fri, Nov 15, 4:35 PM
vsk marked 2 inline comments as done.
vsk edited the summary of this revision. (Show Details)

Simplify the test.

llvm/test/DebugInfo/X86/lto-cross-cu-call-origin-ref.ll
8–12

Thanks!

vsk planned changes to this revision.Fri, Nov 15, 4:51 PM
vsk marked an inline comment as done.
vsk added inline comments.
llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp
190–191

isShareableAcrossCUs is used to figure out whether to whether a DIE can be retrieved from the DwarfFile backing a DwarfUnit, it looks like.

If there were two modules which both defined functions with the same name, I'd expect the definitions to either be merged, or for at least one to be TU-local. If we were to treat all definitions as shareable, we might expose the TU-local one (not sure what/if any harm that would cause -- maybe messed up call site tags? -- but we should probably just sidestep the issue).

vsk updated this revision to Diff 229676.Fri, Nov 15, 6:49 PM

@aprantl I think the patch is correct as-written, but not for the reasons I expected :). It would be incorrect to not share TU-local subprogram definitions, I think. That's because such a definition may be inlined into a different TU, in which case a call to it would still need to be described. I've added a test case to illustrate this. I've also updated the comment to clarify the new situation.

vsk marked an inline comment as done.Fri, Nov 15, 6:51 PM
vsk added inline comments.
llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp
190–191

My comment here is wrong, we actually do want to share all definitions (explanation + test case provided in the last patch update).

In D70350#1748673, @vsk wrote:

@aprantl I think the patch is correct as-written, but not for the reasons I expected :). It would be incorrect to not share TU-local subprogram definitions, I think. That's because such a definition may be inlined into a different TU, in which case a call to it would still need to be described. I've added a test case to illustrate this. I've also updated the comment to clarify the new situation.

That's correct. Though I'll leave it up to @aprantl to acknowledge/agree with this & provide final approval/resolution/closure there.

aprantl accepted this revision.Fri, Nov 22, 2:54 PM
This revision is now accepted and ready to land.Fri, Nov 22, 2:54 PM
vsk planned changes to this revision.Fri, Nov 22, 5:42 PM

This breaks the dwarf produced in an LTO build of xnu. With the patch applied I start seeing a lot of verification failures, mostly about the ranges in a DIE lying outside of the parent DIE's ranges.

In D70350#1757579, @vsk wrote:

This breaks the dwarf produced in an LTO build of xnu. With the patch applied I start seeing a lot of verification failures, mostly about the ranges in a DIE lying outside of the parent DIE's ranges.

Given what's happening here, I would suspect the verification might be incorrect rather than the DWARF. Could you provide a small (or any sized, really) example of the DWARF that's being flagged as invalid?

vsk added a comment.Mon, Dec 2, 9:34 AM
In D70350#1757579, @vsk wrote:

This breaks the dwarf produced in an LTO build of xnu. With the patch applied I start seeing a lot of verification failures, mostly about the ranges in a DIE lying outside of the parent DIE's ranges.

Given what's happening here, I would suspect the verification might be incorrect rather than the DWARF. Could you provide a small (or any sized, really) example of the DWARF that's being flagged as invalid?

Apologies for the rushed update. Fred and I took a look and believe that the DWARF forms used for cross-CU references are not right, e.g.:

error: DW_FORM_ref4 CU offset 0x00006ecb is invalid (must be less than CU size of 0x00003c30):

0x00001ebf: DW_TAG_inlined_subroutine
              DW_AT_abstract_origin     (0x00006ecb) // Should be DW_FORM_ref_addr.

Additionally, it looks like low/high PCs are not being set properly, e.g.:

error: DIE address ranges are not contained in its parent's ranges:
0x00006786: DW_TAG_subprogram
              DW_AT_low_pc      (0x0000000000000001)
              DW_AT_high_pc     (0x0000000000000081)

0x000067a9:   DW_TAG_lexical_block
                DW_AT_low_pc    (0x00000000000030f0)
                DW_AT_high_pc   (0x0000000000003140)

With this patch applied, the following assertion in addLocalLabelAddress (used by attachLowHighPC) does fire:

void DwarfCompileUnit::addLocalLabelAddress(DIE &Die,
                                            dwarf::Attribute Attribute,
                                            const MCSymbol *Label) {
  assert(!Die.getUnit() || Die.getUnit() == getUnitDie().getUnit());

I plan on digging into this today.

In D70350#1765463, @vsk wrote:
In D70350#1757579, @vsk wrote:

This breaks the dwarf produced in an LTO build of xnu. With the patch applied I start seeing a lot of verification failures, mostly about the ranges in a DIE lying outside of the parent DIE's ranges.

Given what's happening here, I would suspect the verification might be incorrect rather than the DWARF. Could you provide a small (or any sized, really) example of the DWARF that's being flagged as invalid?

Apologies for the rushed update. Fred and I took a look and believe that the DWARF forms used for cross-CU references are not right, e.g.:

error: DW_FORM_ref4 CU offset 0x00006ecb is invalid (must be less than CU size of 0x00003c30):

0x00001ebf: DW_TAG_inlined_subroutine
              DW_AT_abstract_origin     (0x00006ecb) // Should be DW_FORM_ref_addr.

Yep, cross-CU references should use FORM_ref_addr ( https://godbolt.org/z/zQZcyf shows these in use in cross-CU inlining). I'm surprised that there would be a bug here in this case - the code that deals with this seems pretty generic: https://github.com/llvm-mirror/llvm/blob/master/lib/CodeGen/AsmPrinter/DwarfUnit.cpp#L391

vsk added a comment.Mon, Dec 2, 5:01 PM

This is the problem: https://github.com/llvm/llvm-project/blob/master/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp#L954.

When llvm emits call site info, it creates the callee DIE if one doesn't exist. In an LTO build this only happens in CUs where a definition DISubprogram for the callee is available [1]. The issue with this is that the definition DIE is attached to the wrong parent/context DIE. The following assert catches the problem:

 DIE *DwarfUnit::getOrCreateContextDIE(const DIScope *Context) {
-  if (!Context || isa<DIFile>(Context))
+  if (!Context || isa<DIFile>(Context)) {
+    assert((!Context || CUNode->getFile() == Context) &&
+           "Can't find a suitable context DIE");
     return &getUnitDie();
+  }

Once the definition is created, we do things like attach low/high pc ranges and/or inlined subroutines to it, assuming that it's nested within the correct CU when it's not. While this is a pre-existing issue, this patch surfaced the verifier failures because subprogram definitions were not shared pre-patch.

Some possible solutions:

  1. Lazily get or create the correct context CU for definition subprograms, so that the definitions can be constructed properly. Not sure if it'd be a layering issue to call DwarfDebug::getOrCreateDwarfCompileUnit from within DwarfCompileUnit.
  2. When a DIE for a subprogram is unavailable, insist on creating a fresh declaration DIE nested within the caller's CU (even if we have a definition DISubprogram at hand). Compared to (1) we'd emit a little extra debug info.
  3. Maintain a list of unfinished call sites (pairs of {DIE *, DISubprogram *}) in DwarfDebug. Set DW_AT_call_origin within the unfinished call sites after all subprogram definition DIEs have been created.

Please let me know if you have alternate/preferred approaches. I think I'll try (3) tomorrow.

[1] When a declaration DIE for a callee is available, the call site info is emitted correctly today already. DIEs for declarations are created ahead of time in DwarfDebug::getOrCreateDwarfCompileUnit.

Shooting from the hip/casual opinion/haven't looked at the details - cross-CU inlining should have the same problem/this should be solved in the same way. (in cross-CU inlining, do we create the source CU (if it has no non-inlined functions, etc) even if nothing's inlined from it (then it'd be an empty CU)? Is it made on-demand when an inlining is needed?)

vsk added a comment.Tue, Dec 3, 10:51 AM

Shooting from the hip/casual opinion/haven't looked at the details - cross-CU inlining should have the same problem/this should be solved in the same way. (in cross-CU inlining, do we create the source CU (if it has no non-inlined functions, etc) even if nothing's inlined from it (then it'd be an empty CU)? Is it made on-demand when an inlining is needed?)

Yes, cross-CU inlining does solve the same problem. To answer your first question, the CU defining an inlined function is constructed lazily. My rough understanding is that:

  • In DwarfDebug::endFunctionImpl(MachineFunction &MF), we gather all the inlined function scopes used in MF, then lazily create abstract definition subprograms for the inlined callees (lazily creating their CUs if needed, see DwarfDebug::constructAbstractSubprogramScopeDIE)
  • Once that's done we emit any necessary TAG_inlined_subroutines.

That sounds similar to suggestion (1) I described in my last comment, except that DwarfDebug is responsible for lazily creating the callee's CU.

In D70350#1767483, @vsk wrote:

Shooting from the hip/casual opinion/haven't looked at the details - cross-CU inlining should have the same problem/this should be solved in the same way. (in cross-CU inlining, do we create the source CU (if it has no non-inlined functions, etc) even if nothing's inlined from it (then it'd be an empty CU)? Is it made on-demand when an inlining is needed?)

Yes, cross-CU inlining does solve the same problem. To answer your first question, the CU defining an inlined function is constructed lazily. My rough understanding is that:

  • In DwarfDebug::endFunctionImpl(MachineFunction &MF), we gather all the inlined function scopes used in MF, then lazily create abstract definition subprograms for the inlined callees (lazily creating their CUs if needed, see DwarfDebug::constructAbstractSubprogramScopeDIE)
  • Once that's done we emit any necessary TAG_inlined_subroutines.

    That sounds similar to suggestion (1) I described in my last comment, except that DwarfDebug is responsible for lazily creating the callee's CU.

Yep - I'd certainly suggest starting down that direction, at least. I don't know of any gotchas right off the bat, but there might be some complications lurking there.

vsk updated this revision to Diff 231992.Tue, Dec 3, 2:27 PM

I've taken @dblaikie's advice and introduced DwarfDebug::constructSubprogramDefinitionDIE to ensure that a callee DIE is available when emitting call site tags. I tested this out by verifying the DWARF in justlto.o from an LTO build of xnu.

This revision is now accepted and ready to land.Tue, Dec 3, 2:27 PM
vsk updated this revision to Diff 231994.Tue, Dec 3, 2:29 PM

Add braces around an if to prevent a warning in NDEBUG mode.

dblaikie added inline comments.Wed, Dec 4, 11:29 AM
llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp
201

Oh, hmm - seems this function isn't used for cross-CU inlining. Perhaps it should be refactored to go through here?

Also - any idea what the history was about the !Definition test in this function? What was that designed to avoid/handle?

llvm/test/DebugInfo/X86/lto-cross-cu-call-origin-ref.ll
6–27

This test case still seems a bit complicated - I'd probably avoid having "main" in it if not needed (because it has arguments and return values that then complicate the DWARF, etc - and extra semantics that probably aren't relevant to the test/readers trying to understand it)

Probably leave functions declared-but-not-defined (since you don't need to link the whole thing into an executable - juts to an object file, to look at the DWARF) rather than optnone, for one thing. & probably doesn't need to be compiled with optimizations enabled? (always_inline is probably enough to get the inlinings you're interested in) & some functions have "()" and others have "(void)" - prefer the former uniformly. You can use llvm-link to link together two llvm IR files, then run it back through clang (to go IR->bitcode-or-IR & just get the always_inline behavior at -O0), rather than needing save-temps, for what it's worth.

What are the cases this test is intended to exercise?

vsk updated this revision to Diff 232247.Wed, Dec 4, 6:20 PM
vsk marked 2 inline comments as done.

Try to shorten the test.

vsk added a subscriber: manmanren.Wed, Dec 4, 6:20 PM
vsk added inline comments.
llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp
201

Do you mean that the cross-CU inlining logic doesn't check that an abstract origin isShareableAcrossCUs before creating a cross-CU reference? It's possible some logic is duplicated there. I'll look into that.

Re: history, @manmanren introduced isShareableAcrossCUs in r193779. Based on the commit message, I'm not sure whether the goal was to share just types/declarations between CUs, and there's no mention of subprogram definitions. There's a reference to added test coverage, but I couldn't find any in that commit or in close-by ones. Certainly no tests broke when removing the !cast<DISubprogram>(D)->isDefinition() guard. I understand it would be more reassuring to know why the guard came into being in the first place, but I think the test case from this patch provides good justification (briefly: when generating a TAG_call_site for the call to noinline_in_a in b.c:main, the definition of noinline_in_a must be shareable across CUs for the getDIE call to succeed).

llvm/test/DebugInfo/X86/lto-cross-cu-call-origin-ref.ll
6–27

I think that the "main" in the test case adds some valuable coverage, as it lets us test that cross-CU references in call site tags are well formed. I'm not sure if that answers your last question fully, but that's the case the test is intended to exercise. The specific sub-cases tested by "main" are: same-CU ref, cross-CU ref to an external def, and cross-CU ref to a non-external def inlined via an external def. Additionally, the DWARF for noinline_in_a is checked to make sure cross-CU references can go in the other direction.

The arguments/return values in "main" do complicate the DWARF, but those bits of DWARF won't/shouldn't be checked here as that would be distracting. As for optimizations, clang won't emit call site info unless they are on, and they do tidy up the IR. However, -Xclang -femit-debug-entry-values can be dropped as that's not tested here -- I'll remove that.

vsk updated this revision to Diff 232249.Wed, Dec 4, 6:31 PM
  • Add back missing changes dropped due to a bad diff.
dblaikie added inline comments.Thu, Dec 5, 1:05 PM
llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
757

Why is this only for the definition case? Is there any chance of coalescing it with the declaration case?

llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp
201

Do you mean that the cross-CU inlining logic doesn't check that an abstract origin isShareableAcrossCUs before creating a cross-CU reference? It's possible some logic is duplicated there. I'll look into that.

Yeah, cross-CU inlining makes the cross-CU choice in DwarfDebug::constructAbstractSubprogramScopeDIE here: https://github.com/llvm/llvm-project/blob/master/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp#L523

re: history/lack of testing - yeah, think we'll just have to accept that. Thanks for looking into it a bit further.

llvm/test/DebugInfo/X86/lto-cross-cu-call-origin-ref.ll
6–27

Still got a few "(void)" in here - please replace those with "()", I think (or make them all "(void)")? (though I guess we don't have an authoritative documentation on C style formatting, etc)

Ah, sorry - what I meant about "main" was basically "why main specifically?" why not another arbitrary function name with no parameters/return value? The code doesn't need to link into a valid executable - it can be only two IR modules linked together into one, that's how I've usually tested IR linking behavior in the past since it has fewer constraints like this, and being able to call undefined external functions is a nice hard-stop to the optimizer without needing optnone attributes (though the attributes can be useful in some cases, for sure)

Perhaps some comments at the top of the file, or within the source code comments explaining which cases are being tested would be helpful? (and/or renaming the functions in the source to self-document their purpose) or splitting each test case, while within the same two files, into separate function groups that don't interact with each other so they're more isolated/easier to read (eg: f1_* functions are all related to testing one situation, f2_* functions are for another situation, etc)

vsk updated this revision to Diff 232471.Thu, Dec 5, 4:31 PM
vsk marked 2 inline comments as done.

Address review feedback.

llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
757

The DIEs for declarations are created in DwarfDebug::getOrCreateDwarfCompileUnit. Those cannot be constructed lazily as they are here, because they may be needed by other CUs.

llvm/test/DebugInfo/X86/lto-cross-cu-call-origin-ref.ll
6–27

I see, I've gotten rid of the '(void)', dropped the dependence on main/ld64/-flto, and renamed the test functions so that they more clearly reflect the functionality being tested (& added some more localized comments).

dblaikie accepted this revision.Thu, Dec 5, 7:41 PM

Thanks a bunch for describing the test cases in more detail!

(personally, I'm not sure the "foo" cases add much value - they motivate the functionality, the need for cross-CU references, but they're more complicated (because of the need for the always_inline extra layer of indirection) & don't add more test coverage, I think? (nothing we could realistically do would break the foo cases but keep the other cases of cross-CU call site references working correctly, for instance, similarly with the other inlining cases - that's usually my bar for testing): So in conclusion, I /think/ showing a_from_b and b_from_a would be sufficient for coverage, and the inlining and static foo aren't pulling their weight as testing/complexity in this test?)

In any case, I know I'm a bit esoterically pedantic about this sort of thing, so I'll leave it at that/up to you.

Assuming those duplicate declaration DIEs in A are not related to this patch/will be fixed by further incremental improvements, that is.

llvm/test/DebugInfo/X86/lto-cross-cu-call-origin-ref.ll
6–8

This is specifically only for subprogram definitions, right? (you can have cross-CU references for subprogram declarations - but that's already handled without this patch?) so probably add "definitions in there ("references to subprogram >definitions< within call site tags are well-formed"...)

11–12

You need -O2 to get the call site debug info? Is there any way to ask for that specifically without enabling optimizations? Might make it clearer (not like the optimizations can do anything given the optnone constraints, but might be a bit simpler if the optnone could be removed and the -ON could be removed too).

14

Should be able to do this at -O0 (that'd still run the always_inliner)?

57

You can probably remove the "0x{{0+}}" from these lines (& let those parts be included in the NOINLINE_FUNC_IN_A match implicitly (eg, write this line as:

; CHECK: [[NOINLINE_FUNC_IN_A:.*]]: DW_TAG_subprogram

& the usage would change from this:

; CHECK-NEXT: DW_AT_call_origin (0x{{0+}}[[NOINLINE_FUNC_IN_A]])

to this:

; CHECK-NEXT: DW_AT_call_origin ([[NOINLINE_FUNC_IN_A]])

Similarly with the other matches.

85–91

These two look out of place - shouldn't they be referencing the DIEs in A's CU rather than emitting declarations here in B? (perhaps this is a separate bug to be fixed in another change, I don't know)

vsk updated this revision to Diff 232590.Fri, Dec 6, 9:37 AM
vsk marked 6 inline comments as done.

Thanks for the reviews!

llvm/test/DebugInfo/X86/lto-cross-cu-call-origin-ref.ll
6–8

Yes, I'll add this in.

11–12

Currently there is no way to force call site tag emission without optimizations (see CGDebugInfo::getCallSiteRelatedAttrs), although it might be simpler to use -O1 here.

14

Not quite. AlwaysInliner does run at -O0, but because the frontend isn't generating IR in -flto mode here, calls to an external (always_inline) function do not get the "always_inline" attribute. So the inliner declines to inline with:

Inliner visiting SCC: always_inline_helper_in_a_that_calls_foo: 1 call sites.
    NOT Inlining (cost=never): no alwaysinline attribute, Call:   call fastcc void @foo.2(), !dbg !20
57

This can be done in some places, but I think the net effect is to confuse things a bit, as the {{0+}} can't be left out of the match in all cases. E.g. it cannot be left out when the matched DIE is referenced by both CU's: in this case, there cannot be a {{0+}} in the check of the same-CU ref, as this would match too many zeros. Better to use the same check/match pattern everywhere, imo, instead of requiring the reader to figure out why the patterns change.

85–91

I think this is a separate bug, in which DwarfDebug::getOrCreateCompileUnit (over-)eagerly emits declaration DIEs. IIUC it doesn't have to emit these declarations in the LTO case, or when call site info has been disabled.

vsk added a comment.Fri, Dec 6, 9:51 AM

Oh, I forgot to address the point about the 'foo' test case. We briefly considered not sharing definitions that were local to a translation unit (DISubprogram::isLocalToUnit). The 'foo' case demonstrates why that doesn't work.

This revision was automatically updated to reflect the committed changes.