This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/AsmPrinter/
-
CodeGen/
-
AsmPrinter/
3/6
DwarfUnit.cpp
-
test/DebugInfo/X86/
-
DebugInfo/
-
X86/
10/16
lto-cross-cu-call-origin-ref.ll

Differential D70350

[DWARF] Allow cross-CU references of subprogram definitions
ClosedPublic

Authored by vsk on Nov 15 2019, 4:05 PM.

Download Raw Diff

Details

Reviewers

dblaikie
aprantl
dexonsmith

Commits

rGbeeb3ab0fc3d: Reland (again): [DWARF] Allow cross-CU references of subprogram definitions
rGe08f205f5c2c: Reland (again): [DWARF] Allow cross-CU references of subprogram definitions
rG79daafc90308: Reland: [DWARF] Allow cross-CU references of subprogram definitions
rG30038da15b18: [DWARF] Allow cross-CU references of subprogram definitions

Summary

This allows a call site tag in CU A to reference a callee DIE in CU B
without resorting to creating an incomplete duplicate DIE for the callee
inside of CU A.

We already allow cross-CU references of subprogram declarations, so it
doesn't seem like definitions ought to be special.

This improves entry value evaluation and tail call frame synthesis in
the LTO setting. During LTO, it's common for cross-module inlining to
produce a call in some CU A where the callee resides in a different CU,
and there is no declaration subprogram for the callee anywhere. In this
case llvm would (unnecessarily, I think) emit an empty DW_TAG_subprogram
in order to fill in the call site tag. That empty 'definition' defeats
entry value evaluation etc., because the debugger can't figure out what
it means.

As a follow-up, maybe we could add a DWARF verifier check that a
DW_TAG_subprogram at least has a DW_AT_name attribute.

rdar://46577651

Diff Detail

Event Timeline

vsk created this revision.Nov 15 2019, 4:05 PM

Herald added a project: Restricted Project. · View Herald TranscriptNov 15 2019, 4:05 PM

Herald added subscribers: llvm-commits, steven_wu, hiraditya, mehdi_amini. · View Herald Transcript

dblaikie added inline comments.Nov 15 2019, 4:22 PM

llvm/test/DebugInfo/X86/lto-cross-cu-call-origin-ref.ll
8–12	By no means mandatory suggestion - my usual approach here would be: use a function call to an optnone function instead of a volatile increment (single call instruction, simple/clear to read in IR, etc) add an always_inline (or alwaysinline? I never remember the spelling) attribute on 'foo' so as not to rely on optimizations? Neither make a huge difference & this is nice and simple as-is, but just some ideas.

Herald added a subscriber: ormris. · View Herald TranscriptNov 15 2019, 4:22 PM

aprantl added inline comments.Nov 15 2019, 4:33 PM

llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp
190–191	Technically DISubprogram that are function definitions aren't part of the type system (as opposed to function declarations). Maybe just the comment needs to be reworded, but I wonder if this could have some unintended consequences when `llvm-link`ing two llvm::Modules that both define different functions with the same name. What decision is `isShareableAcrossCUs()` used for?

Simplify the test.

llvm/test/DebugInfo/X86/lto-cross-cu-call-origin-ref.ll
8–12	Thanks!

vsk planned changes to this revision.Nov 15 2019, 4:51 PM

vsk marked an inline comment as done.

vsk added inline comments.

llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp
190–191	`isShareableAcrossCUs` is used to figure out whether to whether a DIE can be retrieved from the DwarfFile backing a DwarfUnit, it looks like. If there were two modules which both defined functions with the same name, I'd expect the definitions to either be merged, or for at least one to be TU-local. If we were to treat all definitions as shareable, we might expose the TU-local one (not sure what/if any harm that would cause -- maybe messed up call site tags? -- but we should probably just sidestep the issue).

@aprantl I think the patch is correct as-written, but not for the reasons I expected :). It would be incorrect to not share TU-local subprogram definitions, I think. That's because such a definition may be inlined into a different TU, in which case a call to it would still need to be described. I've added a test case to illustrate this. I've also updated the comment to clarify the new situation.

vsk marked an inline comment as done.Nov 15 2019, 6:51 PM

vsk added inline comments.

llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp
190–191	My comment here is wrong, we actually do want to share all definitions (explanation + test case provided in the last patch update).

Ping.

In D70350#1748673, @vsk wrote:

@aprantl I think the patch is correct as-written, but not for the reasons I expected :). It would be incorrect to not share TU-local subprogram definitions, I think. That's because such a definition may be inlined into a different TU, in which case a call to it would still need to be described. I've added a test case to illustrate this. I've also updated the comment to clarify the new situation.

That's correct. Though I'll leave it up to @aprantl to acknowledge/agree with this & provide final approval/resolution/closure there.

aprantl accepted this revision.Nov 22 2019, 2:54 PM

This revision is now accepted and ready to land.Nov 22 2019, 2:54 PM

This breaks the dwarf produced in an LTO build of xnu. With the patch applied I start seeing a lot of verification failures, mostly about the ranges in a DIE lying outside of the parent DIE's ranges.

In D70350#1757579, @vsk wrote:

This breaks the dwarf produced in an LTO build of xnu. With the patch applied I start seeing a lot of verification failures, mostly about the ranges in a DIE lying outside of the parent DIE's ranges.

Given what's happening here, I would suspect the verification might be incorrect rather than the DWARF. Could you provide a small (or any sized, really) example of the DWARF that's being flagged as invalid?

In D70350#1757600, @dblaikie wrote:

In D70350#1757579, @vsk wrote:

This breaks the dwarf produced in an LTO build of xnu. With the patch applied I start seeing a lot of verification failures, mostly about the ranges in a DIE lying outside of the parent DIE's ranges.

Given what's happening here, I would suspect the verification might be incorrect rather than the DWARF. Could you provide a small (or any sized, really) example of the DWARF that's being flagged as invalid?

Apologies for the rushed update. Fred and I took a look and believe that the DWARF forms used for cross-CU references are not right, e.g.:

error: DW_FORM_ref4 CU offset 0x00006ecb is invalid (must be less than CU size of 0x00003c30):

0x00001ebf: DW_TAG_inlined_subroutine
              DW_AT_abstract_origin     (0x00006ecb) // Should be DW_FORM_ref_addr.

Additionally, it looks like low/high PCs are not being set properly, e.g.:

error: DIE address ranges are not contained in its parent's ranges:
0x00006786: DW_TAG_subprogram
              DW_AT_low_pc      (0x0000000000000001)
              DW_AT_high_pc     (0x0000000000000081)

0x000067a9:   DW_TAG_lexical_block
                DW_AT_low_pc    (0x00000000000030f0)
                DW_AT_high_pc   (0x0000000000003140)

With this patch applied, the following assertion in addLocalLabelAddress (used by attachLowHighPC) does fire:

void DwarfCompileUnit::addLocalLabelAddress(DIE &Die,
                                            dwarf::Attribute Attribute,
                                            const MCSymbol *Label) {
  assert(!Die.getUnit() || Die.getUnit() == getUnitDie().getUnit());

I plan on digging into this today.

In D70350#1765463, @vsk wrote:
In D70350#1757600, @dblaikie wrote:

In D70350#1757579, @vsk wrote:

This breaks the dwarf produced in an LTO build of xnu. With the patch applied I start seeing a lot of verification failures, mostly about the ranges in a DIE lying outside of the parent DIE's ranges.

Given what's happening here, I would suspect the verification might be incorrect rather than the DWARF. Could you provide a small (or any sized, really) example of the DWARF that's being flagged as invalid?

Apologies for the rushed update. Fred and I took a look and believe that the DWARF forms used for cross-CU references are not right, e.g.:
error: DW_FORM_ref4 CU offset 0x00006ecb is invalid (must be less than CU size of 0x00003c30):

0x00001ebf: DW_TAG_inlined_subroutine
              DW_AT_abstract_origin     (0x00006ecb) // Should be DW_FORM_ref_addr.

Yep, cross-CU references should use FORM_ref_addr ( https://godbolt.org/z/zQZcyf shows these in use in cross-CU inlining). I'm surprised that there would be a bug here in this case - the code that deals with this seems pretty generic: https://github.com/llvm-mirror/llvm/blob/master/lib/CodeGen/AsmPrinter/DwarfUnit.cpp#L391

This is the problem: https://github.com/llvm/llvm-project/blob/master/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp#L954.

When llvm emits call site info, it creates the callee DIE if one doesn't exist. In an LTO build this only happens in CUs where a definition DISubprogram for the callee is available [1]. The issue with this is that the definition DIE is attached to the wrong parent/context DIE. The following assert catches the problem:

 DIE *DwarfUnit::getOrCreateContextDIE(const DIScope *Context) {
-  if (!Context || isa<DIFile>(Context))
+  if (!Context || isa<DIFile>(Context)) {
+    assert((!Context || CUNode->getFile() == Context) &&
+           "Can't find a suitable context DIE");
     return &getUnitDie();
+  }

Once the definition is created, we do things like attach low/high pc ranges and/or inlined subroutines to it, assuming that it's nested within the correct CU when it's not. While this is a pre-existing issue, this patch surfaced the verifier failures because subprogram definitions were not shared pre-patch.

Some possible solutions:

Lazily get or create the correct context CU for definition subprograms, so that the definitions can be constructed properly. Not sure if it'd be a layering issue to call DwarfDebug::getOrCreateDwarfCompileUnit from within DwarfCompileUnit.
When a DIE for a subprogram is unavailable, insist on creating a fresh declaration DIE nested within the caller's CU (even if we have a definition DISubprogram at hand). Compared to (1) we'd emit a little extra debug info.
Maintain a list of unfinished call sites (pairs of {DIE *, DISubprogram *}) in DwarfDebug. Set DW_AT_call_origin within the unfinished call sites after all subprogram definition DIEs have been created.

Please let me know if you have alternate/preferred approaches. I think I'll try (3) tomorrow.

[1] When a declaration DIE for a callee is available, the call site info is emitted correctly today already. DIEs for declarations are created ahead of time in DwarfDebug::getOrCreateDwarfCompileUnit.

Shooting from the hip/casual opinion/haven't looked at the details - cross-CU inlining should have the same problem/this should be solved in the same way. (in cross-CU inlining, do we create the source CU (if it has no non-inlined functions, etc) even if nothing's inlined from it (then it'd be an empty CU)? Is it made on-demand when an inlining is needed?)

In D70350#1766238, @dblaikie wrote:

Shooting from the hip/casual opinion/haven't looked at the details - cross-CU inlining should have the same problem/this should be solved in the same way. (in cross-CU inlining, do we create the source CU (if it has no non-inlined functions, etc) even if nothing's inlined from it (then it'd be an empty CU)? Is it made on-demand when an inlining is needed?)

Yes, cross-CU inlining does solve the same problem. To answer your first question, the CU defining an inlined function is constructed lazily. My rough understanding is that:

In DwarfDebug::endFunctionImpl(MachineFunction &MF), we gather all the inlined function scopes used in MF, then lazily create abstract definition subprograms for the inlined callees (lazily creating their CUs if needed, see DwarfDebug::constructAbstractSubprogramScopeDIE)
Once that's done we emit any necessary TAG_inlined_subroutines.

That sounds similar to suggestion (1) I described in my last comment, except that DwarfDebug is responsible for lazily creating the callee's CU.

In D70350#1767483, @vsk wrote:

In D70350#1766238, @dblaikie wrote:

Shooting from the hip/casual opinion/haven't looked at the details - cross-CU inlining should have the same problem/this should be solved in the same way. (in cross-CU inlining, do we create the source CU (if it has no non-inlined functions, etc) even if nothing's inlined from it (then it'd be an empty CU)? Is it made on-demand when an inlining is needed?)

Yes, cross-CU inlining does solve the same problem. To answer your first question, the CU defining an inlined function is constructed lazily. My rough understanding is that:

In DwarfDebug::endFunctionImpl(MachineFunction &MF), we gather all the inlined function scopes used in MF, then lazily create abstract definition subprograms for the inlined callees (lazily creating their CUs if needed, see DwarfDebug::constructAbstractSubprogramScopeDIE)

Once that's done we emit any necessary TAG_inlined_subroutines.

That sounds similar to suggestion (1) I described in my last comment, except that DwarfDebug is responsible for lazily creating the callee's CU.

Yep - I'd certainly suggest starting down that direction, at least. I don't know of any gotchas right off the bat, but there might be some complications lurking there.

I've taken @dblaikie's advice and introduced DwarfDebug::constructSubprogramDefinitionDIE to ensure that a callee DIE is available when emitting call site tags. I tested this out by verifying the DWARF in justlto.o from an LTO build of xnu.

This revision is now accepted and ready to land.Dec 3 2019, 2:27 PM

Add braces around an if to prevent a warning in NDEBUG mode.

dblaikie added inline comments.Dec 4 2019, 11:29 AM

llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp
201	Oh, hmm - seems this function isn't used for cross-CU inlining. Perhaps it should be refactored to go through here? Also - any idea what the history was about the !Definition test in this function? What was that designed to avoid/handle?
llvm/test/DebugInfo/X86/lto-cross-cu-call-origin-ref.ll
6–27	This test case still seems a bit complicated - I'd probably avoid having "main" in it if not needed (because it has arguments and return values that then complicate the DWARF, etc - and extra semantics that probably aren't relevant to the test/readers trying to understand it) Probably leave functions declared-but-not-defined (since you don't need to link the whole thing into an executable - juts to an object file, to look at the DWARF) rather than optnone, for one thing. & probably doesn't need to be compiled with optimizations enabled? (always_inline is probably enough to get the inlinings you're interested in) & some functions have "()" and others have "(void)" - prefer the former uniformly. You can use llvm-link to link together two llvm IR files, then run it back through clang (to go IR->bitcode-or-IR & just get the always_inline behavior at -O0), rather than needing save-temps, for what it's worth. What are the cases this test is intended to exercise?

Try to shorten the test.

vsk added a subscriber: manmanren.Dec 4 2019, 6:20 PM

vsk added inline comments.

llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp
201	Do you mean that the cross-CU inlining logic doesn't check that an abstract origin `isShareableAcrossCUs` before creating a cross-CU reference? It's possible some logic is duplicated there. I'll look into that. Re: history, @manmanren introduced `isShareableAcrossCUs` in r193779. Based on the commit message, I'm not sure whether the goal was to share just types/declarations between CUs, and there's no mention of subprogram definitions. There's a reference to added test coverage, but I couldn't find any in that commit or in close-by ones. Certainly no tests broke when removing the `!cast<DISubprogram>(D)->isDefinition()` guard. I understand it would be more reassuring to know why the guard came into being in the first place, but I think the test case from this patch provides good justification (briefly: when generating a TAG_call_site for the call to `noinline_in_a` in b.c:main, the definition of `noinline_in_a` must be shareable across CUs for the `getDIE` call to succeed).
llvm/test/DebugInfo/X86/lto-cross-cu-call-origin-ref.ll
6–27	I think that the "main" in the test case adds some valuable coverage, as it lets us test that cross-CU references in call site tags are well formed. I'm not sure if that answers your last question fully, but that's the case the test is intended to exercise. The specific sub-cases tested by "main" are: same-CU ref, cross-CU ref to an external def, and cross-CU ref to a non-external def inlined via an external def. Additionally, the DWARF for `noinline_in_a` is checked to make sure cross-CU references can go in the other direction. The arguments/return values in "main" do complicate the DWARF, but those bits of DWARF won't/shouldn't be checked here as that would be distracting. As for optimizations, clang won't emit call site info unless they are on, and they do tidy up the IR. However, `-Xclang -femit-debug-entry-values` can be dropped as that's not tested here -- I'll remove that.

Add back missing changes dropped due to a bad diff.

dblaikie added inline comments.Dec 5 2019, 1:05 PM

llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
758 ↗	(On Diff #232249)	Why is this only for the definition case? Is there any chance of coalescing it with the declaration case?
llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp
201	Do you mean that the cross-CU inlining logic doesn't check that an abstract origin isShareableAcrossCUs before creating a cross-CU reference? It's possible some logic is duplicated there. I'll look into that. Yeah, cross-CU inlining makes the cross-CU choice in DwarfDebug::constructAbstractSubprogramScopeDIE here: https://github.com/llvm/llvm-project/blob/master/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp#L523 re: history/lack of testing - yeah, think we'll just have to accept that. Thanks for looking into it a bit further.
llvm/test/DebugInfo/X86/lto-cross-cu-call-origin-ref.ll
6–27	Still got a few "(void)" in here - please replace those with "()", I think (or make them all "(void)")? (though I guess we don't have an authoritative documentation on C style formatting, etc) Ah, sorry - what I meant about "main" was basically "why main specifically?" why not another arbitrary function name with no parameters/return value? The code doesn't need to link into a valid executable - it can be only two IR modules linked together into one, that's how I've usually tested IR linking behavior in the past since it has fewer constraints like this, and being able to call undefined external functions is a nice hard-stop to the optimizer without needing optnone attributes (though the attributes can be useful in some cases, for sure) Perhaps some comments at the top of the file, or within the source code comments explaining which cases are being tested would be helpful? (and/or renaming the functions in the source to self-document their purpose) or splitting each test case, while within the same two files, into separate function groups that don't interact with each other so they're more isolated/easier to read (eg: f1_* functions are all related to testing one situation, f2_* functions are for another situation, etc)

Address review feedback.

llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
758 ↗	(On Diff #232249)	The DIEs for declarations are created in `DwarfDebug::getOrCreateDwarfCompileUnit`. Those cannot be constructed lazily as they are here, because they may be needed by other CUs.
llvm/test/DebugInfo/X86/lto-cross-cu-call-origin-ref.ll
6–27	I see, I've gotten rid of the '(void)', dropped the dependence on main/ld64/-flto, and renamed the test functions so that they more clearly reflect the functionality being tested (& added some more localized comments).

Thanks a bunch for describing the test cases in more detail!

(personally, I'm not sure the "foo" cases add much value - they motivate the functionality, the need for cross-CU references, but they're more complicated (because of the need for the always_inline extra layer of indirection) & don't add more test coverage, I think? (nothing we could realistically do would break the foo cases but keep the other cases of cross-CU call site references working correctly, for instance, similarly with the other inlining cases - that's usually my bar for testing): So in conclusion, I /think/ showing a_from_b and b_from_a would be sufficient for coverage, and the inlining and static foo aren't pulling their weight as testing/complexity in this test?)

In any case, I know I'm a bit esoterically pedantic about this sort of thing, so I'll leave it at that/up to you.

Assuming those duplicate declaration DIEs in A are not related to this patch/will be fixed by further incremental improvements, that is.

llvm/test/DebugInfo/X86/lto-cross-cu-call-origin-ref.ll
6–8	This is specifically only for subprogram definitions, right? (you can have cross-CU references for subprogram declarations - but that's already handled without this patch?) so probably add "definitions in there ("references to subprogram >definitions< within call site tags are well-formed"...)
11–12	You need -O2 to get the call site debug info? Is there any way to ask for that specifically without enabling optimizations? Might make it clearer (not like the optimizations can do anything given the optnone constraints, but might be a bit simpler if the optnone could be removed and the -ON could be removed too).
14	Should be able to do this at -O0 (that'd still run the always_inliner)?
57	You can probably remove the "0x{{0+}}" from these lines (& let those parts be included in the NOINLINE_FUNC_IN_A match implicitly (eg, write this line as: ; CHECK: [[NOINLINE_FUNC_IN_A:.*]]: DW_TAG_subprogram & the usage would change from this: ; CHECK-NEXT: DW_AT_call_origin (0x{{0+}}[[NOINLINE_FUNC_IN_A]]) to this: ; CHECK-NEXT: DW_AT_call_origin ([[NOINLINE_FUNC_IN_A]]) Similarly with the other matches.
85–91	These two look out of place - shouldn't they be referencing the DIEs in A's CU rather than emitting declarations here in B? (perhaps this is a separate bug to be fixed in another change, I don't know)

Thanks for the reviews!

llvm/test/DebugInfo/X86/lto-cross-cu-call-origin-ref.ll
6–8	Yes, I'll add this in.
11–12	Currently there is no way to force call site tag emission without optimizations (see `CGDebugInfo::getCallSiteRelatedAttrs`), although it might be simpler to use -O1 here.
14	Not quite. AlwaysInliner does run at -O0, but because the frontend isn't generating IR in -flto mode here, calls to an external (always_inline) function do not get the "always_inline" attribute. So the inliner declines to inline with: Inliner visiting SCC: always_inline_helper_in_a_that_calls_foo: 1 call sites. NOT Inlining (cost=never): no alwaysinline attribute, Call: call fastcc void @foo.2(), !dbg !20
57	This can be done in some places, but I think the net effect is to confuse things a bit, as the `{{0+}}` can't be left out of the match in all cases. E.g. it cannot be left out when the matched DIE is referenced by both CU's: in this case, there cannot be a `{{0+}}` in the check of the same-CU ref, as this would match too many zeros. Better to use the same check/match pattern everywhere, imo, instead of requiring the reader to figure out why the patterns change.
85–91	I think this is a separate bug, in which DwarfDebug::getOrCreateCompileUnit (over-)eagerly emits declaration DIEs. IIUC it doesn't have to emit these declarations in the LTO case, or when call site info has been disabled.

Oh, I forgot to address the point about the 'foo' test case. We briefly considered not sharing definitions that were local to a translation unit (DISubprogram::isLocalToUnit). The 'foo' case demonstrates why that doesn't work.

Closed by commit rG30038da15b18: [DWARF] Allow cross-CU references of subprogram definitions (authored by vsk). · Explain WhyDec 10 2019, 2:12 PM

This revision was automatically updated to reflect the committed changes.

vsk mentioned this in rGfa4701e19795: [DWARF] Defer creating declaration DIEs until we prepare call site info.Dec 20 2019, 3:28 PM

Just FYI, @vsk, I work on a large codebase that uses Clang to build several projects with (monolithic, not thin) LTO. We have a nightly build that uses LLVM trunk (with some patches internal to us rebased on top) to compile these monolithic LTO projects. Since this patch landed on Dec 20 2019 as rG79daafc9030, each of these LTO builds (>10 in total) began failing during the LTO stage. The stack trace, which I pasted in P8180, seems to indicate there's a problem with debug info, and that's what led me to this diff. I've reverted this diff in our internal LLVM fork, and doing so has our LTO builds succeeding again.

Unfortunately I haven't narrowed down the issue to a reproducible test case, and I can't share the internal projects that are failing. Hopefully the stack trace is useful to you, but if not, let me know if there's anything else I can share! I may not be able to find the time necessary to reduce this to a small test case, but I'd like to help in any other way I can.

wenlei added a subscriber: wenlei.Jan 3 2020, 12:00 PM

Hey @modocache, thanks for the report and stack trace. It looks like clang crashes because it found a DISubprogram without an associated DIE. I'm not sure how this could happen: the function constructCallSiteEntryDIEs should ensure that the associated DIE exists. Are you aware of any other internal debug info-related changes which might break this assumption?

FWIW, after adding an assert that guards against this to constructCallSiteEntryDIEs, I could build all of LNT using the ReleaseLTO-g cmake cache without issue. I'll kick off LTO builds of our internal projects to exercise this code path more. Meanwhile, is there any chance you could verify that the existing assert at DwarfCompileUnit.cpp:972 fires in your LTO build (https://github.com/llvm/llvm-project/blob/master/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp#L972)?

djtodoro mentioned this in D80369: [DebugInfo][CallSites] Remove decl subprograms from 'retainedTypes:'.May 21 2020, 2:18 AM

djtodoro mentioned this in rG40a3fcb05c83: [DebugInfo][CallSites] Remove decl subprograms from 'retainedTypes:'.Jun 1 2020, 12:29 AM

jmorse mentioned this in D94976: [DWARF] Create subprogram's DIE in the unit specified by its DISubprogram.Jan 19 2021, 9:12 AM

jmorse mentioned this in rGef0dcb506300: [DWARF] Create subprogram's DIE in DISubprogram's unit.Jan 27 2021, 4:38 AM

jmorse mentioned this in rG8998f5843503: Re-land D94976 after revert in e29552c5aff6.Feb 4 2021, 3:17 AM

@vsk As commented in https://bugs.llvm.org/show_bug.cgi?id=48790, this seems to impact us when building a large app with LTO even in XCode.
I don't have a cutdown but it showed up frequently blocking us stack traces with symbolication. Have we considered fixing this or backing out this change?

In D70350#2867905, @kyulee wrote:

@vsk As commented in https://bugs.llvm.org/show_bug.cgi?id=48790, this seems to impact us when building a large app with LTO even in XCode.
I don't have a cutdown but it showed up frequently blocking us stack traces with symbolication. Have we considered fixing this or backing out this change?

@jmorse has a patch up that should address an issue with definition DIEs not being created in the unit referenced by their subprogram: D94976. I haven't caught up with the discussion there.

The impact of backing out this change would be that backtraces including artificial tail call frames and parameter entry value evaluation would to an appreciable extent stop working in some of our key internal projects.

In D70350#2868253, @vsk wrote:

In D70350#2867905, @kyulee wrote:

@vsk As commented in https://bugs.llvm.org/show_bug.cgi?id=48790, this seems to impact us when building a large app with LTO even in XCode.
I don't have a cutdown but it showed up frequently blocking us stack traces with symbolication. Have we considered fixing this or backing out this change?

@jmorse has a patch up that should address an issue with definition DIEs not being created in the unit referenced by their subprogram: D94976. I haven't caught up with the discussion there.

The impact of backing out this change would be that backtraces including artificial tail call frames and parameter entry value evaluation would to an appreciable extent stop working in some of our key internal projects.

Unfortunately D94976 was reverted by c1d45abda5c8e1b00b12ae81461c0e3705d88666. I found another discussion in https://groups.google.com/g/llvm-dev/c/V4pU7FRreNw/m/EmMPvGJyAAAJ which also reverted this as workaround.
For now, we had to revert this internally to avoid symbolication crash. I wonder if there is a right fix over this.

jmorse mentioned this in D107076: [DWARF] Revert sharing subprograms across CUs.Jul 29 2021, 7:25 AM

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

AsmPrinter/

DwarfUnit.cpp

9 lines

test/

DebugInfo/

X86/

lto-cross-cu-call-origin-ref.ll

145 lines

Diff 229676

llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp

Show First 20 Lines • Show All 181 Lines • ▼ Show 20 Lines	if (DD->getDwarfVersion() >= 5)
return 1;		return 1;
break;		break;
}		}

return -1;		return -1;
}		}

/// Check whether the DIE for this MDNode can be shared across CUs.		/// Check whether the DIE for this MDNode can be shared across CUs.
bool DwarfUnit::isShareableAcrossCUs(const DINode *D) const {		bool DwarfUnit::isShareableAcrossCUs(const DINode *D) const {
// When the MDNode can be part of the type system, the DIE can be shared		// When the MDNode can be part of the type system (this includes subprogram
		aprantlUnsubmitted Not Done Reply Inline Actions Technically DISubprogram that are function definitions aren't part of the type system (as opposed to function declarations). Maybe just the comment needs to be reworded, but I wonder if this could have some unintended consequences when `llvm-link`ing two llvm::Modules that both define different functions with the same name. What decision is `isShareableAcrossCUs()` used for? aprantl: Technically DISubprogram that are function definitions aren't part of the type system (as…
		vskAuthorUnsubmitted Done Reply Inline Actions `isShareableAcrossCUs` is used to figure out whether to whether a DIE can be retrieved from the DwarfFile backing a DwarfUnit, it looks like. If there were two modules which both defined functions with the same name, I'd expect the definitions to either be merged, or for at least one to be TU-local. If we were to treat all definitions as shareable, we might expose the TU-local one (not sure what/if any harm that would cause -- maybe messed up call site tags? -- but we should probably just sidestep the issue). vsk: `isShareableAcrossCUs` is used to figure out whether to whether a DIE can be retrieved from the…
		vskAuthorUnsubmitted Done Reply Inline Actions My comment here is wrong, we actually do want to share all definitions (explanation + test case provided in the last patch update). vsk: My comment here is wrong, we actually do want to share all definitions (explanation + test case…
// across CUs.		// declarations and subprogram definitions, even local definitions), the
		// DIE must be shared across CUs.
// Combining type units and cross-CU DIE sharing is lower value (since		// Combining type units and cross-CU DIE sharing is lower value (since
// cross-CU DIE sharing is used in LTO and removes type redundancy at that		// cross-CU DIE sharing is used in LTO and removes type redundancy at that
// level already) but may be implementable for some value in projects		// level already) but may be implementable for some value in projects
// building multiple independent libraries with LTO and then linking those		// building multiple independent libraries with LTO and then linking those
// together.		// together.
if (isDwoUnit() && !DD->shareAcrossDWOCUs())		if (isDwoUnit() && !DD->shareAcrossDWOCUs())
return false;		return false;
return (isa<DIType>(D) \|\|		return (isa<DIType>(D) \|\| isa<DISubprogram>(D)) && !DD->generateTypeUnits();
		dblaikieUnsubmitted Not Done Reply Inline Actions Oh, hmm - seems this function isn't used for cross-CU inlining. Perhaps it should be refactored to go through here? Also - any idea what the history was about the !Definition test in this function? What was that designed to avoid/handle? dblaikie: Oh, hmm - seems this function isn't used for cross-CU inlining. Perhaps it should be refactored…
		vskAuthorUnsubmitted Done Reply Inline Actions Do you mean that the cross-CU inlining logic doesn't check that an abstract origin `isShareableAcrossCUs` before creating a cross-CU reference? It's possible some logic is duplicated there. I'll look into that. Re: history, @manmanren introduced `isShareableAcrossCUs` in r193779. Based on the commit message, I'm not sure whether the goal was to share just types/declarations between CUs, and there's no mention of subprogram definitions. There's a reference to added test coverage, but I couldn't find any in that commit or in close-by ones. Certainly no tests broke when removing the `!cast<DISubprogram>(D)->isDefinition()` guard. I understand it would be more reassuring to know why the guard came into being in the first place, but I think the test case from this patch provides good justification (briefly: when generating a TAG_call_site for the call to `noinline_in_a` in b.c:main, the definition of `noinline_in_a` must be shareable across CUs for the `getDIE` call to succeed). vsk: Do you mean that the cross-CU inlining logic doesn't check that an abstract origin…
		dblaikieUnsubmitted Not Done Reply Inline Actions Do you mean that the cross-CU inlining logic doesn't check that an abstract origin isShareableAcrossCUs before creating a cross-CU reference? It's possible some logic is duplicated there. I'll look into that. Yeah, cross-CU inlining makes the cross-CU choice in DwarfDebug::constructAbstractSubprogramScopeDIE here: https://github.com/llvm/llvm-project/blob/master/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp#L523 re: history/lack of testing - yeah, think we'll just have to accept that. Thanks for looking into it a bit further. dblaikie: > Do you mean that the cross-CU inlining logic doesn't check that an abstract origin…
(isa<DISubprogram>(D) && !cast<DISubprogram>(D)->isDefinition())) &&
!DD->generateTypeUnits();
}		}

DIE DwarfUnit::getDIE(const DINode D) const {		DIE DwarfUnit::getDIE(const DINode D) const {
if (isShareableAcrossCUs(D))		if (isShareableAcrossCUs(D))
return DU->getDIE(D);		return DU->getDIE(D);
return MDNodeToDieMap.lookup(D);		return MDNodeToDieMap.lookup(D);
}		}

▲ Show 20 Lines • Show All 1,516 Lines • Show Last 20 Lines

llvm/test/DebugInfo/X86/lto-cross-cu-call-origin-ref.ll

This file was added.

				; RUN: llc -mtriple=x86_64-apple-darwin -filetype=obj < %s \| llvm-dwarfdump - \
				; RUN: \| FileCheck %s -implicit-check-not=DW_TAG_subprogram

				; Source:
				; // a.c
				; __attribute__((optnone)) void bar() {}
				; __attribute__((optnone)) static void baz() {}
				; __attribute__((always_inline)) void foo() {
				dblaikieUnsubmitted Not Done Reply Inline Actions This is specifically only for subprogram definitions, right? (you can have cross-CU references for subprogram declarations - but that's already handled without this patch?) so probably add "definitions in there ("references to subprogram >definitions< within call site tags are well-formed"...) dblaikie: This is specifically only for subprogram definitions, right? (you can have cross-CU references…
				vskAuthorUnsubmitted Done Reply Inline Actions Yes, I'll add this in. vsk: Yes, I'll add this in.
				; bar();
				; baz();
				; }
				; // b.c
				dblaikieUnsubmitted Done Reply Inline Actions By no means mandatory suggestion - my usual approach here would be: use a function call to an optnone function instead of a volatile increment (single call instruction, simple/clear to read in IR, etc) add an always_inline (or alwaysinline? I never remember the spelling) attribute on 'foo' so as not to rely on optimizations? Neither make a huge difference & this is nice and simple as-is, but just some ideas. dblaikie: By no means mandatory suggestion - my usual approach here would be: * use a function call to…
				vskAuthorUnsubmitted Done Reply Inline Actions Thanks! vsk: Thanks!
				dblaikieUnsubmitted Not Done Reply Inline Actions You need -O2 to get the call site debug info? Is there any way to ask for that specifically without enabling optimizations? Might make it clearer (not like the optimizations can do anything given the optnone constraints, but might be a bit simpler if the optnone could be removed and the -ON could be removed too). dblaikie: You need -O2 to get the call site debug info? Is there any way to ask for that specifically…
				vskAuthorUnsubmitted Done Reply Inline Actions Currently there is no way to force call site tag emission without optimizations (see `CGDebugInfo::getCallSiteRelatedAttrs`), although it might be simpler to use -O1 here. vsk: Currently there is no way to force call site tag emission without optimizations (see…
				; extern void foo();
				; __attribute__((optnone)) void baz() {}
				dblaikieUnsubmitted Not Done Reply Inline Actions Should be able to do this at -O0 (that'd still run the always_inliner)? dblaikie: Should be able to do this at -O0 (that'd still run the always_inliner)?
				vskAuthorUnsubmitted Done Reply Inline Actions Not quite. AlwaysInliner does run at -O0, but because the frontend isn't generating IR in -flto mode here, calls to an external (always_inline) function do not get the "always_inline" attribute. So the inliner declines to inline with: Inliner visiting SCC: always_inline_helper_in_a_that_calls_foo: 1 call sites. NOT Inlining (cost=never): no alwaysinline attribute, Call: call fastcc void @foo.2(), !dbg !20 vsk: Not quite. AlwaysInliner does run at -O0, but because the frontend isn't generating IR in -flto…
				; int main() {
				; foo();
				; baz();
				; return 0;
				; }

				; Command:
				; clang -O2 -Xclang -femit-debug-entry-values -g -flto -o a.o -c a.c
				; clang -O2 -Xclang -femit-debug-entry-values -g -flto -o b.o -c b.c
				; clang -O2 -Xclang -femit-debug-entry-values -g -flto -o main a.o b.o -Wl,-object_path_lto,lto.o,-save-temps

				; === CU for b.c ===

				dblaikieUnsubmitted Not Done Reply Inline Actions This test case still seems a bit complicated - I'd probably avoid having "main" in it if not needed (because it has arguments and return values that then complicate the DWARF, etc - and extra semantics that probably aren't relevant to the test/readers trying to understand it) Probably leave functions declared-but-not-defined (since you don't need to link the whole thing into an executable - juts to an object file, to look at the DWARF) rather than optnone, for one thing. & probably doesn't need to be compiled with optimizations enabled? (always_inline is probably enough to get the inlinings you're interested in) & some functions have "()" and others have "(void)" - prefer the former uniformly. You can use llvm-link to link together two llvm IR files, then run it back through clang (to go IR->bitcode-or-IR & just get the always_inline behavior at -O0), rather than needing save-temps, for what it's worth. What are the cases this test is intended to exercise? dblaikie: This test case still seems a bit complicated - I'd probably avoid having "main" in it if not…
				vskAuthorUnsubmitted Done Reply Inline Actions I think that the "main" in the test case adds some valuable coverage, as it lets us test that cross-CU references in call site tags are well formed. I'm not sure if that answers your last question fully, but that's the case the test is intended to exercise. The specific sub-cases tested by "main" are: same-CU ref, cross-CU ref to an external def, and cross-CU ref to a non-external def inlined via an external def. Additionally, the DWARF for `noinline_in_a` is checked to make sure cross-CU references can go in the other direction. The arguments/return values in "main" do complicate the DWARF, but those bits of DWARF won't/shouldn't be checked here as that would be distracting. As for optimizations, clang won't emit call site info unless they are on, and they do tidy up the IR. However, `-Xclang -femit-debug-entry-values` can be dropped as that's not tested here -- I'll remove that. vsk: I think that the "main" in the test case adds some valuable coverage, as it lets us test that…
				dblaikieUnsubmitted Not Done Reply Inline Actions Still got a few "(void)" in here - please replace those with "()", I think (or make them all "(void)")? (though I guess we don't have an authoritative documentation on C style formatting, etc) Ah, sorry - what I meant about "main" was basically "why main specifically?" why not another arbitrary function name with no parameters/return value? The code doesn't need to link into a valid executable - it can be only two IR modules linked together into one, that's how I've usually tested IR linking behavior in the past since it has fewer constraints like this, and being able to call undefined external functions is a nice hard-stop to the optimizer without needing optnone attributes (though the attributes can be useful in some cases, for sure) Perhaps some comments at the top of the file, or within the source code comments explaining which cases are being tested would be helpful? (and/or renaming the functions in the source to self-document their purpose) or splitting each test case, while within the same two files, into separate function groups that don't interact with each other so they're more isolated/easier to read (eg: f1_* functions are all related to testing one situation, f2_* functions are for another situation, etc) dblaikie: Still got a few "(void)" in here - please replace those with "()", I think (or make them all "…
				vskAuthorUnsubmitted Done Reply Inline Actions I see, I've gotten rid of the '(void)', dropped the dependence on main/ld64/-flto, and renamed the test functions so that they more clearly reflect the functionality being tested (& added some more localized comments). vsk: I see, I've gotten rid of the '(void)', dropped the dependence on main/ld64/-flto, and renamed…
				; CHECK: DW_TAG_compile_unit
				; CHECK: DW_AT_name ("b.c")

				; "foo" should still be present in "b.c" as a declaration.
				; CHECK: DW_TAG_subprogram
				; CHECK: DW_AT_name ("foo")
				; CHECK: DW_AT_declaration (true)

				; Check for an external definition subprogram for "baz".
				; CHECK: 0x{{0+}}[[BAZ_IN_B_DIE:.*]]: DW_TAG_subprogram
				; CHECK: DW_AT_call_all_calls (true)
				; CHECK: DW_AT_name ("baz")
				; CHECK: DW_AT_external (true)

				; Check that "main" references, in order:
				;
				; 1) The definition of "bar" in "a.c". (Previously we would not share
				; definition subprograms across CUs, so the call site info logic would create an
				; empty 'definition' subprogram for "bar" which never got filled in. This breaks
				; both entry value evaluation & artificial tail call frame synthesis in the LTO
				; setting.)
				;
				; 2) The non-external definition of "baz" in "a.c". This is inlined via "foo".
				;
				; 3) The external definition of "baz" in "b.c".
				; CHECK: DW_TAG_subprogram
				; CHECK: DW_AT_name ("main")
				; CHECK: DW_TAG_call_site
				; CHECK-NEXT: DW_AT_call_origin (0x{{0+}}[[BAR_IN_A_DIE:.*]])
				; CHECK: DW_TAG_call_site
				dblaikieUnsubmitted Done Reply Inline Actions You can probably remove the "0x{{0+}}" from these lines (& let those parts be included in the NOINLINE_FUNC_IN_A match implicitly (eg, write this line as: ; CHECK: [[NOINLINE_FUNC_IN_A:.]]: DW_TAG_subprogram & the usage would change from this: ; CHECK-NEXT: DW_AT_call_origin (0x{{0+}}[[NOINLINE_FUNC_IN_A]]) to this: ; CHECK-NEXT: DW_AT_call_origin ([[NOINLINE_FUNC_IN_A]]) Similarly with the other matches. dblaikie:* You can probably remove the "0x{{0+}}" from these lines (& let those parts be included in the…
				vskAuthorUnsubmitted Done Reply Inline Actions This can be done in some places, but I think the net effect is to confuse things a bit, as the `{{0+}}` can't be left out of the match in all cases. E.g. it cannot be left out when the matched DIE is referenced by both CU's: in this case, there cannot be a `{{0+}}` in the check of the same-CU ref, as this would match too many zeros. Better to use the same check/match pattern everywhere, imo, instead of requiring the reader to figure out why the patterns change. vsk: This can be done in some places, but I think the net effect is to confuse things a bit, as the…
				; CHECK-NEXT: DW_AT_call_origin (0x{{0+}}[[BAZ_IN_A_DIE:.*]])
				; CHECK: DW_TAG_call_site
				; CHECK-NEXT: DW_AT_call_origin (0x{{0+}}[[BAZ_IN_B_DIE]])

				; === CU for a.c ===

				; CHECK: DW_TAG_compile_unit
				; CHECK: DW_AT_name ("a.c")

				; Check for the definitions expected in a.c.
				; CHECK: 0x{{.*}}[[BAR_IN_A_DIE]]: DW_TAG_subprogram
				; CHECK: DW_AT_call_all_calls (true)
				; CHECK: DW_AT_name ("bar")

				; CHECK: 0x{{.*}}[[BAZ_IN_A_DIE]]: DW_TAG_subprogram
				; CHECK: DW_AT_call_all_calls (true)
				; CHECK: DW_AT_name ("baz")

				; Match & ignore.
				; CHECK: DW_TAG_subprogram
				; CHECK: DW_AT_name ("foo")
				; CHECK: DW_AT_inline (DW_INL_inlined)

				target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-apple-macosx10.14.0"

				define internal fastcc void @bar() unnamed_addr #0 !dbg !15 {
				entry:
				ret void, !dbg !18
				}

				define internal fastcc void @baz.2() unnamed_addr #0 !dbg !19 {
				entry:
				ret void, !dbg !20
				dblaikieUnsubmitted Not Done Reply Inline Actions These two look out of place - shouldn't they be referencing the DIEs in A's CU rather than emitting declarations here in B? (perhaps this is a separate bug to be fixed in another change, I don't know) dblaikie: These two look out of place - shouldn't they be referencing the DIEs in A's CU rather than…
				vskAuthorUnsubmitted Done Reply Inline Actions I think this is a separate bug, in which DwarfDebug::getOrCreateCompileUnit (over-)eagerly emits declaration DIEs. IIUC it doesn't have to emit these declarations in the LTO case, or when call site info has been disabled. vsk: I think this is a separate bug, in which DwarfDebug::getOrCreateCompileUnit (over-)eagerly…
				}

				define internal fastcc void @baz() unnamed_addr #0 !dbg !21 {
				entry:
				ret void, !dbg !22
				}

				define i32 @main() local_unnamed_addr !dbg !23 {
				entry:
				tail call fastcc void @bar(), !dbg !27
				tail call fastcc void @baz.2(), !dbg !30
				tail call fastcc void @baz(), !dbg !31
				ret i32 0, !dbg !32
				}

				attributes #0 = { noinline optnone }

				!llvm.dbg.cu = !{!0, !3}
				!llvm.ident = !{!9, !9}
				!llvm.module.flags = !{!10, !11, !12, !13, !14}

				!0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang version 10.0.0 (git@github.com:llvm/llvm-project.git 3ad8c5db6b232383f4b0553505f585ad7fd194a4)", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !2, nameTableKind: None)
				!1 = !DIFile(filename: "a.c", directory: "/Users/vsk/tmp/lto-entry-vals")
				!2 = !{}
				!3 = distinct !DICompileUnit(language: DW_LANG_C99, file: !4, producer: "clang version 10.0.0 (git@github.com:llvm/llvm-project.git 3ad8c5db6b232383f4b0553505f585ad7fd194a4)", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !2, retainedTypes: !5, nameTableKind: None)
				!4 = !DIFile(filename: "b.c", directory: "/Users/vsk/tmp/lto-entry-vals")
				!5 = !{!6}
				!6 = !DISubprogram(name: "foo", scope: !4, file: !4, line: 1, type: !7, spFlags: DISPFlagOptimized, retainedNodes: !2)
				!7 = !DISubroutineType(types: !8)
				!8 = !{null, null}
				!9 = !{!"clang version 10.0.0 (git@github.com:llvm/llvm-project.git 3ad8c5db6b232383f4b0553505f585ad7fd194a4)"}
				!10 = !{i32 2, !"Dwarf Version", i32 4}
				!11 = !{i32 2, !"Debug Info Version", i32 3}
				!12 = !{i32 1, !"wchar_size", i32 4}
				!13 = !{i32 7, !"PIC Level", i32 2}
				!14 = !{i32 1, !"LTOPostLink", i32 1}
				!15 = distinct !DISubprogram(name: "bar", scope: !1, file: !1, line: 1, type: !16, scopeLine: 1, flags: DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition \| DISPFlagOptimized, unit: !0, retainedNodes: !2)
				!16 = !DISubroutineType(types: !17)
				!17 = !{null}
				!18 = !DILocation(line: 1, column: 38, scope: !15)
				!19 = distinct !DISubprogram(name: "baz", scope: !1, file: !1, line: 2, type: !16, scopeLine: 2, flags: DIFlagAllCallsDescribed, spFlags: DISPFlagLocalToUnit \| DISPFlagDefinition \| DISPFlagOptimized, unit: !0, retainedNodes: !2)
				!20 = !DILocation(line: 2, column: 45, scope: !19)
				!21 = distinct !DISubprogram(name: "baz", scope: !4, file: !4, line: 2, type: !16, scopeLine: 2, flags: DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition \| DISPFlagOptimized, unit: !3, retainedNodes: !2)
				!22 = !DILocation(line: 2, column: 38, scope: !21)
				!23 = distinct !DISubprogram(name: "main", scope: !4, file: !4, line: 3, type: !24, scopeLine: 3, flags: DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition \| DISPFlagOptimized, unit: !3, retainedNodes: !2)
				!24 = !DISubroutineType(types: !25)
				!25 = !{!26}
				!26 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
				!27 = !DILocation(line: 4, column: 3, scope: !28, inlinedAt: !29)
				!28 = distinct !DISubprogram(name: "foo", scope: !1, file: !1, line: 3, type: !16, scopeLine: 3, flags: DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition \| DISPFlagOptimized, unit: !0, retainedNodes: !2)
				!29 = distinct !DILocation(line: 4, column: 3, scope: !23)
				!30 = !DILocation(line: 5, column: 3, scope: !28, inlinedAt: !29)
				!31 = !DILocation(line: 5, column: 3, scope: !23)
				!32 = !DILocation(line: 6, column: 3, scope: !23)

This is an archive of the discontinued LLVM Phabricator instance.

[DWARF] Allow cross-CU references of subprogram definitionsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 229676

llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp

llvm/test/DebugInfo/X86/lto-cross-cu-call-origin-ref.ll

[DWARF] Allow cross-CU references of subprogram definitions
ClosedPublic