This is an archive of the discontinued LLVM Phabricator instance.

[DebugInfo] Keep the CU consistent for operating `DISubprogram`.
AbandonedPublic

Authored by DianQK on Apr 5 2023, 8:10 AM.

Details

Summary

Fixes 61932.

The core issue I found is that DISubprogram operations usually use getUnit, while its DIE uses getScope. Once these two are associated with different CUs, associated DIEs may use another CU.

I found that even under llc version 15.0.7, DW_AT_low_pc and DW_AT_high_pc are still placed under the wrong CU. The D136039 just reflected this error to other locations. And this "Cannot represent a difference across sections" error only occurs on x86/x86-64.

$ llc -O0 -mtriple=x86_64-unknown-unknown -filetype=obj test.ll
$ llvm-dwarfdump --debug-info --verbose test.o
0x0000000b: DW_TAG_compile_unit [1] *
...
              DW_AT_low_pc [DW_FORM_addr]	(0x0000000000000070 ".text")
              DW_AT_high_pc [DW_FORM_data4]	(0x00000000)
0x00000080:     DW_TAG_namespace [5] * (0x0000004b)
...
0x00000085:       DW_TAG_structure_type [6] * (0x00000080)
...
0x0000008c:         DW_TAG_subprogram [9] * (0x00000085)
                      DW_AT_low_pc [DW_FORM_addr]	(0x0000000000000000 ".text")
                      DW_AT_high_pc [DW_FORM_data4]	(0x00000062)
...
0x000000a5:           DW_TAG_lexical_block [10] * (0x0000008c)
                        DW_AT_low_pc [DW_FORM_addr]	(0x0000000000000030 ".text")
                        DW_AT_high_pc [DW_FORM_data4]	(0x0000001e)
...

0x000000fe: DW_TAG_compile_unit [1] *
...
              DW_AT_low_pc [DW_FORM_addr]	(0x0000000000000000 ".text") <--- This should be in another CU.
              DW_AT_high_pc [DW_FORM_data4]	(0x00000062) <--- This should be in another CU.
...

Diff Detail

Event Timeline

DianQK created this revision.Apr 5 2023, 8:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 5 2023, 8:10 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
DianQK requested review of this revision.Apr 5 2023, 8:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 5 2023, 8:10 AM
DianQK edited the summary of this revision. (Show Details)Apr 5 2023, 8:18 AM
DianQK added reviewers: nikic, dblaikie, ellis, jmorse.
DianQK added a project: debug-info.
DianQK updated this revision to Diff 511409.Apr 6 2023, 7:30 AM

Eliminate duplicate code and update tests.

DianQK edited the summary of this revision. (Show Details)Apr 6 2023, 7:43 AM
DianQK edited the summary of this revision. (Show Details)Apr 10 2023, 3:35 PM
DianQK edited the summary of this revision. (Show Details)Apr 10 2023, 3:47 PM
dblaikie added inline comments.Apr 18 2023, 4:36 PM
llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
1275–1278 ↗(On Diff #511409)

Could you explain more about when/how/why these are ever different CUs? (when would the scope have a different CU from the subprogram itself)

The use of getScope seems to be the buggy part, maybe - it should be getting the scope from the appropriate CU and it isn't, maybe?

Also the LLVM UIR in the test case seems strange to me - I guess you've put DISubprogram dedfinitions scoped within the type declaration? That isn't usually how we do things & might lead to problems with debuggers since this may be unexpected. (GCC always puts the definitions at the global scope, clang puts the definitions at namespace scope I think - though perhaps at the global scope when there's a member function declaration anyway) Maybe that's part of the problem? (the scoping of the definitions - that it's not how LLVM ddebug info IR is usually generated by clang, so you may benefit from doing something more clang-like on the rust side to avoid running into more untested/unsupported cases)

Especially when it comes to type units and things, I can imagine this sort of IR might hit a lot of untested corners.

Also the LLVM UIR in the test case seems strange to me - I guess you've put DISubprogram dedfinitions scoped within the type declaration?

I looked around a bit, and I think the answer is yes based on this comment: https://github.com/rust-lang/rust/blob/c7815840793b980d0aae7d5a2f5d9bb1fd6c0d1e/compiler/rustc_codegen_llvm/src/debuginfo/mod.rs#L502-L504 Looking through blame, this was implemented in https://github.com/rust-lang/rust/pull/33358 to fix https://github.com/rust-lang/rust/issues/33192. I don't know anything about DWARF, so can't comment on whether that was the right/wrong solution.

Also the LLVM UIR in the test case seems strange to me.

I read 2.3 Relationship of Debugging Information Entries in DWARF5. There don't seem to be any constraints on the relationship of DIEs. So I guess the structure of this debugging info is at least reasonably present?
I do not know Rust and Swift's debugging information generation. But these languages all seem to generate similar debugging message structures. Maybe having LTO could improve the structure here?

Alternatively, we can use this patch to remove the changes to D135114 and D136039.

llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
1275–1278 ↗(On Diff #511409)

Let's see llvm/test/DebugInfo/Generic/cross-cu.ll.
The first key point is that beginModule is executed before beginFunctionImpl.
The second key point is the creation of the global at beginModule.

Follow the creation chain of Global:

  1. !4 CU via global create !8 DIGlobalVariable
  2. !4 CU in !8 DIGlobalVariable via type create !10 DICompositeType
  3. !4 CU in !10 via vtableHolder create !11
  4. !4 CU in !11 via baseType create !12
  5. !4 CU in !12 via elements create !15=!{!16}
  6. !4 CU in !16 via baseType create !17
  7. !4 CU in !17 via templateParams create !19=!{!20}
  8. !4 CU in !20 via type create !21

Now !21 belongs to !4!
But the !23 is distinct !DISubprogram(name: "alloc_impl", scope: !21, unit: !1).

The previous llvm/test/DebugInfo/Generic/cross-cu-inlining-2.ll and llvm/test/DebugInfo/Generic/cross-cu-inlining-ranges.ll were in a similar case.

DianQK updated this revision to Diff 514960.Apr 19 2023, 8:13 AM

Remove updateSubprogramScopeDIE.

dblaikie added inline comments.Apr 24 2023, 5:42 PM
llvm/test/DebugInfo/Generic/cross-cu.ll
41–63

OK, so from the description in the other inline comment - I take it you're dealing with the situation here, where:

!23 (DISubprogram for alloc::alloc::Global::alloc_impl) is defined in CU !1
!8 that also, indirectly, references the type is in CU !4

Yeah, I think the bug here is that you shouldn't have member function definitions directly associated with DICompositeTypes - they should indirect through a declaration of the function.

Otherwise we'll inevitably end up in impossible situations - where we can't define the type in every translation unit that defines one of the member functions (or other references) to the type.

Types are intentionally not attached to a CU because they exist across the whole program - so we need to be able to reference them from anywhere/lazily create them in whatever CU we need, generally.

If you've got similar metadata from Swift, I'd like to take a look/talk to the Apple/Swift folks about this.

But, yeah, generally, I don't think this IR is OK/supportable as-is. It should have a declaration of the function - so that the function definition can appear in a CU distinct from the definition of the type. So the function definition can stay in the CU that defined it.

DianQK added inline comments.Apr 25 2023, 6:14 AM
llvm/test/DebugInfo/Generic/cross-cu.ll
41–63

Thank you for clarifying the issue.
Maybe @aprantl could come and take a look at Swift?

I'm glad to check out the Rust-related implementation. I only know a little bit about Rust. It may take some time.
Do you think it is appropriate to submit this patch as a workaround? We can revert it before LLVM 17 is released.
This change does not appear to affect C++ debug info generation.

dblaikie added inline comments.Apr 25 2023, 9:17 AM
llvm/test/DebugInfo/Generic/cross-cu.ll
41–63

I don't think it's suitable to commit as a workaround - but maybe other folks have other ideas/I wouldn't rule it out entirely.

If you're not especially familiar with Rust, what's your particular context for/interest in this bug?

DianQK added inline comments.Apr 26 2023, 8:33 PM
llvm/test/DebugInfo/Generic/cross-cu.ll
41–63

I don't think it's suitable to commit as a workaround - but maybe other folks have other ideas/I wouldn't rule it out entirely.

If we treat all such IR fixes as a workaround, I realize that D136039 is a vulnerable workaround. It corrects one CU and affects the CU of another case. But this issue does not occur with the current patch.
I have found that inconsistent CU can lead to several problems:

  • Invalid DWARF: no mapping for range, inconsistent range data, and could not find referenced DIE.
  • Unable to generate target file: Cannot represent a difference across sections.

After this patch, we will get Valid but useless DWARF. I think Valid but useless DWARF is better than Invalid DWARF, and Invalid DWARF is better than Unable to generate target file in these three bad cases. We can make a no-good IR work as well as possible. Here's what I think.

But it may also be annoying for debugging.


If you're not especially familiar with Rust, what's your particular context for/interest in this bug?

No special reason. I'm just a PL/compiler enthusiast. I improve my technical skills by solving some real problems. I didn't have the opportunity to get a comprehensive CS and PL education. So sometimes I may have weird conclusions and terminology (;△;).

DianQK added a comment.EditedApr 27 2023, 5:19 AM

I re-looked at this IR and the associated code implementation.
I have some discoveries.

From the perspective of code implementation, I think this IR maybe is OK.
The IR doesn't have member function definitions directly associated with DICompositeTypes. The elements of !21 DICompositeType are empty.
Then I found a C++ example at D58538. !26 unit is !10, and !1 indirectly references !26. It's a similar case. D58786 is probably similar (I didn't look much at it).
I think these patches are a workaround with inconsistent scope and unit.

I started debugging why adding a declaration would solve this issue.

See DwarfUnit.cpp#L1163-L1166:

if (auto *SPDecl = SP->getDeclaration()) {
  if (!Minimal) {
    // Add subprogram definitions to the CU die directly.
    ContextDIE = &getUnitDie(); // Here. If there is a declaration, the parent DIE is set to the Unit DIE.

So for this IR, it just needs to add a declaration (we don't need add elements for !21):

-!23 = distinct !DISubprogram(name: "alloc_impl", linkageName: "_ZN5alloc5alloc6Global10alloc_impl17h238d0844851c71a5E", scope: !21, file: !24, line: 170, type: !25, scopeLine: 170, flags: DIFlagPrototyped, spFlags: DISPFlagLocalToUnit | DISPFlagDefinition, unit: !1, templateParams: !3, retainedNodes: !3)
+!23 = distinct !DISubprogram(name: "alloc_impl", linkageName: "_ZN5alloc5alloc6Global10alloc_impl17h238d0844851c71a5E", scope: !21, file: !24, line: 170, type: !25, scopeLine: 170, flags: DIFlagPrototyped, spFlags: DISPFlagLocalToUnit | DISPFlagDefinition, unit: !1, declaration: !40, templateParams: !3, retainedNodes: !3)
...
+!40 = !DISubprogram(name: "alloc_impl", linkageName: "_ZN5alloc5alloc6Global10alloc_impl17h238d0844851c71a5E", scope: !21, file: !24, line: 170, type: !25, scopeLine: 170, flags: DIFlagPrototyped, spFlags: 0)

This has nothing to do with members, just a declaration to switch the parent DIE from scope to unit. I don't think this makes sense.
So I think this should address in LLVM.

But why is it inconsistent? I think this may be a bug in LTO. I'll continue to debug the DWARF code and take a look at the LTO later.
If I understand correctly, this is because the ODR merges the DICompositeType. In this case, it is possible that scope and unit refer to different CUs.

I have a curiosity.
Can we change DIE *ContextDIE = Minimal ? &getUnitDie() : getOrCreateContextDIE(SP->getScope()); to DIE *ContextDIE = &getUnitDie();.
I understand that this code tends to select scope. Then switch to unit if it has a declaration.

DianQK updated this revision to Diff 517560.Apr 27 2023, 8:01 AM

Improved CU selection, preferring unit when inconsistent.

But why is it inconsistent?

If I understand correctly, this is because the ODR merges the DICompositeType. In this case, it is possible that scope and unit refer to different CUs.

This has nothing to do with members, just a declaration to switch the parent DIE from scope to unit. I don't think this makes sense.

Could you explain in more detail why you don't think this makes sense?

If I understand correctly, this is because the ODR merges the DICompositeType. In this case, it is possible that scope and unit refer to different CUs.

Roughly, yes. DICompositeTypes shouldn't have a scope that refers to a unit - they should be unit-agnostic. But DISubprograms do exist concretely in specific units (though, still, I don't think their scope field indicates that/should lead to any unit - it should just be a series of namespaces that leads to null) - so, since one DICompositeType can have two DISubprograms defined in different units - those DISubprograms can't be directly scoped inside the DICompositeType - they need to be indirected through declarations.

That's the fix at the source here - producers should produce declarations for these member functions and then it'll work.

I don't think we should workaround this issue in LLVM - producers should be fixed instead. At some point/if anything, LLVM's DebugInfo verifier should reject DWARF that looks like this (with defining DISubprograms scoped directly underneath DICompositeTypes).

DianQK added a comment.May 2 2023, 5:18 AM

This has nothing to do with members, just a declaration to switch the parent DIE from scope to unit. I don't think this makes sense.

Could you explain in more detail why you don't think this makes sense?

This is my consideration only from the perspective of code implementation.
See DwarfUnit.cpp#L1157-L1166:

DIE *ContextDIE =
    Minimal ? &getUnitDie() : getOrCreateContextDIE(SP->getScope());

if (DIE *SPDie = getDIE(SP))
  return SPDie;

if (auto *SPDecl = SP->getDeclaration()) {
  if (!Minimal) {
    // Add subprogram definitions to the CU die directly.
    ContextDIE = &getUnitDie();

As above code, the ParentDIE of distinct !DISubprogram(name: "foo", scope: !21, spFlags: DISPFlagDefinition, unit: !1) is scope.
However, the ParentDIE of distinct !DISubprogram(name: "foo", scope: !21, spFlags: DISPFlagDefinition, unit: !1, declaration: !40) is unit. Just a declaration makes such a difference. So I don't think this makes sense.

Note that this has nothing to do with the members of DICompositeType.

I blame this code to Debug Info: use createAndAddDIE for newly-created Subprogram DIEs. Interestingly, I found this code to solve the DIE across CU/LTO problem.
From this patch, I am more convinced that my patch is a more suitable solution.
The declaration should not control the ContextDIE here.


That's the fix at the source here - producers should produce declarations for these member functions and then it'll work.

Based on the above explanation of declarations deciding DIE, I think the declaration and members should play some role in other debugging structures.

cuviper added a subscriber: cuviper.May 4 2023, 3:05 PM

That's the fix at the source here - producers should produce declarations for these member functions and then it'll work.

I have implemented this in rustc here: https://github.com/rust-lang/rust/pull/111167

It seems to be working -- we'll just have to make sure it doesn't break any consumers. Given that this is closer to what both Clang and GCC do for C++ methods, I'm hopeful.

That's the fix at the source here - producers should produce declarations for these member functions and then it'll work.

I have implemented this in rustc here: https://github.com/rust-lang/rust/pull/111167

It seems to be working -- we'll just have to make sure it doesn't break any consumers. Given that this is closer to what both Clang and GCC do for C++ methods, I'm hopeful.

*fingers crossed* thanks for that!

DianQK abandoned this revision.May 9 2023, 6:07 AM

We prefer to resolve the issue by adding a declaration.