This is an archive of the discontinued LLVM Phabricator instance.

[DWARF] Share across CUs only when order free
AbandonedPublic

Authored by DianQK on Oct 5 2022, 7:07 AM.

Details

Summary

This change should fix some who could not find referenced DIE issues.
For example, see https://github.com/apple/swift/issues/53940.

I created a demo.

main.swift is:

let foos: Set<Foo> = []

foo.swift is:

struct Foo: Equatable, Hashable {}

I use SPM to build these IRs:

// swift-tools-version: 5.6
// The swift-tools-version declares the minimum version of Swift required to build this package.

import PackageDescription

let package = Package(
    name: "demo",
    targets: [
        .executableTarget(
            name: "demo",
            swiftSettings: [
                .unsafeFlags(["-Xfrontend", "-lto=llvm-full", "-Xfrontend", "-emit-bc"])
            ]),
    ]
)

Then:

swift build -c release
llvm-link .build/x86_64-unknown-linux-gnu/release/demo.build/foo.swift.o .build/x86_64-unknown-linux-gnu/release/demo.build/main.swift.o -o .build/lto.bc
llc -filetype=obj .build/lto.bc -o .build/lto.o
llvm-dwarfdump .build/lto.o --verify

Outputs:

...
error: invalid DIE reference 0x00000097. Offset is in between DIEs:

0x0000008f: DW_TAG_inlined_subroutine [8] * (0x0000007c)
              DW_AT_abstract_origin [DW_FORM_ref4]      (cu + 0x0097 => {0x00000097})
              DW_AT_low_pc [DW_FORM_addr]       (0x000000000000012b ".text")
              DW_AT_high_pc [DW_FORM_data4]     (0x0000000f)
              DW_AT_call_file [DW_FORM_data1]   ("/home/dianqk/Documents/demo/Sources/demo/main.swift")
              DW_AT_call_line [DW_FORM_data1]   (0)


Verifying unit: 2 / 2, "/home/dianqk/Documents/demo/Sources/demo/foo.swift"
Verifying dwo Units...
Errors detected.

I dumped the dwarf and found DW_TAG_structure_type("Foo") is owned by DW_TAG_compile_unit(main.swift), then some invalidating cross-unit references are found. It should be owned by DW_TAG_compile_unit(foo.swift).

To better (and completely) solve this problem, I think there are some general rules to consider.

We cannot freely share DIType with every CU in all cases.

It follows these rules:

  1. When a DIType has no successor node, the DIType can belong to any CU, and other CUs reference this.
  2. If there are (recursive) successor nodes and only if these successor nodes belong to the same CU, the DIType can share in a limited way, and this DIType must belong to the corresponding CU.
  3. When there are successor nodes, which scatter in different CUs, DIType cannot share.

However, it will be a major change to implement the above rules accurately.
So to solve the current problem, we need to make trade-offs.
Merge 2 and 3 rules, When there are successor nodes, DIType cannot share.

I modified a test case that correctly places a subprogram that does not need to share in the appropriate CU.
In addition, I added the above swift test case (I edited it to simpler) as a supplement.

Diff Detail

Event Timeline

DianQK created this revision.Oct 5 2022, 7:07 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 5 2022, 7:07 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
DianQK requested review of this revision.Oct 5 2022, 7:07 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 5 2022, 7:07 AM
DianQK edited the summary of this revision. (Show Details)Oct 5 2022, 7:10 AM
DianQK added a project: debug-info.

Sorry, I'm not really understanding most of this - could you explain further what you mean by "successor" nodes and what "order free" means in this context?

Sorry, I'm not really understanding most of this - could you explain further what you mean by "successor" nodes and what "order free" means in this context?

Sorry I didn't explain clearly.

In this context,

!10 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "Foo", scope: !4, file: !1, runtimeLang: DW_LANG_Swift)
!21 = distinct !DISubprogram(name: "hashValue.get", linkageName: "$hashValueGetInFoo2", scope: !10, file: !14, type: !15, spFlags: DISPFlagDefinition, unit: !0)

!21 is a "successor" node of !10 (link by scope).

"order" is similar in D135114.
In my test case, no matter how we swap the order, it fails. It always creates first in the "main CU" because of globals in CU.

After I created this Diff, I found this D135114 and tried this patch, which seems to fix the problem nicely.
But in a larger project I get warnings "no mapping for range" and "inconsistent range data".
Maybe we have some other work to do.

Currently, I can confirm that if I disable sharing any DIType, everything works fine.

Sorry, I'm not really understanding most of this - could you explain further what you mean by "successor" nodes and what "order free" means in this context?

Sorry I didn't explain clearly.

In this context,

!10 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "Foo", scope: !4, file: !1, runtimeLang: DW_LANG_Swift)
!21 = distinct !DISubprogram(name: "hashValue.get", linkageName: "$hashValueGetInFoo2", scope: !10, file: !14, type: !15, spFlags: DISPFlagDefinition, unit: !0)

!21 is a "successor" node of !10 (link by scope).

OK - I don't think that's a term we've historically used in describing debug info metadata, so it might be a bit confusing in this review and moreso in any comments/variable names, etc.

So !21 is a member function of !10, likely - !21 probably (but doesn't have to be) listed in the members list of !10.

Oh... I think I see the problem. Perhaps Swift should produce debug info that looks more like Clang's debug info - where all member functions are declarations, and are not scoped inside the type that they're a member of.

take for example this C++:

struct t1 {
  void f1() {
  }
};
int main() {
  t1().f1();
}

it produces this IR:

17 = distinct !DISubprogram(name: "f1", linkageName: "_ZN2t12f1Ev", scope: !18, file: !1, line: 2, type: !21, scopeLine: 2, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !0, declaration: !20, retainedNodes: !14)
!18 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "t1", file: !1, line: 1, size: 8, flags: DIFlagTypePassByValue, elements: !19, identifier: "_ZTS2t1")
!19 = !{!20}
!20 = !DISubprogram(name: "f1", linkageName: "_ZN2t12f1Ev", scope: !18, file: !1, line: 2, type: !21, scopeLine: 2, flags: DIFlagPrototyped, spFlags: 0)

Hmm, well "f1" is scoped to !18, my mistake. But this, one way or another, still doesn't try to put the definition of "f1" inside the definition of "t1" in LLVM's DWARF generation - so maybe it's the presence of a declaration that makes this work for C++ but not for swift?

Given this is pretty common in C++, and C++, as far as I understand, has been working OK with cross-CU sharing in LTO for many years now - it'd be worth better understanding what's different about Swift's debug info that's causing problems here, and consider where best to fix the issue - whether to make Swift's debug info look more like the C++ debug info we've had for a while, or to support some new case.

DianQK added a comment.Oct 7 2022, 7:37 AM

@dblaikie
Following your case, I created one swift example.

But let's revisit the C++ example back.
I found the main.cpp CU is not use DISubprogram(name: "f1"...).
It only uses DILocation:

define dso_local noundef i32 @main() #0 !dbg !10 {
  %1 = alloca %struct.t1, align 1
  call void @_ZN2t12f1Ev(%struct.t1* noundef nonnull align 1 dereferenceable(1) %1), !dbg !15
  ret i32 0, !dbg !16
}

!10 = distinct !DISubprogram(name: "main", scope: !1, file: !1, line: 7, type: !11, scopeLine: 7, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !0, retainedNodes: !14)
!15 = !DILocation(line: 8, column: 8, scope: !10)

Use single main.cpp(All in one file) or link t1.cpp and main.cpp for are same result.
Everything is fine. I think the declaration is irrelevant.

Hmm, in swift, it's the same as C++.

struct t1 {
    func f1() {}
}
t1().f1()
define protected i32 @main(i32 %0, i8** %1) #0 !dbg !35 {
  %3 = bitcast i8** %1 to i8*
  call swiftcc void @"$s4demo2t1VACycfC"(), !dbg !47
  call swiftcc void @"$s4demo2t1V2f1yyF"(), !dbg !49
  ret i32 0, !dbg !49
}

!35 = distinct !DISubprogram(name: "main", linkageName: "main", scope: !5, file: !1, line: 1, type: !36, spFlags: DISPFlagDefinition, unit: !0, retainedNodes: !2)
!48 = distinct !DILexicalBlock(scope: !35, file: !1, line: 9, column: 1)
!49 = !DILocation(line: 9, column: 6, scope: !48)

I fallback to my case and remove Hashable:
t1.swift:

struct t1 {
    func f1() {}
}

main.swift

let _t1 = t1()
_t1.f1()

It created DICompositeType(tag: DW_TAG_structure_type, name: "t1" ...) in t1.ll and main.ll.

main.ll:

!0 = distinct !DICompileUnit(language: DW_LANG_Swift, file: !1, producer: "Swift version 5.6 (swift-5.6-RELEASE)", isOptimized: true, runtimeVersion: 5, emissionKind: FullDebug, enums: !2, globals: !3, imports: !10)
!3 = !{!4}
!4 = !DIGlobalVariableExpression(var: !5, expr: !DIExpression(DW_OP_constu, 0, DW_OP_stack_value))
!5 = distinct !DIGlobalVariable(name: "_t1", scope: !6, file: !1, line: 1, type: !7, isLocal: true, isDefinition: true)
!7 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !8)
!8 = !DICompositeType(tag: DW_TAG_structure_type, name: "t1", scope: !6, file: !9, elements: !2, runtimeLang: DW_LANG_Swift, identifier: "$s4demo2t1VD")

t1.ll:

!0 = distinct !DICompileUnit(language: DW_LANG_Swift, file: !1, producer: "Swift version 5.6 (swift-5.6-RELEASE)", isOptimized: true, runtimeVersion: 5, emissionKind: FullDebug, enums: !2, imports: !3)
!32 = !DICompositeType(tag: DW_TAG_structure_type, name: "t1", scope: !5, file: !1, elements: !2, runtimeLang: DW_LANG_Swift, identifier: "$s4demo2t1VD")

DwarfDebug.cpp#L1221 always puts the DW_TAG_structure_type("t1") inside the CU of main.swift, then all member functions are put inside this CU/DW_TAG_structure_type.

You are right. DW_TAG_structure_type("t1") doesn't have to be inside in the CU of t1.swift.
But if I add Hashable, it will create a function in the CU of t1.swift that needs a function inside of the DW_TAG_structure_type("t1") in the CU of main.swift.

I don't know how to create a similar IR in C++.

D135114 has fixed fine. We can use any DIE in any CU. (But this does not solve the "no mapping for range" problem mentioned earlier.)

I will check about the Rust language later. For now, I think we should support these new cases.
Because these IRs look fine and maybe many language use like this case.

Sorry, I'm new to DebugInfo (also LLVM), but I will try to improve my term standards.

DianQK added a comment.Oct 7 2022, 7:44 AM

There seems to be a lot of complexity to consider here, would adding an option to control DINode can be shareable be a good suggestion?
Pseudo-code:

bool DwarfUnit::isShareableAcrossCUs(const DINode *D) const {
  ...
  return ((isa<DIType>(D) && !option("disable-die-share-across-cus"))||
          (isa<DISubprogram>(D) && !cast<DISubprogram>(D)->isDefinition())) &&
         !DD->generateTypeUnits();
}

There seems to be a lot of complexity to consider here, would adding an option to control DINode can be shareable be a good suggestion?
Pseudo-code:

bool DwarfUnit::isShareableAcrossCUs(const DINode *D) const {
  ...
  return ((isa<DIType>(D) && !option("disable-die-share-across-cus"))||
          (isa<DISubprogram>(D) && !cast<DISubprogram>(D)->isDefinition())) &&
         !DD->generateTypeUnits();
}

I'm not ready to understand what solutions might be appropriate as I'm still trying to understand the problem and why it arises for Swift but not C++ - sorry I haven't had enough time to try to get the context clearly in my head.

I don't have a Swift compiler, and don't really know Swift (so I'm not sure what Foo: Hashable means - in C++ that would be derivation, and Hashable wouldn't have any reference to Foo (only the other way around, Foo would reference Hashable) - could you include the LLVM IR for this small example so I could take a look?

There seems to be a lot of complexity to consider here, would adding an option to control DINode can be shareable be a good suggestion?
Pseudo-code:

bool DwarfUnit::isShareableAcrossCUs(const DINode *D) const {
  ...
  return ((isa<DIType>(D) && !option("disable-die-share-across-cus"))||
          (isa<DISubprogram>(D) && !cast<DISubprogram>(D)->isDefinition())) &&
         !DD->generateTypeUnits();
}

I'm not ready to understand what solutions might be appropriate as I'm still trying to understand the problem and why it arises for Swift but not C++ - sorry I haven't had enough time to try to get the context clearly in my head.

I don't have a Swift compiler, and don't really know Swift (so I'm not sure what Foo: Hashable means - in C++ that would be derivation, and Hashable wouldn't have any reference to Foo (only the other way around, Foo would reference Hashable) - could you include the LLVM IR for this small example so I could take a look?

struct Foo: Equatable, Hashable {}

Is a definition of a struct with the name Foo that conforms to the protocols Equatable and Hashable. Protocols in Swift are somewhat similar in spirit to Concepts in C++. Defining Foo to conform to Equatable to somewhat similar to declaring bool operator==(const Foo &) = default; in C++. Because of the protocol conformance the compiler must emit a witness table (~ vtable) defining how to call the methods required by Equatable on a Foo object.

There seems to be a lot of complexity to consider here, would adding an option to control DINode can be shareable be a good suggestion?
Pseudo-code:

bool DwarfUnit::isShareableAcrossCUs(const DINode *D) const {
  ...
  return ((isa<DIType>(D) && !option("disable-die-share-across-cus"))||
          (isa<DISubprogram>(D) && !cast<DISubprogram>(D)->isDefinition())) &&
         !DD->generateTypeUnits();
}

I'm not ready to understand what solutions might be appropriate as I'm still trying to understand the problem and why it arises for Swift but not C++ - sorry I haven't had enough time to try to get the context clearly in my head.

I don't have a Swift compiler, and don't really know Swift (so I'm not sure what Foo: Hashable means - in C++ that would be derivation, and Hashable wouldn't have any reference to Foo (only the other way around, Foo would reference Hashable) - could you include the LLVM IR for this small example so I could take a look?

struct Foo: Equatable, Hashable {}

Is a definition of a struct with the name Foo that conforms to the protocols Equatable and Hashable. Protocols in Swift are somewhat similar in spirit to Concepts in C++. Defining Foo to conform to Equatable to somewhat similar to declaring bool operator==(const Foo &) = default; in C++. Because of the protocol conformance the compiler must emit a witness table (~ vtable) defining how to call the methods required by Equatable on a Foo object.

@aprantl is right. Thank you for your explanation.
It's not a good suggestion. I think I'm wrong, please ignore this.

By the way, I tried to fix the "no mapping for range" and "inconsistent range data" issues mentioned earlier in D136039.

DianQK abandoned this revision.Oct 16 2022, 8:02 AM