Page MenuHomePhabricator

[RFC][DwarfDebug][AsmPrinter] Support emitting function-local declaration for a lexical block
AcceptedPublic

Authored by krisb on Nov 12 2021, 1:21 AM.

Details

Summary

This is another attempt to make function-local declarations
(like static variables, structs/classes, and other) be correctly emitted
within a lexical (bracketed) block.

Some ideas were taken from previous attempts [0] and [1].

Limitations:

Since we skip lexical blocks if we couldn't find non-scope children for them,
we need a way to force the emission of blocks that have static locals or local
type declarations but don't have local variables. As a workaround, I
collect such scopes by collectLocalScopesWithDeclsFromCU() and
store them in a set. This set is used later to check whether we need to emit
a lexical block. It, however, lacks information about used records
and typedefs because there is no access to them from a CU.
In this case, we will put the entity to a first available parent scope
(with subprogram as a bound).

Fixes https://bugs.llvm.org/show_bug.cgi?id=19238.

clang's part is at D113743.

[0] https://reviews.llvm.org/D11180
[1] https://reviews.llvm.org/D15976
[2] https://bugs.llvm.org/show_bug.cgi?id=27579

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
thakis added a subscriber: thakis.Dec 4 2021, 8:22 AM

I see this is already reverted (thanks for watching the bots after committing!), but fyi for the relabel: this also broke tests on macOS, e.g. http://45.33.8.238/mac/39897/step_11.txt

MaskRay added inline comments.
llvm/test/DebugInfo/Generic/lexical_block_static.ll
2

Somewhat strange the patch introduced new tests with both - and _ filename separators...

In most components of llvm, - is more popular though I can see directories where _ is more popular.

FWIW this also caused some ThinLTO build failures (linker errors specifically) along the lines of:

(.debug_gnu_pubnames): name lookup table at offset 0x9a5 has a terminator at offset 0x9b3 before the expected end at 0x9f2

I don't have a simple/small reproduction at the moment. But would be good to work together/reach out to me if it's not easily discoverable and figure out this issue before recommitting.

FWIW this also caused some ThinLTO build failures (linker errors specifically) along the lines of:

(.debug_gnu_pubnames): name lookup table at offset 0x9a5 has a terminator at offset 0x9b3 before the expected end at 0x9f2

I don't have a simple/small reproduction at the moment. But would be good to work together/reach out to me if it's not easily discoverable and figure out this issue before recommitting.

Oh, I misread - that this issue was seen on the recommit & hasn't ben reverted yet.

I'll try to reduce a test case, but might be worth reverting in the interim regardless?

FWIW this also caused some ThinLTO build failures (linker errors specifically) along the lines of:

(.debug_gnu_pubnames): name lookup table at offset 0x9a5 has a terminator at offset 0x9b3 before the expected end at 0x9f2

I don't have a simple/small reproduction at the moment. But would be good to work together/reach out to me if it's not easily discoverable and figure out this issue before recommitting.

Oh, I misread - that this issue was seen on the recommit & hasn't ben reverted yet.

I'll try to reduce a test case, but might be worth reverting in the interim regardless?

If you think it'd be better to revert until the issue gets resolved then okay, let me know and I'll revert it. I can't say more without a reproducer.
But could you please share a way to reproduce the issue? Not small/simple is also okay.

llvm/test/DebugInfo/Generic/lexical_block_static.ll
2

This happened cause I borrowed some tests from ellis's patch, those ones with '-'.
Sorry, I somehow didn't notice this difference. I'll either fix it in an NFC or in the next recommit if it'll be decided it's better to revert this patch until LTO issues get resolved.

FWIW this also caused some ThinLTO build failures (linker errors specifically) along the lines of:

(.debug_gnu_pubnames): name lookup table at offset 0x9a5 has a terminator at offset 0x9b3 before the expected end at 0x9f2

I don't have a simple/small reproduction at the moment. But would be good to work together/reach out to me if it's not easily discoverable and figure out this issue before recommitting.

Oh, I misread - that this issue was seen on the recommit & hasn't ben reverted yet.

I'll try to reduce a test case, but might be worth reverting in the interim regardless?

If you think it'd be better to revert until the issue gets resolved then okay, let me know and I'll revert it. I can't say more without a reproducer.
But could you please share a way to reproduce the issue? Not small/simple is also okay.

Unfortunately due to ThinLTO it's a bit hard for me to extract (& not sure if there are bits I need to anonymize, etc - which is easier when reducing just a single file) and share at the moment :/

Some DIEs are coming through with a 0 offset, which prematurely terminates the pubnames/types list. I'm working on figuring out in what situation that happens.

Ah, seems to be related to "isDebugDirectivesOnly" - ie: we have debug info but only for middle end diagnostics, not to produce DWARF in the output...

hmm, wait, no, not that...

Hmm, think I'm honing in on it.

It looks like it's related to a unit that defines a type that's already defined by another unit - usually that first unit could be skipped entirely by https://github.com/llvm/llvm-project/blob/6683099a0d0a17fcde3576733e9c85e3b5f71de5/llvm/lib/CodeGen/AsmPrinter/DwarfFile.cpp#L47 - but since the namespace now gets created first, then the type is skipped...

Hm, no, maybe I'm chasing down a different manifestation caused by 78d15a112cbd545fbb6e1aa37c221ef5aeffb3f2 - perhaps still with this patch (D113741) as a root cause? I'm not sure yet.

Hmm, think I'm honing in on it.

It looks like it's related to a unit that defines a type that's already defined by another unit - usually that first unit could be skipped entirely by https://github.com/llvm/llvm-project/blob/6683099a0d0a17fcde3576733e9c85e3b5f71de5/llvm/lib/CodeGen/AsmPrinter/DwarfFile.cpp#L47 - but since the namespace now gets created first, then the type is skipped...

Correct link is actually: https://github.com/llvm/llvm-project/blob/6683099a0d0a17fcde3576733e9c85e3b5f71de5/llvm/lib/CodeGen/AsmPrinter/DwarfFile.cpp#L71

Hm, no, maybe I'm chasing down a different manifestation caused by 78d15a112cbd545fbb6e1aa37c221ef5aeffb3f2 - perhaps still with this patch (D113741) as a root cause? I'm not sure yet.

Alrighty, got an A/B comparison with/without this patch (but still not reduced to share) - seems we end up creating a second split unit for one of the imported units under ThinLTO, which is incorrect (because Split DWARF only really supports one CU per split unit - so having two in a single dwo file is broken - at least if you're going to package the file with dwp, which we do).

Looking into it further/trying to get a reduced example.

krisb added a comment.EditedDec 22 2021, 6:44 AM

Alrighty, got an A/B comparison with/without this patch (but still not reduced to share) - seems we end up creating a second split unit for one of the imported units under ThinLTO, which is incorrect (because Split DWARF only really supports one CU per split unit - so having two in a single dwo file is broken - at least if you're going to package the file with dwp, which we do).

Looking into it further/trying to get a reduced example.

(trying to guess the issue by keywords...)

Is the issue about a function-local import?

The only diff in imported entities handling I see is that we no longer skip emission of local imports under 'includeMinimalInlineScopes()' (which was the case before this patch, but it seems I missed this fact).
Can it be a hidden issue revealed by this mistake?

P.S. I'll fix 'includeMinimalInlineScopes()' for local import.

Alrighty, got an A/B comparison with/without this patch (but still not reduced to share) - seems we end up creating a second split unit for one of the imported units under ThinLTO, which is incorrect (because Split DWARF only really supports one CU per split unit - so having two in a single dwo file is broken - at least if you're going to package the file with dwp, which we do).

Looking into it further/trying to get a reduced example.

(trying to guess the issue by keywords...)

Is the issue about a function-local import?

The only diff in imported entities handling I see is that we no longer skip emission of local imports under 'includeMinimalInlineScopes()' (which was the case before this patch, but it seems I missed this fact).
Can it be a hidden issue revealed by this mistake?

P.S. I'll fix 'includeMinimalInlineScopes()' for local import.

Yep, I'm certainly seeing some function-local imports in the test case I'm investigating, and that sounds related to be sure. Can't guarantee it's the precise issue, but I'll keep poking around at it & having a fix for the local import under minimal-inline-scopes would be good to have in any case.

dblaikie added inline comments.Dec 22 2021, 8:07 AM
llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp
1039–1059

Adding back these two includeMinimalInlineScopes checks doesn't seem to address the failure I'm investigating, at least.

But they may still represent a bug (either an independent one, or as part of the issue I'm investigating).

OK, think I've got it now.

It requires splitDebugInlining=false in the source CU, and `true' in the destination CU. Needs a local using decl in the source CU and some function import from source to destination CU.

This example, compiled at -O1 does the trick:

; ModuleID = 'ab.bc'
source_filename = "llvm-link"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

%"struct.ns::t1" = type { i8 }

; Function Attrs: mustprogress uwtable
define available_externally dso_local void @_Z2f1v() #0 !dbg !4 {
entry:
  %v1 = alloca %"struct.ns::t1", align 1
  call void @llvm.dbg.declare(metadata %"struct.ns::t1"* %v1, metadata !21, metadata !DIExpression()), !dbg !22
  call void @_Z3pinv(), !dbg !23
  ret void, !dbg !24
}

; Function Attrs: nofree nosync nounwind readnone speculatable willreturn
declare void @llvm.dbg.declare(metadata, metadata, metadata) #1

declare dso_local void @_Z3pinv() #2

; Function Attrs: mustprogress norecurse uwtable
define dso_local i32 @main() #3 !dbg !25 {
entry:
  call void @_Z2f1v(), !dbg !29
  ret i32 0, !dbg !30
}

attributes #0 = { mustprogress uwtable "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
attributes #1 = { nofree nosync nounwind readnone speculatable willreturn }
attributes #2 = { "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
attributes #3 = { mustprogress norecurse uwtable "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }

!llvm.dbg.cu = !{!0, !10}
!llvm.ident = !{!12, !12}
!llvm.module.flags = !{!13, !14, !15, !16, !17}

!0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus_14, file: !1, producer: "clang version 14.0.0 (git@github.com:llvm/llvm-project.git 75b622a7959479ccdb758ebe0973061b7b4fcacd)", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, imports: !2, splitDebugInlining: false, nameTableKind: GNU)
!1 = !DIFile(filename: "a.cpp", directory: "/usr/local/google/home/blaikie/dev/scratch")
!2 = !{!3}
!3 = !DIImportedEntity(tag: DW_TAG_imported_declaration, scope: !4, entity: !8, file: !1, line: 3)
!4 = distinct !DISubprogram(name: "f1", linkageName: "_Z2f1v", scope: !1, file: !1, line: 3, type: !5, scopeLine: 3, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !0, retainedNodes: !7)
!5 = !DISubroutineType(types: !6)
!6 = !{null}
!7 = !{}
!8 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "t1", scope: !9, file: !1, line: 1, size: 8, flags: DIFlagTypePassByValue, elements: !7, identifier: "_ZTSN2ns2t1E")
!9 = !DINamespace(name: "ns", scope: null)
!10 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus_14, file: !11, producer: "clang version 14.0.0 (git@github.com:llvm/llvm-project.git 75b622a7959479ccdb758ebe0973061b7b4fcacd)", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, splitDebugInlining: true, nameTableKind: GNU)
!11 = !DIFile(filename: "b.cpp", directory: "/usr/local/google/home/blaikie/dev/scratch")
!12 = !{!"clang version 14.0.0 (git@github.com:llvm/llvm-project.git 75b622a7959479ccdb758ebe0973061b7b4fcacd)"}
!13 = !{i32 7, !"Dwarf Version", i32 4}
!14 = !{i32 2, !"Debug Info Version", i32 3}
!15 = !{i32 1, !"wchar_size", i32 4}
!16 = !{i32 7, !"uwtable", i32 1}
!17 = !{i32 7, !"frame-pointer", i32 2}
!18 = distinct !DISubprogram(name: "f2", linkageName: "_Z2f2v", scope: !1, file: !1, line: 4, type: !5, scopeLine: 4, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !0, retainedNodes: !7)
!19 = !DILocation(line: 4, column: 13, scope: !18)
!20 = !DILocation(line: 4, column: 19, scope: !18)
!21 = !DILocalVariable(name: "v1", scope: !4, file: !1, line: 3, type: !8)
!22 = !DILocation(line: 3, column: 37, scope: !4)
!23 = !DILocation(line: 3, column: 41, scope: !4)
!24 = !DILocation(line: 3, column: 48, scope: !4)
!25 = distinct !DISubprogram(name: "main", scope: !11, file: !11, line: 2, type: !26, scopeLine: 2, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !10, retainedNodes: !7)
!26 = !DISubroutineType(types: !27)
!27 = !{!28}
!28 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
!29 = !DILocation(line: 3, column: 3, scope: !25)
!30 = !DILocation(line: 4, column: 1, scope: !25)

I'll look into reverting this (& dependent patches) after lunch to unblock the main branch - but happy to work with you/continue to work on this to figure out how to address the issue. It may be a latent bug of my own creation regarding testing the splitDwarfInilning property on the wrong CU or something.

oh, and in another example I found that this patch caused LLVM to produce two dwo CUs, which is incorrect (but unlike the other example, doesn't crash). (this is the same as the first example, but with splitDedugInlining enabled on both CUs)

; ModuleID = 'ab.bc'
source_filename = "llvm-link"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

%"struct.ns::t1" = type { i8 }

; Function Attrs: mustprogress uwtable
define available_externally dso_local void @_Z2f1v() #0 !dbg !4 {
entry:
  %v1 = alloca %"struct.ns::t1", align 1
  call void @llvm.dbg.declare(metadata %"struct.ns::t1"* %v1, metadata !21, metadata !DIExpression()), !dbg !22
  call void @_Z3pinv(), !dbg !23
  ret void, !dbg !24
}

; Function Attrs: nofree nosync nounwind readnone speculatable willreturn
declare void @llvm.dbg.declare(metadata, metadata, metadata) #1

declare dso_local void @_Z3pinv() #2

; Function Attrs: mustprogress norecurse uwtable
define dso_local i32 @main() #3 !dbg !25 {
entry:
  call void @_Z2f1v(), !dbg !29
  ret i32 0, !dbg !30
}

attributes #0 = { mustprogress uwtable "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
attributes #1 = { nofree nosync nounwind readnone speculatable willreturn }
attributes #2 = { "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
attributes #3 = { mustprogress norecurse uwtable "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }

!llvm.dbg.cu = !{!0, !10}
!llvm.ident = !{!12, !12}
!llvm.module.flags = !{!13, !14, !15, !16, !17}

!0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus_14, file: !1, producer: "clang version 14.0.0 (git@github.com:llvm/llvm-project.git 75b622a7959479ccdb758ebe0973061b7b4fcacd)", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, imports: !2, splitDebugInlining: true, nameTableKind: GNU)
!1 = !DIFile(filename: "a.cpp", directory: "/usr/local/google/home/blaikie/dev/scratch")
!2 = !{!3}
!3 = !DIImportedEntity(tag: DW_TAG_imported_declaration, scope: !4, entity: !8, file: !1, line: 3)
!4 = distinct !DISubprogram(name: "f1", linkageName: "_Z2f1v", scope: !1, file: !1, line: 3, type: !5, scopeLine: 3, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !0, retainedNodes: !7)
!5 = !DISubroutineType(types: !6)
!6 = !{null}
!7 = !{}
!8 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "t1", scope: !9, file: !1, line: 1, size: 8, flags: DIFlagTypePassByValue, elements: !7, identifier: "_ZTSN2ns2t1E")
!9 = !DINamespace(name: "ns", scope: null)
!10 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus_14, file: !11, producer: "clang version 14.0.0 (git@github.com:llvm/llvm-project.git 75b622a7959479ccdb758ebe0973061b7b4fcacd)", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, splitDebugInlining: true, nameTableKind: GNU)
!11 = !DIFile(filename: "b.cpp", directory: "/usr/local/google/home/blaikie/dev/scratch")
!12 = !{!"clang version 14.0.0 (git@github.com:llvm/llvm-project.git 75b622a7959479ccdb758ebe0973061b7b4fcacd)"}
!13 = !{i32 7, !"Dwarf Version", i32 4}
!14 = !{i32 2, !"Debug Info Version", i32 3}
!15 = !{i32 1, !"wchar_size", i32 4}
!16 = !{i32 7, !"uwtable", i32 1}
!17 = !{i32 7, !"frame-pointer", i32 2}
!18 = distinct !DISubprogram(name: "f2", linkageName: "_Z2f2v", scope: !1, file: !1, line: 4, type: !5, scopeLine: 4, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !0, retainedNodes: !7)
!19 = !DILocation(line: 4, column: 13, scope: !18)
!20 = !DILocation(line: 4, column: 19, scope: !18)
!21 = !DILocalVariable(name: "v1", scope: !4, file: !1, line: 3, type: !8)
!22 = !DILocation(line: 3, column: 37, scope: !4)
!23 = !DILocation(line: 3, column: 41, scope: !4)
!24 = !DILocation(line: 3, column: 48, scope: !4)
!25 = distinct !DISubprogram(name: "main", scope: !11, file: !11, line: 2, type: !26, scopeLine: 2, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !10, retainedNodes: !7)
!26 = !DISubroutineType(types: !27)
!27 = !{!28}
!28 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
!29 = !DILocation(line: 3, column: 3, scope: !25)
!30 = !DILocation(line: 4, column: 1, scope: !25)
krisb added a comment.Dec 22 2021, 2:03 PM

@dblaikie, thank you for the reproducers!

I've tried to compile&examine them, but haven't found any issues. Likely, I did something wrong.
Could you, please, give more details on how to reproduce the issue(s)? I mean, exact compilation command line and/or crash details / wrong dwarf output? Thank you!

@dblaikie, thank you for the reproducers!

I've tried to compile&examine them, but haven't found any issues. Likely, I did something wrong.
Could you, please, give more details on how to reproduce the issue(s)? I mean, exact compilation command line and/or crash details / wrong dwarf output? Thank you!

Oh, right, sorry:

For the first one, you should observe a crash (in a +Asserts build) when running this command:

$ clang++-tot ab-crash.ll -gsplit-dwarf -c -O1

The backtrace looks something like this:

$ clang++-tot ab-crash.ll -gsplit-dwarf -c -O1 -S 
clang++-tot: /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp:660: llvm::DIE *llvm::DwarfCompileUnit::constructInlinedScopeDIE(llvm::LexicalScope *): Assertion `OriginDIE && "Unable to find original DIE for an inlined subprogram."' failed.
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.      Program arguments: clang++-tot ab-crash.ll -gsplit-dwarf -c -O1 -S
1.      Code generation
2.      Running pass 'Function Pass Manager' on module 'ab-crash.ll'.
3.      Running pass 'X86 Assembly Printer' on function '@main'
 #0 0x0000000009e604ea llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/Support/Unix/Signals.inc:565:11
 #1 0x0000000009e6069b PrintStackTraceSignalHandler(void*) /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/Support/Unix/Signals.inc:632:1
 #2 0x0000000009e5ed63 llvm::sys::RunSignalHandlers() /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/Support/Signals.cpp:97:5
 #3 0x0000000009e5fdfe llvm::sys::CleanupOnSignal(unsigned long) /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/Support/Unix/Signals.inc:362:1
 #4 0x0000000009d6fa4e (anonymous namespace)::CrashRecoveryContextImpl::HandleCrash(int, unsigned long) /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/Support/CrashRecoveryContext.cpp:0:7
 #5 0x0000000009d6fe03 CrashRecoverySignalHandler(int) /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/Support/CrashRecoveryContext.cpp:390:1
 #6 0x00007f60568c08e0 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x138e0)
 #7 0x00007f6056389e71 raise ./signal/../sysdeps/unix/sysv/linux/raise.c:50:1
 #8 0x00007f6056373536 abort ./stdlib/abort.c:81:7
 #9 0x00007f605637341f get_sysdep_segment_value ./intl/loadmsgcat.c:509:8
#10 0x00007f605637341f _nl_load_domain ./intl/loadmsgcat.c:970:34
#11 0x00007f60563827f2 (/lib/x86_64-linux-gnu/libc.so.6+0x357f2)
#12 0x000000000b57ce6e llvm::DwarfCompileUnit::constructInlinedScopeDIE(llvm::LexicalScope*) /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp:0:3
#13 0x000000000b57cb6c llvm::DwarfCompileUnit::constructScopeDIE(llvm::LexicalScope*, llvm::DIE&) /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp:538:10
#14 0x000000000b57d3ae llvm::DwarfCompileUnit::createAndAddScopeChildren(llvm::LexicalScope*, llvm::DIE&) /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp:1072:7
#15 0x000000000b57efa1 llvm::DwarfCompileUnit::constructSubprogramScopeDIE(llvm::DISubprogram const*, llvm::LexicalScope*) /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp:1023:14
#16 0x000000000b52bc0d llvm::DwarfDebug::endFunctionImpl(llvm::MachineFunction const*) /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp:2301:50
#17 0x000000000b505ab2 llvm::DebugHandlerBase::endFunction(llvm::MachineFunction const*) /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/CodeGen/AsmPrinter/DebugHandlerBase.cpp:0:5
#18 0x000000000b4e1f4c llvm::AsmPrinter::emitFunctionBody() /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp:1505:3
#19 0x0000000007ed7113 llvm::X86AsmPrinter::runOnMachineFunction(llvm::MachineFunction&) /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/Target/X86/X86AsmPrinter.cpp:82:3
#20 0x0000000008a96477 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/CodeGen/MachineFunctionPass.cpp:72:8
#21 0x0000000009165d5e llvm::FPPassManager::runOnFunction(llvm::Function&) /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/IR/LegacyPassManager.cpp:1439:23
#22 0x000000000916abb2 llvm::FPPassManager::runOnModule(llvm::Module&) /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/IR/LegacyPassManager.cpp:1485:16
#23 0x0000000009166649 (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/IR/LegacyPassManager.cpp:1554:23
#24 0x00000000091661bd llvm::legacy::PassManagerImpl::run(llvm::Module&) /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/IR/LegacyPassManager.cpp:542:16
#25 0x000000000916ae91 llvm::legacy::PassManager::run(llvm::Module&) /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/IR/LegacyPassManager.cpp:1681:3

But that's actually not the original situation I was investigating, so maybe it's another bug? (possibly same root cause, though)

Though the original issue I was investigating was from invalid .debug_gnu_pubnames (DIE offsets of 0, which prematurely terminates the .debug_gnu_pubnames list and produces a warning from llvm-dwarfdump), and by adding assert(Offset) in llvm::DIE::getOffset (initially I added it in the pubnames emission code, but generally it should be true that the offset of a DIE should not be queried until it's been computed, and no DIE has offset zero, because the offset is from the start of the CU contribution - so the headers come first and mean the DIE offset is always non-zero). It seems I still don't have a small reproduction for that exact issue, though.

The second test case provided above, that produces multiple DWO CUs can be reproduced as follows:

$ clang++-tot ab-multi-cu.ll -gsplit-dwarf -c -O1 && llvm-dwarfdump-tot ab-multi-cu.o
ab-multi-cu.o:  file format elf64-x86-64

.debug_info contents:
0x00000000: Compile Unit: length = 0x00000053, format = DWARF32, version = 0x0004, abbr_offset = 0x0000, addr_size = 0x08 (next unit at 0x00000057)

0x0000000b: DW_TAG_compile_unit
              DW_AT_stmt_list   (0x00000000)
              DW_AT_comp_dir    ("/usr/local/google/home/blaikie/dev/scratch")
              DW_AT_GNU_pubnames        (true)
              DW_AT_GNU_dwo_name        ("ab-multi-cu.dwo")
              DW_AT_GNU_dwo_id  (0xe9d1660e0395035d)
              DW_AT_low_pc      (0x0000000000000000)
              DW_AT_high_pc     (0x000000000000000d)
              DW_AT_GNU_addr_base       (0x00000000)

0x00000030:   DW_TAG_subprogram
                DW_AT_low_pc    (0x0000000000000000)
                DW_AT_high_pc   (0x000000000000000d)
                DW_AT_name      ("main")

0x00000041:     DW_TAG_inlined_subroutine
                  DW_AT_abstract_origin (0x000000000000007b "f1")
                  DW_AT_low_pc  (0x0000000000000004)
                  DW_AT_high_pc (0x0000000000000009)
                  DW_AT_call_file       ("/usr/local/google/home/blaikie/dev/scratch/b.cpp")
                  DW_AT_call_line       (3)
                  DW_AT_call_column     (0x03)

0x00000055:     NULL

0x00000056:   NULL
0x00000057: Compile Unit: length = 0x00000027, format = DWARF32, version = 0x0004, abbr_offset = 0x0000, addr_size = 0x08 (next unit at 0x00000082)

0x00000062: DW_TAG_compile_unit
              DW_AT_stmt_list   (0x0000004c)
              DW_AT_comp_dir    ("/usr/local/google/home/blaikie/dev/scratch")
              DW_AT_GNU_pubnames        (true)
              DW_AT_GNU_dwo_name        ("ab-multi-cu.dwo")
              DW_AT_GNU_dwo_id  (0xeb227ecffcd242c1)
              DW_AT_GNU_addr_base       (0x00000000)

0x0000007b:   DW_TAG_subprogram
                DW_AT_name      ("f1")
                DW_AT_inline    (DW_INL_inlined)

0x00000081:   NULL

In this case the second CU is OK (for the cross-CU inlining that's occurred), but it shouldn't have any DWO_* attributes, and the .dwo file shouldn't have a second CU in it (which it does after this patch) because Split DWARF doesn't have a good way to describe cross-CU references correctly once the CU is packaged in a dwp file. (this behavior is meant to be controlled by the SplitDwarfCrossCUReferences flag.

krisb added a comment.EditedDec 23 2021, 10:30 AM

@dblaikie added another attempt to revert this as a review: D116225. Will commit once pre-commit tests + local lldb tests passed.

About the issues.
First of all, thank you for all the details! I was able to reproduce both issues.
(1) The crash doesn't seem related to the patch and reverting it doesn't make things better. It seems there is an inconsistency between when/where an abstract origin gets emitted and when/where it's requested.
(2) The issue with wrong dwo attributes is indeed caused by the imported declaration that gets emitted where it shouldn't (at least comparing with the behavior before this patch).
Before this patch 'includeMinimalInlineScopes()' makes emission of imported entity skipped and the second CU assumed empty, so we did not emit dwo attributes for it.
But I'm not sure I fully understand how DwarfDebug handles split dwarf. Why do we need the 'includeMinimalInlineScopes()' check for correctness here?

krisb reopened this revision.Dec 23 2021, 10:42 AM
This revision is now accepted and ready to land.Dec 23 2021, 10:42 AM

@dblaikie added another attempt to revert this as a review: D116225. Will commit once pre-commit tests + local lldb tests passed.

About the issues.
First of all, thank you for all the details! I was able to reproduce both issues.
(1) The crash doesn't seem related to the patch and reverting it doesn't make things better. It seems there is an inconsistency between when/where an abstract origin gets emitted and when/where it's requested.

Hmm, bother - guess I failed to correctly reduce the issue. I'll try again.

(2) The issue with wrong dwo attributes is indeed caused by the imported declaration that gets emitted where it shouldn't (at least comparing with the behavior before this patch).
Before this patch 'includeMinimalInlineScopes()' makes emission of imported entity skipped and the second CU assumed empty, so we did not emit dwo attributes for it.
But I'm not sure I fully understand how DwarfDebug handles split dwarf. Why do we need the 'includeMinimalInlineScopes()' check for correctness here?

It gets a bit complicated because a DWO file can only contain a single CU, but the .o file can contain multiple CUs - so to preserve as much fidelity as possible, if "split dwarf inlining" is enabled (include inline stack frames in the CU in the .o file, so the program can be symbolized even without the dwo/dwp files) then the .o file will get multiple CUs with -g1-like debug info, and one of those CUs will also have the dwo attributes and be a skeleton for the .dwo CU (but the dwo file, instead of having multiple CUs to describe the cross-CU inlining, it'll degrade the debug info quality by moving the DIEs that should be in other CUs in the dwo into the single CU there).

Turned out we had two different patches causing the same assert (should've bisected them both to be sure - added some unnecessary confusion to this whole situation). One of my reduced test cases actually was for https://reviews.llvm.org/D115325#3209087 - working on another for this patch (D113741) now...

To continue with a more accurate fix, one example for your reference. (The behavior is changed with/without this reversion):

int main() {      // <====== function scope
  int x = 3;
  int num = 6;
  printf("%d\n", x);
  if (num) {   // <======= lexical block 1
    static int x = 45;
    printf("%d\n", x);
    if (x) {   // <======= lexical block 2
      int x = 145;
      printf("%d\n", x);
    }
  }
  return 0;
}

Without this patch:

<1><2a>: Abbrev Number: 2 (DW_TAG_subprogram)
   <2b>   DW_AT_low_pc      : 0x10010680
   <33>   DW_AT_high_pc     : 0xd4
   <37>   DW_AT_frame_base  : 1 byte block: 6f         (DW_OP_reg31 (r31))
   <39>   DW_AT_name        : (indirect string, offset: 0xcb): main
   <3f>   DW_AT_type        : <0x94>
   <43>   DW_AT_external    : 1
<2><43>: Abbrev Number: 3 (DW_TAG_variable)
   <44>   DW_AT_location    : 3 byte block: 91 f0 0    (DW_OP_fbreg: 112)
   <48>   DW_AT_name        : (indirect string, offset: 0xd4): x
   <4e>   DW_AT_type        : <0x94>
<2><52>: Abbrev Number: 3 (DW_TAG_variable)
   <53>   DW_AT_location    : 3 byte block: 91 ec 0    (DW_OP_fbreg: 108)
   <57>   DW_AT_name        : (indirect string, offset: 0xd0): num
   <5d>   DW_AT_type        : <0x94>
<2><61>: Abbrev Number: 4 (DW_TAG_lexical_block)     <<<<<<<<<<this should be for lexical block 2, so its DIE level should be 3
   <62>   DW_AT_low_pc      : 0x1001070c
   <6a>   DW_AT_high_pc     : 0x20
<3><6e>: Abbrev Number: 3 (DW_TAG_variable)             <<<<<<<<<this is for x=145 and nested in lexical block2, so its DIE level should be 4
   <6f>   DW_AT_location    : 3 byte block: 91 e8 0    (DW_OP_fbreg: 104)
   <73>   DW_AT_name        : (indirect string, offset: 0xd4): x
   <79>   DW_AT_type        : <0x94>
<3><7d>: Abbrev Number: 0
<2><7e>: Abbrev Number: 5 (DW_TAG_variable)              <<<<<<<<<<this is for static int x = 45, so there should be another lexical block DIE for lexical block 1, and its DIE level should be 2.
   <7f>   DW_AT_name        : (indirect string, offset: 0xd4): x
   <83>   DW_AT_type        : <0x94>
   <89>   DW_AT_location    : 9 byte block: 3 e8 a 3 10 0 0 0 0        (DW_OP_addr: 10030ae8)
<2><93>: Abbrev Number: 0
<1><94>: Abbrev Number: 6 (DW_TAG_base_type)
   <95>   DW_AT_name        : (indirect string, offset: 0x0): int
   <99>   DW_AT_encoding    : 5        (signed)
   <9a>   DW_AT_byte_size   : 4
<1><9b>: Abbrev Number: 0

And we get wrong result for static number x at line 7. In debugger, it is printed as 3.

With this patch:

0x0000002a:   DW_TAG_subprogram
                DW_AT_low_pc    (0x0000000010010680)
                DW_AT_high_pc   (0x0000000010010754)
                DW_AT_frame_base        (DW_OP_reg31 X31)
                DW_AT_name      ("main")
                DW_AT_type      (0x00000094 "int")
                DW_AT_external  (true)

0x00000043:     DW_TAG_variable
                  DW_AT_name    ("x")          <<<<<<<<<<<<we need a lexical block for this definition
                  DW_AT_type    (0x00000094 "int")
                  DW_AT_location        (DW_OP_addr 0x10030ae8)

0x00000058:     DW_TAG_variable
                  DW_AT_location        (DW_OP_fbreg +112)
                  DW_AT_name    ("x")
                  DW_AT_type    (0x00000094 "int")

0x00000067:     DW_TAG_variable
                  DW_AT_location        (DW_OP_fbreg +108)
                  DW_AT_name    ("num")
                  DW_AT_type    (0x00000094 "int")

0x00000076:     DW_TAG_lexical_block
                  DW_AT_low_pc  (0x000000001001070c)
                  DW_AT_high_pc (0x000000001001072c)

0x00000083:       DW_TAG_variable
                    DW_AT_location      (DW_OP_fbreg +104)
                    DW_AT_name  ("x")
                    DW_AT_type  (0x00000094 "int")

0x00000092:       NULL

Now we get wrong result for local variable x at line 3. In debugger, it is printed as 45. (The DIE for the local variable x is overridden by the static variable x in lexical block 1).

To continue with a more accurate fix, one example for your reference. (The behavior is changed with/without this reversion):

Sorry, I'm not following this comment ^ (& the rest) - could you rephrase if there's a question/concern you have I can help clarify/describe/work with you on?

To continue with a more accurate fix, one example for your reference. (The behavior is changed with/without this reversion):

Sorry, I'm not following this comment ^ (& the rest) - could you rephrase if there's a question/concern you have I can help clarify/describe/work with you on?

I guess there will be a follow-up patch for the function-local declaration for lexical block? This patch seems like a reversion of a previous functionality change?
In our internal test, we also found an issue related to variables in the lexical block, so I pasted it here for reference : )

krisb added a comment.Dec 28 2021, 4:13 PM

To continue with a more accurate fix, one example for your reference. (The behavior is changed with/without this reversion):

Sorry, I'm not following this comment ^ (& the rest) - could you rephrase if there's a question/concern you have I can help clarify/describe/work with you on?

I guess there will be a follow-up patch for the function-local declaration for lexical block? This patch seems like a reversion of a previous functionality change?
In our internal test, we also found an issue related to variables in the lexical block, so I pasted it here for reference : )

This is the functional change itself, reverting patch is at https://reviews.llvm.org/D116225.

Without this patch:

I guess you meant without the revert, and actually _with_ this patch, right?
If so, you'd need a clang side change https://reviews.llvm.org/D113743 (which has been reverted as well) to see static locals placed correctly.
If to apply all the reverted patches (D114705, D113741, and D113743) I believe you should see a correct DWARF compiled for your example.

If to apply all the reverted patches (D114705, D113741, and D113743) I believe you should see a correct DWARF compiled for your example.

Yeah, verify with manually applying all the above patches, it works as expected. Looking forward to seeing these patches merged in again.

krisb updated this revision to Diff 411133.Feb 24 2022, 7:46 AM
  • Rebase on ToT & minor fixes.
  • Fix the issue when an local imported entity caused emission of multiple CUs in a dwo when split dwarf is enabled.
  • Add a test for the aforementioned issue.
krisb added a comment.Feb 24 2022, 8:15 AM

@dblaikie I've updated the patch with a fix for split dwarf issue (the test is included). Could I ask you to check the full build on your side?

The issue was in wrong assumptions about in which CUs we need to create those local entities. Basically, they [CUs] should match the ones where the parent subprogram was emitted.
It's not an easy thing to find the particular list of suited CUs while we are in endModule() (especially, because split-dwarf and split-dwarf inlining introduce a tricky logic around), so I chose to collect this information during subprograms emission.
The solution still looks a bit hacky, but I hope this is a step forward.

What about the second issue, unfortunately (or fortunately? :)), I'm still able to reproduce the crash on ToT of main, so I'm not sure this patch may cause the issue. I'll appreciate if could check it once again.

Thank you and sorry for the long delay.

@dblaikie I've updated the patch with a fix for split dwarf issue (the test is included). Could I ask you to check the full build on your side?

The issue was in wrong assumptions about in which CUs we need to create those local entities. Basically, they [CUs] should match the ones where the parent subprogram was emitted.
It's not an easy thing to find the particular list of suited CUs while we are in endModule() (especially, because split-dwarf and split-dwarf inlining introduce a tricky logic around), so I chose to collect this information during subprograms emission.
The solution still looks a bit hacky, but I hope this is a step forward.

What about the second issue, unfortunately (or fortunately? :)), I'm still able to reproduce the crash on ToT of main, so I'm not sure this patch may cause the issue. I'll appreciate if could check it once again.

Thank you and sorry for the long delay.

All good.

I'm not immediately following why a list of CUs is needed/why we'd create the variable in more than one CU - could you explain that?

llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
1406–1407

This looks buggy - it's returning a pointer to a local variable (the TheCU pointer) which means it'll be dangling once the function returns, yeah? (I'd expect msan to catch this, for instance)

krisb updated this revision to Diff 411184.Feb 24 2022, 10:51 AM

Minor fix.

@dblaikie I've updated the patch with a fix for split dwarf issue (the test is included). Could I ask you to check the full build on your side?

The issue was in wrong assumptions about in which CUs we need to create those local entities. Basically, they [CUs] should match the ones where the parent subprogram was emitted.
It's not an easy thing to find the particular list of suited CUs while we are in endModule() (especially, because split-dwarf and split-dwarf inlining introduce a tricky logic around), so I chose to collect this information during subprograms emission.
The solution still looks a bit hacky, but I hope this is a step forward.

What about the second issue, unfortunately (or fortunately? :)), I'm still able to reproduce the crash on ToT of main, so I'm not sure this patch may cause the issue. I'll appreciate if could check it once again.

Thank you and sorry for the long delay.

All good.

I'm not immediately following why a list of CUs is needed/why we'd create the variable in more than one CU - could you explain that?

Well, this is because we may have a parent subprogram emitted in more than one CU (in the case of split dwarf), and we do not know in which one.
Here are the situations I see (hope I got them right):
(1) If split dwarf disabled, we have a single list of subprogram's abstract origins and can reference them between CUs. So, we'll emit a concrete definition and its abstract origin (any existing) in the same CU. In this case, we have only one CU.
(2) If we have split dwarf enabled, and shareAcrossDWOCUs() is false (it's false by default), every CU will maintain its own abstract origin's list. In this case, we cannot have references between CUs, and if we have cross-CU inlining, we'll create:

  • concrete def -> in the CU the subprogram is defined,
  • abstract origin -> in every CU the subprogram is inlined. For all local entities we should follow the same logic.

(3) SplitDwarfInlining adds skeleton units to this list. Basically, we do not need to emit any local entities for skeleton units in this case, but I just reused already existing logic around includeMinimalInlineScopes().

The reproducer you provided is about the case (2). We tried to emit the imported entity in a CU, where its parent subprogram was defined, but the subprogram itself was emitted in the CU it was inlined.

llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
1406–1407

Uuups, bad experiments. Fixed.

dblaikie added inline comments.Feb 24 2022, 10:57 AM
llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
1406–1407

This code still looks buggy, it's still returning the address of a local variable (the address of "TheU" pointer) so it'd still be UB/dangling/msan-unclean, I think?

krisb updated this revision to Diff 411198.Feb 24 2022, 11:29 AM

Attempt 2.

llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
1406–1407

Hurry isn't a good way to do smth good. Sorry. One more try and I'll go to check this with msan.

@dblaikie I've updated the patch with a fix for split dwarf issue (the test is included). Could I ask you to check the full build on your side?

The issue was in wrong assumptions about in which CUs we need to create those local entities. Basically, they [CUs] should match the ones where the parent subprogram was emitted.
It's not an easy thing to find the particular list of suited CUs while we are in endModule() (especially, because split-dwarf and split-dwarf inlining introduce a tricky logic around), so I chose to collect this information during subprograms emission.
The solution still looks a bit hacky, but I hope this is a step forward.

What about the second issue, unfortunately (or fortunately? :)), I'm still able to reproduce the crash on ToT of main, so I'm not sure this patch may cause the issue. I'll appreciate if could check it once again.

Thank you and sorry for the long delay.

All good.

I'm not immediately following why a list of CUs is needed/why we'd create the variable in more than one CU - could you explain that?

Well, this is because we may have a parent subprogram emitted in more than one CU (in the case of split dwarf), and we do not know in which one.
Here are the situations I see (hope I got them right):
(1) If split dwarf disabled, we have a single list of subprogram's abstract origins and can reference them between CUs. So, we'll emit a concrete definition and its abstract origin (any existing) in the same CU. In this case, we have only one CU.
(2) If we have split dwarf enabled, and shareAcrossDWOCUs() is false (it's false by default), every CU will maintain its own abstract origin's list. In this case, we cannot have references between CUs, and if we have cross-CU inlining, we'll create:

  • concrete def -> in the CU the subprogram is defined,
  • abstract origin -> in every CU the subprogram is inlined. For all local entities we should follow the same logic.

Do you have a test case for this (sorry, I realize it might be one of the test cases in the patch - I haven't looked/would appreciate some pointers)? My recollection is that without shareAcrossDWOCUs we should probably be only emitting one CU if using Split DWARF - Split DWARF doesn't really support multiple CUs in a single .dwo file (& we only support emitting a single .dwo for a given .o file) - but I guess maybe we do in full LTO mode, even if we don't do it in ThinLTO mode (ThinLTO's the only thing I've been especially concerned with/the only mode we use at Google)?

(3) SplitDwarfInlining adds skeleton units to this list. Basically, we do not need to emit any local entities for skeleton units in this case, but I just reused already existing logic around includeMinimalInlineScopes().

The reproducer you provided is about the case (2). We tried to emit the imported entity in a CU, where its parent subprogram was defined, but the subprogram itself was emitted in the CU it was inlined.

llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
1406–1407

That doesn't seem like a valid range either - that range would be from "TheCU" until nullptr - so a huge chunk of the address space?

I think now you need/could have "TheCU + 1" as the end of the range and that'd be correct, now that it's passed in a pointer-to-pointer.

krisb added a comment.Feb 24 2022, 1:18 PM

All good.

I'm not immediately following why a list of CUs is needed/why we'd create the variable in more than one CU - could you explain that?

Well, this is because we may have a parent subprogram emitted in more than one CU (in the case of split dwarf), and we do not know in which one.
Here are the situations I see (hope I got them right):
(1) If split dwarf disabled, we have a single list of subprogram's abstract origins and can reference them between CUs. So, we'll emit a concrete definition and its abstract origin (any existing) in the same CU. In this case, we have only one CU.
(2) If we have split dwarf enabled, and shareAcrossDWOCUs() is false (it's false by default), every CU will maintain its own abstract origin's list. In this case, we cannot have references between CUs, and if we have cross-CU inlining, we'll create:

  • concrete def -> in the CU the subprogram is defined,
  • abstract origin -> in every CU the subprogram is inlined. For all local entities we should follow the same logic.

Do you have a test case for this (sorry, I realize it might be one of the test cases in the patch - I haven't looked/would appreciate some pointers)? My recollection is that without shareAcrossDWOCUs we should probably be only emitting one CU if using Split DWARF - Split DWARF doesn't really support multiple CUs in a single .dwo file (& we only support emitting a single .dwo for a given .o file) - but I guess maybe we do in full LTO mode, even if we don't do it in ThinLTO mode (ThinLTO's the only thing I've been especially concerned with/the only mode we use at Google)?

I was inspired by these two tests:

LLVM :: DebugInfo/X86/split-dwarf-cross-cu-gmlt-g.ll
LLVM :: DebugInfo/X86/split-dwarf-cross-unit-reference.ll

They test the case I described in (2), with 2 CUs in a single .dwo.

krisb updated this revision to Diff 411797.Feb 28 2022, 5:29 AM

Apply David's comment and add 'REQUIRES: object-emission' to tests.

krisb added a comment.Feb 28 2022, 5:29 AM

I'm not immediately following why a list of CUs is needed/why we'd create the variable in more than one CU - could you explain that?

Well, this is because we may have a parent subprogram emitted in more than one CU (in the case of split dwarf), and we do not know in which one.
Here are the situations I see (hope I got them right):
(1) If split dwarf disabled, we have a single list of subprogram's abstract origins and can reference them between CUs. So, we'll emit a concrete definition and its abstract origin (any existing) in the same CU. In this case, we have only one CU.
(2) If we have split dwarf enabled, and shareAcrossDWOCUs() is false (it's false by default), every CU will maintain its own abstract origin's list. In this case, we cannot have references between CUs, and if we have cross-CU inlining, we'll create:

  • concrete def -> in the CU the subprogram is defined,
  • abstract origin -> in every CU the subprogram is inlined. For all local entities we should follow the same logic.

Do you have a test case for this (sorry, I realize it might be one of the test cases in the patch - I haven't looked/would appreciate some pointers)? My recollection is that without shareAcrossDWOCUs we should probably be only emitting one CU if using Split DWARF - Split DWARF doesn't really support multiple CUs in a single .dwo file (& we only support emitting a single .dwo for a given .o file) - but I guess maybe we do in full LTO mode, even if we don't do it in ThinLTO mode (ThinLTO's the only thing I've been especially concerned with/the only mode we use at Google)?

Okay, let's summarize this a bit.

Statement 1. The problem with the example you provided above was because this patch made subprograms and their local "global" entities emitted in different CUs in the case of split DWARF. This is a bug, for sure.
Statement 2. Currently, there is nothing in the code that restricts emitting multiple CUs in a single .dwo (whether for FullLTO or for manually linked modules (as in the tests) we will generate more than one CU per .dwo).

All my assumptions and conclusions are based on the code and existing regression tests. Unfortunately, I'm not aware of the real situation of split DWARF or split DWARF inlining support (and I couldn't find anything related in the docs either).

Having this, I think we cannot assume that a subprogram is guaranteed to be emitted only in one CU, unless we explicitly disallow the case with multiple CUs per .dwo. DWARF standard seems to assume only one CU per .dwo (but I couldn't find any explicit wording about restrictions on the number of CUs in a .dwo). However, it looks like GDB is okay and works fine with more than one CU in a .dwo. It'd be good to have this situation clarified. But it seems to be out of the scope of this patch. Do you have any thoughs on how to proceed with this?

(3) SplitDwarfInlining adds skeleton units to this list. Basically, we do not need to emit any local entities for skeleton units in this case, but I just reused already existing logic around includeMinimalInlineScopes().

The reproducer you provided is about the case (2). We tried to emit the imported entity in a CU, where its parent subprogram was defined, but the subprogram itself was emitted in the CU it was inlined.

Herald added a project: Restricted Project. · View Herald TranscriptMar 7 2022, 2:06 AM

Friendly ping :)

Friendly ping :)

Sorry I haven't got to this yet - it is on my list.

I'm not immediately following why a list of CUs is needed/why we'd create the variable in more than one CU - could you explain that?

Well, this is because we may have a parent subprogram emitted in more than one CU (in the case of split dwarf), and we do not know in which one.
Here are the situations I see (hope I got them right):
(1) If split dwarf disabled, we have a single list of subprogram's abstract origins and can reference them between CUs. So, we'll emit a concrete definition and its abstract origin (any existing) in the same CU. In this case, we have only one CU.
(2) If we have split dwarf enabled, and shareAcrossDWOCUs() is false (it's false by default), every CU will maintain its own abstract origin's list. In this case, we cannot have references between CUs, and if we have cross-CU inlining, we'll create:

  • concrete def -> in the CU the subprogram is defined,
  • abstract origin -> in every CU the subprogram is inlined. For all local entities we should follow the same logic.

Do you have a test case for this (sorry, I realize it might be one of the test cases in the patch - I haven't looked/would appreciate some pointers)? My recollection is that without shareAcrossDWOCUs we should probably be only emitting one CU if using Split DWARF - Split DWARF doesn't really support multiple CUs in a single .dwo file (& we only support emitting a single .dwo for a given .o file) - but I guess maybe we do in full LTO mode, even if we don't do it in ThinLTO mode (ThinLTO's the only thing I've been especially concerned with/the only mode we use at Google)?

Okay, let's summarize this a bit.

Statement 1. The problem with the example you provided above was because this patch made subprograms and their local "global" entities emitted in different CUs in the case of split DWARF. This is a bug, for sure.

Is that bug now addressed in the proposed patch & with tests to demonstrate it's fixed?

Statement 2. Currently, there is nothing in the code that restricts emitting multiple CUs in a single .dwo (whether for FullLTO or for manually linked modules (as in the tests) we will generate more than one CU per .dwo).

I believe DwarfDebug.cpp:SplitDwarfCrossCuReferences is intended to disallow this functionality, so far as I can tell/remember?

All my assumptions and conclusions are based on the code and existing regression tests. Unfortunately, I'm not aware of the real situation of split DWARF or split DWARF inlining support (and I couldn't find anything related in the docs either).

Having this, I think we cannot assume that a subprogram is guaranteed to be emitted only in one CU, unless we explicitly disallow the case with multiple CUs per .dwo. DWARF standard seems to assume only one CU per .dwo (but I couldn't find any explicit wording about restrictions on the number of CUs in a .dwo). However, it looks like GDB is okay and works fine with more than one CU in a .dwo. It'd be good to have this situation clarified. But it seems to be out of the scope of this patch. Do you have any thoughs on how to proceed with this?

One of the issues is that the dwp file format describes an index by CU hash (DWO ID) - it's unclear what it would mean to have multiple entries that overlap (because a single region contains multiple CUs) and where the region doesn't start with the CU for the lookup (because it's some region with multiple units).

krisb added a comment.Mar 30 2022, 2:43 AM

Statement 1. The problem with the example you provided above was because this patch made subprograms and their local "global" entities emitted in different CUs in the case of split DWARF. This is a bug, for sure.

Is that bug now addressed in the proposed patch & with tests to demonstrate it's fixed?

I believe, it is. I created a test from the example you provided above, see llvm/test/DebugInfo/Generic/split-dwarf-local-import.ll.
It checks that the second CU doesn't have DW_AT_GNU_dwo_name and DW_AT_GNU_dwo_id and the imported entity has a proper parent.
But, if you have a chance to check the updated patch in the same conditions this bug originally reproduced, I'll highly appreciate it.

Statement 2. Currently, there is nothing in the code that restricts emitting multiple CUs in a single .dwo (whether for FullLTO or for manually linked modules (as in the tests) we will generate more than one CU per .dwo).

I believe DwarfDebug.cpp:SplitDwarfCrossCuReferences is intended to disallow this functionality, so far as I can tell/remember?

Hmm, I'm not sure fully I understood your point. SplitDwarfCrossCuReferences seems to _allow_ references between CUs, but it isn't enabled by default.
For example, if you enable split-dwarf with FullLTO, clang would emit a single .dwo per complied&linked binary with all the CUs in this .dwo. Neither clang nor backend would emit any errors/warnings.
I'm not insisting this is correct, or this may be useful for someone (I'm simply not aware), just want to mention that this is possible.

All my assumptions and conclusions are based on the code and existing regression tests. Unfortunately, I'm not aware of the real situation of split DWARF or split DWARF inlining support (and I couldn't find anything related in the docs either).

Having this, I think we cannot assume that a subprogram is guaranteed to be emitted only in one CU, unless we explicitly disallow the case with multiple CUs per .dwo. DWARF standard seems to assume only one CU per .dwo (but I couldn't find any explicit wording about restrictions on the number of CUs in a .dwo). However, it looks like GDB is okay and works fine with more than one CU in a .dwo. It'd be good to have this situation clarified. But it seems to be out of the scope of this patch. Do you have any thoughs on how to proceed with this?

One of the issues is that the dwp file format describes an index by CU hash (DWO ID) - it's unclear what it would mean to have multiple entries that overlap (because a single region contains multiple CUs) and where the region doesn't start with the CU for the lookup (because it's some region with multiple units).

So the question is: should we explicitly disallow emitting multiple CUs in .dwo (meaning, for example, split-dwarf with FullLTO w/o split-dwarf-cross-cu-references)? By reporting an error or at least specifying somewhere in docs that this case it not correct thus not supported?

jmorse added a subscriber: jmorse.Apr 1 2022, 6:05 AM

I tried applying this internally & running the same build that tripped over something last time this was committed & I seem to be tripping over the same thing:

(.debug_gnu_pubnames): name lookup table at offset 0x38cd has a terminator at offset 0x38db before the expected end at 0x3a3d

As well as a bunch of unresolved symbols during linking... :/

I'll have to work on reducing a test case for this.

krisb added a comment.Apr 8 2022, 12:28 PM

I tried applying this internally & running the same build that tripped over something last time this was committed & I seem to be tripping over the same thing:

(.debug_gnu_pubnames): name lookup table at offset 0x38cd has a terminator at offset 0x38db before the expected end at 0x3a3d

As well as a bunch of unresolved symbols during linking... :/

I'll have to work on reducing a test case for this.

If I remember correctly, this issue with .debug_gnu_pubnames was caused by another patch (https://reviews.llvm.org/rG78d15a112cbd545fbb6e1aa37c221ef5aeffb3f2).
Did I understand you right that the same issue is reproduced with w/o that patch?

Drive-by comment on the topic of the gnu pubnames error,

If I remember correctly, this issue with .debug_gnu_pubnames was caused by another patch (https://reviews.llvm.org/rG78d15a112cbd545fbb6e1aa37c221ef5aeffb3f2).
Did I understand you right that the same issue is reproduced with w/o that patch?

I've run into that error myself when trying to alter which unit a subprogram definition goes in -- see the comment at D94976#2543257 . That patch never landed in the end because I couldn't figure out exactly what was wrong. IMHO, there's some latent weirdness that triggers that error, exactly how is unclear though. IMO, it's not specific to rG78d15a112cbd5 .

[0] "(.debug_gnu_pubnames): name lookup table at offset (...) has a terminator at offset (...) before the expected end at (...)"

I tried applying this internally & running the same build that tripped over something last time this was committed & I seem to be tripping over the same thing:

(.debug_gnu_pubnames): name lookup table at offset 0x38cd has a terminator at offset 0x38db before the expected end at 0x3a3d

As well as a bunch of unresolved symbols during linking... :/

I'll have to work on reducing a test case for this.

If I remember correctly, this issue with .debug_gnu_pubnames was caused by another patch (https://reviews.llvm.org/rG78d15a112cbd545fbb6e1aa37c221ef5aeffb3f2).
Did I understand you right that the same issue is reproduced with w/o that patch?

That's correct. Sorry I lost context on this again and don't think I have a reduced test case/something to work from right now, but I'll keep in on my radar to work on.

krisb added a comment.Mon, May 16, 8:41 AM

There is an alternative implementation that relies on addition field of DISubprogram/DILexicalScope that tracks static locals, local imports and types D125693.


Drive-by comment on the topic of the gnu pubnames error,

If I remember correctly, this issue with .debug_gnu_pubnames was caused by another patch (https://reviews.llvm.org/rG78d15a112cbd545fbb6e1aa37c221ef5aeffb3f2).
Did I understand you right that the same issue is reproduced with w/o that patch?

I've run into that error myself when trying to alter which unit a subprogram definition goes in -- see the comment at D94976#2543257 . That patch never landed in the end because I couldn't figure out exactly what was wrong. IMHO, there's some latent weirdness that triggers that error, exactly how is unclear though. IMO, it's not specific to rG78d15a112cbd5 .

[0] "(.debug_gnu_pubnames): name lookup table at offset (...) has a terminator at offset (...) before the expected end at (...)"

I've tried to reproduce the issue using the example from D94976 (and with this patch), it compiled fine and didn't show the mentioned problem with pubnames.
Another question is about the combination of FullLTO and split-dwarf which has been already discussed in this thread and which isn't (fully) supported by LLVM mainline (yet, at least in a way that doesn't violate DWARF specification).

So, I'm still looking for a reproducer for the issue with pubnames. Any help will be appreciated.

There is an alternative implementation that relies on addition field of DISubprogram/DILexicalScope that tracks static locals, local imports and types D125693.

Ah, OK - so that 5-part sequence is an alternative to this one?

Might be good to figure out the issue here before debating which direction to go in, etc. (hmm, one of the patches in that series I've already approved? So am I reviewing alternatives here? Given the complexity here that's a bit costly for me/end up feeling like I'm drowning under trying to figure out all these moving parts... - a linear sequence is generally OK/good, I can review the easy parts and focus on the difficult ones, but multiple large alternatives can be a bit daunting)


Drive-by comment on the topic of the gnu pubnames error,

If I remember correctly, this issue with .debug_gnu_pubnames was caused by another patch (https://reviews.llvm.org/rG78d15a112cbd545fbb6e1aa37c221ef5aeffb3f2).
Did I understand you right that the same issue is reproduced with w/o that patch?

I've run into that error myself when trying to alter which unit a subprogram definition goes in -- see the comment at D94976#2543257 . That patch never landed in the end because I couldn't figure out exactly what was wrong. IMHO, there's some latent weirdness that triggers that error, exactly how is unclear though. IMO, it's not specific to rG78d15a112cbd5 .

[0] "(.debug_gnu_pubnames): name lookup table at offset (...) has a terminator at offset (...) before the expected end at (...)"

I've tried to reproduce the issue using the example from D94976 (and with this patch), it compiled fine and didn't show the mentioned problem with pubnames.
Another question is about the combination of FullLTO and split-dwarf which has been already discussed in this thread and which isn't (fully) supported by LLVM mainline (yet, at least in a way that doesn't violate DWARF specification).

So, I'm still looking for a reproducer for the issue with pubnames. Any help will be appreciated.

Yeah, I still need to figure out how to do that/get you that. :/

(is this at all/possibly related to https://reviews.llvm.org/D124892 ?)

There is an alternative implementation that relies on addition field of DISubprogram/DILexicalScope that tracks static locals, local imports and types D125693.

Ah, OK - so that 5-part sequence is an alternative to this one?

Might be good to figure out the issue here before debating which direction to go in, etc. (hmm, one of the patches in that series I've already approved? So am I reviewing alternatives here? Given the complexity here that's a bit costly for me/end up feeling like I'm drowning under trying to figure out all these moving parts... - a linear sequence is generally OK/good, I can review the easy parts and focus on the difficult ones, but multiple large alternatives can be a bit daunting)


Drive-by comment on the topic of the gnu pubnames error,

If I remember correctly, this issue with .debug_gnu_pubnames was caused by another patch (https://reviews.llvm.org/rG78d15a112cbd545fbb6e1aa37c221ef5aeffb3f2).
Did I understand you right that the same issue is reproduced with w/o that patch?

I've run into that error myself when trying to alter which unit a subprogram definition goes in -- see the comment at D94976#2543257 . That patch never landed in the end because I couldn't figure out exactly what was wrong. IMHO, there's some latent weirdness that triggers that error, exactly how is unclear though. IMO, it's not specific to rG78d15a112cbd5 .

[0] "(.debug_gnu_pubnames): name lookup table at offset (...) has a terminator at offset (...) before the expected end at (...)"

I've tried to reproduce the issue using the example from D94976 (and with this patch), it compiled fine and didn't show the mentioned problem with pubnames.
Another question is about the combination of FullLTO and split-dwarf which has been already discussed in this thread and which isn't (fully) supported by LLVM mainline (yet, at least in a way that doesn't violate DWARF specification).

So, I'm still looking for a reproducer for the issue with pubnames. Any help will be appreciated.

Yeah, I still need to figure out how to do that/get you that. :/

(is this at all/possibly related to https://reviews.llvm.org/D124892 ?)

Seems like a discussion is more about multiple CUs in DWO with full LTO? Need to read all the history in more details.
Side note BOLT doesn't support multiple CUs in the dwo. There are certain assumptions about abbrev, and how it writes out new dwo files.
Does pubnames actually used for anything? My impression is that debuggers (or at least lldb) ignores it.

There is an alternative implementation that relies on addition field of DISubprogram/DILexicalScope that tracks static locals, local imports and types D125693.

Ah, OK - so that 5-part sequence is an alternative to this one?

Might be good to figure out the issue here before debating which direction to go in, etc. (hmm, one of the patches in that series I've already approved? So am I reviewing alternatives here? Given the complexity here that's a bit costly for me/end up feeling like I'm drowning under trying to figure out all these moving parts... - a linear sequence is generally OK/good, I can review the easy parts and focus on the difficult ones, but multiple large alternatives can be a bit daunting)

Ping on this ^


Drive-by comment on the topic of the gnu pubnames error,

If I remember correctly, this issue with .debug_gnu_pubnames was caused by another patch (https://reviews.llvm.org/rG78d15a112cbd545fbb6e1aa37c221ef5aeffb3f2).
Did I understand you right that the same issue is reproduced with w/o that patch?

I've run into that error myself when trying to alter which unit a subprogram definition goes in -- see the comment at D94976#2543257 . That patch never landed in the end because I couldn't figure out exactly what was wrong. IMHO, there's some latent weirdness that triggers that error, exactly how is unclear though. IMO, it's not specific to rG78d15a112cbd5 .

[0] "(.debug_gnu_pubnames): name lookup table at offset (...) has a terminator at offset (...) before the expected end at (...)"

I've tried to reproduce the issue using the example from D94976 (and with this patch), it compiled fine and didn't show the mentioned problem with pubnames.
Another question is about the combination of FullLTO and split-dwarf which has been already discussed in this thread and which isn't (fully) supported by LLVM mainline (yet, at least in a way that doesn't violate DWARF specification).

So, I'm still looking for a reproducer for the issue with pubnames. Any help will be appreciated.

Yeah, I still need to figure out how to do that/get you that. :/

(is this at all/possibly related to https://reviews.llvm.org/D124892 ?)

Seems like a discussion is more about multiple CUs in DWO with full LTO? Need to read all the history in more details.
Side note BOLT doesn't support multiple CUs in the dwo. There are certain assumptions about abbrev, and how it writes out new dwo files.
Does pubnames actually used for anything? My impression is that debuggers (or at least lldb) ignores it.

Yep, debug_gnu_pubnames are used by gold and lld to build gdb_index which improves gdb startup time.

There is an alternative implementation that relies on addition field of DISubprogram/DILexicalScope that tracks static locals, local imports and types D125693.

Ah, OK - so that 5-part sequence is an alternative to this one?

Might be good to figure out the issue here before debating which direction to go in, etc. (hmm, one of the patches in that series I've already approved? So am I reviewing alternatives here? Given the complexity here that's a bit costly for me/end up feeling like I'm drowning under trying to figure out all these moving parts... - a linear sequence is generally OK/good, I can review the easy parts and focus on the difficult ones, but multiple large alternatives can be a bit daunting)

Ping on this ^


Drive-by comment on the topic of the gnu pubnames error,

If I remember correctly, this issue with .debug_gnu_pubnames was caused by another patch (https://reviews.llvm.org/rG78d15a112cbd545fbb6e1aa37c221ef5aeffb3f2).
Did I understand you right that the same issue is reproduced with w/o that patch?

I've run into that error myself when trying to alter which unit a subprogram definition goes in -- see the comment at D94976#2543257 . That patch never landed in the end because I couldn't figure out exactly what was wrong. IMHO, there's some latent weirdness that triggers that error, exactly how is unclear though. IMO, it's not specific to rG78d15a112cbd5 .

[0] "(.debug_gnu_pubnames): name lookup table at offset (...) has a terminator at offset (...) before the expected end at (...)"

I've tried to reproduce the issue using the example from D94976 (and with this patch), it compiled fine and didn't show the mentioned problem with pubnames.
Another question is about the combination of FullLTO and split-dwarf which has been already discussed in this thread and which isn't (fully) supported by LLVM mainline (yet, at least in a way that doesn't violate DWARF specification).

So, I'm still looking for a reproducer for the issue with pubnames. Any help will be appreciated.

Yeah, I still need to figure out how to do that/get you that. :/

(is this at all/possibly related to https://reviews.llvm.org/D124892 ?)

Seems like a discussion is more about multiple CUs in DWO with full LTO? Need to read all the history in more details.
Side note BOLT doesn't support multiple CUs in the dwo. There are certain assumptions about abbrev, and how it writes out new dwo files.
Does pubnames actually used for anything? My impression is that debuggers (or at least lldb) ignores it.

Yep, debug_gnu_pubnames are used by gold and lld to build gdb_index which improves gdb startup time.

Ah right, forgot about it.

krisb added a comment.Wed, May 25, 9:43 AM

There is an alternative implementation that relies on addition field of DISubprogram/DILexicalScope that tracks static locals, local imports and types D125693.

Ah, OK - so that 5-part sequence is an alternative to this one?

Mostly. The alternative implementation reuses large part of work of this patch.
Basically, this first implementation has (had) 3 patches in sequence:

  • D114705 (moving handling of globals, types and imports from beginModule() to endModule() in DwarfDebug.cpp)
  • D113741 (this one, backend's part)
  • D113743 (clang's part)

The new one has 5 patches:

  • D125691 (adding new 'localDecls' field to DILexicalBlock and DISubprogram),
  • D114705 (taken from previous implementation w/o changes),
  • D125693 (backend's part, replacement for D113741)
  • D125694, D125695 (clang's part, the same as D113743, but split on 2 patches and with some minor improvements).

Might be good to figure out the issue here before debating which direction to go in, etc. (hmm, one of the patches in that series I've already approved? So am I reviewing alternatives here? Given the complexity here that's a bit costly for me/end up feeling like I'm drowning under trying to figure out all these moving parts... - a linear sequence is generally OK/good, I can review the easy parts and focus on the difficult ones, but multiple large alternatives can be a bit daunting)

I'm ready to continue working with this patchset to find and fix the issue with pubnames, but I have no ideas how I can reproduce it, so I decided to move forward and try another approach which looks a bit better to me.
If you have some time to help me finding the reproducer or could give some hints on this, I'll be happy to fix the aforementioned problem first.

Drive-by comment on the topic of the gnu pubnames error,

If I remember correctly, this issue with .debug_gnu_pubnames was caused by another patch (https://reviews.llvm.org/rG78d15a112cbd545fbb6e1aa37c221ef5aeffb3f2).
Did I understand you right that the same issue is reproduced with w/o that patch?

I've run into that error myself when trying to alter which unit a subprogram definition goes in -- see the comment at D94976#2543257 . That patch never landed in the end because I couldn't figure out exactly what was wrong. IMHO, there's some latent weirdness that triggers that error, exactly how is unclear though. IMO, it's not specific to rG78d15a112cbd5 .

[0] "(.debug_gnu_pubnames): name lookup table at offset (...) has a terminator at offset (...) before the expected end at (...)"

I've tried to reproduce the issue using the example from D94976 (and with this patch), it compiled fine and didn't show the mentioned problem with pubnames.
Another question is about the combination of FullLTO and split-dwarf which has been already discussed in this thread and which isn't (fully) supported by LLVM mainline (yet, at least in a way that doesn't violate DWARF specification).

So, I'm still looking for a reproducer for the issue with pubnames. Any help will be appreciated.

Yeah, I still need to figure out how to do that/get you that. :/

(is this at all/possibly related to https://reviews.llvm.org/D124892 ?)

One of the issues you mentioned here https://reviews.llvm.org/D113741#3207125 seems to relate to D124892, but as I said, it reproduces w/o this patchset and likely has different root cause.

OK, so if I want to reproduce the issue for you, I need to apply these three patches? D114705 D113741 D113743

krisb added a comment.Wed, May 25, 1:53 PM

OK, so if I want to reproduce the issue for you, I need to apply these three patches? D114705 D113741 D113743

You, likely, not need D113743 (it'd been reverted just after being committed, so it seems it couldn't be the guilty one).

OK, so if I want to reproduce the issue for you, I need to apply these three patches? D114705 D113741 D113743

OK, reapplied the patches and reproduced the issue with a simple test program using Control Flow Integrity. I'm not sure entirely how easy this will be to extract into something reproducible outside Google's internal build system (my understanding is that CFI is a whole-program (including the standard library) sort of thing, which might make satisfying those requirements outside difficult, though not impossible - the features do exist outside of Google/are usable/tested there, etc) but I'll get started.

@dblaikie, thank you for the reproducers!

I've tried to compile&examine them, but haven't found any issues. Likely, I did something wrong.
Could you, please, give more details on how to reproduce the issue(s)? I mean, exact compilation command line and/or crash details / wrong dwarf output? Thank you!

Oh, right, sorry:

For the first one, you should observe a crash (in a +Asserts build) when running this command:

$ clang++-tot ab-crash.ll -gsplit-dwarf -c -O1

The backtrace looks something like this:

$ clang++-tot ab-crash.ll -gsplit-dwarf -c -O1 -S 
clang++-tot: /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp:660: llvm::DIE *llvm::DwarfCompileUnit::constructInlinedScopeDIE(llvm::LexicalScope *): Assertion `OriginDIE && "Unable to find original DIE for an inlined subprogram."' failed.
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.      Program arguments: clang++-tot ab-crash.ll -gsplit-dwarf -c -O1 -S
1.      Code generation
2.      Running pass 'Function Pass Manager' on module 'ab-crash.ll'.
3.      Running pass 'X86 Assembly Printer' on function '@main'
 #0 0x0000000009e604ea llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/Support/Unix/Signals.inc:565:11
 #1 0x0000000009e6069b PrintStackTraceSignalHandler(void*) /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/Support/Unix/Signals.inc:632:1
 #2 0x0000000009e5ed63 llvm::sys::RunSignalHandlers() /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/Support/Signals.cpp:97:5
 #3 0x0000000009e5fdfe llvm::sys::CleanupOnSignal(unsigned long) /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/Support/Unix/Signals.inc:362:1
 #4 0x0000000009d6fa4e (anonymous namespace)::CrashRecoveryContextImpl::HandleCrash(int, unsigned long) /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/Support/CrashRecoveryContext.cpp:0:7
 #5 0x0000000009d6fe03 CrashRecoverySignalHandler(int) /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/Support/CrashRecoveryContext.cpp:390:1
 #6 0x00007f60568c08e0 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x138e0)
 #7 0x00007f6056389e71 raise ./signal/../sysdeps/unix/sysv/linux/raise.c:50:1
 #8 0x00007f6056373536 abort ./stdlib/abort.c:81:7
 #9 0x00007f605637341f get_sysdep_segment_value ./intl/loadmsgcat.c:509:8
#10 0x00007f605637341f _nl_load_domain ./intl/loadmsgcat.c:970:34
#11 0x00007f60563827f2 (/lib/x86_64-linux-gnu/libc.so.6+0x357f2)
#12 0x000000000b57ce6e llvm::DwarfCompileUnit::constructInlinedScopeDIE(llvm::LexicalScope*) /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp:0:3
#13 0x000000000b57cb6c llvm::DwarfCompileUnit::constructScopeDIE(llvm::LexicalScope*, llvm::DIE&) /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp:538:10
#14 0x000000000b57d3ae llvm::DwarfCompileUnit::createAndAddScopeChildren(llvm::LexicalScope*, llvm::DIE&) /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp:1072:7
#15 0x000000000b57efa1 llvm::DwarfCompileUnit::constructSubprogramScopeDIE(llvm::DISubprogram const*, llvm::LexicalScope*) /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp:1023:14
#16 0x000000000b52bc0d llvm::DwarfDebug::endFunctionImpl(llvm::MachineFunction const*) /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp:2301:50
#17 0x000000000b505ab2 llvm::DebugHandlerBase::endFunction(llvm::MachineFunction const*) /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/CodeGen/AsmPrinter/DebugHandlerBase.cpp:0:5
#18 0x000000000b4e1f4c llvm::AsmPrinter::emitFunctionBody() /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp:1505:3
#19 0x0000000007ed7113 llvm::X86AsmPrinter::runOnMachineFunction(llvm::MachineFunction&) /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/Target/X86/X86AsmPrinter.cpp:82:3
#20 0x0000000008a96477 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/CodeGen/MachineFunctionPass.cpp:72:8
#21 0x0000000009165d5e llvm::FPPassManager::runOnFunction(llvm::Function&) /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/IR/LegacyPassManager.cpp:1439:23
#22 0x000000000916abb2 llvm::FPPassManager::runOnModule(llvm::Module&) /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/IR/LegacyPassManager.cpp:1485:16
#23 0x0000000009166649 (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/IR/LegacyPassManager.cpp:1554:23
#24 0x00000000091661bd llvm::legacy::PassManagerImpl::run(llvm::Module&) /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/IR/LegacyPassManager.cpp:542:16
#25 0x000000000916ae91 llvm::legacy::PassManager::run(llvm::Module&) /usr/local/google/home/blaikie/dev/llvm/src/llvm/lib/IR/LegacyPassManager.cpp:1681:3

But that's actually not the original situation I was investigating, so maybe it's another bug? (possibly same root cause, though)

Though the original issue I was investigating was from invalid .debug_gnu_pubnames (DIE offsets of 0, which prematurely terminates the .debug_gnu_pubnames list and produces a warning from llvm-dwarfdump), and by adding assert(Offset) in llvm::DIE::getOffset (initially I added it in the pubnames emission code, but generally it should be true that the offset of a DIE should not be queried until it's been computed, and no DIE has offset zero, because the offset is from the start of the CU contribution - so the headers come first and mean the DIE offset is always non-zero). It seems I still don't have a small reproduction for that exact issue, though.

The second test case provided above, that produces multiple DWO CUs can be reproduced as follows:

$ clang++-tot ab-multi-cu.ll -gsplit-dwarf -c -O1 && llvm-dwarfdump-tot ab-multi-cu.o
ab-multi-cu.o:  file format elf64-x86-64

.debug_info contents:
0x00000000: Compile Unit: length = 0x00000053, format = DWARF32, version = 0x0004, abbr_offset = 0x0000, addr_size = 0x08 (next unit at 0x00000057)

0x0000000b: DW_TAG_compile_unit
              DW_AT_stmt_list   (0x00000000)
              DW_AT_comp_dir    ("/usr/local/google/home/blaikie/dev/scratch")
              DW_AT_GNU_pubnames        (true)
              DW_AT_GNU_dwo_name        ("ab-multi-cu.dwo")
              DW_AT_GNU_dwo_id  (0xe9d1660e0395035d)
              DW_AT_low_pc      (0x0000000000000000)
              DW_AT_high_pc     (0x000000000000000d)
              DW_AT_GNU_addr_base       (0x00000000)

0x00000030:   DW_TAG_subprogram
                DW_AT_low_pc    (0x0000000000000000)
                DW_AT_high_pc   (0x000000000000000d)
                DW_AT_name      ("main")

0x00000041:     DW_TAG_inlined_subroutine
                  DW_AT_abstract_origin (0x000000000000007b "f1")
                  DW_AT_low_pc  (0x0000000000000004)
                  DW_AT_high_pc (0x0000000000000009)
                  DW_AT_call_file       ("/usr/local/google/home/blaikie/dev/scratch/b.cpp")
                  DW_AT_call_line       (3)
                  DW_AT_call_column     (0x03)

0x00000055:     NULL

0x00000056:   NULL
0x00000057: Compile Unit: length = 0x00000027, format = DWARF32, version = 0x0004, abbr_offset = 0x0000, addr_size = 0x08 (next unit at 0x00000082)

0x00000062: DW_TAG_compile_unit
              DW_AT_stmt_list   (0x0000004c)
              DW_AT_comp_dir    ("/usr/local/google/home/blaikie/dev/scratch")
              DW_AT_GNU_pubnames        (true)
              DW_AT_GNU_dwo_name        ("ab-multi-cu.dwo")
              DW_AT_GNU_dwo_id  (0xeb227ecffcd242c1)
              DW_AT_GNU_addr_base       (0x00000000)

0x0000007b:   DW_TAG_subprogram
                DW_AT_name      ("f1")
                DW_AT_inline    (DW_INL_inlined)

0x00000081:   NULL

In this case the second CU is OK (for the cross-CU inlining that's occurred), but it shouldn't have any DWO_* attributes, and the .dwo file shouldn't have a second CU in it (which it does after this patch) because Split DWARF doesn't have a good way to describe cross-CU references correctly once the CU is packaged in a dwp file. (this behavior is meant to be controlled by the SplitDwarfCrossCUReferences flag.

This does look like a stack trace for D124892.

Right, so this does only reproduce with Split DWARF + Split DWARF inlining + ThinLTO, I think. The CUs for the imported units Split DWARF inlining are being designated as split units, but then split units aren't actually emitted (because of the single-unit limitation of Split DWARF) and so the pubnames are emitted for DIEs that were never layed out/emitted, by the looks of it, so they get an offset of zero, which looks like/creates a truncated pubnames contribution.

Trying to make a small reproducer.

Right, so this does only reproduce with Split DWARF + Split DWARF inlining + ThinLTO, I think. The CUs for the imported units Split DWARF inlining are being designated as split units, but then split units aren't actually emitted (because of the single-unit limitation of Split DWARF) and so the pubnames are emitted for DIEs that were never layed out/emitted, by the looks of it, so they get an offset of zero, which looks like/creates a truncated pubnames contribution.

Trying to make a small reproducer.

OK, think I've got something in IR at least. Looks like it could be reproduced from source + LTO probably. With an assertions-enabled build of LLVM/Clang:

define void @f1() !dbg !13 {
lbl:
  ret void, !dbg !16
}

define void @f2() !dbg !22 {
lbl:
  ret void, !dbg !23
}

!llvm.dbg.cu = !{!0, !2, !10}
!llvm.module.flags = !{!12}

!0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus_14, file: !1, emissionKind: FullDebug)
!1 = !DIFile(filename: "a.cc", directory: "")
!2 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus_14, file: !3, emissionKind: FullDebug, imports: !4)
!3 = !DIFile(filename: "b.cc", directory: "")
!4 = !{!5}
!5 = !DIImportedEntity(tag: DW_TAG_imported_declaration, scope: !6, entity: !7)
!6 = !DISubprogram(scope: null, spFlags: DISPFlagOptimized)
!7 = !DINamespace(scope: !2)
!8 = !DISubroutineType(types: !9)
!9 = !{}
!10 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus_14, file: !11, emissionKind: FullDebug)
!11 = !DIFile(filename: "c.cc", directory: "")
!12 = !{i32 2, !"Debug Info Version", i32 3}
!13 = distinct !DISubprogram(scope: null, type: !8, spFlags: DISPFlagDefinition, unit: !0)
!16 = !DILocation(line: 0, scope: !17, inlinedAt: !18)
!17 = distinct !DISubprogram(scope: null, unit: !10)
!18 = !DILocation(line: 0, scope: !21)
!21 = !DILexicalBlockFile(scope: !13, discriminator: 0)
!22 = distinct !DISubprogram(scope: null, type: !8, spFlags: DISPFlagDefinition, unit: !0)
!23 = !DILocation(line: 0, scope: !24, inlinedAt: !25)
!24 = distinct !DISubprogram(scope: null, unit: !2)
!25 = !DILocation(line: 0, scope: !22)
clang -cc1 -emit-obj  -split-dwarf-file x.dwo x.ll

Does that work for you?