This is an archive of the discontinued LLVM Phabricator instance.

[DebugInfo] Support types, imports and static locals declared in a lexical block (3/5)
AbandonedPublic

Authored by krisb on May 16 2022, 8:38 AM.

Details

Summary

This implements support of lexical block scopes for function-local types and static locals. Before this patch and its clang counterparts (see D125694 and D125695), LLVM ignores the fact that such entities are being defined within a lexical block and puts them directly into a parent function.
Consider the following example:

 1  int main() {
 2    int a = 0;
 3    switch(a) {
 4    case 1: {
 5      static const int a = 1;
 6    }	
 7    case 2: {
 8      static const int a = 2;
 9    }
10    }
11  }

All the 'a''s DW_TAG_variable go to subprogram's scope which confuses debuggers when 'a' variable is being inspected.

0x0000002a:   DW_TAG_subprogram
              ...

0x00000043:     DW_TAG_variable
                  DW_AT_name  ("a")
                  DW_AT_type  (0x0000007c "const int")
                  DW_AT_decl_file ("temp.cpp")
                  DW_AT_decl_line (5) 
                  DW_AT_location  (DW_OP_addr 0x402004)

0x00000058:     DW_TAG_variable
                  DW_AT_name  ("a")
                  DW_AT_type  (0x0000007c "const int")
                  DW_AT_decl_file ("temp.cpp")
                  DW_AT_decl_line (8)
                  DW_AT_location  (DW_OP_addr 0x402008)

0x0000006d:     DW_TAG_variable
                  DW_AT_location  (DW_OP_fbreg -8) 
                  DW_AT_name  ("a")
                  DW_AT_decl_file ("temp.cpp")
                  DW_AT_decl_line (2) 
                  DW_AT_type  (0x00000081 "int")

This fixes

In the same time, it also unifies implementation of local imported entities allowing all the function-local declarations to be handled the same way,
thus fixes

This is an alternative implementation for D113741 which relies on localDecls field of DISubprogram or DILexicalBlock for getting local declarations (types, imports or static variable) for a particular local scope.

It has 2 issues from D113741 resolved:

  • Previous patch may misscope type declarations belong to a lexical block. This happened because it wasn't possible to know if there are any types belonging to a particular local scope by the time we started to emit this scope. So, we may skip emission of lexical blocks even if they contain some type declarations, and later couldn't find a proper parent for those types. localDecls tracks all the types declared within a particular local scope, so we no longer skip non-empty lexical blocks.
  • Previous patch didn't have a mapping between a local declaration and its containing subprogram DIE (and CU DIE), which may cause issues like https://reviews.llvm.org/D113741#3206883. Now we are able to create such mapping basing on localDecls of DISubprogram or DILexicalBlock.

Another thing that should be noted that with this patch we will no longer emit local declarations if its parent scope doesn't have a LexicalScope calculated for it (i.e. there are no DILocations that point to this scope or any of its children scopes) unless something refers such a declaration (relevant for types), see changes in llvm/test/Instrumentation/InstrProfiling/debug-info-correlate.ll as an example. I guess this shouldn't affect debugging experience.

The patch contains a lot of changes in the tests, but most of them trivial and just reflect changes in metadata, order of metadata and DWARF entities:
Order changes only:

  • clang/test/CodeGen/debug-info-codeview-unnamed.c
  • clang/test/CodeGen/debug-info-unused-types.c
  • clang/test/CodeGen/debug-info-unused-types.cpp
  • clang/test/CodeGenCXX/debug-info-anon-union-vars.cpp
  • clang/test/CodeGenCXX/debug-info-codeview-unnamed.cpp
  • clang/test/CodeGenCXX/debug-info-gline-tables-only-codeview.cpp
  • clang/test/CodeGenCXX/debug-lambda-this.cpp
  • clang/test/CodeGenCXX/debug-info-cxx1y.cpp
  • llvm/test/DebugInfo/Generic/namespace.ll
  • llvm/test/DebugInfo/X86/gnu-public-names.ll

Empty retainedNodes no longer emitted:

  • clang/test/CodeGen/attr-btf_tag-disubprogram-callsite.c
  • clang/test/CodeGenCXX/aix-static-init-debug-info.cpp
  • clang/test/CodeGenCXX/debug-info-template.cpp
  • clang/test/CodeGenObjC/debug-info-category.m

Local entities moved from CU to SP:

  • clang/test/CodeGenCXX/debug-info-namespace.cpp
  • llvm/test/CodeGen/AArch64/arm64-2011-03-17-AsmPrinterCrash.ll
  • llvm/test/CodeGen/BPF/BTF/static-var-inited-sec.ll
  • llvm/test/CodeGen/BPF/BTF/static-var-inited.ll
  • llvm/test/CodeGen/BPF/BTF/static-var-readonly-sec.ll
  • llvm/test/CodeGen/BPF/BTF/static-var-readonly.ll
  • llvm/test/CodeGen/BPF/BTF/static-var-sec.ll
  • llvm/test/CodeGen/BPF/BTF/static-var.ll
  • llvm/test/CodeGen/BPF/dwarfdump.ll
  • llvm/test/CodeGen/PowerPC/pr24546.ll
  • llvm/test/CodeGen/X86/dbg-distringtype-uint.ll
  • llvm/test/DebugInfo/Generic/2009-11-05-DeadGlobalVariable.ll
  • llvm/test/DebugInfo/X86/DW_AT_specification.ll
  • llvm/test/DebugInfo/X86/dimodule-external-fortran.ll
  • llvm/test/DebugInfo/X86/distringtype.ll
  • llvm/test/DebugInfo/X86/dwarfdump-DIImportedEntity_elements.ll
  • llvm/test/DebugInfo/X86/fission-inline.ll
  • llvm/test/DebugInfo/X86/fission-local-import.ll
  • llvm/test/DebugInfo/X86/fission-no-inline-gsym.ll
  • llvm/test/DebugInfo/X86/lexical-block-file-inline.ll
  • llvm/test/DebugInfo/X86/namelist1.ll
  • llvm/test/DebugInfo/X86/namelist2.ll
  • llvm/test/DebugInfo/omit-empty.ll

Non-trivial changes:

  • llvm/test/DebugInfo/Generic/imported-name-inlined.ll
  • llvm/test/DebugInfo/Generic/verifier-invalid-disubprogram.ll
  • llvm/test/Instrumentation/InstrProfiling/debug-info-correlate.ll
  • llvm/test/ThinLTO/X86/debuginfo-cu-import.ll
  • llvm/unittests/Transforms/Utils/CloningTest.cpp

Diff Detail

Event Timeline

krisb created this revision.May 16 2022, 8:38 AM
Herald added a project: Restricted Project. · View Herald Transcript
krisb requested review of this revision.May 16 2022, 8:38 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMay 16 2022, 8:38 AM

Mark this and D125691 as WIP since I'm still testing the approach on various combinations of debug info and optimization options (O0, O3, thinlto, split-dwarf, split-dwarf-inlining, gline-tables-only, etc.).
Except of the testing purpose itself I'm also trying to catch the issue reported by David here https://reviews.llvm.org/D113741#3437182 for previous implementation (while I hope this approach wouldn't trigger the issue, it'd be good to catch and fix it anyway).

ormris removed a subscriber: ormris.May 16 2022, 11:07 AM

Thanks a lot for the patch! It would be great to get this issue finally fixed. I assume that this is the main patch, other patches in the stack seem like just preparation/adjustments needed for this one to work.

This an alternative implementation for D113741 which relies on localDecls field of DISubprogram or DILexicalBlock for getting local declarations (types, imports or static variable) for a particular local scope.

Can you include a short description of the issue? PR19238 has a good and short example, we can inline it here.

Another thing that should be noted that with this patch we will no longer emit local declarations if its parent scope doesn't have a LexicalScope calculated for it

If a local declaration (a type or a static variable) is optimized away, then with this patch we have no debug information for it, right? For types we have -debug-info-kind=unused-types as a workaround (as shown in lexical-block-retained-types.ll), so the issue is only with static variables that have no uses. Given that they were broken before this patch and there is no way to access them from a debugger (cannot set a breakpoint if a block is optimized away), I think it is fine.

llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
1382

So here we discard LLVM IR metadata that have local declarations tied to a CU, right? This will break compatibility with old LLVM IR. Can we do some upgrade to convert this "old style" metadata the way we expect it now? Does it make sense to support both variants?

krisb edited the summary of this revision. (Show Details)May 23 2022, 4:01 AM
krisb added a comment.May 23 2022, 5:13 AM

Thank you, @asavonic for your comments!

I've added some more details about the issues this is going to fix, hope it'll help to review the patchset.

Thanks a lot for the patch! It would be great to get this issue finally fixed. I assume that this is the main patch, other patches in the stack seem like just preparation/adjustments needed for this one to work.

Correct. The first two patches were extracted to just make this one a bit smaller and easier to review.

llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
1382

Good question. I haven't thought about backward compatibility, not even sure we have some requirements for this. The easiest option seems to increase debug metadata version and drop debug info if it isn't compatible with this one as it done in AutoUpgrade. This prevents producing incorrect DWARF for ill-formed metadata.

Upgrading "old style" metadata would be a bit tricky. Current implementation assumes that all the local declarations (excluding regular local vars and labels) are referenced by parent's localDecls field, including types. Before this patchset there were no ways to find types except by references from their uses. So, to find all the function-local types we may need to iterate over all instructions and parse all debug metadata, which likely increases compile time. Not sure this trade-off is a worth one.

krisb edited the summary of this revision. (Show Details)May 31 2022, 7:36 AM
krisb updated this revision to Diff 434434.Jun 6 2022, 4:49 AM

Add two tests from D113741.
Fixed crash in DwarfCompileUnit in case of PR55680 (the issue itself wasn't fixed), add a test.

Tested the whole patchset with various combinations of

  • opt levels (O0, O2, O3, ThinLTO, FullLTO),
  • debug info options (-fdebug-types-section, -gline-tables-only, -fno-eliminate-unused-debug-types -gsplit-dwarf, -fsplit-dwarf-inlining),
  • and some additional checks / asserts (like additional verification of abstract / concrete out-of-line tree to be sure local entities are not duplicated, do not go to the wrong tree)

on clang bootstrapping build. No issues so far (except PR55680, which caused not by this and related patches).

This still doesn't guarantee that the patch accounts all the corner cases and no issues would be exposed later, but seems good to start with.
Removing [WIP] status.

krisb retitled this revision from [DebugInfo][WIP] Support types, imports and static locals declared in a lexical block (3/5) to [DebugInfo] Support types, imports and static locals declared in a lexical block (3/5).Jun 13 2022, 10:26 AM
krisb updated this revision to Diff 438617.Jun 21 2022, 3:03 AM

Rebase.

krisb updated this revision to Diff 441031.Jun 29 2022, 8:45 AM

Make DwarfDebug::endModule() to skip function-local decls
CUs 'imports', 'globals' and 'enums' instead of asserting.

ellis added inline comments.Jul 5 2022, 1:57 PM
llvm/test/Instrumentation/InstrProfiling/debug-info-correlate.ll
23

I asked the same question in D113741, but why is this test changed?

krisb added inline comments.Jul 6 2022, 2:21 AM
llvm/test/Instrumentation/InstrProfiling/debug-info-correlate.ll
23

Normally, we emit function-local entities iff a parent function has location information. This is done the same way for local variables, labels, parameters, imported entities, and, now, for static locals as well.
Before this change static locals behaved more like global variables and get emitted doesn't matter its parent function. This patch makes them handled more like local variables.

I believe either the call or the 'ret' (or, likely, both) had had DILocation attached originally, but it has been removed to simplify the test.

ellis added inline comments.Jul 6 2022, 10:39 AM
llvm/test/Instrumentation/InstrProfiling/debug-info-correlate.ll
23

I just checked and the llvm.instrprof.increment intrinsic does not have debug info attached. I think you're right that I removed debug info from the ret instruction to simplify the test.

This is a special case where a global and its debug info is synthesized.
https://github.com/llvm/llvm-project/blob/23c2bedfd93cfacc62009425c464e659a34e92e6/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp#L976-L1001

So I don't understand why this added debug info is necessary. Does the test fail otherwise?

krisb added inline comments.Jul 6 2022, 3:45 PM
llvm/test/Instrumentation/InstrProfiling/debug-info-correlate.ll
23

Right, the test would fail w/o the added DILocation, because __profc_foo variable (which is function-local 'global' technically) would not be emitted.
With this patch we emit function-local entities if and only if the parent function has LexicalScope defined (see createAndAddScopeChildren() and its call from constructSubprogramScopeDIE()), otherwise we skip emitting all function's children (including function-local 'globals').

ellis added inline comments.Jul 6 2022, 4:03 PM
llvm/test/Instrumentation/InstrProfiling/debug-info-correlate.ll
23

Got it, thanks for explaining!

I'm ok with this because llvm.instrprof.increment should probably be next to some instruction with debug info.

@dblaikie, could you please take a look at this and/or D113741? Do you see any ways to proceed?

llvm/test/Instrumentation/InstrProfiling/debug-info-correlate.ll
23

Thank you for checking this is okay!

@dblaikie, could you please take a look at this and/or D113741? Do you see any ways to proceed?

My concern with this direction is that anything that adds lists to the IR metadata makes it difficult for that data to be dropped when it becomes unused (the type graph, apart from the "retained types" on the CU, is structured so that if a variable, say, gets optimized away, or a function gets optimized away and its parameters along with it, the types get dropped too - similarly with function descriptions, they aren't in a list (they used to be) and are instead referenced from the llvm::Function ensuring that if the function is optimized away entirely, the debug info for that goes away too). Admittedly function-local things are handled somewhat differently, for instance there is a list on the DISubprogram of the local variables to ensure they are retained through optimizations so name lookup does the right things at function-scope. So /maybe/ it's OK to move in that direction here, but it might look more like that, add these other function-local things to the DISubprogram-scoped list (rename the list to generalize over more than just variables), rather than adding per-scope lists?

(@aprantl @JDevlieghere @probinson - what do you folks think? Add these things to some list (be it on the scope itself, or on a/the existing list on the subprogram metadata) or go with the lazy-referenced style already in-place and proposed in https://reviews.llvm.org/D113741 though some issues are discussed with that. I'm not sure if those are solvable problems?)

@dblaikie, could you please take a look at this and/or D113741? Do you see any ways to proceed?

My concern with this direction is that anything that adds lists to the IR metadata makes it difficult for that data to be dropped when it becomes unused (the type graph, apart from the "retained types" on the CU, is structured so that if a variable, say, gets optimized away, or a function gets optimized away and its parameters along with it, the types get dropped too - similarly with function descriptions, they aren't in a list (they used to be) and are instead referenced from the llvm::Function ensuring that if the function is optimized away entirely, the debug info for that goes away too). Admittedly function-local things are handled somewhat differently, for instance there is a list on the DISubprogram of the local variables to ensure they are retained through optimizations so name lookup does the right things at function-scope. So /maybe/ it's OK to move in that direction here, but it might look more like that, add these other function-local things to the DISubprogram-scoped list (rename the list to generalize over more than just variables), rather than adding per-scope lists?

Initially, I made the dedicated per-scope list of local static vars/imports/types to make possible to remove unused entities if their parent scope gets optimized out. This doesn't fully resolve your concern about local types that might be emitted even if they are not used, but this makes the things a bit better. Moreover, such local entities as well as variables/labels are not going to be emitted to a final DWARF if their scope was optimized away. Currently, to emit any local entity we need its scope to have LexicalScope defined. If there are no DILocations in such a scope (so that it may be considered as optimized out), it will not have LexicalScope defined for it, thus no variables/labels/other local things will be emitted for it. So, if we are not going to change this design in a near future, it might be worse to consider switching local variables/labels to per-scope list as well.

What about merging variables/labels/other entities to a single list, I fully support this idea (it would require some additional checks in the backend, but would look more consistent).

@dblaikie, could you please take a look at this and/or D113741? Do you see any ways to proceed?

My concern with this direction is that anything that adds lists to the IR metadata makes it difficult for that data to be dropped when it becomes unused (the type graph, apart from the "retained types" on the CU, is structured so that if a variable, say, gets optimized away, or a function gets optimized away and its parameters along with it, the types get dropped too - similarly with function descriptions, they aren't in a list (they used to be) and are instead referenced from the llvm::Function ensuring that if the function is optimized away entirely, the debug info for that goes away too). Admittedly function-local things are handled somewhat differently, for instance there is a list on the DISubprogram of the local variables to ensure they are retained through optimizations so name lookup does the right things at function-scope. So /maybe/ it's OK to move in that direction here, but it might look more like that, add these other function-local things to the DISubprogram-scoped list (rename the list to generalize over more than just variables), rather than adding per-scope lists?

Initially, I made the dedicated per-scope list of local static vars/imports/types to make possible to remove unused entities if their parent scope gets optimized out. This doesn't fully resolve your concern about local types that might be emitted even if they are not used, but this makes the things a bit better. Moreover, such local entities as well as variables/labels are not going to be emitted to a final DWARF if their scope was optimized away. Currently, to emit any local entity we need its scope to have LexicalScope defined. If there are no DILocations in such a scope (so that it may be considered as optimized out), it will not have LexicalScope defined for it, thus no variables/labels/other local things will be emitted for it. So, if we are not going to change this design in a near future, it might be worse to consider switching local variables/labels to per-scope list as well.

What about merging variables/labels/other entities to a single list, I fully support this idea (it would require some additional checks in the backend, but would look more consistent).

Ah, fair point about implicitly dropping some debug info if it's in a dead scope. & yeah, maybe moving the existing optimized local variables list into a per-scope list might be a nice improvement regardless.

@aprantl @probinson you folks have any thoughts on this? I'm starting to lean towards this being: 1) move the current "optimized variables" subprogram list into the lexical scope 2) add other local things into that list 3) profit. Though adding a list/fiel dto every lexical scope feels a bit heavy - not sure if there's a good metric/benchmark to measure the impact of that on memory usage, bitcode size, etc. It's probably not that big in the grand scheme of things, though.

@dblaikie, could you please take a look at this and/or D113741? Do you see any ways to proceed?

My concern with this direction is that anything that adds lists to the IR metadata makes it difficult for that data to be dropped when it becomes unused (the type graph, apart from the "retained types" on the CU, is structured so that if a variable, say, gets optimized away, or a function gets optimized away and its parameters along with it, the types get dropped too - similarly with function descriptions, they aren't in a list (they used to be) and are instead referenced from the llvm::Function ensuring that if the function is optimized away entirely, the debug info for that goes away too). Admittedly function-local things are handled somewhat differently, for instance there is a list on the DISubprogram of the local variables to ensure they are retained through optimizations so name lookup does the right things at function-scope. So /maybe/ it's OK to move in that direction here, but it might look more like that, add these other function-local things to the DISubprogram-scoped list (rename the list to generalize over more than just variables), rather than adding per-scope lists?

Initially, I made the dedicated per-scope list of local static vars/imports/types to make possible to remove unused entities if their parent scope gets optimized out. This doesn't fully resolve your concern about local types that might be emitted even if they are not used, but this makes the things a bit better. Moreover, such local entities as well as variables/labels are not going to be emitted to a final DWARF if their scope was optimized away. Currently, to emit any local entity we need its scope to have LexicalScope defined. If there are no DILocations in such a scope (so that it may be considered as optimized out), it will not have LexicalScope defined for it, thus no variables/labels/other local things will be emitted for it. So, if we are not going to change this design in a near future, it might be worse to consider switching local variables/labels to per-scope list as well.

What about merging variables/labels/other entities to a single list, I fully support this idea (it would require some additional checks in the backend, but would look more consistent).

Ah, fair point about implicitly dropping some debug info if it's in a dead scope. & yeah, maybe moving the existing optimized local variables list into a per-scope list might be a nice improvement regardless.

Well, this isn't as good as I thought. We are skipping optimized scopes and entities belong to them iff the scope is concrete (either inline or out-of-line). But we never skip abstract scopes; instead we create LexicalScope's just before constructing an abstract tree (see DwarfDebug::ensureAbstractEntityIsCreated()) if there are some retained nodes in them (currently, local variables or labels).

I was a bit confused by assert in DwarfDebug::endFunctionImpl():

assert(LScopes.getAbstractScopesList().size() == NumAbstractScopes
       && "ensureAbstractEntityIsCreated inserted abstract scopes");

It checks that DwarfDebug::ensureAbstractEntityIsCreated() doesn't create any scopes, but this check is for subprogram scopes only, it doesn't guard against creation of lexical block scope. And we do create LexicalScope's for DILexicalBlock if there is a retained node belong to it. I tried to restore the same behavior with per-scope retained nodes approach, but it significantly affects compile time (22.6% geomean on CTMark).
So, I agree with your first suggestion to reuse retainedNodes field of DISuprogram to track other local entities.
I'll update this patch soon.

krisb updated this revision to Diff 447343.Jul 25 2022, 7:55 AM
  • Rebase on top of 'main'.
  • Reuse DIsubprogram's retainedNodes to track function-local types, imports and static locals.
krisb edited the summary of this revision. (Show Details)Jul 25 2022, 7:59 AM
krisb updated this revision to Diff 447349.Jul 25 2022, 8:18 AM
krisb edited the summary of this revision. (Show Details)

Fix misprint.