This is an archive of the discontinued LLVM Phabricator instance.

Cloning: Fix debug info cloning
ClosedPublic

Authored by GorNishanov on May 26 2017, 4:50 PM.
Tokens
"Mountain of Wealth" token, awarded by GorNishanov.

Details

Summary

I believe https://reviews.llvm.org/rL302576 introduced two bugs:

  1. it produces duplicate distinct variables for every: dbg.value describing the same variable. To fix the problme I switched form getDistinct() to get() in DebugLoc.cpp: auto reparentVar = [&](DILocalVariable *Var) { return DILocalVariable::getDistinct(
  1. It passes NewFunction plain name as a linkagename parameter to Subprogram constructor. Breaks assert in:

    || DeclLinkageName.empty()) || LinkageName == DeclLinkageName) && "decl has a linkage name and it is different"' failed.

#9 0x00007f5010261b75 llvm::DwarfUnit::applySubprogramDefinitionAttributes(llvm::DISubprogram const*, llvm::DIE&) /home/gor/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp:1173:3
#
(Edit: reproducer added)

Here how https://reviews.llvm.org/rL302576 broke coroutine debug info.
Coroutine body of the original function is split into several parts by cloning and removing unneeded code.
All parts describe the original function and variables present in the original function.

For a simple case, prior to Split, original function has these two blocks:

PostSpill:                                        ; preds = %AllocaSpillBB
  call void @llvm.dbg.value(metadata i32 %x, i64 0, metadata !14, metadata !15), !dbg !13
  store i32 %x, i32* %x.addr, align 4
  ...
and

sw.epilog:                                        ; preds = %sw.bb
  %x.addr.reload.addr = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 4, !dbg !20
  %4 = load i32, i32* %x.addr.reload.addr, align 4, !dbg !20
  call void @llvm.dbg.value(metadata i32 %4, i64 0, metadata !14, metadata !15), !dbg !13

!14 = !DILocalVariable(name: "x", arg: 1, scope: !6, file: !7, line: 55, type: !11)

Note that in two blocks different expression represent the same original user variable X.

Before rL302576, for every cloned function there was exactly one cloned DILocalVariable(name: "x" as in:

define i8* @f(i32 %x) #0 !dbg !6 {
  ...
!6 = distinct !DISubprogram(name: "f", scope: !7, file: !7, line: 55, type: !8, isLocal: false, isDefinition: true, scopeLine: 55, flags: DIFlagPrototyped, 
...
!14 = !DILocalVariable(name: "x", arg: 1, scope: !6, file: !7, line: 55, type: !11)

define internal fastcc void @f.resume(%f.Frame* %FramePtr) #0 !dbg !25 {
...
!25 = distinct !DISubprogram(name: "f", scope: !7, file: !7, line: 55, type: !8, isLocal: false, isDefinition: true, scopeLine: 55, flags: DIFlagPrototyped, isOptimized: false, unit: !0, variables: !2)
!28 = !DILocalVariable(name: "x", arg: 1, scope: !25, file: !7, line: 55, type: !11)

After rL302576, for every cloned function there were as many DILocalVariable(name: "x" as there were "call void @llvm.dbg.value" for that variable.
This was causing asserts in VerifyDebugInfo and AssemblyPrinter.

Example:

!27 = distinct !DISubprogram(name: "f", linkageName: "f.resume", scope: !7, file: !7, line: 55, type: !8, isLocal: false, isDefinition: true, scopeLine: 55, 
!29 = distinct !DILocalVariable(name: "x", arg: 1, scope: !27, file: !7, line: 55, type: !11)
!39 = distinct !DILocalVariable(name: "x", arg: 1, scope: !27, file: !7, line: 55, type: !11)
!41 = distinct !DILocalVariable(name: "x", arg: 1, scope: !27, file: !7, line: 55, type: !11)

Second problem:

Prior to rL302576, all clones were described by DISubprogram referring to original function.

define i8* @f(i32 %x) #0 !dbg !6 {
...
!6 = distinct !DISubprogram(name: "f", scope: !7, file: !7, line: 55, type: !8, isLocal: false, isDefinition: true, scopeLine: 55, flags: DIFlagPrototyped, 

define internal fastcc void @f.resume(%f.Frame* %FramePtr) #0 !dbg !25 {
...
!25 = distinct !DISubprogram(name: "f", scope: !7, file: !7, line: 55, type: !8, isLocal: false, isDefinition: true, scopeLine: 55, flags: DIFlagPrototyped,

After rL302576, DISubprogram for clones is of two minds, plain name refers to the original name, linkageName refers to plain name of the clone.

!27 = distinct !DISubprogram(name: "f", linkageName: "f.resume", scope: !7, file: !7, line: 55, type: !8, isLocal: false, isDefinition: true, scopeLine: 55,

I think the assumption in AsmPrinter is that both name and linkageName should refer to the same entity. It asserts here when they are not:

 || DeclLinkageName.empty()) || LinkageName == DeclLinkageName) && "decl has a linkage name and it is different"' failed.
#9 0x00007f5010261b75 llvm::DwarfUnit::applySubprogramDefinitionAttributes(llvm::DISubprogram const*, llvm::DIE&) /home/gor/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp:1173:3

After this fix, behavior (with respect to coroutines) reverts to exactly as it was before and therefore making them debuggable again, or even more importantly, compilable, with "-g"

Diff Detail

Repository
rL LLVM

Event Timeline

GorNishanov created this revision.May 26 2017, 4:50 PM
GorNishanov edited the summary of this revision. (Show Details)May 26 2017, 5:26 PM
GorNishanov edited the summary of this revision. (Show Details)
GorNishanov edited the summary of this revision. (Show Details)May 26 2017, 5:34 PM
dblaikie edited edge metadata.May 26 2017, 6:01 PM

The assertion is that, in the presence of a separate declaration (eg: a member function that is a coroutine - so the member variable gets a declaration (this is about the only time LLVM has/uses a separate declaration for a function/subprogram in the debug info) and models an out of line definition (even if the definition is in-line)) that if there is a linkage name for the declaration, then it's the same as the linkage name for the definition.

This seems pretty wrong for coroutines where the debug info looks like it's going to have 3-4 functions, with different mangled names, all that claim to be an implementation of the same declaration? Their mangled names won't match the declaration anyway...

I think maybe, it'd be better to drop the linkage name when cloning. (or use the linkage name of the clone - and change/relax the assertion) It won't be better in the end - the debug info would still have multiple definitions referring to the same declaration and if the declaration has a linkage name it'll be right for at most one of those functions and wrong for the rest. But at least the IR won't be /as/ wrong... ish.

GorNishanov added a comment.EditedMay 26 2017, 7:25 PM

The assertion is that, in the presence of a separate declaration (eg: a member function that is a coroutine - so the member variable gets a declaration (this is about the only time LLVM has/uses a separate declaration for a function/subprogram in the debug info) and models an out of line definition (even if the definition is in-line)) that if there is a linkage name for the declaration, then it's the same as the linkage name for the definition.

This seems pretty wrong for coroutines where the debug info looks like it's going to have 3-4 functions, with different mangled names, all that claim to be an implementation of the same declaration? Their mangled names won't match the declaration anyway...

I think maybe, it'd be better to drop the linkage name when cloning. (or use the linkage name of the clone - and change/relax the assertion) It won't be better in the end - the debug info would still have multiple definitions referring to the same declaration and if the declaration has a linkage name it'll be right for at most one of those functions and wrong for the rest. But at least the IR won't be /as/ wrong... ish.

Something like this?

Replace:

SP->getName(), NewFunc->getName()  (rL302576 way)

or

SP->getName(), SP->getLinkageName() (This patch way)

with

NewFunc->getName(), NewFunc->getName(),  (dblakie way?)

Just tested:

dblakie way triggers the same assert as rL302576 way.
I am keeping "this patch way" for know

Test updated to catch breakage if it happens again. Preparing to land to unblock coroutine debugging (or compiling with -g).

(once you've sent something for review, please wait until it's approved - generally the assumption/concensus is that if a developer asks for review, it's because it's needed - submitting then without review/sign off indicates that maybe the required review didn't happen, which isn't good)

lib/IR/DebugLoc.cpp
170 ↗(On Diff #100517)

Is there a reason this can't be distinct anymore?

You mentioned in the description: "I added caching, but, it is not sufficient, since dbg.value may refer to various expressions representing the same user var, thus, switching form getDistinct to get is important."

But I don't follow quite what you're saying there - could you explain it in more detail/in another way?

What happens without changing this to 'get'?

(once you've sent something for review, please wait until it's approved - generally the assumption/concensus is that if a developer asks for review, it's because it's needed - submitting then without review/sign off indicates that maybe the required review didn't happen, which isn't good)

Sure thing. I'll wait for LGTM.

GorNishanov added inline comments.May 26 2017, 8:08 PM
lib/IR/DebugLoc.cpp
170 ↗(On Diff #100517)

It can still creates duplicate DILocalVariables. I have it in the very large test case, I can see if I can reduce it to something manangable. Thank you for looking at this!

GorNishanov added inline comments.May 26 2017, 9:24 PM
lib/IR/DebugLoc.cpp
170 ↗(On Diff #100517)

Hah. Cache by itself does not work since it is cleared for every basic block.
I am not sure what are the implications of moving it outside of CloneBasicBlock.

I would keep the cache where it is, and keep get as opposed to getDistinct to collapse duplicate LocalVariables to the same one.

170 ↗(On Diff #100517)

Good to go?

GorNishanov edited the summary of this revision. (Show Details)May 26 2017, 9:33 PM
GorNishanov edited the summary of this revision. (Show Details)
GorNishanov edited the summary of this revision. (Show Details)
GorNishanov added inline comments.May 26 2017, 9:43 PM
lib/IR/DebugLoc.cpp
170 ↗(On Diff #100517)

Alternative, since the cache scope is just one basic block, I can remove the caching from auto reparentVar altogether, so the fix in lib/IR/DebugLoc.cpp will be simply:

s/getDistinct/get

in

auto reparentVar = [&](DILocalVariable *Var) {
    return DILocalVariable::getDistinct(

Simplified the fix in IR/DebugLoc.cpp to s/getDistinct/get.
Granted mountains of wealth token for those kind souls who would review this change.

GorNishanov marked 5 inline comments as done.May 26 2017, 10:05 PM
GorNishanov edited the summary of this revision. (Show Details)May 27 2017, 6:49 AM
dblaikie accepted this revision.May 27 2017, 12:27 PM

Looks good enough to unblock coroutine work.

Adrian, when you get a chance, could you look this over & make any better/longer-term fixes?

My rough thoughts are:
DILocalVariables aren't distinct when clang creates them, so probably fine not to create them distinct here. But perhaps that decision, globally, should be revisited (& in that case add a function-wide caching for this cloning step).

Linkage name - While preserving it ensures that it won't conflict with the DISubprogram representing the function declaration, if there is one, it seems wrong because this function now has a potentially different linkage name. I'd thought "if there is a declaration, remove the linkage name", but maybe that's not even right - may be that the right thing is to actually have a different linkage name from the one on the declaration? Not sure.

This revision is now accepted and ready to land.May 27 2017, 12:27 PM
This revision was automatically updated to reflect the committed changes.