This is an archive of the discontinued LLVM Phabricator instance.

[Coroutines] Look for dbg.declare for temp spills
ClosedPublic

Authored by weiwang on Mar 21 2023, 10:01 AM.

Details

Summary

A temp spill may not have direct dbg.declare attached. This can cause problem for debugger when it wants to print the value in resume/destroy/cleanup functions.

In particular, we found this happening to "this" pointer that a temp is used to store its value in entry block and spilled later.

Diff Detail

Event Timeline

weiwang created this revision.Mar 21 2023, 10:01 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 21 2023, 10:01 AM
weiwang requested review of this revision.Mar 21 2023, 10:01 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 21 2023, 10:01 AM
weiwang retitled this revision from debug info to [Coroutines] Look for dbg.declare for temp spills.Mar 21 2023, 10:09 AM
weiwang edited the summary of this revision. (Show Details)
weiwang added a reviewer: ChuanqiXu.

In fact, we did similar things in the downstream too. But we didn't contribute it since we feel it might not be so good. Since it is a little bit dirty and it is natural that the debug information is missing. If you still want to do this, I feel CoroCloner::salvageDebugInfo() may be a better place.

weiwang added a comment.EditedMar 22 2023, 1:27 PM

In fact, we did similar things in the downstream too. But we didn't contribute it since we feel it might not be so good. Since it is a little bit dirty and it is natural that the debug information is missing. If you still want to do this, I feel CoroCloner::salvageDebugInfo() may be a better place.

Ha, I was wonder why nobody reported this issue before :)

For us, we've seen that "this" pointer debug info is gone after the first suspension point. And there are quite some reports internally, so this is really hurting the debug experiences. The current proposal is very specific to the "this" pointer problem we have. For some reason the "this" pointer is stored into some temp reg at entry block which complicates the look-up of dbg.declare. For example, in https://godbolt.org/z/q3rP77919

define linkonce_odr dso_local ptr @_ZN9Container3fooEi(ptr noundef nonnull align 4 dereferenceable(4) %this, i32 noundef %arg)
entry:
  %this.addr = alloca ptr, align 8
  %arg.addr = alloca i32, align 4
  %arg2 = alloca i32, align 4
  ... ...
  store ptr %this, ptr %this.addr, align 8
  call void @llvm.dbg.declare(metadata ptr %this.addr, metadata !658, metadata !DIExpression()), !dbg !660
  store i32 %arg, ptr %arg.addr, align 4
  call void @llvm.dbg.declare(metadata ptr %arg.addr, metadata !661, metadata !DIExpression()), !dbg !662
  %this1 = load ptr, ptr %this.addr, align 8

coro.init:
  call void @llvm.lifetime.start.p0(i64 4, ptr %arg2) #5, !dbg !665
  call void @llvm.dbg.declare(metadata ptr %arg2, metadata !661, metadata !DIExpression()), !dbg !660
  %6 = load i32, ptr %arg.addr, align 4, !dbg !665
  store i32 %6, ptr %arg2, align 4, !dbg !665

There is no use of %this.addr after this, so %this or %this.addr won't be in coro frame, but %this1 is.

A side note: the other function parameter arg is stored into an alloca %args2, which has a dbg.declare. Not sure why frontend treats them differently. If this1 is an alloca instead of a temp, it could already have a dbg.declare. Maybe this is another angle.

Anyway, I feel that in order to have the same level of debug info availability as its non-async counterparts, coroutine function needs to make sure debug info of frame lived value is preserved across suspension points if that value maps to a source variable. From the examples I've seen, I think allocas are generally fine here. They usually come with needed debug info. And most temps do not map to source variable, no need of debug info. That comes down to those temps that map to source variable.

Does your downstream patch address the same issue? Maybe we can consolidate here.

The reason I put the change in coro::buildCoroutineFrame before spilt happens is that the missing dbg.declare can be created at the reload site so that after split and in CoroCloner::salvageDebugInfo, it would automatically be converted to a dbg.declare of DEREF from frame offset. And we only need to do that once in the original function. So this from the original function (dbg.declare is created in coro::buildCoroutineFrame)

%this1.reload.addr = getelementptr inbounds %_ZN9Container3fooEi.Frame, ptr %0, i32 0, i32 4, !dbg !1794
%this1.reload = load ptr, ptr %this1.reload.addr, align 8, !dbg !1794
call void @llvm.dbg.declare(metadata ptr %this1.reload, metadata !1779, metadata !DIExpression()), !dbg !1780

becomes

call void @llvm.dbg.declare(metadata ptr %.debug, metadata !1781, metadata !DIExpression(DW_OP_deref, DW_OP_plus_uconst, 32)), !dbg !1783

in the entry block of resume/destroy/cleanup function.

This seems to be the most direct way of adding the missing debug info, but if there is a better way, please advise.

Does your downstream patch address the same issue? Maybe we can consolidate here.

Oh, not exactly. It looks like you're saving the debug information under O0. While we're trying to save the debug information within optimization, which might not be so natural. And it looks like the problem occurs within opaque pointer and our downstream compiler didn't start to use opaque pointer yet. So in another word, this should be regression bug if I understand things correctly.

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
1850–1860
weiwang added a comment.EditedMar 23 2023, 10:42 AM

Does your downstream patch address the same issue? Maybe we can consolidate here.

Oh, not exactly. It looks like you're saving the debug information under O0. While we're trying to save the debug information within optimization, which might not be so natural. And it looks like the problem occurs within opaque pointer and our downstream compiler didn't start to use opaque pointer yet. So in another word, this should be regression bug if I understand things correctly.

Yeah, we generally use -O0 for better debugging experiences. The issue was first reported from our internal release, which is based on clang-12, so I think it exists before and after opaque pointer. My plan is to fix upstream and then backport.

Does your downstream patch address the same issue? Maybe we can consolidate here.

Oh, not exactly. It looks like you're saving the debug information under O0. While we're trying to save the debug information within optimization, which might not be so natural. And it looks like the problem occurs within opaque pointer and our downstream compiler didn't start to use opaque pointer yet. So in another word, this should be regression bug if I understand things correctly.

Yeah, we generally use -O0 for better debugging experiences. The issue was first reported from our internal release, which is based on clang-12, so I think it exists before and after opaque pointer. My plan is to fix upstream and then backport.

Got it. But let's first try to handle it in coro::salvageDebugInfo() to make the code smell better.

Does your downstream patch address the same issue? Maybe we can consolidate here.

Oh, not exactly. It looks like you're saving the debug information under O0. While we're trying to save the debug information within optimization, which might not be so natural. And it looks like the problem occurs within opaque pointer and our downstream compiler didn't start to use opaque pointer yet. So in another word, this should be regression bug if I understand things correctly.

Yeah, we generally use -O0 for better debugging experiences. The issue was first reported from our internal release, which is based on clang-12, so I think it exists before and after opaque pointer. My plan is to fix upstream and then backport.

Got it. But let's first try to handle it in coro::salvageDebugInfo() to make the code smell better.

I took a look at coro::salvageDebugInfo(), but can't find an easy way to handle my case there.

In particular, I have the following code, in which the dbg.declare is for %this.addr, and its alias %this1 is spilled.

%this.addr = alloca ptr, align 8
call void @llvm.dbg.declare(metadata ptr %this.addr, metadata !658, metadata !DIExpression()), !dbg !660
%this1 = load ptr, ptr %this.addr, align 8
... ...
; spill
%this1.spill.addr = getelementptr inbounds %_ZN9Container3fooEi.Frame, ptr %0, i32 0, i32 4, !dbg !1782
store ptr %this1, ptr %this1.spill.addr, align 8, !dbg !1782
.. ...
; reload
%this1.reload.addr = getelementptr inbounds %_ZN9Container3fooEi.Frame, ptr %0, i32 0, i32 4, !dbg !1794
%this1.reload = load ptr, ptr %this1.reload.addr, align 8, !dbg !1794

And I need to update the dbg.declare to describe the reload GEP %this1.reload.addr, something like

call void @llvm.dbg.declare(metadata ptr %this1.reload.addr, metadata !658, metadata !DIExpression(DW_OP_plus_uconst, 32)), !dbg !660

Under -O0, it would be

call void @llvm.dbg.declare(metadata ptr %frame, metadata !658, metadata !DIExpression(DW_OP_deref, DW_OP_plus_uconst, 32)), !dbg !660

And if the spill has multiple reloads, I'll need one dbg.declare for every reload.

In coro::salvageDebugInfo(), I'll need to trace from %this.addr to its reload GEP %this1.reload.addr. I feel I need to have an extra struct to record spill to its reloads (probably when the reloads are created in insertSpills()). Also since I don't know if an alloca is an alias of a spill and leads to a reload GEP (such information is easier to get in insertSpills()), I'll have to do such check on every alloca, which is extra overhead.

Any suggestions?

ChuanqiXu accepted this revision.Mar 28 2023, 1:27 AM

Oh, you're right. It is indeed easier to handle them in the early place.

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
1853

Let's add a fast check here to avoid fall in loop if the program is not compiled with debug information.

This revision is now accepted and ready to land.Mar 28 2023, 1:27 AM
weiwang updated this revision to Diff 509163.Mar 28 2023, 3:56 PM
weiwang edited the summary of this revision. (Show Details)
  1. Rebase
  2. Add check of function has debug info or not before going into the loop.
weiwang marked an inline comment as done.Mar 28 2023, 3:58 PM
weiwang added inline comments.
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
1853

Good catch! Updated.

weiwang marked an inline comment as done.Mar 28 2023, 3:59 PM
This revision was landed with ongoing or failed builds.Mar 30 2023, 10:13 AM
This revision was automatically updated to reflect the committed changes.
fdeazeve added a subscriber: fdeazeve.EditedOct 25 2023, 3:07 PM

While debugging a crash in the compiler (see my comment about GlobalVariable) below, I found this patch and I am struggling to understand if we are doing the right thing here.

Besides the crash, I am not sure this patch is correct with respect to the debug intrinsics. Consider this:

%indirect_storage = ptr ...
%AliveAccrossAplit = load ptr %indirect_storage
call void llvm.dbg.declare(%indirect_storage, ...)

... split point ...

use %UseAliveAccrossAplit

With this patch, we rewrite the above into:

%indirect_storage = ptr ...
%AliveAccrossSplit = load ptr %indirect_storage
call void llvm.dbg.declare(%indirect_storage, !MyVar, ...)

... split point ...

%reload = ... reload of %AliveAccrossSplit inside the coroutine frame ...
call void llvm.dbg.declare(%reload, !MyVar, ...)

Note that the dbg.declare after the rewrite is *not* equivalent to the original intrinsic: we are now claiming the address of !MyVar is %reload, and %reload == %AliveAccrossSplit.

Does this make sense? In fact, the debug instruction that is being created doesn't even show up in the test added by this patch. If we run this test on a debugger and put a breakpoint in this line:

-> 1881           coro::salvageDebugInfo(ArgToAllocaMap, DDI, Shape.OptimizeFrame,
   1882                                  false /*UseEntryValue*/);

We should this:

(lldb) p CurrentBlock->dump()

resume:                                           ; preds = %AfterCoroSuspend
  %this1.reload.addr = getelementptr inbounds %foo.Frame, ptr %hdl, i32 0, i32 4
  %this1.reload = load ptr, ptr %this1.reload.addr, align 8
  call void @llvm.dbg.declare(metadata ptr %this1.reload, metadata !34, metadata !DIExpression()), !dbg !36
  call void @bar(ptr %this1)
  br label %cleanup

Note that this dbg.declare doesn't survive the end of this test (it is not in the final output).

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
1852

I'm not sure alias is the right term here: if we walk up a load chain, we are not finding aliases, instead we're adding levels of pointer indirections. These pointers don't alias, they point to different addresses.

1858

What is this load trying to check? LdInst->getPointerOperandType() always returns the type ptr.
This is the same as if(!isa<PointerType<(LdInst->getType()), was this the intention?

I was a bit confused with the wording of the comment about, would appreciate your help rewording it.

1861

Note that this will crash the compiler if CurDef is a GlobalVariable.

While debugging a crash in the compiler (see my comment about GlobalVariable) below, I found this patch and I am struggling to understand if we are doing the right thing here.

Besides the crash, I am not sure this patch is correct with respect to the debug intrinsics. Consider this:

%indirect_storage = ptr ...
%AliveAccrossAplit = load ptr %indirect_storage
call void llvm.dbg.declare(%indirect_storage, ...)

... split point ...

use %UseAliveAccrossAplit

With this patch, we rewrite the above into:

%indirect_storage = ptr ...
%AliveAccrossSplit = load ptr %indirect_storage
call void llvm.dbg.declare(%indirect_storage, !MyVar, ...)

... split point ...

%reload = ... reload of %AliveAccrossSplit inside the coroutine frame ...
call void llvm.dbg.declare(%reload, !MyVar, ...)

Note that the dbg.declare after the rewrite is *not* equivalent to the original intrinsic: we are now claiming the address of !MyVar is %reload, and %reload == %AliveAccrossSplit.

Does this make sense? In fact, the debug instruction that is being created doesn't even show up in the test added by this patch. If we run this test on a debugger and put a breakpoint in this line:

-> 1881           coro::salvageDebugInfo(ArgToAllocaMap, DDI, Shape.OptimizeFrame,
   1882                                  false /*UseEntryValue*/);

We should this:

(lldb) p CurrentBlock->dump()

resume:                                           ; preds = %AfterCoroSuspend
  %this1.reload.addr = getelementptr inbounds %foo.Frame, ptr %hdl, i32 0, i32 4
  %this1.reload = load ptr, ptr %this1.reload.addr, align 8
  call void @llvm.dbg.declare(metadata ptr %this1.reload, metadata !34, metadata !DIExpression()), !dbg !36
  call void @bar(ptr %this1)
  br label %cleanup

Note that this dbg.declare doesn't survive the end of this test (it is not in the final output).

I'm fairly sure that if we revert this patch and rerun the test, it will still pass.

Thanks for the comment. Yeah, I think there are issues with the change here. And the wording of "alias" is misleading.

The issue that this change specifically tries to fix is that, we saw the debug info for "this" pointer no loner exists after coroutine resumes. i.e. https://godbolt.org/z/sWxa4f94j, we can print "this" in lldb at line 24, but not at line 26. This is because "this" is stored into a temp that does not have debug info.

define ptr @foo(ptr noundef nonnull align 1 dereferenceable(1) %this) !dbg !11 {
entry:
  %this.addr = alloca ptr, align 8
  call void @llvm.dbg.declare(metadata ptr %this.addr, metadata !34, metadata !DIExpression()), !dbg !36
  %__promise = alloca i8, align 1
  call void @llvm.dbg.declare(metadata ptr %__promise, metadata !37, metadata !DIExpression()), !dbg !36
  store ptr %this, ptr %this.addr, align 8
  %this1 = load ptr, ptr %this.addr, align 8

... ...

PostSpill:                                        ; preds = %AllocaSpillBB
  %this1.spill.addr = getelementptr inbounds %foo.Frame, ptr %hdl, i32 0, i32 4
  store ptr %this1, ptr %this1.spill.addr, align 8

So this change walks up the pointer Deref chain and tries to find the llvm.dbg.declare for %this.addr and use it for the temp %this1.

In the test included with the change.

Before:

define internal fastcc void @foo.resume(ptr noundef nonnull align 8 dereferenceable(32) %hdl) !dbg !38 {
entry.resume:
  %hdl.debug = alloca ptr, align 8
  call void @llvm.dbg.declare(metadata ptr %hdl.debug, metadata !40, metadata !DIExpression(DW_OP_deref)), !dbg !41

... ...

!40 = !DILocalVariable(name: "__coro_frame", scope: !38, file: !1, type: !22, flags: DIFlagArtificial)

After:

define internal fastcc void @foo.resume(ptr noundef nonnull align 8 dereferenceable(32) %hdl) !dbg !38 {
entry.resume:
  %hdl.debug = alloca ptr, align 8
  call void @llvm.dbg.declare(metadata ptr %hdl.debug, metadata !41, metadata !DIExpression(DW_OP_deref, DW_OP_plus_uconst, 24)), !dbg !42
  call void @llvm.dbg.declare(metadata ptr %hdl.debug, metadata !40, metadata !DIExpression(DW_OP_deref)), !dbg !42

... ...

!40 = !DILocalVariable(name: "__coro_frame", scope: !38, file: !1, type: !22, flags: DIFlagArtificial)
!41 = !DILocalVariable(name: "this", arg: 1, scope: !38, type: !35, flags: DIFlagArtificial | DIFlagObjectPointer)

Note that this will crash the compiler if CurDef is a GlobalVariable.

You are right, and we also observed the crash and fixed in https://reviews.llvm.org/D157423. Since the original change is targeting a very specific case, the fix puts more restriction on the case it can affect.

Do you have any suggestions on how can we make it more generic and applicable to more use cases?

fdeazeve added a comment.EditedOct 25 2023, 4:42 PM

Thank you for the quick response!

Note that this will crash the compiler if CurDef is a GlobalVariable.

You are right, and we also observed the crash and fixed in https://reviews.llvm.org/D157423. Since the original change is targeting a very specific case, the fix puts more restriction on the case it can affect.

Thank you for the pointers, glad to hear it was fixed in a subsequent commit. I was working on a slightly older version of the codebase.

Do you have any suggestions on how can we make it more generic and applicable to more use cases?

After:

define internal fastcc void @foo.resume(ptr noundef nonnull align 8 dereferenceable(32) %hdl) !dbg !38 {
entry.resume:
  %hdl.debug = alloca ptr, align 8
  call void @llvm.dbg.declare(metadata ptr %hdl.debug, metadata !41, metadata !DIExpression(DW_OP_deref, DW_OP_plus_uconst, 24)), !dbg !42
  call void @llvm.dbg.declare(metadata ptr %hdl.debug, metadata !40, metadata !DIExpression(DW_OP_deref)), !dbg !42

... ...

!40 = !DILocalVariable(name: "__coro_frame", scope: !38, file: !1, type: !22, flags: DIFlagArtificial)
!41 = !DILocalVariable(name: "this", arg: 1, scope: !38, type: !35, flags: DIFlagArtificial | DIFlagObjectPointer)

I understand the problem the patch is solving, but I haven't been able to figure out how we are going from that dbg.declare that I mentioned to the one that ends up in the final output of the compiler. Do you see the problem that I was trying to point to? The _first_ dbg.declare that we create -- the one shown in the output of p CurrentBlock->dump()-- is not equivalent to the original dbg.declare. And yet, somehow, this "invalid" dbg.declare disappears and the correct one shows up later. To be very specific, when we do:

DIBuilder(*CurrentBlock->getParent()->getParent(), AllowUnresolved)
              .insertDeclare(CurrentReload, DDI->getVariable(),
                             DDI->getExpression(), DDI->getDebugLoc(),
                             &*Builder.GetInsertPoint());

The CurrentReload argument seems incorrect if we stripped some of the loads along the way. Does that make sense?

weiwang added a comment.EditedOct 26 2023, 9:59 AM

I understand the problem the patch is solving, but I haven't been able to figure out how we are going from that dbg.declare that I mentioned to the one that ends up in the final output of the compiler. Do you see the problem that I was trying to point to? The _first_ dbg.declare that we create -- the one shown in the output of p CurrentBlock->dump()-- is not equivalent to the original dbg.declare. And yet, somehow, this "invalid" dbg.declare disappears and the correct one shows up later. To be very specific, when we do:

DIBuilder(*CurrentBlock->getParent()->getParent(), AllowUnresolved)
              .insertDeclare(CurrentReload, DDI->getVariable(),
                             DDI->getExpression(), DDI->getDebugLoc(),
                             &*Builder.GetInsertPoint());

The CurrentReload argument seems incorrect if we stripped some of the loads along the way. Does that make sense?

I think the reason is that the reload %this1.reload is after a suspend point, so the DbgDeclareInst (DDI) is unreachable from the ramp function and removed later. But since it is reachable in resume and destroy copies, it shows up there.