This is an archive of the discontinued LLVM Phabricator instance.

[RFC] [[Coroutine] [Debug] Salvage dbg.values
ClosedPublic

Authored by ChuanqiXu on Mar 1 2021, 3:18 AM.

Details

Summary

The previous implementation of coro-split didn't collect values used by dbg instructions into the spills which made a log debug info unavailable with optimization on.
This patch tries to collect these uses which are used by dbg.values. In this way, the debugbility of coroutine could be as powerful as normal functions with optimization on.

To avoid enlarging the coroutine frame, this patch only collects dbg.value whose value is already in the coroutine frame. This decision may make some debug info getting unavailable. But if we are with optimization on, the performance issue should be considered first. And this patch would make the debugbility of coroutine to be better only without changing the layout of the frame.

Test-plan: check-llvm

Diff Detail

Event Timeline

ChuanqiXu created this revision.Mar 1 2021, 3:18 AM
ChuanqiXu requested review of this revision.Mar 1 2021, 3:18 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 1 2021, 3:18 AM
aprantl requested changes to this revision.Mar 3 2021, 4:39 PM

I know this is is tempting, but unfortunately we can't have codegen change based on a debug info intrinsic (see more detailed comment inline).

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
2618

Unfortunately, this is violates a guiding principle in LLVM: Debug info may under no circumstances affect the generated code. In other words, the output of clang and clang -g followed by strip must be identical. If you want this behavior it must either be controlled by a separate flag, or it must be on all the time, even when compiling without debug info.

This revision now requires changes to proceed.Mar 3 2021, 4:39 PM
ChuanqiXu updated this revision to Diff 328350.Mar 4 2021, 6:12 PM

Add option to control the behavior.

Looks like it's calling "llvm::dbgs()" directly.

btw is this patch based off main branch or some other internal patch? I wasn't able to arc patch this one. I would like to play with it locally.

ChuanqiXu updated this revision to Diff 331810.Mar 19 2021, 3:02 AM

rebase with trunk

Looks like it's calling "llvm::dbgs()" directly.

btw is this patch based off main branch or some other internal patch? I wasn't able to arc patch this one. I would like to play with it locally.

Thanks for reminding. I just rebased it.

Curious, is %0 = bitcast %a to ... the majority of the cases we are missing? If so, could we turn all the dbg.value instructions to use their original pointer by stripping pointer cast?

Curious, is %0 = bitcast %a to ... the majority of the cases we are missing? If so, could we turn all the dbg.value instructions to use their original pointer by stripping pointer cast?

I am not sure if %0 = bit cast %a to ... is the majority redundant fields of coroutine frame if we enable this patch. My original idea is to add a rematerialization process to reduce the coroutine frame like register allocation. However, it maybe a little hard and benefit less. It sounds more easy and beneficial to use original pointer by value tracking. Then I think this problem may be not introduced by this patch. Here is my example:

%0 = bitcast %a to ...
call @llvm.coro.suspend 
; ....
use of %a
use of %0

Then both %a and %0 would be put in the frame, which is totally redundant. We could see if there are examples other than bit cast in other patch.

I am not sure if %0 = bit cast %a to ... is the majority redundant fields of coroutine frame if we enable this patch. My original idea is to add a rematerialization process to reduce the coroutine frame like register allocation. However, it maybe a little hard and benefit less. It sounds more easy and beneficial to use original pointer by value tracking. Then I think this problem may be not introduced by this patch. Here is my example:

%0 = bitcast %a to ...
call @llvm.coro.suspend 
; ....
use of %a
use of %0

Then both %a and %0 would be put in the frame, which is totally redundant. We could see if there are examples other than bit cast in other patch.

We already do materialization though. I wonder why that doesn't cover the case? Can we take advantage of that?

I am not sure if %0 = bit cast %a to ... is the majority redundant fields of coroutine frame if we enable this patch. My original idea is to add a rematerialization process to reduce the coroutine frame like register allocation. However, it maybe a little hard and benefit less. It sounds more easy and beneficial to use original pointer by value tracking. Then I think this problem may be not introduced by this patch. Here is my example:

%0 = bitcast %a to ...
call @llvm.coro.suspend 
; ....
use of %a
use of %0

Then both %a and %0 would be put in the frame, which is totally redundant. We could see if there are examples other than bit cast in other patch.

We already do materialization though. I wonder why that doesn't cover the case? Can we take advantage of that?

Yes, it makes since to cover the case. I would work on this later. Thank you!

ChuanqiXu updated this revision to Diff 332907.Mar 24 2021, 2:38 AM

Materialize bit cast for dbg.values as suggested.

jmorse added a subscriber: jmorse.Mar 24 2021, 8:34 AM

When looping over the debug users of instructions and replacing the dbg.values operands, you should now use the "replaceVariableLocationOp" helper instead -- this avoids re-creating a dbg.value intrinsic, and should handle variadic variable locations (the patches for which are 95% landed) seamlessly.

Overall the aim of the patch makes sense to me, it's storing values used by debug intrinsics in the coroutine frame for later retrieval right? Is there a risk that some later coroutine optimisation will try to optimise the frame and overwrite the stored values-for-debug-users -- this can happen with stack slot colouring at the other end of the compiler.

I'm not sure how the rest of the community will feel about the approach; IIRC there's a "fake use" patch floating around that does this for all (non-coroutine) variables, which wasn't landed in the end.

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
54

IMO: the description should explicitly refer to the fact that codegen will change as a result of -g, to avoid any users experiencing unexpected behaviour.

llvm/test/Transforms/Coroutines/coro-debug-dbg.values-O2.ll
6–8 ↗(On Diff #332907)

nit: IMO you should check that the dbg.values operand is a specific LLVM-IR value (i.e., the frame pointer), rather than just checking the type. That protects against some future optimisation or error producing a pointer-typed undef as the dbg.value operand.

jmorse added inline comments.Mar 24 2021, 8:39 AM
llvm/test/Transforms/Coroutines/coro-debug-dbg.values-O2.ll
6–8 ↗(On Diff #332907)

Scratch that, I didn't notice you were already doing that.

lxfind added inline comments.Mar 24 2021, 8:43 AM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
52

Do we still need this if we can materialize them?

ChuanqiXu updated this revision to Diff 333200.Mar 24 2021, 8:53 PM
ChuanqiXu added a reviewer: jmorse.

Use replaceVariableLocationOp and remove EnhanceDebugability option.

When looping over the debug users of instructions and replacing the dbg.values operands, you should now use the "replaceVariableLocationOp" helper instead -- this avoids re-creating a dbg.value intrinsic, and should handle variadic variable locations (the patches for which are 95% landed) seamlessly.

Overall the aim of the patch makes sense to me, it's storing values used by debug intrinsics in the coroutine frame for later retrieval right? Is there a risk that some later coroutine optimisation will try to optimise the frame and overwrite the stored values-for-debug-users -- this can happen with stack slot colouring at the other end of the compiler.

To my knowledge, if other passes try to optimize the frame, other passes are responsible to maintain the debug information.

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
52

Yeah, I think so. If there is anything we are missing, we should improve on the materialization part.

ChuanqiXu updated this revision to Diff 335764.Apr 7 2021, 2:47 AM

Rebase with trunk and add option to control whether we would collect variables used only by dbg.vaues.
Although we can use materializing process to reduce the extra space, we can't prove we could get the same result with -g or not.
Previously, I think if we can't get the same result, the materialization part should take responsibility. However, I find that a pattern recently:
State before:

%a = alloca  ...
; ... some uses who wouldn't cross suspend points
call to coro.suspend()
; ... alternative path
store %v to %a
dbg.value(metadata %v, dbg variable for %a, ...

Then after some optimization, the store in some path would be eliminated:

%a = alloca  ...
; ... some uses who wouldn't cross suspend points
call to coro.suspend()
; ... alternative path
dbg.value(metadata %v, dbg variable for %a, ...

Then %a wouldn't be put into the frame before this patch. And it would be in the frame after this patch if we don't offer an option to control this. And materialization couldn't do much about this.

So I think the assumption before isn't solid, we should add option to control it indeed.

I would recommend to pick a name for -enhance-debug-with-coroutine that does not contain debug. The name should make clear that this changes codegen, so nobody gets the idea of turning it as part of a -g option. For example, something like -coroutine-spill-all-locals.

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
2654

Can you add a comment explaining what is being done and why?

llvm/lib/Transforms/Coroutines/CoroSplit.cpp
650

The changes in this file are NFC and look good.

I would recommend to pick a name for -enhance-debug-with-coroutine that does not contain debug. The name should make clear that this changes codegen, so nobody gets the idea of turning it as part of a -g option. For example, something like -coroutine-spill-all-locals.

The accurate semantic for this option is collect values used by dbg.values, which would make coroutine frame get larger in some cases. This option would work only if user turns -g on. The name -coroutine-spill-all-locals is not correct and it is hard for me to get a name which shows it would change the codegen and enhance the debugbility. I am wondering if we can remove this option and make collecting values used by dbg.values a default behavior. From our analysis above, it should be unusual to make the coroutine frame larger. @lxfind what's your opinion?

I would recommend to pick a name for -enhance-debug-with-coroutine that does not contain debug. The name should make clear that this changes codegen, so nobody gets the idea of turning it as part of a -g option. For example, something like -coroutine-spill-all-locals.

The accurate semantic for this option is collect values used by dbg.values, which would make coroutine frame get larger in some cases. This option would work only if user turns -g on. The name -coroutine-spill-all-locals is not correct and it is hard for me to get a name which shows it would change the codegen and enhance the debugbility. I am wondering if we can remove this option and make collecting values used by dbg.values a default behavior. From our analysis above, it should be unusual to make the coroutine frame larger. @lxfind what's your opinion?

That's just not something allowed by LLVM policy. The presence of -g may not affect the generated code. Similarly, the presence or absence of debug intrinsics may not affect the generated code either. Is there a way to control this behavior without relying on debug info? For, example, could we make sure all allocas are spilled, regardless of whether they are referred to by a dbg.declare?

lxfind added a comment.Apr 9 2021, 2:28 PM

I am also a bit skeptical about this patch.
Specifically I agree that dbg info should not affect the coroutine frame.
From what I can tell, dbg intrinsics are location insensitive, i.e. they can be put at any location. (correct me if I am wrong)
So a use of any value by dbg intrinsics should not cause the value to be put on the frame.
Perhaps we could first move all dbg intrinsics to a dedicated location (e.g. right after corobegin) before creating the frame, and copy them during function cloning?

I am also a bit skeptical about this patch.
Specifically I agree that dbg info should not affect the coroutine frame.
From what I can tell, dbg intrinsics are location insensitive, i.e. they can be put at any location. (correct me if I am wrong)
So a use of any value by dbg intrinsics should not cause the value to be put on the frame.
Perhaps we could first move all dbg intrinsics to a dedicated location (e.g. right after corobegin) before creating the frame, and copy them during function cloning?

It looks better to collect the dbg.values who wouldn't change the layout of the frame. I would try to make it.

ChuanqiXu updated this revision to Diff 336803.Apr 12 2021, 4:31 AM

Address the comments. Now we would only collect dbg.values who would't change the layout of the frame.

Could you update the description of this patch as well?
It's still not fully clear to me what problem this patch is aiming at resolving.

Also, it seems that the newly added test "coro-debug-dbg.values-O2-nouse.ll" passes even without this patch

ChuanqiXu edited the summary of this revision. (Show Details)Apr 15 2021, 6:53 PM

Also, it seems that the newly added test "coro-debug-dbg.values-O2-nouse.ll" passes even without this patch

It is intended behavior. This patch tries to test whether we would collect dbg.values whose value isn't in the coroutine frame. This test can't pass for previous revision. If someone in the future wants to enhance the debugbility for coroutine frame, this test would tell him that we should think more to change the layout of the coroutine frame for debugability.

Could you update the description of this patch as well?
It's still not fully clear to me what problem this patch is aiming at resolving.

Previously all dbg.values would be lost after CoroSplit pass. dbg.value is debug info compiler produced under optimization. This patch wants to remain as many dbg.value as possible while it don't want to change the layout of frame for debug info.

Previously all dbg.values would be lost after CoroSplit pass. dbg.value is debug info compiler produced under optimization. This patch wants to remain as many dbg.value as possible while it don't want to change the layout of frame for debug info.

Why would all dbg.values be lost after CoroSplit pass? As long as they are in the resume code path, they would be kept in the .resume function, right?
I also tried to run coro-split on coro-debug-dbg.values-O2.ll, I do see a ton of dbg.value in the generate functions even without this patch.
Could you elaborate what is the exact problem, perhaps ideally with an example?

Previously all dbg.values would be lost after CoroSplit pass. dbg.value is debug info compiler produced under optimization. This patch wants to remain as many dbg.value as possible while it don't want to change the layout of frame for debug info.

Why would all dbg.values be lost after CoroSplit pass? As long as they are in the resume code path, they would be kept in the .resume function, right?
I also tried to run coro-split on coro-debug-dbg.values-O2.ll, I do see a ton of dbg.value in the generate functions even without this patch.
Could you elaborate what is the exact problem, perhaps ideally with an example?

The problem here is when we collect values to be put on the frame and insert values to the frame, we didn't care about the dbg.values.
Here is the example:

define void @f(i32 %i, i32 %j) {
; ...
; coro.suspend
call void @llvm.dbg.value(metadata i32 %i, metadata !1, metadata !DIExpression()) ; tell the value !1 from %i
call void @llvm.dbg.value(metadata i32 %j, metadata !2, metadata !DIExpression()) ; tell the value !2 from %j
}

Then in the .resume function:

define internal fastcc void @f.resume(%f.Frame* noalias nonnull align 16 dereferenceable(80) %FramePtr)
; ...
call void @llvm.dbg.value(metadata i32 %i, metadata !1, metadata !DIExpression())
call void @llvm.dbg.value(metadata i32 %j, metadata !2, metadata !DIExpression())

These two dbg.values are trying to tell values of !1 and !2 from values %i and %j. But wait, there is no definition for %I and %j in the resume function. So finally, the resume function would become:

define internal fastcc void @f.resume(%f.Frame* noalias nonnull align 16 dereferenceable(80) %FramePtr)
; ...
call void @llvm.dbg.value(metadata i32 undef, metadata !1, metadata !DIExpression())
call void @llvm.dbg.value(metadata i32 undef, metadata !2, metadata !DIExpression())

This patch wants to salvage the dbg.values whose first operand is in the frame already. The reason why you can find some dbg.values in resume function is that their first operand is either constant or defined in the resume function (like alloca who wouldn't across suspend points).

Previously all dbg.values would be lost after CoroSplit pass. dbg.value is debug info compiler produced under optimization. This patch wants to remain as many dbg.value as possible while it don't want to change the layout of frame for debug info.

Thanks for explaining. It makes sense.
Does it only happen to parameters, or does it also happen to allocas? (I would be more interested to see how this affects allocas, since parameters are somewhat special)
And how is this related to O2?

Previously all dbg.values would be lost after CoroSplit pass. dbg.value is debug info compiler produced under optimization. This patch wants to remain as many dbg.value as possible while it don't want to change the layout of frame for debug info.

Thanks for explaining. It makes sense.
Does it only happen to parameters, or does it also happen to allocas? (I would be more interested to see how this affects allocas, since parameters are somewhat special)
And how is this related to O2?

It happens to all the value possible in the frame including parameters, allocas and other values possible (if any). Since we only collect values who are already in the frame, this patch shouldn't affect any thing.

And how is this related to O2?

Since dbg.value would be generated with optimization enabled. If optimization isn't enabled, compiler would generates dbg.declare as the debug info, which is already handled by previous patches. It is confusing to use O2 here, my bad. I would edit it.

ChuanqiXu updated this revision to Diff 339083.Apr 20 2021, 7:51 PM
ChuanqiXu retitled this revision from [RFC] [[Coroutine] [Debug] Salvage dbg.value at O2 to [RFC] [[Coroutine] [Debug] Salvage dbg.values.
ChuanqiXu edited the summary of this revision. (Show Details)

Address the comments. Removing confusing describes about O2.

aprantl added inline comments.Apr 21 2021, 9:50 AM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
1636

// Also update metadata uses in dbg.value intrinsics.

2584

// Manually add dbg.value metadata uses of I.

aprantl added inline comments.Apr 21 2021, 9:51 AM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
172

We should either handle unhandled cases gracefully (by ignoring them) or make sure they are impossible and put an llvm_unreachable here.

The test is insufficient. I would like to see that we cover dbg.value used on different cases: parameters, allocas, spills.

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
1638

Why doesn't replaceUsesOfWith work? Could you double check that replaceUsesOfWith doesn't work for DbgValueInst?

1694–1699

If replaceUsesOfWith works for DbgValueInst, then I think you can just append all DIs into UsersToUpdate above.

1943

Similarly, if replaceUsesOfWith works for DbgValueInst, this isn't necessary

2664–2676

I think it's cleaner to move this code into the loop above.
After the check of all users, you can check if FrameData.Spills contains &I, and if so, do this DbgValues thing.

@lxfind wrote:

Why doesn't replaceUsesOfWith work? Could you double check that replaceUsesOfWith doesn't work for DbgValueInst?

This is by design. Metadata-uses of llvm:Value may not affect code generation, thus they also aren't found by any operation working on llvm::Uses.

@lxfind wrote:

Why doesn't replaceUsesOfWith work? Could you double check that replaceUsesOfWith doesn't work for DbgValueInst?

This is by design. Metadata-uses of llvm:Value may not affect code generation, thus they also aren't found by any operation working on llvm::Uses.

I don't think so.
Take a look at this: https://github.com/llvm/llvm-project/blob/main/llvm/lib/IR/User.cpp#L34-L37
I think replaceUsesOfWith works for DbgValueInst.

It's true though that .users() does not include DbgValueInst, but that's a separate question than whether replaceUsesOfWith works.

ChuanqiXu updated this revision to Diff 339451.Apr 21 2021, 7:53 PM

Address the comments.

@lxfind wrote:

Why doesn't replaceUsesOfWith work? Could you double check that replaceUsesOfWith doesn't work for DbgValueInst?

This is by design. Metadata-uses of llvm:Value may not affect code generation, thus they also aren't found by any operation working on llvm::Uses.

I don't think so.
Take a look at this: https://github.com/llvm/llvm-project/blob/main/llvm/lib/IR/User.cpp#L34-L37
I think replaceUsesOfWith works for DbgValueInst.

It's true though that .users() does not include DbgValueInst, but that's a separate question than whether replaceUsesOfWith works.

Good Catcha. It looks like you are right. I have thought DbgVariablesInst uses different def-use architecture since users() didn't count DbgValueInst. I should have looked into the details.

Also please add tests to cover more types of variables (allocas, spills..)

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
2682–2690

I think it's cleaner to move this code into the loop above.
After the check of all users, you can check if FrameData.Spills contains &I, and if so, do this DbgValues thing.

ChuanqiXu updated this revision to Diff 339550.Apr 22 2021, 3:59 AM

Address the comments:

  • Add test case for alloca and spills.
ChuanqiXu updated this revision to Diff 339558.Apr 22 2021, 4:28 AM

Address the comments.

Also please add tests to cover more types of variables (allocas, spills..)

Sorry for forgetting addressing the inline comments.

Harbormaster completed remote builds in B100221: Diff 339558.
lxfind added inline comments.Apr 22 2021, 9:28 AM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
2665

This might not work. The iteration of instructions is not guaranteed to be in dominator order.
This part of the code should be:

if (FrameData.Spills.count(&I)) {
  SmallVector<DbgValueInst *, 16> DVIs;
  findDbgValues(DVIs, &I);
  ...
}
lxfind added inline comments.Apr 22 2021, 9:36 AM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
1686–1691

I don't know enough about this, but I wonder if these code (handling of dbgdeclare and dbgvalues) could be merged by simply traversing thorough all DVIs in findDbgUsers and do a replaceUsesOfWith on each of them?

Address the comments.

ChuanqiXu updated this revision to Diff 339937.Apr 23 2021, 1:43 AM

Address the comments.

ChuanqiXu added inline comments.Apr 23 2021, 1:51 AM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
2665

Sorry for not considering the dominator order. The example in the inline comment don't cover the Argument. I prefer to move these codes out of the loop instead of add it here and in the loop traversing arguments. I think it may not be much confusing if we move the codes out of the current loop.

lxfind added inline comments.Apr 23 2021, 7:28 AM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
1668–1669

Comments need to be updated

ChuanqiXu updated this revision to Diff 340413.Apr 25 2021, 6:47 PM

Address the comments.

ChuanqiXu updated this revision to Diff 340430.Apr 25 2021, 9:49 PM

Address the comments.

lxfind added inline comments.Apr 27 2021, 1:32 PM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
2667–2670

This is basically iterating over FrameData.Spills and obtain .first on the iterator

ChuanqiXu added inline comments.Apr 27 2021, 6:47 PM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
2667–2670

Yes, I am just wondering if the code now makes the semantics more clear. If we write

for (auto *Iter : FrameData.Spills) {
     auto *V = Iter.first;
     // ...
}

we need to comment that we would handle alloca specially. And the style tells it in the level of code.

lxfind added inline comments.Apr 27 2021, 8:40 PM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
2667–2670

Well for one, we process Spills and Allocas separately everywhere (that's the intention of separating them as two fields anyway). And another reason is getAllDefs() reconstruct a new vector to hold all elements, which is costly.

ChuanqiXu updated this revision to Diff 341067.Apr 27 2021, 8:55 PM

Address the comments.

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
2667–2670

Done, your suggestion makes sense.

lxfind accepted this revision.Apr 27 2021, 9:16 PM

LGTM

aprantl accepted this revision.May 12 2021, 7:05 PM
aprantl added inline comments.
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
1669

gdb -> dbg (2x)

This revision is now accepted and ready to land.May 12 2021, 7:05 PM
This revision was automatically updated to reflect the committed changes.
hoy added a subscriber: hoy.Dec 2 2021, 8:49 AM
hoy added inline comments.
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
2515

I've seen this caused issue when Storage is a block terminator, e.g, an invoke instruction returning a piece of allocated memory. I don't quite get the full context of how we are salvaging debug info, but do you think we can just skip such case here?

StephenTozer added inline comments.
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
2515

Not my patch, but do you have a reproducer for this? I'm not sure what circumstances are causing Storage to be a block terminator, but it doesn't sound like something that should happen.

hoy added inline comments.Dec 2 2021, 9:03 AM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
2515

I've been trying to get a small repro with llvm-reduce which has already run 12 hours on our internal giant functions resulted by a combination of AutoFDO and aggressive inlining. The block terminator I'm seeing is an invoke instruction calling new and the code here result in the following pattern

%3516 = invoke noalias nonnull i8* @_Znwm(i64 120) #79 to label %3517 unwind label %3585, !dbg !332197 
call void @llvm.dbg.declare(metadata i8* %3516, metadata !332198, metadata !DIExpression(DW_OP_plus_uconst, 16, DW_OP_plus_uconst, 56)), !dbg !332217 
call void @llvm.dbg.declare(metadata i8* %3516, metadata !332278, metadata !DIExpression(DW_OP_plus_uconst, 16, DW_OP_plus_uconst, 8)), !dbg !332298
hoy added inline comments.Dec 2 2021, 1:38 PM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
2515

So I managed to get a repro case that crashed the compiler: https://reviews.llvm.org/D114994 . Unfortunately it is still not small though with a single function only.

@hoy The rationale here is that the dbg.declare intrinsics might be lost due to coroutine split. So we move the dbg.declare intrinsics to the place where the Storage is just created. So if it caused crash, I think we could skip the kind of case. Or it would be better to insert dbg.declare to the start of successor blocks in the case of invoke. (Would any other terminator return a value?)
BTW, there is nothing in your repro link.

hoy added a comment.Dec 2 2021, 6:14 PM

@hoy The rationale here is that the dbg.declare intrinsics might be lost due to coroutine split. So we move the dbg.declare intrinsics to the place where the Storage is just created. So if it caused crash, I think we could skip the kind of case. Or it would be better to insert dbg.declare to the start of successor blocks in the case of invoke. (Would any other terminator return a value?)
BTW, there is nothing in your repro link.

An invoke instruction is a call instruction that can throw exceptions. It is a block terminator and can return a value. Looks like dbg.declare can only appear once so maybe we could only move it to one of the successors, or just skip this case.

There is a test file in my repro link. The file is probably too large (40MB) to be rendered. Sorry for not being able to get a smaller repro case.

In D97673#3168737, @hoy wrote:

@hoy The rationale here is that the dbg.declare intrinsics might be lost due to coroutine split. So we move the dbg.declare intrinsics to the place where the Storage is just created. So if it caused crash, I think we could skip the kind of case. Or it would be better to insert dbg.declare to the start of successor blocks in the case of invoke. (Would any other terminator return a value?)
BTW, there is nothing in your repro link.

An invoke instruction is a call instruction that can throw exceptions. It is a block terminator and can return a value. Looks like dbg.declare can only appear once so maybe we could only move it to one of the successors, or just skip this case.

There is a test file in my repro link. The file is probably too large (40MB) to be rendered. Sorry for not being able to get a smaller repro case.

Got it. I would try to fix this. Thanks for reporting this.

hoy added a comment.Dec 2 2021, 6:47 PM
In D97673#3168737, @hoy wrote:

@hoy The rationale here is that the dbg.declare intrinsics might be lost due to coroutine split. So we move the dbg.declare intrinsics to the place where the Storage is just created. So if it caused crash, I think we could skip the kind of case. Or it would be better to insert dbg.declare to the start of successor blocks in the case of invoke. (Would any other terminator return a value?)
BTW, there is nothing in your repro link.

An invoke instruction is a call instruction that can throw exceptions. It is a block terminator and can return a value. Looks like dbg.declare can only appear once so maybe we could only move it to one of the successors, or just skip this case.

There is a test file in my repro link. The file is probably too large (40MB) to be rendered. Sorry for not being able to get a smaller repro case.

Got it. I would try to fix this. Thanks for reporting this.

Thanks. I'm pasting the problematic code snippet in case my link doesn't work for you, but please let me know so that I'll find another way to share with you.

%660 = invoke noalias nonnull i8* @_Znwm(i64 120) #20
          to label %.noexc unwind label %702, !dbg !304801

.noexc:                                           ; preds = %653
  call void @llvm.dbg.value(metadata %"class.std::allocator.6883"* undef, metadata !304802, metadata !DIExpression(DW_OP_LLVM_fragment, 0, 64)), !dbg !304803
  call void @llvm.dbg.value(metadata i8* %660, metadata !304802, metadata !DIExpression(DW_OP_LLVM_fragment, 64, 64)), !dbg !304803
  call void @llvm.dbg.value(metadata i8* %660, metadata !304804, metadata !DIExpression()), !dbg !304803
  call void @llvm.dbg.value(metadata i8* %660, metadata !303143, metadata !DIExpression()), !dbg !304805
  %661 = bitcast i8* %660 to %"class.std::_Sp_counted_base"*, !dbg !304808
  call void @_ZNSt16_Sp_counted_baseILN9__gnu_cxx12_Lock_policyE2EEC2Ev(%"class.std::_Sp_counted_base"* nonnull dereferenceable(16) %661) #15, !dbg !304809
  %662 = bitcast i8* %660 to i32 (...)***, !dbg !304808
  store i32 (...)** bitcast (i8** getelementptr inbounds ({ [7 x i8*] }, { [7 x i8*] }* @_ZTVSt23_Sp_counted_ptr_inplaceIN7scribex17ScribeReadFlavourESaIS1_ELN9__gnu_cxx12_Lock_policyE2EE, i64 0, inrange i32 0, i64 2) to i32 (...)**), i32 (...)*** %662, align 8, !dbg !304808, !tbaa !302581
  call void @llvm.dbg.value(metadata i8* %660, metadata !303156, metadata !DIExpression()), !dbg !304810
  ....
  call void @llvm.dbg.declare(metadata i8* %660, metadata !303236, metadata !DIExpression(DW_OP_plus_uconst, 16, DW_OP_plus_uconst, 8)), !dbg !304825

The last dbg.declare will be moved to right after the first invoke instruction.

@hoy Hi, the problem should be addressed within 84980761a.

hoy added a comment.Dec 2 2021, 10:13 PM

@hoy Hi, the problem should be addressed within 84980761a.

Thanks for the quick turnaround! I have verified that it fixes our build break.

It's a relatively obscure instruction I think, but we should also handle callbr in the same manner as invoke, as that is also a terminator instruction that defines an SSA value. A similar solution should suffice, as it also has a single destination where the value is expected to be valid (getDefaultDest()) that the dbg.declare can be inserted at the start of.