Page MenuHomePhabricator

[RFC] [[Coroutine] [Debug] Salvage dbg.values
Needs ReviewPublic

Authored by ChuanqiXu on Mar 1 2021, 3:18 AM.

Details

Summary

The previous implementation of coro-split didn't collect values used by dbg instructions into the spills which made a log debug info unavailable with optimization on.
This patch tries to collect these uses which are used by dbg.values. In this way, the debugbility of coroutine could be as powerful as normal functions with optimization on.

To avoid enlarging the coroutine frame, this patch only collects dbg.value whose value is already in the coroutine frame. This decision may make some debug info getting unavailable. But if we are with optimization on, the performance issue should be considered first. And this patch would make the debugbility of coroutine to be better only without changing the layout of the frame.

Test-plan: check-llvm

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I am not sure if %0 = bit cast %a to ... is the majority redundant fields of coroutine frame if we enable this patch. My original idea is to add a rematerialization process to reduce the coroutine frame like register allocation. However, it maybe a little hard and benefit less. It sounds more easy and beneficial to use original pointer by value tracking. Then I think this problem may be not introduced by this patch. Here is my example:

%0 = bitcast %a to ...
call @llvm.coro.suspend 
; ....
use of %a
use of %0

Then both %a and %0 would be put in the frame, which is totally redundant. We could see if there are examples other than bit cast in other patch.

We already do materialization though. I wonder why that doesn't cover the case? Can we take advantage of that?

Yes, it makes since to cover the case. I would work on this later. Thank you!

ChuanqiXu updated this revision to Diff 332907.Mar 24 2021, 2:38 AM

Materialize bit cast for dbg.values as suggested.

jmorse added a subscriber: jmorse.Mar 24 2021, 8:34 AM

When looping over the debug users of instructions and replacing the dbg.values operands, you should now use the "replaceVariableLocationOp" helper instead -- this avoids re-creating a dbg.value intrinsic, and should handle variadic variable locations (the patches for which are 95% landed) seamlessly.

Overall the aim of the patch makes sense to me, it's storing values used by debug intrinsics in the coroutine frame for later retrieval right? Is there a risk that some later coroutine optimisation will try to optimise the frame and overwrite the stored values-for-debug-users -- this can happen with stack slot colouring at the other end of the compiler.

I'm not sure how the rest of the community will feel about the approach; IIRC there's a "fake use" patch floating around that does this for all (non-coroutine) variables, which wasn't landed in the end.

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
53

IMO: the description should explicitly refer to the fact that codegen will change as a result of -g, to avoid any users experiencing unexpected behaviour.

llvm/test/Transforms/Coroutines/coro-debug-dbg.values-O2.ll
6–8 ↗(On Diff #332907)

nit: IMO you should check that the dbg.values operand is a specific LLVM-IR value (i.e., the frame pointer), rather than just checking the type. That protects against some future optimisation or error producing a pointer-typed undef as the dbg.value operand.

jmorse added inline comments.Mar 24 2021, 8:39 AM
llvm/test/Transforms/Coroutines/coro-debug-dbg.values-O2.ll
6–8 ↗(On Diff #332907)

Scratch that, I didn't notice you were already doing that.

lxfind added inline comments.Mar 24 2021, 8:43 AM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
51

Do we still need this if we can materialize them?

ChuanqiXu updated this revision to Diff 333200.Mar 24 2021, 8:53 PM
ChuanqiXu added a reviewer: jmorse.

Use replaceVariableLocationOp and remove EnhanceDebugability option.

When looping over the debug users of instructions and replacing the dbg.values operands, you should now use the "replaceVariableLocationOp" helper instead -- this avoids re-creating a dbg.value intrinsic, and should handle variadic variable locations (the patches for which are 95% landed) seamlessly.

Overall the aim of the patch makes sense to me, it's storing values used by debug intrinsics in the coroutine frame for later retrieval right? Is there a risk that some later coroutine optimisation will try to optimise the frame and overwrite the stored values-for-debug-users -- this can happen with stack slot colouring at the other end of the compiler.

To my knowledge, if other passes try to optimize the frame, other passes are responsible to maintain the debug information.

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
51

Yeah, I think so. If there is anything we are missing, we should improve on the materialization part.

ChuanqiXu updated this revision to Diff 335764.Apr 7 2021, 2:47 AM

Rebase with trunk and add option to control whether we would collect variables used only by dbg.vaues.
Although we can use materializing process to reduce the extra space, we can't prove we could get the same result with -g or not.
Previously, I think if we can't get the same result, the materialization part should take responsibility. However, I find that a pattern recently:
State before:

%a = alloca  ...
; ... some uses who wouldn't cross suspend points
call to coro.suspend()
; ... alternative path
store %v to %a
dbg.value(metadata %v, dbg variable for %a, ...

Then after some optimization, the store in some path would be eliminated:

%a = alloca  ...
; ... some uses who wouldn't cross suspend points
call to coro.suspend()
; ... alternative path
dbg.value(metadata %v, dbg variable for %a, ...

Then %a wouldn't be put into the frame before this patch. And it would be in the frame after this patch if we don't offer an option to control this. And materialization couldn't do much about this.

So I think the assumption before isn't solid, we should add option to control it indeed.

I would recommend to pick a name for -enhance-debug-with-coroutine that does not contain debug. The name should make clear that this changes codegen, so nobody gets the idea of turning it as part of a -g option. For example, something like -coroutine-spill-all-locals.

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
2292

Can you add a comment explaining what is being done and why?

llvm/lib/Transforms/Coroutines/CoroSplit.cpp
650

The changes in this file are NFC and look good.

I would recommend to pick a name for -enhance-debug-with-coroutine that does not contain debug. The name should make clear that this changes codegen, so nobody gets the idea of turning it as part of a -g option. For example, something like -coroutine-spill-all-locals.

The accurate semantic for this option is collect values used by dbg.values, which would make coroutine frame get larger in some cases. This option would work only if user turns -g on. The name -coroutine-spill-all-locals is not correct and it is hard for me to get a name which shows it would change the codegen and enhance the debugbility. I am wondering if we can remove this option and make collecting values used by dbg.values a default behavior. From our analysis above, it should be unusual to make the coroutine frame larger. @lxfind what's your opinion?

I would recommend to pick a name for -enhance-debug-with-coroutine that does not contain debug. The name should make clear that this changes codegen, so nobody gets the idea of turning it as part of a -g option. For example, something like -coroutine-spill-all-locals.

The accurate semantic for this option is collect values used by dbg.values, which would make coroutine frame get larger in some cases. This option would work only if user turns -g on. The name -coroutine-spill-all-locals is not correct and it is hard for me to get a name which shows it would change the codegen and enhance the debugbility. I am wondering if we can remove this option and make collecting values used by dbg.values a default behavior. From our analysis above, it should be unusual to make the coroutine frame larger. @lxfind what's your opinion?

That's just not something allowed by LLVM policy. The presence of -g may not affect the generated code. Similarly, the presence or absence of debug intrinsics may not affect the generated code either. Is there a way to control this behavior without relying on debug info? For, example, could we make sure all allocas are spilled, regardless of whether they are referred to by a dbg.declare?

lxfind added a comment.Apr 9 2021, 2:28 PM

I am also a bit skeptical about this patch.
Specifically I agree that dbg info should not affect the coroutine frame.
From what I can tell, dbg intrinsics are location insensitive, i.e. they can be put at any location. (correct me if I am wrong)
So a use of any value by dbg intrinsics should not cause the value to be put on the frame.
Perhaps we could first move all dbg intrinsics to a dedicated location (e.g. right after corobegin) before creating the frame, and copy them during function cloning?

I am also a bit skeptical about this patch.
Specifically I agree that dbg info should not affect the coroutine frame.
From what I can tell, dbg intrinsics are location insensitive, i.e. they can be put at any location. (correct me if I am wrong)
So a use of any value by dbg intrinsics should not cause the value to be put on the frame.
Perhaps we could first move all dbg intrinsics to a dedicated location (e.g. right after corobegin) before creating the frame, and copy them during function cloning?

It looks better to collect the dbg.values who wouldn't change the layout of the frame. I would try to make it.

ChuanqiXu updated this revision to Diff 336803.Mon, Apr 12, 4:31 AM

Address the comments. Now we would only collect dbg.values who would't change the layout of the frame.

Could you update the description of this patch as well?
It's still not fully clear to me what problem this patch is aiming at resolving.

Also, it seems that the newly added test "coro-debug-dbg.values-O2-nouse.ll" passes even without this patch

ChuanqiXu edited the summary of this revision. (Show Details)Thu, Apr 15, 6:53 PM

Also, it seems that the newly added test "coro-debug-dbg.values-O2-nouse.ll" passes even without this patch

It is intended behavior. This patch tries to test whether we would collect dbg.values whose value isn't in the coroutine frame. This test can't pass for previous revision. If someone in the future wants to enhance the debugbility for coroutine frame, this test would tell him that we should think more to change the layout of the coroutine frame for debugability.

Could you update the description of this patch as well?
It's still not fully clear to me what problem this patch is aiming at resolving.

Previously all dbg.values would be lost after CoroSplit pass. dbg.value is debug info compiler produced under optimization. This patch wants to remain as many dbg.value as possible while it don't want to change the layout of frame for debug info.

Previously all dbg.values would be lost after CoroSplit pass. dbg.value is debug info compiler produced under optimization. This patch wants to remain as many dbg.value as possible while it don't want to change the layout of frame for debug info.

Why would all dbg.values be lost after CoroSplit pass? As long as they are in the resume code path, they would be kept in the .resume function, right?
I also tried to run coro-split on coro-debug-dbg.values-O2.ll, I do see a ton of dbg.value in the generate functions even without this patch.
Could you elaborate what is the exact problem, perhaps ideally with an example?

Previously all dbg.values would be lost after CoroSplit pass. dbg.value is debug info compiler produced under optimization. This patch wants to remain as many dbg.value as possible while it don't want to change the layout of frame for debug info.

Why would all dbg.values be lost after CoroSplit pass? As long as they are in the resume code path, they would be kept in the .resume function, right?
I also tried to run coro-split on coro-debug-dbg.values-O2.ll, I do see a ton of dbg.value in the generate functions even without this patch.
Could you elaborate what is the exact problem, perhaps ideally with an example?

The problem here is when we collect values to be put on the frame and insert values to the frame, we didn't care about the dbg.values.
Here is the example:

define void @f(i32 %i, i32 %j) {
; ...
; coro.suspend
call void @llvm.dbg.value(metadata i32 %i, metadata !1, metadata !DIExpression()) ; tell the value !1 from %i
call void @llvm.dbg.value(metadata i32 %j, metadata !2, metadata !DIExpression()) ; tell the value !2 from %j
}

Then in the .resume function:

define internal fastcc void @f.resume(%f.Frame* noalias nonnull align 16 dereferenceable(80) %FramePtr)
; ...
call void @llvm.dbg.value(metadata i32 %i, metadata !1, metadata !DIExpression())
call void @llvm.dbg.value(metadata i32 %j, metadata !2, metadata !DIExpression())

These two dbg.values are trying to tell values of !1 and !2 from values %i and %j. But wait, there is no definition for %I and %j in the resume function. So finally, the resume function would become:

define internal fastcc void @f.resume(%f.Frame* noalias nonnull align 16 dereferenceable(80) %FramePtr)
; ...
call void @llvm.dbg.value(metadata i32 undef, metadata !1, metadata !DIExpression())
call void @llvm.dbg.value(metadata i32 undef, metadata !2, metadata !DIExpression())

This patch wants to salvage the dbg.values whose first operand is in the frame already. The reason why you can find some dbg.values in resume function is that their first operand is either constant or defined in the resume function (like alloca who wouldn't across suspend points).

Previously all dbg.values would be lost after CoroSplit pass. dbg.value is debug info compiler produced under optimization. This patch wants to remain as many dbg.value as possible while it don't want to change the layout of frame for debug info.

Thanks for explaining. It makes sense.
Does it only happen to parameters, or does it also happen to allocas? (I would be more interested to see how this affects allocas, since parameters are somewhat special)
And how is this related to O2?

Previously all dbg.values would be lost after CoroSplit pass. dbg.value is debug info compiler produced under optimization. This patch wants to remain as many dbg.value as possible while it don't want to change the layout of frame for debug info.

Thanks for explaining. It makes sense.
Does it only happen to parameters, or does it also happen to allocas? (I would be more interested to see how this affects allocas, since parameters are somewhat special)
And how is this related to O2?

It happens to all the value possible in the frame including parameters, allocas and other values possible (if any). Since we only collect values who are already in the frame, this patch shouldn't affect any thing.

And how is this related to O2?

Since dbg.value would be generated with optimization enabled. If optimization isn't enabled, compiler would generates dbg.declare as the debug info, which is already handled by previous patches. It is confusing to use O2 here, my bad. I would edit it.

ChuanqiXu updated this revision to Diff 339083.Tue, Apr 20, 7:51 PM
ChuanqiXu retitled this revision from [RFC] [[Coroutine] [Debug] Salvage dbg.value at O2 to [RFC] [[Coroutine] [Debug] Salvage dbg.values.
ChuanqiXu edited the summary of this revision. (Show Details)

Address the comments. Removing confusing describes about O2.

aprantl added inline comments.Wed, Apr 21, 9:50 AM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
1274

// Also update metadata uses in dbg.value intrinsics.

2222

// Manually add dbg.value metadata uses of I.

aprantl added inline comments.Wed, Apr 21, 9:51 AM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
171

We should either handle unhandled cases gracefully (by ignoring them) or make sure they are impossible and put an llvm_unreachable here.

The test is insufficient. I would like to see that we cover dbg.value used on different cases: parameters, allocas, spills.

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
1276

Why doesn't replaceUsesOfWith work? Could you double check that replaceUsesOfWith doesn't work for DbgValueInst?

1332–1337

If replaceUsesOfWith works for DbgValueInst, then I think you can just append all DIs into UsersToUpdate above.

1581

Similarly, if replaceUsesOfWith works for DbgValueInst, this isn't necessary

2302–2314

I think it's cleaner to move this code into the loop above.
After the check of all users, you can check if FrameData.Spills contains &I, and if so, do this DbgValues thing.

@lxfind wrote:

Why doesn't replaceUsesOfWith work? Could you double check that replaceUsesOfWith doesn't work for DbgValueInst?

This is by design. Metadata-uses of llvm:Value may not affect code generation, thus they also aren't found by any operation working on llvm::Uses.

@lxfind wrote:

Why doesn't replaceUsesOfWith work? Could you double check that replaceUsesOfWith doesn't work for DbgValueInst?

This is by design. Metadata-uses of llvm:Value may not affect code generation, thus they also aren't found by any operation working on llvm::Uses.

I don't think so.
Take a look at this: https://github.com/llvm/llvm-project/blob/main/llvm/lib/IR/User.cpp#L34-L37
I think replaceUsesOfWith works for DbgValueInst.

It's true though that .users() does not include DbgValueInst, but that's a separate question than whether replaceUsesOfWith works.

ChuanqiXu updated this revision to Diff 339451.Wed, Apr 21, 7:53 PM

Address the comments.

@lxfind wrote:

Why doesn't replaceUsesOfWith work? Could you double check that replaceUsesOfWith doesn't work for DbgValueInst?

This is by design. Metadata-uses of llvm:Value may not affect code generation, thus they also aren't found by any operation working on llvm::Uses.

I don't think so.
Take a look at this: https://github.com/llvm/llvm-project/blob/main/llvm/lib/IR/User.cpp#L34-L37
I think replaceUsesOfWith works for DbgValueInst.

It's true though that .users() does not include DbgValueInst, but that's a separate question than whether replaceUsesOfWith works.

Good Catcha. It looks like you are right. I have thought DbgVariablesInst uses different def-use architecture since users() didn't count DbgValueInst. I should have looked into the details.

Also please add tests to cover more types of variables (allocas, spills..)

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
2320–2328

I think it's cleaner to move this code into the loop above.
After the check of all users, you can check if FrameData.Spills contains &I, and if so, do this DbgValues thing.

ChuanqiXu updated this revision to Diff 339550.Thu, Apr 22, 3:59 AM

Address the comments:

  • Add test case for alloca and spills.
ChuanqiXu updated this revision to Diff 339558.Thu, Apr 22, 4:28 AM

Address the comments.

Also please add tests to cover more types of variables (allocas, spills..)

Sorry for forgetting addressing the inline comments.

Harbormaster completed remote builds in B100221: Diff 339558.
lxfind added inline comments.Thu, Apr 22, 9:28 AM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
2303

This might not work. The iteration of instructions is not guaranteed to be in dominator order.
This part of the code should be:

if (FrameData.Spills.count(&I)) {
  SmallVector<DbgValueInst *, 16> DVIs;
  findDbgValues(DVIs, &I);
  ...
}
lxfind added inline comments.Thu, Apr 22, 9:36 AM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
1324–1329

I don't know enough about this, but I wonder if these code (handling of dbgdeclare and dbgvalues) could be merged by simply traversing thorough all DVIs in findDbgUsers and do a replaceUsesOfWith on each of them?

Address the comments.

ChuanqiXu updated this revision to Diff 339937.Fri, Apr 23, 1:43 AM

Address the comments.

ChuanqiXu added inline comments.Fri, Apr 23, 1:51 AM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
2303

Sorry for not considering the dominator order. The example in the inline comment don't cover the Argument. I prefer to move these codes out of the loop instead of add it here and in the loop traversing arguments. I think it may not be much confusing if we move the codes out of the current loop.

lxfind added inline comments.Fri, Apr 23, 7:28 AM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
1306–1307

Comments need to be updated

ChuanqiXu updated this revision to Diff 340413.Sun, Apr 25, 6:47 PM

Address the comments.

ChuanqiXu updated this revision to Diff 340430.Sun, Apr 25, 9:49 PM

Address the comments.

lxfind added inline comments.Tue, Apr 27, 1:32 PM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
2305–2308

This is basically iterating over FrameData.Spills and obtain .first on the iterator

ChuanqiXu added inline comments.Tue, Apr 27, 6:47 PM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
2305–2308

Yes, I am just wondering if the code now makes the semantics more clear. If we write

for (auto *Iter : FrameData.Spills) {
     auto *V = Iter.first;
     // ...
}

we need to comment that we would handle alloca specially. And the style tells it in the level of code.

lxfind added inline comments.Tue, Apr 27, 8:40 PM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
2305–2308

Well for one, we process Spills and Allocas separately everywhere (that's the intention of separating them as two fields anyway). And another reason is getAllDefs() reconstruct a new vector to hold all elements, which is costly.

ChuanqiXu updated this revision to Diff 341067.Tue, Apr 27, 8:55 PM

Address the comments.

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
2305–2308

Done, your suggestion makes sense.

lxfind accepted this revision.Tue, Apr 27, 9:16 PM

LGTM