This is an archive of the discontinued LLVM Phabricator instance.

[LICM][Coroutine] Don't sink stores to coroutine suspend point.
AbandonedPublic

Authored by junparser on Sep 17 2020, 1:44 AM.

Details

Summary

This patch is the refactor of D86190.
See pr46990(https://bugs.llvm.org/show_bug.cgi?id=46990). Instead of using idea (not sinking store instructions to loop exit blocks which cross coro.suspend intrinsics) in D86190. This patch split exit block of the suspend point alone and ignore it when do promotion.

Test Plan: check-llvm, pr46990

Diff Detail

Event Timeline

junparser created this revision.Sep 17 2020, 1:44 AM
junparser requested review of this revision.Sep 17 2020, 1:44 AM

We try to find an general solution to handle some conservative & mismatch analysis in transformations caused by the change of cfg with coro.suspend intrinsic. However afaik, it seems quiet difficult to deal this without change framework fo coroutine.
This patch just fixes the known issue, it may happens in other passes, FYI.

We try to find an general solution to handle some conservative & mismatch analysis in transformations caused by the change of cfg with coro.suspend intrinsic. However afaik, it seems quiet difficult to deal this without change framework fo coroutine.
This patch just fixes the known issue, it may happens in other passes, FYI.

one of the idea is that we do not generate default case for switch instruction of coro.suspend intrinsic. Instead, we find fall through coro.end for coro.suspend directly in corosplit pass. The only uncertain thing is whether fall through coro.end is moved by passes which may change the semantic.

If there are transformations that are specifically illegal with corouties, I'd like to see a section in https://llvm.org/docs/Coroutines.html describing what transforms are and are not legal. Just saying that the current version of the coroutine lowering code can't handle it isn't sufficient.

If there are transformations that are specifically illegal with corouties, I'd like to see a section in https://llvm.org/docs/Coroutines.html describing what transforms are and are not legal. Just saying that the current version of the coroutine lowering code can't handle it isn't sufficient.

Thanks for the suggestion, I'll add it later.
Let's say for this case, we do not wanna move any instructions (maybe this is too restrictive, using !mayReadOrWriteMemory && !mayHaveSideEffects) to path between coroutine suspend point (coro.suspend intrinsic) and return to caller (coro.end intrinsic).

efriedma requested changes to this revision.Sep 18 2020, 1:57 PM

Okay, I'll wait for the update.

llvm/lib/Transforms/Scalar/LICM.cpp
400

Is there any particular reason to expect the edge from the coro_suspend to the coro_end is a switch?

This revision now requires changes to proceed.Sep 18 2020, 1:57 PM

sorry for the late reply.

llvm/lib/Transforms/Scalar/LICM.cpp
400

The result of this coro.suspend lead to basic blocks where coroutine should proceed when suspended (-1), resumed (0) or destroyed (1). we also generate switch from front end.

Although sometimes it would be optimized to conditional branch because of that both of resumed and destroyed case lead to the same basic block. since we are inside loop, the resumed basic block and destroyed basic block should not be same.

junparser updated this revision to Diff 293319.Sep 21 2020, 7:30 PM

Update the doc. @efriedma is this ok?

junparser updated this revision to Diff 293324.Sep 21 2020, 7:45 PM
efriedma added inline comments.Sep 22 2020, 12:31 PM
llvm/docs/Coroutines.rst
1561

This isn't what I was looking for.

Say I'm someone who doesn't know anything about coroutines, and I'm writing a new optimization. What do I need to know about coroutines so I don't break them?

llvm/lib/Transforms/Scalar/LICM.cpp
400

the resumed basic block and destroyed basic block should not be same

"should not be the same" here seems to mean "usually won't be the same"; is there any rule that actually precludes it?

Can you please extend the testing to cover at least some of these cases: multiple predecesssors, multiple exit blocks, multiple predecessors with a coroutine switch?

llvm/test/Transforms/LICM/sink-with-coroutine.ll
2

Add new pass manager testing using -passes.

3

Add memoryssa verification with -verify-memoryssa

11

Nit: update test to use the variables generated.
For example here, replace bb0: with [[BB0]], generated one line above.

18

Following up on the question of "is it always a switch", this looks like it could be have optimized to:

 %cond = icmp eq i8  [[SUSPEND]], 0
br i8 [[AWAIT_READY]], label [[BB2_SPLIT_LOOP_EXIT1:%.*]]

For example, simple loop unswitch may do such a change, and licm is in the same LPM, so perhaps this is possible.

junparser added inline comments.Sep 22 2020, 8:08 PM
llvm/docs/Coroutines.rst
1561

The coroutine framework is designed to not interfere other transformation, Ideally, we do not need know anything about coroutine when write optimization. so this is still temporary fix.

llvm/lib/Transforms/Scalar/LICM.cpp
400

the resumed basic block and destroyed basic block should not be same

"should not be the same" here seems to mean "usually won't be the same"; is there any rule that actually precludes it?

Except we write llvm ir directly with two branches, it is always a switch inside loop.

Can you please extend the testing to cover at least some of these cases: multiple predecesssors, multiple exit blocks, multiple predecessors with a coroutine switch?

ok, I'll add later.

llvm/test/Transforms/LICM/sink-with-coroutine.ll
18

I missed the cleanup label. the switch is

switch i8 %suspend, label %bb2 [
i8 0, label %await.ready
i8 1, label %cleanup
]

We try to find an general solution to handle some conservative & mismatch analysis in transformations caused by the change of cfg with coro.suspend intrinsic. However afaik, it seems quiet difficult to deal this without change framework fo coroutine.
This patch just fixes the known issue, it may happens in other passes, FYI.

one of the idea is that we do not generate default case for switch instruction of coro.suspend intrinsic. Instead, we find fall through coro.end for coro.suspend directly in corosplit pass. The only uncertain thing is whether fall through coro.end is moved by passes which may change the semantic.

@lxfind, what do you think about this idea?

efriedma added inline comments.Sep 23 2020, 12:25 PM
llvm/docs/Coroutines.rst
1561

If you're not sure how it's supposed to work, I'm not sure we want to add temporary hacks before it's settled.

If you want to push forward with this anyway, please add comments to LICM to make it clear that this is a temporary state.

We try to find an general solution to handle some conservative & mismatch analysis in transformations caused by the change of cfg with coro.suspend intrinsic. However afaik, it seems quiet difficult to deal this without change framework fo coroutine.
This patch just fixes the known issue, it may happens in other passes, FYI.

one of the idea is that we do not generate default case for switch instruction of coro.suspend intrinsic. Instead, we find fall through coro.end for coro.suspend directly in corosplit pass. The only uncertain thing is whether fall through coro.end is moved by passes which may change the semantic.

@lxfind, what do you think about this idea?

Are you suggesting to not have an edge from the coro.suspend switch to the coro.end block? Would that make the coro.end BB a dead BB and could potentially get removed?

This is likely the right solution, but we do need to make it more general. In order for this to always work, I wonder if the proper place to insert this transformation is in LoopSimplify, which is a pass always executed before any Loop pass. That might guarantee that any loop pass will be able to handle this properly. I am not too familiar with loop passes, but maybe someone else could comment on that.

We try to find an general solution to handle some conservative & mismatch analysis in transformations caused by the change of cfg with coro.suspend intrinsic. However afaik, it seems quiet difficult to deal this without change framework fo coroutine.
This patch just fixes the known issue, it may happens in other passes, FYI.

one of the idea is that we do not generate default case for switch instruction of coro.suspend intrinsic. Instead, we find fall through coro.end for coro.suspend directly in corosplit pass. The only uncertain thing is whether fall through coro.end is moved by passes which may change the semantic.

@lxfind, what do you think about this idea?

Are you suggesting to not have an edge from the coro.suspend switch to the coro.end block? Would that make the coro.end BB a dead BB and could potentially get removed?

No, There is always path from entry to coro.end BB.

This is likely the right solution, but we do need to make it more general. In order for this to always work, I wonder if the proper place to insert this transformation is in LoopSimplify, which is a pass always executed before any Loop pass. That might guarantee that any loop pass will be able to handle this properly. I am not too familiar with loop passes, but maybe someone else could comment on that.

Loop exit-block insertion in LoopSimplify pass guarantees that all exit blocks from the loop only have predecessors from inside of the loop. It will break the dedicated form of exit blocks when we consider ignore coro.suspend path. Maybe there is other idea i haven't thought about.

We try to find an general solution to handle some conservative & mismatch analysis in transformations caused by the change of cfg with coro.suspend intrinsic. However afaik, it seems quiet difficult to deal this without change framework fo coroutine.
This patch just fixes the known issue, it may happens in other passes, FYI.

one of the idea is that we do not generate default case for switch instruction of coro.suspend intrinsic. Instead, we find fall through coro.end for coro.suspend directly in corosplit pass. The only uncertain thing is whether fall through coro.end is moved by passes which may change the semantic.

@lxfind, what do you think about this idea?

Are you suggesting to not have an edge from the coro.suspend switch to the coro.end block? Would that make the coro.end BB a dead BB and could potentially get removed?

No, There is always path from entry to coro.end BB.

This is likely the right solution, but we do need to make it more general. In order for this to always work, I wonder if the proper place to insert this transformation is in LoopSimplify, which is a pass always executed before any Loop pass. That might guarantee that any loop pass will be able to handle this properly. I am not too familiar with loop passes, but maybe someone else could comment on that.

Loop exit-block insertion in LoopSimplify pass guarantees that all exit blocks from the loop only have predecessors from inside of the loop. It will break the dedicated form of exit blocks when we consider ignore coro.suspend path. Maybe there is other idea i haven't thought about.

I wasn't referring to ignoring the coro.suspend path, but to move this code in this patch into the LoopSimplify pass, so that all loop passes can benefit, not sure if that makes sense. Anyway, I don't have a better idea.. So don't let that block you here.

We try to find an general solution to handle some conservative & mismatch analysis in transformations caused by the change of cfg with coro.suspend intrinsic. However afaik, it seems quiet difficult to deal this without change framework fo coroutine.
This patch just fixes the known issue, it may happens in other passes, FYI.

one of the idea is that we do not generate default case for switch instruction of coro.suspend intrinsic. Instead, we find fall through coro.end for coro.suspend directly in corosplit pass. The only uncertain thing is whether fall through coro.end is moved by passes which may change the semantic.

@lxfind, what do you think about this idea?

Are you suggesting to not have an edge from the coro.suspend switch to the coro.end block? Would that make the coro.end BB a dead BB and could potentially get removed?

No, There is always path from entry to coro.end BB.

This is likely the right solution, but we do need to make it more general. In order for this to always work, I wonder if the proper place to insert this transformation is in LoopSimplify, which is a pass always executed before any Loop pass. That might guarantee that any loop pass will be able to handle this properly. I am not too familiar with loop passes, but maybe someone else could comment on that.

Loop exit-block insertion in LoopSimplify pass guarantees that all exit blocks from the loop only have predecessors from inside of the loop. It will break the dedicated form of exit blocks when we consider ignore coro.suspend path. Maybe there is other idea i haven't thought about.

I wasn't referring to ignoring the coro.suspend path, but to move this code in this patch into the LoopSimplify pass, so that all loop passes can benefit, not sure if that makes sense. Anyway, I don't have a better idea.. So don't let that block you here.

Thanks for the suggestion.

junparser planned changes to this revision.Sep 28 2020, 5:21 PM

@efriedma Consider that temporary fix maybe not the best solution for such issue. We would try to find other general solution. Before that, I'll keep this patch as plan changes.

junparser abandoned this revision.Mar 18 2021, 6:16 PM