Page MenuHomePhabricator

[LICM][Coroutine] Don't sink stores from loops with coro.suspend instructions.
AbandonedPublic

Authored by junparser on Aug 18 2020, 8:00 PM.

Details

Summary

See pr46990(https://bugs.llvm.org/show_bug.cgi?id=46990). LICM should not sink store instructions to loop exit blocks which cross coro.suspend intrinsics. This breaks semantic of coro.suspend intrinsic which return to caller directly. Also this leads to use-after-free if the coroutine is freed before control returns to the caller in multithread environment.

This patch disable promotion by check whether loop contains coro.suspend intrinsics.
Also Promotion optimization will run in CGSCC pipeline after coroutine function been splitted.

Test Plan: check-llvm, pr46990

Diff Detail

Unit TestsFailed

TimeTest
60 mslinux > flang-Unit.Lower/_/FlangLoweringOpenMPTests::OpenMPLoweringTest.Barrier
Note: Google Test filter = OpenMPLoweringTest.Barrier [==========] Running 1 test from 1 test case. [----------] Global test environment set-up.
70 mslinux > flang-Unit.Lower/_/FlangLoweringOpenMPTests::OpenMPLoweringTest.EmptyParallel
Note: Google Test filter = OpenMPLoweringTest.EmptyParallel [==========] Running 1 test from 1 test case. [----------] Global test environment set-up.
70 mslinux > flang-Unit.Lower/_/FlangLoweringOpenMPTests::OpenMPLoweringTest.TaskWait
Note: Google Test filter = OpenMPLoweringTest.TaskWait [==========] Running 1 test from 1 test case. [----------] Global test environment set-up.
80 mslinux > flang-Unit.Lower/_/FlangLoweringOpenMPTests::OpenMPLoweringTest.TaskYield
Note: Google Test filter = OpenMPLoweringTest.TaskYield [==========] Running 1 test from 1 test case. [----------] Global test environment set-up.

Event Timeline

junparser created this revision.Aug 18 2020, 8:00 PM
junparser requested review of this revision.Aug 18 2020, 8:00 PM
junparser edited the summary of this revision. (Show Details)

Are you sure this is the right fix? At first glance, ensuring %p lives long enough seems like the sort of thing coroutine lowering should be able to deal with. If it isn't, at the very least we need documentation describing what optimizations are legal.

lxfind added inline comments.Aug 19 2020, 4:36 PM
llvm/test/Transforms/LICM/sink-with-coroutine.ll
37

It seems to me the problem is that the LICM generated code is mixing the loop exit basic block vs the return basic block. It just happens that both the suspend switch and loop exit condition jumps to the same block %bb2, but they mean different things.
I think we should still be able to do LICM when there is coroutine, but it's just that we need to make sure the instructions are moved to the loop exit block, while the suspend switch should remain jumping to the return block?

Are you sure this is the right fix? At first glance, ensuring %p lives long enough seems like the sort of thing coroutine lowering should be able to deal with. If it isn't, at the very least we need documentation describing what optimizations are legal.

I just disable promotion before coroutine lowering and count on promotion after lowering which might not be the right fix. @lxfind's comment seems to be the right solution. Need some investigation

junparser added inline comments.Aug 19 2020, 8:57 PM
llvm/test/Transforms/LICM/sink-with-coroutine.ll
37

Thanks for the suggestion. This might be the right fix, I'll check it later.

junparser abandoned this revision.Aug 23 2020, 6:36 PM

@junparser I have been re-thinking about this patch, and now I actually think this is the right patch to go.
Other than the correctness reasons, I realize that it is almost always harmful to do LICM for a loop that contains a coroutine suspension. The reason is that in most cases LICM moves memory operations out of the loop. (LICM does move constant calls as well but it is relatively rare to have constant calls that's not inlined, furthermore there are other complications such as TLS access and etc.).
For memory operations, moving them out of the loop does not make the loop faster if there is a coroutine suspend in the loop, because the value moved out needs to be put on the coroutine frame anyway. So LICM not only does not eliminate memory access but increases the frame size.
I propose to bring this patch back. (I can help send it too)
What do you think?