This is an archive of the discontinued LLVM Phabricator instance.

[Coroutines] Add the newly generated SCCs back to the CGSCC work queue after CoroSplit actually happened
ClosedPublic

Authored by lxfind on Feb 1 2021, 11:28 AM.

Details

Summary

Relevant discussion can be found at: https://lists.llvm.org/pipermail/llvm-dev/2021-January/148197.html
In the existing design, An SCC that contains a coroutine will go through the folloing passes:
Inliner -> CoroSplitPass (fake) -> FunctionSimplificationPipeline -> Inliner -> CoroSplitPass (real) -> FunctionSimplificationPipeline

The first CoroSplitPass doesn't do anything other than putting the SCC back to the queue so that the entire pipeline can repeat.
As you can see, we run Inliner twice on the SCC consecutively without doing any real split, which is unnecessary and likely unintended.
What we really wanted is this:
Inliner -> FunctionSimplificationPipeline -> CoroSplitPass -> FunctionSimplificationPipeline
(note that we don't really need to run Inliner again on the ramp function after split).

Hence the way we do it here is to move CoroSplitPass to the end of the CGSCC pipeline, make it once for real, insert the newly generated SCCs (the clones) back to the pipeline so that they can be optimized, and also add a function simplification pipeline after CoroSplit to optimize the post-split ramp function.

This approach also conforms to how the new pass manager works instead of relying on an adhoc post split cleanup, making it ready for full switch to new pass manager eventually.

By looking at some of the changes to the tests, we can already observe that this changes allows for more optimizations applied to coroutines.

Diff Detail

Event Timeline

lxfind created this revision.Feb 1 2021, 11:28 AM
lxfind requested review of this revision.Feb 1 2021, 11:28 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 1 2021, 11:28 AM
aeubanks added inline comments.Feb 1 2021, 12:33 PM
llvm/lib/Passes/PassBuilder.cpp
649

we can still keep this here right? since we'll run the function simplification pipeline on the split coroutine
also, this should be duplicated in buildFunctionSimplificationPipeline()

llvm/lib/Transforms/Coroutines/CoroSplit.cpp
1826

this could regress coroutines under the legacy PM behavior since we don't run postSplitCleanup() here? but maybe we don't care about the legacy PM so much anymore.

2124

are all Clones guaranteed to be in different SCC?

lxfind added inline comments.Feb 1 2021, 12:46 PM
llvm/lib/Transforms/Coroutines/CoroSplit.cpp
1826

Not really. CoroSplit for legacy PM works in a similar way. Function simplifications happen after CoroSplit (if not O0) as well.

2124

No. But CWorklist is a map so redundant insertions won't happen.

I can't really review the subtle pass-management stuff here, but if you're going to be able to eliminate the hacky post-pass cleanup pipeline, I'm overjoyed.

lxfind added inline comments.Feb 1 2021, 1:41 PM
llvm/lib/Passes/PassBuilder.cpp
649

We could. It does create a logically strange situation, that CoroElide will run once before CoroSplit.
How to decide whether CoroElide should go into simplification pipeline or optimization pipeline?

ChuanqiXu added inline comments.Feb 1 2021, 6:21 PM
llvm/lib/Passes/PassBuilder.cpp
649

I agree with that it is strange that CoroElide would run once before CoroSplit. But if we only run CoroElide after CoroSplit, we may miss many optimization point. Given that a coroutine A was inlined into a coroutine B, and CoroElide run on B before the CoroSplit for B.

define B(...) {
    entry:
         %A.handle = ...; The handle for A
         ;... ; Some operations with A
        call @llvm.coro.suspend(%B.handle, false); which is possible if the implementation of initial_suspend of the promise type of B isn't std::experimental::suspend_always
        ;... Some other operations with A
        call @llvm.subfn.addr(%A.handle, 1) ; means the end for the lifetime of A
}

In this example, CoroElide would find that every path from the definition of %A.handle to the exit point of B would pass llvm.subfn.addr(%A.handle, 1) which means the end for the lifetime of %A.handle so that CoroElide may find that %A.handle isn't escaped in B. So it is possible that A maybe elided in B.

However if we run CoroElide only if B have been split, we may miss the opportunity to elide A in B. I would give the example if needed.

I agree with that the design of CoroElide seems to be a little strange and need to be refactored likely. But I don't get a clear and feasible solution now.

llvm/lib/Transforms/Coroutines/CoroSplit.cpp
2121–2125

I am not familiar with the Shape.ABI other than coro::ABI:switch. But the diff line seems strange, it looks like that condition gets weaker.

ChuanqiXu added inline comments.Feb 1 2021, 6:26 PM
llvm/lib/Transforms/Coroutines/CoroSplit.cpp
1135

Is it necessary to remain the verify process?

rjmccall added inline comments.Feb 1 2021, 9:51 PM
llvm/lib/Transforms/Coroutines/CoroSplit.cpp
1135

It's probably okay for it to go away now.

ChuanqiXu added inline comments.Feb 2 2021, 1:27 AM
llvm/lib/Transforms/Coroutines/CoroSplit.cpp
1135

I prefer to remain the verification *personally* since it had reported many errors for me :).

By the way, is everyone comfortable with dropping the test lines using the legacy pass manager for Coroutine?
It's really painful to update the tests to keep both legacy pass manager and new pass manager, now that the pass sequence becomes much different. (CoroSplit runs late in CGSCC pipeline for NewPM, but early in legacy pm).
If we want to keep the legacy PM tests for a while for coroutine, I could wait until everyone is comfortable before updating this patch again.

By the way, is everyone comfortable with dropping the test lines using the legacy pass manager for Coroutine?
It's really painful to update the tests to keep both legacy pass manager and new pass manager, now that the pass sequence becomes much different. (CoroSplit runs late in CGSCC pipeline for NewPM, but early in legacy pm).
If we want to keep the legacy PM tests for a while for coroutine, I could wait until everyone is comfortable before updating this patch again.

What's most important is to test the pass. If you want to port most of the tests to unconditionally test under the new PM, that's fine with me. We should keep some tests that validate basic full-coroutine-pipeline behavior under the old PM until we actually retire the old PM, though.

By the way, is everyone comfortable with dropping the test lines using the legacy pass manager for Coroutine?
It's really painful to update the tests to keep both legacy pass manager and new pass manager, now that the pass sequence becomes much different. (CoroSplit runs late in CGSCC pipeline for NewPM, but early in legacy pm).
If we want to keep the legacy PM tests for a while for coroutine, I could wait until everyone is comfortable before updating this patch again.

I think testing is more important than the develop experience. It is more suitable to remove the tests for legacy pass manager when the old pass manager got replaced entirely.

aeubanks added inline comments.Feb 3 2021, 11:37 AM
llvm/lib/Passes/PassBuilder.cpp
649

The simplification pipeline is anything that simplifies/canonicalizes IR. Passes in the optimization may add complexity. If CoroElide mostly ends up cleaning things, it should definitely be in the simplification pipeline.

I'm not super familiar with the coroutine passes, but given that we now unconditionally re-add the current SCC to the queue, CoroElide should run before and after CoroSplit on any given function that is split. If that's good enough then I'd say keep this here.

llvm/lib/Transforms/Coroutines/CoroSplit.cpp
2121–2125

I believe that's intentional, and a big part of this patch. We want to re-add the current SCC (and the split SCCs) any time we split an SCC. Before we weren't properly doing that.

2124

Oh I didn't know that, thanks!

ChuanqiXu added inline comments.Feb 3 2021, 6:30 PM
llvm/lib/Passes/PassBuilder.cpp
649

CoroElide pass would try to transform heap allocation to stack allocation, which is a major optimization in Coroutine Passes. To my understanding, it isn't equal to simplification. I am not pretty sure about this.

I prefer to run CoroElide before and after CoroSplit for now even it is strange.

aeubanks added inline comments.Feb 3 2021, 7:30 PM
llvm/lib/Passes/PassBuilder.cpp
649

Your description of the pass sounds like simplification/canonicalization to me.

By the way, is everyone comfortable with dropping the test lines using the legacy pass manager for Coroutine?
It's really painful to update the tests to keep both legacy pass manager and new pass manager, now that the pass sequence becomes much different. (CoroSplit runs late in CGSCC pipeline for NewPM, but early in legacy pm).
If we want to keep the legacy PM tests for a while for coroutine, I could wait until everyone is comfortable before updating this patch again.

I think testing is more important than the develop experience. It is more suitable to remove the tests for legacy pass manager when the old pass manager got replaced entirely.

By the way, since the new PM has been enabled by default, all of the tests are no longer testing Legacy PM behavior anymore (unless explicitly specified --enable-new-pm=0)

By the way, is everyone comfortable with dropping the test lines using the legacy pass manager for Coroutine?
It's really painful to update the tests to keep both legacy pass manager and new pass manager, now that the pass sequence becomes much different. (CoroSplit runs late in CGSCC pipeline for NewPM, but early in legacy pm).
If we want to keep the legacy PM tests for a while for coroutine, I could wait until everyone is comfortable before updating this patch again.

I think testing is more important than the develop experience. It is more suitable to remove the tests for legacy pass manager when the old pass manager got replaced entirely.

By the way, since the new PM has been enabled by default, all of the tests are no longer testing Legacy PM behavior anymore (unless explicitly specified --enable-new-pm=0)

To my understand, this means that it is OK to remain the Legacy PM test lines? Since the legacy PM test line would be ignored?

lxfind planned changes to this revision.Feb 18 2021, 8:57 AM

Since we seem to be nearly done with the legacy PM, I think it's acceptable to no longer test it specifically for the coro passes.

lxfind updated this revision to Diff 355110.Jun 28 2021, 9:22 PM

After removing the legacy test command, I was finally able to update this patch. It's now ready for review. I will update the decription to reflect to the latest changes

Herald added a project: Restricted Project. · View Herald TranscriptJun 28 2021, 9:22 PM
lxfind retitled this revision from [RFC][Coroutines] Add the newly generated SCCs back to the CGSCC work queue after CoroSplit actually happened to [Coroutines] Add the newly generated SCCs back to the CGSCC work queue after CoroSplit actually happened.Jun 28 2021, 9:25 PM
lxfind edited the summary of this revision. (Show Details)

note that we don't really need to run Inliner again on the ramp function after split

This isn't accurate. The inline may run again for ramp function after split and it's required by coro elide.

It seems like that we don't need the attribute CORO_PRESPLIT_ATTR any more, do we? If yes, I think we should remove them.

note that we don't really need to run Inliner again on the ramp function after split

This isn't accurate. The inline may run again for ramp function after split and it's required by coro elide.

If there is an inlining opportunity, it should have happened pre-split, right? Is there any reason it didn't happen pre-split but only post-split?

It seems like that we don't need the attribute CORO_PRESPLIT_ATTR any more, do we? If yes, I think we should remove them.

It's still needed by the legacy pass manager. I don't want to break that yet.

ChuanqiXu accepted this revision.Jun 29 2021, 6:55 PM

note that we don't really need to run Inliner again on the ramp function after split

This isn't accurate. The inline may run again for ramp function after split and it's required by coro elide.

If there is an inlining opportunity, it should have happened pre-split, right? Is there any reason it didn't happen pre-split but only post-split?

Since we had prevent inlining coroutine before split, coroutine can't be inlined pre-split. After splitting, due to the SCC changes, the inliner would run again for ramp.

LGTM. But I am not familiar with the pipeline, please wait for 1~2 days in case there are more comments.

llvm/lib/Transforms/Coroutines/CoroSplit.cpp
2110–2114

These are not needed.

This revision is now accepted and ready to land.Jun 29 2021, 6:55 PM

this will run the function simplification pipeline twice on every single function when coroutines are enabled, I don't think that's the intention

I thought the intention was to do all the the re-adding of SCCs inside CoroSplit.cpp, including the SCC with the function that was split

this will run the function simplification pipeline twice on every single function when coroutines are enabled, I don't think that's the intention

I thought the intention was to do all the the re-adding of SCCs inside CoroSplit.cpp, including the SCC with the function that was split

Good point. I was trying to avoid the second inliner on the coroutine ramp function. But I guess the cost will be bigger than the win.

this will run the function simplification pipeline twice on every single function when coroutines are enabled, I don't think that's the intention

I thought the intention was to do all the the re-adding of SCCs inside CoroSplit.cpp, including the SCC with the function that was split

Good point. I was trying to avoid the second inliner on the coroutine ramp function. But I guess the cost will be bigger than the win.

If coroutine ramp function couldn't get inlined, it would disable coroutine elide optimization. Could you elaborate more on why do you want to do that?

If coroutine ramp function couldn't get inlined, it would disable coroutine elide optimization. Could you elaborate more on why do you want to do that?

Ramp function will eventually be inlined, but not when you run Inliner on the inlinee.
Let's say coroutine A calls coroutine B, and eventually we want to inline B into A so that we could perform CoroElide on A.
After B is split, we don't need to run inliner again on B. When we run inliner on A, A will inline B.

If coroutine ramp function couldn't get inlined, it would disable coroutine elide optimization. Could you elaborate more on why do you want to do that?

Ramp function will eventually be inlined, but not when you run Inliner on the inlinee.
Let's say coroutine A calls coroutine B, and eventually we want to inline B into A so that we could perform CoroElide on A.
After B is split, we don't need to run inliner again on B. When we run inliner on A, A will inline B.

Thanks for clarifying.

lxfind updated this revision to Diff 355431.Jun 29 2021, 8:25 PM

Put the post-split ramp function back to the CGSCC worklist

ChuanqiXu added inline comments.Jun 29 2021, 11:00 PM
llvm/lib/Transforms/Coroutines/CoroSplit.cpp
2110–2114

Refactor this into:

LLVM_DEBUG(dbgs() << "CoroSplit: Processing coroutine '" << F.getName()
                      << "' state: " << Attr.getValueAsString() << "\n");

could erase an warning in release build.

aeubanks accepted this revision.Jun 30 2021, 8:56 AM

The CoroSplit/PassBuilder changes lgtm

lxfind added inline comments.Jun 30 2021, 9:52 AM
llvm/lib/Transforms/Coroutines/CoroSplit.cpp
2110–2114

Good catch. How did you catch this? It seems like I don't see warning on my Mac by default.

lxfind updated this revision to Diff 355607.Jun 30 2021, 9:52 AM

fix warning

ychen added a subscriber: ychen.Jun 30 2021, 10:10 AM
ychen added inline comments.
llvm/lib/Transforms/Coroutines/CoroSplit.cpp
2115

drive-by nit: would it be better to move it up near Coroutines.push_back(&N);?

2121–2125

I got your point. So "// All clones will be in the same RefSCC ...." : this is not accurate I think?

lxfind added inline comments.Jun 30 2021, 10:17 AM
llvm/lib/Transforms/Coroutines/CoroSplit.cpp
2121–2125

Note that previously this is done only for Async, Retcon and RetconOnce ABIs, not for the Switch ABI.
I guess that's accurate for those ABIs? But for Switch ABI this is not true.
And before we were not adding back the split functions to the pipeline to be properly optimized. Now we are dong that. This should help improve the performance of the post-split functions.

This revision was landed with ongoing or failed builds.Jun 30 2021, 11:38 AM
This revision was automatically updated to reflect the committed changes.
ychen added inline comments.Jun 30 2021, 11:56 AM
llvm/lib/Transforms/Coroutines/CoroSplit.cpp
2121–2125

I missed the ABI condition. Thanks for the explanation.