Page MenuHomePhabricator

[Coroutines][2/6] New pass manager: coro-split
ClosedPublic

Authored by modocache on Dec 26 2019, 6:24 AM.

Details

Summary

This patch has four dependencies:

  1. The first in this series of patches that implement coroutine passes in the new pass manager: https://reviews.llvm.org/D71898.
  2. A patch that introduces an API for CGSCC passes to add new reference edges to a LazyCallGraph, updateCGAndAnalysisManagerForCGSCCPass: https://reviews.llvm.org/D72025.
  3. A patch that introduces a CallGraphUpdater helper class that is capable of mutating internal LazyCallGraph state in order to insert new function nodes into a specific SCC: https://reviews.llvm.org/D70927.
  4. And finally, a small edge case fix for updating LazyCallGraph that patch 3 above happens to run into: https://reviews.llvm.org/D72226.

This is the second in a series of patches that ports the LLVM coroutines
passes to the new pass manager infrastructure. This patch implements
'coro-split'.

Some notes:

  • Using the new CGSCC pass manager resulted in IR being printed in the reverse order in some tests. To prevent FileCheck checks from failing due to these reversed orders, this patch splits up test files that test multiple different coroutine functions: specifically coro-alloc-with-param.ll, coro-split-eh.ll, and coro-eh-aware-edge-split.ll.
  • CoroSplit.cpp contained 2 overloads of splitCoroutine, one of which dispatched to the other based on the coroutine ABI being used (C++20 switch-based versus Swift returned-continuation-based). I found this confusing, especially with the additional branching based on CallGraph vs. LazyCallGraph, so I removed the ABI-checking overload of splitCoroutine.

Event Timeline

modocache created this revision.Dec 26 2019, 6:24 AM
Herald added a project: Restricted Project. · View Herald Transcript
wenlei added a subscriber: wenlei.Dec 26 2019, 8:00 AM
jdoerfert added a comment.EditedDec 26 2019, 8:40 AM

As a generic comment: Changes like the addition of const or renaming of the pass to be Legacy could be done in NFC commits prior to this one without pre-commit review (IMHO). That makes this patch smaller. The same holds for splitting tests and functions, if there is no functional change.

Edit: Also consider using the child/parent revision feature to ease the navigation between dependent patches.

Also consider using the child/parent revision feature to ease the navigation between dependent patches.

Oh, thanks! I hadn't realized LLVM's Phabricator instance had this feature. I just edited the parent/child revisions as per your suggestion, they're now displayed as a stack.

Changes like the addition of const or renaming of the pass to be Legacy could be done in NFC commits prior to this one without pre-commit review (IMHO)

Good point! I considered doing so with const, but renaming the passes to Legacy sounds good, too. I'll split out these changes and commit them without review.

My understanding is devirt trigger is only a trick used with Legacy PM to run optimizations on coroutine funclets after coro-split. With NewPM's CGSCCUpdateResult infra, can we communicate that change and request passes directly with ModuleToPostOrderCGSCCPassAdaptor, without artificially introducing the indirect call and devirt?

I would imaging any form of outlining would require this communication if optimization is needed for the outlined region. How is that handled for outlining in general with New PM and can we follow that without the devirt trick?

llvm/lib/Transforms/Coroutines/CoroSplit.cpp
1393

Seems LazyCallGraph wanted every possible (direct or indirect) references to be modeled upfront, and there's check/asserts for that. This change effectively bypasses that. Is slightly less optimal CGSCC order the only consequence of not following that rule?

With coro-split, we do alter the underlying CG, is it safe to populate CG Node without other bookkeeping like informing CGSCCUpdateResult? I'm not familiar with the assumptions and constraints about how CG should be updated, but asking because this populate() is never called outside of call graph construction, and I'm guessing it's a public function only because LazyCallGraphTest needed it..

1466

This doesn't seem necessary as DevirtFn doesn't contain any calls.

My understanding is devirt trigger is only a trick used with Legacy PM to run optimizations on coroutine funclets after coro-split. With NewPM's CGSCCUpdateResult infra, can we communicate that change and request passes directly with ModuleToPostOrderCGSCCPassAdaptor, without artificially introducing the indirect call and devirt?

The devirt trigger here is to restart CGSCC pipeline to run coro-split again to split the coroutine function. There is no need to introduce devirt trigger in NewPM since we call coro-split pass manually.

I would imaging any form of outlining would require this communication if optimization is needed for the outlined region. How is that handled for outlining in general with New PM and can we follow that without the devirt trick?

However, I do think the SCC should be updated using CGSCCUpdateResult in second run of coro-split for outlined functions

The devirt trigger here is to restart CGSCC pipeline to run coro-split again to split the coroutine function. There is no need to introduce devirt trigger in NewPM since we call coro-split pass manually.

Manually schedule the 2nd coro-split pass is only a workaround before we can trigger CGSCC passes on the split funclet like we did for legacy PM. It does the split without restarting CGSCC passes so it works, but it also leaves performance on the table because the split funclets won't go through many opt passes of CGSCC pipeline. Yes, I agree don't need to introduce devirt trigger with new PM, but that's because I think we can request CGSCC passes on split funclet via other mechanism like CGSCCUpdateResult, not just because 2nd coro-split pass is manually scheduled.

That said, for new PM, this patch implemented devirt trigger insertion only, but not the devirt detection part. I would suggest we have all or nothing for a working mechanism of rerunning CGSCC passes for split funclet. Funclets like resume contains actual code, not just stubs, so IMHO fully optimizing these funclets is an essential part of coroutine support.

The devirt trigger here is to restart CGSCC pipeline to run coro-split again to split the coroutine function. There is no need to introduce devirt trigger in NewPM since we call coro-split pass manually.

Manually schedule the 2nd coro-split pass is only a workaround before we can trigger CGSCC passes on the split funclet like we did for legacy PM. It does the split without restarting CGSCC passes so it works, but it also leaves performance on the table because the split funclets won't go through many opt passes of CGSCC pipeline. Yes, I agree don't need to introduce devirt trigger with new PM, but that's because I think we can request CGSCC passes on split funclet via other mechanism like CGSCCUpdateResult, not just because 2nd coro-split pass is manually scheduled.

There are two issues here: 1) coro-split needs run at least twice, we do not need CGSCC pipeline at pre-split stage which coro-split pass just works as a function pass 2) request CGSCC passes on split funclet after 2nd running of coro-split, and coroutine optimization such as coro-elide pass also depends on these optimization.
Manually schedule coro-split twice is a workaround for 1), as for 2) the current pipeline of coroutine in this patch set need be changed.

That said, for new PM, this patch implemented devirt trigger insertion only, but not the devirt detection part. I would suggest we have all or nothing for a working mechanism of rerunning CGSCC passes for split funclet. Funclets like resume contains actual code, not just stubs, so IMHO fully optimizing these funclets is an essential part of coroutine support.

I agree.

modocache updated this revision to Diff 235811.Jan 1 2020, 7:07 PM

As per one of the first review comments, I split out the trivial parts of this patch, such as the 'legacy' pass renaming, and committed them separately. Here's an updated version of this patch without those 'NFC' changes. Next I'll work on the other comments, both the ones left here and on D71903. I'll set the status of this diff to 'Changes Planned' to reflect that. Thanks for the reviews so far!

modocache planned changes to this revision.Jan 1 2020, 7:07 PM
modocache updated this revision to Diff 236241.Jan 5 2020, 5:41 AM
modocache edited the summary of this revision. (Show Details)
modocache removed a subscriber: wenlei.

Thanks for the reviews! This latest revision makes use of the private LazyCallGraph API exposed via the CallGraphUpdater class from D70927, and the helper function updateCGAndAnalysisManagerForCGSCCPass added in D72025, in order to properly update the call graph. It also uses CGSCCUpdateResult in order to enqueue the second phase of coro-split, instead of relying on function devirtualization. This patch exposed what I think is a bug, so it now also relies on a fix for that behavior, D72226.

wenlei added a comment.Jan 5 2020, 3:25 PM

Thanks for making the changes to schedule 2nd coro-split so funclets get fully optimized. A few comments inline, looks good otherwise.

llvm/lib/Transforms/Coroutines/CoroSplit.cpp
1410

Now that we have CallGraphUpdater, can we use CGUpdater.reanalyzeFunction(N.getFunction()) instead?

1573

nit: I would say "new pass manager" instead of "experimental pass manager".. it's not that experimental at this moment.

1595

Is it necessary to request RefSCC to be reprocessed? I thought UR.CWorklist.insert(&C) should be enough..

modocache marked 3 inline comments as done.Jan 5 2020, 3:50 PM

Thanks for the quick review! I'll send an update in a sec.

llvm/lib/Transforms/Coroutines/CoroSplit.cpp
1410

Yeah, I considered that as well. I opted not to because it's just extra steps:

  1. We grab node's function via LazyCallGraph::Node::getFunction, just to have CallGraphUpdater::reanalyzeFunction call LazyCallGraph::get to lookup the node in the graph, or create one if it's not in the graph. But we know this node in the graph, and we already have a reference to it, so why look it up?
  2. CallGraphUpdater::reanalyzeFunction then looks up the SCC, which we also already have a reference to. As far as I can tell there's no way CallGraphUpdater::registerOutlinedFunction above would have split the SCC, so there's no reason to re-lookup an SCC we already have a reference to.
  3. And finally, the call to updateCGAndAnalysisManagerForCGSCCPass.

I figured, why not just do (3)?

That being said, I do think in future patches there's room to consolidate the legacy pass and the new pass's interactions with the call graph. My personal preference would be to wait until D70927 is merged to do that, though.

1573

Oh, good catch! I agree, will do.

1595

Will do! I was trying to do whatever seemed most similar to the legacy pass manager's repeater, I wasn't sure whether that was the outer repeating loop over RefSCCs, or the inner SCC loop.

wenlei added inline comments.Jan 5 2020, 4:06 PM
llvm/lib/Transforms/Coroutines/CoroSplit.cpp
1410

yeah, there's a bit of abstraction overhead for the CallGraphUpdater layer over calling updateCGAndAnalysisManagerForCGSCCPass. Technically CallGraphUpdater::reanalyzeFunction is not really needed anywhere as we can almost always call updateCGAndAnalysisManagerForCGSCCPass directly, but I thought communicating through CallGraphUpdater is cleaner. I don't have a strong opinion on this though - @jdoerfert may want to weigh in on this.

modocache updated this revision to Diff 236272.Jan 5 2020, 4:18 PM

Addressed review comments.

modocache marked 4 inline comments as done and an inline comment as not done.Jan 5 2020, 4:19 PM
modocache updated this revision to Diff 236511.Jan 6 2020, 9:21 PM

To avoid an assert when compiling recursive coroutine functions, use the new API I added in D72226: CallGraphUpdater::registerReferredToOutlinedFunction. This is also a more valid method of updating the call graph. Previously, we were inserting coroutine funclets into the same SCC, despite the fact that they did not form a strongly-connected cycle with the original coroutine function. Now, we insert them as referred-to-by the original coroutine.

modocache updated this revision to Diff 236670.Jan 7 2020, 12:43 PM

Rebase onto my update of D72226.

modocache updated this revision to Diff 236676.Jan 7 2020, 1:26 PM

Apply clang-format.

modocache marked an inline comment as done.Jan 7 2020, 2:19 PM
modocache added inline comments.
llvm/lib/Transforms/Coroutines/CoroSplit.cpp
1595

@wenlei Now that I've updated D72226 to outline the funclets into the same RefSCC (so, *not* the same SCC) as the coroutine, I think we now may actually want to re-enqueue the entire RefSCC. Thoughts?

wenlei added inline comments.Jan 7 2020, 4:48 PM
llvm/lib/Transforms/Coroutines/CoroSplit.cpp
1595

Re-enqueue is for rerun the 2nd coro-split pass, which I though is orthogonal to the actual outlining where funclet is now created as new SCC in the same RefSCC. So as long as we get to rerun 2nd coro-split on those functions I think we should be fine, and re-enqueue SCC does that. But if we re-enqueue the RefSCC, it's functionally correct as well, just we might rerun pipeline for some extra SCCs in the same RefSCC while it's not really needed..

modocache updated this revision to Diff 242246.Feb 3 2020, 8:14 PM

Use the latest outlining interface from D72226, which was just updated based on the latest version of D70927.

modocache updated this revision to Diff 244565.Feb 13 2020, 8:09 PM

Rebase past D69930 so the patch applies cleanly to trunk.

modocache updated this revision to Diff 244697.Feb 14 2020, 9:57 AM

I found a test that tested the legacy PM opt -coro-split, but not the new PM opt -passes=coro-split. Now it tests both.

I'm not sure I'm the right person to review this so do not wait for my OK. I left some drive-by comments below though.

llvm/lib/Transforms/Coroutines/CoroSplit.cpp
19

Drive by: Isn't this accurate anymore? Should we replace it instead of removing it? I like file comments that explain what is going on.

1332

Drive-by: Changes to these functions splitCoroutine seem NFC to me. If that is the case they can/should be committed beforehand as such (w/o review).

1410

If you are in a new PM only code region, and the LazyCall graph helpers are sufficient, I don't see any reason not to call them directly. The CallGraphUpdater comes in handy if you are in generic code that needs to work in both PMs, or if it has abstractions that make your live easier.

wenlei accepted this revision.Feb 17 2020, 8:15 PM

Sorry for the delay, LGTM except the file comment as Johannes pointed out. We've tested this stack of coroutine patches with multiple large services internally, all working as expected, so I'm going to accept this stack as review feedbacks are addressed too.

This revision is now accepted and ready to land.Feb 17 2020, 8:15 PM
modocache updated this revision to Diff 245076.Feb 17 2020, 8:27 PM

Thanks for the reviews! You're right, @jdoerfert, the file header comment is still totally applicable. I put it back in.

This revision was automatically updated to reflect the committed changes.