Page MenuHomePhabricator

[CGSCC][Coroutine][NewPM] Properly support function splitting/outlining

Authored by aeubanks on Dec 26 2020, 1:47 PM.



Previously when trying to support CoroSplit's function splitting, we
added in a hack that simply added the new function's node into the
original function's SCC ( This is
incorrect since it might be in its own SCC.

Now, more similar to the previous design, we have callers explicitly
notify the LazyCallGraph that a function has been split out from another

In order to properly support CoroSplit, there are two ways functions can
be split out.

One is the normal expected "outlining" of one function into a new one.
The new function may only contain references to other functions that the
original did. The original function must reference the new function. The
new function may reference the original function, which can result in
the new function being in the same SCC as the original function. The
weird case is when the original function indirectly references the new
function, but the new function directly calls the original function,
resulting in the new SCC being a parent of the original function's SCC.
This form of function splitting works with CoroSplit's Switch ABI.

The second way of splitting is more specific to CoroSplit. CoroSplit's
Retcon and Async ABIs split the original function into multiple
functions that all reference each other and are referenced by the
original function. In order to keep the LazyCallGraph in a valid state,
all new functions must be processed together, else some nodes won't be
populated. To keep things simple, this only supports the case where all
new edges are ref edges, and every new function references every other
new function. There can be a reference back from any new function to the
original function, putting all functions in the same RefSCC.

This also adds asserts that all nodes in a (Ref)SCC can reach all other
nodes to prevent future incorrect hacks.

The original hacks in are no longer
necessary since all new functions should have been registered before
calling updateCGAndAnalysisManagerForPass.

This fixes all coroutine tests when opt's -enable-new-pm is true by
default. This also fixes PR48190, which was likely due to the previous
hack breaking SCC invariants.

Diff Detail

Event Timeline

aeubanks created this revision.Dec 26 2020, 1:47 PM
aeubanks requested review of this revision.Dec 26 2020, 1:47 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 26 2020, 1:47 PM
lxfind added inline comments.Dec 29 2020, 12:02 PM

Could NewN be able to reach OriginalN by calling other functions?
For example, before CoroSplit, say A is the coroutine. A calls B, and B calls A.
After split, A references A.resume, A.resume calls B, and B calls A. In this case, they will still be in the same RefSCC?

aeubanks updated this revision to Diff 314047.Dec 29 2020, 6:39 PM

handle cases where split function references any function in original function's RefSCC (and add tests)

aeubanks added inline comments.Dec 29 2020, 6:40 PM

You're completely right, I totally missed that case. Updated the code to handle that (and it's a bit simpler now too which is nice).

Thank you for working on this! LGTM. I will let others who are more familiar with SCC to accept it.
A few nits commented inline.

On a side note, I was somewhat surprised how little interface RefSCC provides, and how much heavy-weight lifting needs to be done by accessing data directly. Not necessarily in this patch, but adding a few more helper methods to RefSCC seems useful.
For example, I imagine having a method in RefSCC like this could be useful:

SCC *RefSCC::addNewSCCWithSingleNode(Node* N, int Index);

which creates a new SCC with a single Node N and insert it into index Index in the SCC list of the RefSCC. I can see this method used 3 times by this patch.
Another is to add a version of LazyCallGraph::initNode that only takes one parameter which is Node (and having the original one calling it).


Would it be better to implement getEdgeKind() by iterating through the list of edges from the Node that corresponds to OriginalFunction? (that way you can also assert on the Ref Edge in release mode without extra overhead)


NewC = OriginalC;


RefSCC *NewRC = OriginalRC

aeubanks updated this revision to Diff 314120.Dec 30 2020, 7:11 AM

address review comments
clang-tidy style

aeubanks marked 2 inline comments as done.Dec 30 2020, 7:11 AM
aeubanks added inline comments.

This is used in cases where we don't have the edges populated yet. For example, the original function's node hasn't been updated to add the new edge when the split function is called. It's also useful for making sure that edges actually correspond to the instructions.
Added a comment.

I initially wanted the user of addSplitFunction() to specify the edge kind so we wouldn't have to calculate it here, but found that it was very easy to choose the wrong edge kind, and it's harder to use for little benefit.

lxfind added inline comments.Dec 30 2020, 8:43 AM

From what I see, this seems a common pattern in code:

LazyCallGraph::Edge &E : N.populate() {...}
aeubanks added inline comments.Dec 30 2020, 12:06 PM

N.populate() will only look through the function once to populate the edges once. If the edges have already been populated, it won't look through the function body again.
For example, updateCGAndAnalysisManagerForPass() has to look through the function body itself in order to find new/removed edges for an existing node.

aeubanks updated this revision to Diff 314144.Dec 30 2020, 1:00 PM

repurpose initNode for use in function splitting (since it's no longer used elsewhere)

ychen added a subscriber: ychen.Dec 30 2020, 7:59 PM
rnk accepted this revision.Jan 5 2021, 4:09 PM


I'm not an SCC expert, but Arthur explained to me how this works, and I trust the explanation.


Woohoo, tests!

This revision is now accepted and ready to land.Jan 5 2021, 4:09 PM