Page MenuHomePhabricator

[PM] Provide an initial, minimal port of the inliner to the new pass manager.
ClosedPublic

Authored by chandlerc on Sep 5 2016, 1:08 AM.

Details

Summary

This doesn't implement *every* feature of the existing inliner, but
tries to implement the most important ones for building a functional
optimization pipeline and beginning to sort out bugs, regressions, and
other problems.

Notable, but intentional omissions:

  • No alloca merging support. Why? Because it isn't clear we want to do this at all. Active discussion and investigation is going on to remove it, so for simplicity I omitted it.
  • No support for trying to iterate on "internally" devirtualized calls. Why? Because it adds what I suspect is inappropriate coupling for little or no benefit. We will have an outer iteration system that tracks devirtualization including that from function passes and iterates already. We should improve that rather than approximate it here.
  • Optimization remarks. Why? Purely to make the patch smaller, no other reason at all.

Note that of these, the last two are the ones I expect to need to
implement eventually. The last one I'll probably do almost immediately.
But I wanted to skip it in the initial patch to try to focus the change
as much as possible as there is already a lot of code moving around and
both of these *could* be skipped without really disrupting the core
logic.

A summary of the different things happening here:

  1. Adding the usual new PM class and rigging.
  1. Fixing minor underlying assumptions in the inline cost analysis or inline logic that don't generally hold in the new PM world.
  1. Adding the core pass logic which is in essence a loop over the calls in the nodes in the call graph. This is a bit duplicated from the old inliner, but only a handful of lines could realistically be shared. (I tried at first, and it really didn't help anything.) All told, this is only about 100 lines of code, and most of that is the mechanics of wiring up analyses from the new PM world.
  1. Updating the LazyCallGraph (in the new PM) based on the *newly inlined* calls and references. This is very minimal because we cannot form cycles.
  1. When inlining removes the last use of a function, eagerly updating the call graph and deleting the function so that any "one use remaining" inline cost heuristics are immediately refined. This is pretty minor in the inliner at 15 or so lines.
  1. After all the inlining for a particular function, updating the LazyCallGraph and the CGSCC pass manager to reflect the removed call edges and function references. Both of these can happen just be removing the call instruction that is inlined, but I've implemented this as a generalized "DCE" update because we run InstSimplify as we process inlined instructions and I don't want to constrain how far we constant fold. Ultimately, it adds no real complexity to handle the full "DCE" cases. The logic here is similar but not precisely the same as the fully general function-pass update. The reason for the difference is for efficiency. We can build the necessary sets up-front and early exit in the cases where all called functions or referenced functions are accounted for. All told this is just over 100 lines. I'd still like to find a way to simplify this, as it feels like more code than should be necessary but I've not found anything that is really an improvement. (Mostly found things that make it shorter but harder to read and understand.)
  1. Refactoring the existing CGSCC update logic to share as much code as possible when implementing #6.

While the patch delta is somewhat large, much of this is moving code out
to helper functions. I can separate these into NFC refactoring patches
that I land ahead of time but it didn't make sense as I would have no
way to explain the *particular* refactoring without adding the second
caller to that API here.

The really substantial delta is 100 lines of inliner and under 300 lines
of CGSCC update, which seems fairly reasonable for the core of all this
madness. =] The rest are comments, APIs in headers, and boiler plate fro
the new pass.

Depends on D24225

Diff Detail

Repository
rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Prazek added inline comments.Sep 12 2016, 2:30 PM
lib/Analysis/CGSCCPassManager.cpp
140–141 ↗(On Diff #70307)

why not using?

199–200 ↗(On Diff #70307)

same.

269 ↗(On Diff #70307)

I know that it is not your change, but auto

388–391 ↗(On Diff #70307)

usings?

432 ↗(On Diff #70307)

auto?

434 ↗(On Diff #70307)

auto

sanjoy requested changes to this revision.Sep 29 2016, 11:35 PM
sanjoy edited edge metadata.

Most of the comments down below are stylistic issues, except: InlinerPass::run in this patch does not handle the full generality of InlineFunction. That is the most important thing that needs to be addressed.

include/llvm/Analysis/CGSCCPassManager.h
527 ↗(On Diff #70307)

Please document the RC and C parameters-- it isn't obvious what they represent. Are they SourceRC and SourceC?

Also, the return value needs to be documented.

lib/Analysis/CGSCCPassManager.cpp
144 ↗(On Diff #70307)

If RC and C are the source RefSCC and SCC, then please add an assert to make that obvious.

163 ↗(On Diff #70307)

This should be mentioned on the declaration of RefSCC itself.

182 ↗(On Diff #70307)

I'd just assert that the first element of NewRefSCCs is RC and then skip the first element in the loop.

Checking NewRC != RC in every iteration may let a bug slip through undetected.

393 ↗(On Diff #70307)

Add a Node::empty()?

421 ↗(On Diff #70307)

What is the extra check on Visited.insert(Callee).second getting you here? Why not just try to DeadCallTargets.erase(Callee) directly?

429 ↗(On Diff #70307)

I don't buy this reason for doing two walks. :)

I think you should be able to do a combined walk and break out if DeadRefTargets.empty() && DeadCallTargets.empty(), and predicate the DeadCallTargets on !DeadCallTargets.empty() and corresponding for DeadRefTargets. We'll have the overhead of repeated calls to SmallPtrSet::empty() but that sounds cheaper than walking all instructions twice.

However, walking twice looks simpler. If you'd rather keep it this way to make things readable, I can buy that. :)

454 ↗(On Diff #70307)

I'd s/DeadTargets/DeadEdges/

455 ↗(On Diff #70307)

I'd s/DemotedRefTargets/DemotedCallTargets (i.e. these were call targets that were demoted). Alternatively TargetsDemotedToRef (actually, this one sounds better).

lib/Transforms/IPO/Inliner.cpp
848 ↗(On Diff #70307)

Looks like this one wasn't addressed.

861 ↗(On Diff #70307)

As mentioned in D24226, InlineFunction can insert non-trivial call edges (by promoting ref edges), causing SCCs to be formed.

This revision now requires changes to proceed.Sep 29 2016, 11:35 PM
chandlerc updated this revision to Diff 80851.Dec 8 2016, 5:22 PM
chandlerc edited edge metadata.
chandlerc marked 5 inline comments as done.

Significant improvement of update logic to address core issues pointed out by
Sanjoy and Easwaran and revealed with further testing. New test cases added as
well. Extraneous changes to other code removed leaving this patch even more
focused on the inliner port.

With the latest update I've now addressed essentially all of the comments (that remained relevant). The patch is now, if anything, smaller and more focused too. =] Some responses to specific comments below.

include/llvm/Analysis/CGSCCPassManager.h
527 ↗(On Diff #70307)

This code is gone now.

lib/Analysis/CGSCCPassManager.cpp
140–141 ↗(On Diff #70307)

Consistency with other code, but all of this is gone now.

163 ↗(On Diff #70307)

I'll fold this into the larger LCG documentation update ongoing. Also note that this existing comment just moved around (and is no longer moved after the updates to this patch).

182 ↗(On Diff #70307)

Done but in a separate cleanup CL.

269 ↗(On Diff #70307)

I will do a separate cleanup CL to add some use of auto in relevant places, but this doesn't belong in this change.

393 ↗(On Diff #70307)

This code is gone so I've not added this. I can though if it comes back up.

421 ↗(On Diff #70307)

All this code is gone now.

Happy to revisit the walk approach in the other code though to either improve it or to change it to look like this walk looked. But not in the inliner patch. =]

429 ↗(On Diff #70307)

See above, all this is gone now. We can talk about whether the existing code should be simplified in a separate patch.

432 ↗(On Diff #70307)

I think the type helps readability here.

433 ↗(On Diff #70307)

And here.

434 ↗(On Diff #70307)

But will upt auto here in some follow-up patc.h

lib/Analysis/LazyCallGraph.cpp
28 ↗(On Diff #70307)

The insert was already there. If there is some value to emplace (not sure there is) then we can go through and systematically use it, but I'd rather not blindly mix and match as code happens to be touched.

lib/Transforms/IPO/Inliner.cpp
842 ↗(On Diff #70307)

It's not really documented at all, and it only actually holds for the *clone* API, not the InlineFunction API. See the discussion weath Easwaran below for details, but I'm no longer dealing with any of this.

848 ↗(On Diff #70307)

This is a great comment Easwaran, and in fact there are more issues. I was thinking of the clone API when I wrote this, InlineFunction does *way* too much to let this work.

There were actually other issues here as well, and I've switched entirely to a simpler approach where we add all of the edges in the call graph for all of the inlined functions, and then prune out the ones based on simplifications at the end of this. For iterating in the inliner itself, I've added a tiny bit of code *inside* InlineFunction where we were already walking the cloned instructions to compute the list of inlined calls even without a call graph, and used that here. Hopefully this works better -- I even already had a test case that was hitting this and just hadn't looked at it carefully enough.

851 ↗(On Diff #70307)

I've actually switched to getting the newly exposed callsite inside InlineFunction in the latest version (see above for why). It still walks the newly inlined instructions, but there really isn't any other way and there shouldn't be much cost here (we walk the inlined instructions several times inside InlineFunction from what I can see).

861 ↗(On Diff #70307)

Yep. I've added the test cases for this and it is now handled much better.

863 ↗(On Diff #70307)

I'm not sure what the concern is here? This is even simpler now, but either way I think the Inliner is somewhat unusual in that it is *mutating* the call graph. Passes which do that will have to do work to correctly update SCCs.

894 ↗(On Diff #70307)

Uh, quite. =D Fixed!

sanjoy requested changes to this revision.Dec 11 2016, 7:27 PM
sanjoy edited edge metadata.

Overall this looks great! Minor comments inline.

include/llvm/Transforms/IPO/Inliner.h
96 ↗(On Diff #80851)

The explicit llvm:: qualification is not needed.

lib/Transforms/IPO/Inliner.cpp
667 ↗(On Diff #80851)

Line looks too long, clang-format?

781 ↗(On Diff #80851)

I'd mildly prefer spelling out ProfileSummaryInfo here, since the type is not obvious from context.

799 ↗(On Diff #80851)

s/llvm::getInlineCost/getInlineCost/

857 ↗(On Diff #80851)

I'm not confident that the calls sites you've put in Calls here and above will stay valid across the future iterations of this loop. That is, say we're looking at the main SCC in:

void f() { return false; }
void g() { }
void main() {
  if (f() /* CS0 */)
    g() /* CS1 */;
}

Before entering the loop, we'll put CS0 and CS1 in Calls. Once we inline through CS0, it is reasonable (though I don't know if it does this today) for InlineFunction to want to simplify away and delete CS1, leaving a dangling pointer in Calls.

Is there a reason why that ^ won't happen?

865 ↗(On Diff #80851)

This bit needs a unit test -- nothing in make check failed when I commented this out.

895 ↗(On Diff #80851)

Two periods intentional?

897 ↗(On Diff #80851)

Why is DebugLogging true by default?

This revision now requires changes to proceed.Dec 11 2016, 7:27 PM

Thanks! Mostly minor comments below.

include/llvm/Transforms/IPO/Inliner.h
1 ↗(On Diff #70307)

The header above should read Inliner.h

include/llvm/Transforms/Utils/Cloning.h
200 ↗(On Diff #80851)

typo: of -> if

lib/Transforms/IPO/Inliner.cpp
857 ↗(On Diff #80851)

When Calls is initially populated, only callees for which isDeclaration is false is added. That is not the case with the callees in IFI.InlinedCallSites. We check and bail out if Callee->isDeclaration() is true in many places (getInlineCost for example), so this probably doesn't break anything now, but it's preferable to filter out callees without their body when we augment Calls above.

881 ↗(On Diff #80851)

This might add edges that don't exist anymore in the IR: assuming a->b->c originally, and b first gets inlined and then c gets inlined into 'a' this would still add a->c edge.

lib/Transforms/Utils/InlineFunction.cpp
1645 ↗(On Diff #80851)

Partial inlining implementation creates IFI with an empty CG and this code unnecessarily adds the inlined callsites when invoked in the context of partial inlining. Not a big deal, but may be you can change InlinedCallSites to a pointer and guard the below code based on that?

sanjoy added inline comments.Dec 12 2016, 5:23 PM
lib/Transforms/IPO/Inliner.cpp
857 ↗(On Diff #80851)

If this is a reply to what I said above, then I'm not sure how isDeclaration is relevant to the problem I'm trying to point out.

I'm trying to sketch a scenario where InlineFunction ends up excising a call site to a function (that has a body) after we've put a pointer to the CallInst or InvokeInst for the said call site in Calls therefore ending up with a dangling pointer.

It is possible that InlineFunction does not do such simplifications (simplifications in blocks that logically belonged to the caller, that is); but if that is the case then that invariant should be documented and tested.

eraman added inline comments.Dec 12 2016, 5:31 PM
lib/Transforms/IPO/Inliner.cpp
857 ↗(On Diff #80851)

Sorry, that wasn't a reply to your comment. Mine was a separate unrelated comment on line 857 above.

chandlerc updated this revision to Diff 81552.Dec 15 2016, 2:11 AM
chandlerc edited edge metadata.
chandlerc marked 11 inline comments as done.

Update to address review comments (and rebase).

Thanks so much for the review! Updated patch and some responses below.

lib/Transforms/IPO/Inliner.cpp
857 ↗(On Diff #80851)

Nope, no reason other than "it doesn't do that that at the moment". At least, that I'm aware of...

I may be missing something, but a casual look through the old inliner's code makes me think it has the same fundamental assumption. Perhaps as a consequence, the current *inliner* does DCE on call instructions rather than InlineFunction doing this....

I think we should just document this (in a separate patch probably)...

857 ↗(On Diff #80851)

Sanjoy, thanks for the thought. Sadly we rely on this already and just don't have any testing for it. I've added a test case based on your idea with lots of dead code to try and make sure we don't get this wrong at some point in the future. It also showcases exactly why we *should* do some DCE in the caller when inlining because we leave trivially dead code around.

I've also added a comment while I'm here so we don't forget. It's a small change.

Good thought Easwaran, totally agree about inserting the minimal number of calls. I've adjusted the code accordingly.

865 ↗(On Diff #80851)

Yea, deleting the analogous line above also doesn't cause anything to fail.

:: sigh ::

865 ↗(On Diff #80851)

I've had to add numerous tests to cover all of the "last callsite" heuristics under this inliner. They include crafty test cases that cover this. I'm not really trying to make them work with the original inliner though as the approach used there is fundamentally harder to test.

881 ↗(On Diff #80851)

Yes, this is an over approximation. But we then clean all of these (dead) edges up with the below CG update routine.

897 ↗(On Diff #80851)

.... Because I didn't thread an explicit DebugLogging flag through this code (and I don't want to), and I haven't taught the CGSCC update logic code to support a side channel like DEBUG_PASS yet, so I hard coded it.

Anyways, removed. The right way to make this stuff debuggable is to support DEBUG_PASS, but that's a separate issue and nothing to do with this patch.

lib/Transforms/Utils/InlineFunction.cpp
1645 ↗(On Diff #80851)

I don't think this is really a big deal, and I'd rather not deal with ownership issues of a pointer....

The partial inlining code isn't really heavily used, and it remains correct. In fact, I suspect that if we actually wanted to keep the partial inlining code (but I don't think we do long-term), we'd want to use this exact data structure to iterate the same way we do in the inliner.

So if its OK, I'd rather keep this code simple and pay a (very small I think) overhead of building this when it isn't used in that context.

sanjoy accepted this revision.Dec 15 2016, 10:22 PM
sanjoy edited edge metadata.

lgtm with minor nits

lib/Transforms/IPO/Inliner.cpp
783 ↗(On Diff #81552)

Super minor, but it seems natural to have a SCC::getModule(). Maybe in a later change?

808 ↗(On Diff #81552)

Why not SmallVector<LazyCallGraph::Node *, 16> Nodes(InitialC.begin(), InitialC.end());?

853 ↗(On Diff #81552)

How about s/Calls/CallsInCurrentNode? Given the nested loops, it is a little hard to keep track of whether Calls has all the call sites in the SCC or just the ones in the current node.

868 ↗(On Diff #81552)

(This is minor, please don't hold this patch over resolving this. We can add this (or not) in a later change.)

I'd rather add to Calls if either CS is indirect OR is direct to a function with a body, since a later inline can make that call direct.

That way, we get this case:

fnptr_t f() { return printf; }
void g(fnptr_t val) {
  val("");
}
void main() {
  fnptr_t val = f();
  g(val);
}

If we're lucky enough to first inline g and then f, we want to consider inlining the call to val that came in from g (that is now directly, after inlining f).

Of course, we have to be somewhat smarter to always get this case, but the probability is higher if we reconsider indirect calls.

915 ↗(On Diff #81552)

We'll only ever try to insert a function once into DeadFunctions, right? Since if we're trying to insert InlinedCallee into DeadFunctions then it has no uses, and we could not possible try to inline through a call to it again later, so it should never appear again in InlinedCallees.

If that ^ is correct, perhaps we can add an assert here (and we could even use a SmallVector instead of SmallPtrSet, but I'd especially like to have the assert in that case).

This revision is now accepted and ready to land.Dec 15 2016, 10:22 PM
chandlerc updated this revision to Diff 81725.Dec 16 2016, 2:21 AM
chandlerc edited edge metadata.

Updates from latest review by Sanjoy.

Patch updated and responses below. Some discussion here so will let you give a final OK before I land. Also would like to get an OK from Easwaran if possible although he seemed pretty happy on the last iteration.

lib/Transforms/IPO/Inliner.cpp
865 ↗(On Diff #80851)

Yea, deleting the analogous line above also doesn't cause anything to fail.

:: sigh ::

783 ↗(On Diff #81552)

Will do.

808 ↗(On Diff #81552)

Because it doesn't compile -- we need pointers not references.

Eventially with a range constructor and an adaptor this will work, but today the current code seemed like the least typing. =[

853 ↗(On Diff #81552)

I'm not sure about this... Seems a long name when we don't have two of them. I've cleaned up the comment a bit.

Also, the per-node loop is the outer loop. The only reason this isn't scoped to the per-node region is to avoid re-allocation.

868 ↗(On Diff #81552)

Yeah, this is really similar to the devirtualization cases already. I think I'll do this in a follow-up to make it more focused and get more testing (and because I have a few other things I'd like to get moving).

Generally, I want to pursue a strategy of optimistic iteration here rather than exhaustive. Because we're going to end up with an iteration construct around all of this to handle cross-pass cases anyways, and we can rely on that handling hard to spot cases. The focus here should be to quickly find as my opportunities as we can without too much waste.

For example, the current inliner does another scan of the entire function every time anything gets inlined. This basically doubles the cost as we do a full run of computing inline cost and not inlining anything. Then we pop out to the devirt iteration and in some cases re-run it ... again....

I have a good idea of how to handle the majority of easy cases here though by looking at the users of inlined calls when they are inlined. But it'll require a bit of work and threading things through as well as more test cases so i'd like to circle back to it.

915 ↗(On Diff #81552)

All of this stemmed from an intermediate state where I tried to actually delete the function here.

Now that it is fully deferred, I can directly put it in the queue above when we discover it. This avoids the duplicated predicate and this extra loop, etc.

I've switched to a vector and added the assert because yea, we can't inline the same call twice and have it become dead both times. =] At least, not unless there is some really cool reincarnation going on here...

LGTM!

lib/Transforms/IPO/Inliner.cpp
808 ↗(On Diff #81552)

Ah, okay; I missed the &.

This revision was automatically updated to reflect the committed changes.