This is an archive of the discontinued LLVM Phabricator instance.

[PM] Introduce a devirtualization iteration layer for the new PM.
ClosedPublic

Authored by chandlerc on Aug 3 2016, 3:57 AM.

Details

Summary

This is an orthogonal and separated layer instead of being embedded
inside the pass manager. While it adds a small amount of complexity, it
is fairly minimal and the composability and control seems worth the
cost.

The logic for this ends up being nicely isolated and targeted. It should
be easy to experiment with different iteration strategies wrapped around
the CGSCC bottom-up walk using this kind of facility.

The mechanism used to track devirtualization is the simplest one I came
up with. I think it handles most of the cases the existing iteration
machinery handles, but I haven't done a *very* in depth analysis. It
does however match the basic intended semantics, and we can tweak or
tune its exact behavior incrementally as necessary. One thing that we
may want to revisit is freshly building the value handle set on each
iteration. While I don't think this will be a significant cost (it is
strictly fewer value handles but more churn of value handes than the old
call graph), it is conceivable that we'll want a somewhat more clever
tracking mechanism. My hope is to layer that on as a follow up patch
with data supporting any implementation complexity it adds.

Note, I haven't written a lot of tests for this yet, but it already fixes the
FIXME in the existing test that motivates it. I'll write more detailed tests
for it, but wanted to go ahead and post the core idea behind this.

Depends on D21462

Diff Detail

Repository
rL LLVM

Event Timeline

chandlerc updated this revision to Diff 66639.Aug 3 2016, 3:57 AM
chandlerc retitled this revision from to [PM] Introduce a devirtualization iteration layer for the new PM..
chandlerc updated this object.
chandlerc added a subscriber: llvm-commits.
silvas added a subscriber: silvas.Aug 3 2016, 1:12 PM
silvas added inline comments.
include/llvm/Analysis/CGSCCPassManager.h
528 ↗(On Diff #66639)

If you update C in this assignment, where do you call AM.invalidate(...) on the original C? (this is needed to downward invalidate any function analyses that might depend on SCC analyses on the original C). Is this something you expect the SCC transformation being run by DevirtIteratingWrapper itself to do? It would be nice to see a unittest exhibiting this.

chandlerc added inline comments.Aug 5 2016, 9:29 PM
include/llvm/Analysis/CGSCCPassManager.h
528 ↗(On Diff #66639)

The pass is required to preserve the call graph (that's part of passing it by argument) and if it updates it to reflect that in the UpdateResult and invalidate the analysis manager.

But all of this should really be spelled out in the prior patch that switches to this API, I'll update that patch with better documentation of this fact.

I certainly want to add verification of these properties as well, but I'd like to do that in a follow-up patch rather than put more into the existing patches out for review. (There is some verification already, but with some API changes and work I think we can get more verification into place.)

chandlerc updated this revision to Diff 67060.Aug 5 2016, 9:32 PM

Update to use a new name (stemming from a conversation with Mehdi on IRC) and
include a more interesting and thorough test case that exhausts the iteration
space.

Also simplify some naming and other bits of the code. Nothing substantive
changed here though.

Should be in a good state at this point.

silvas added inline comments.Aug 5 2016, 9:34 PM
include/llvm/Analysis/CGSCCPassManager.h
528 ↗(On Diff #66639)

But all of this should really be spelled out in the prior patch that switches to this API, I'll update that patch with better documentation of this fact.

Please add a test case as well.

silvas added inline comments.Aug 5 2016, 9:53 PM
include/llvm/Analysis/CGSCCPassManager.h
475 ↗(On Diff #67060)

Typo: MaxRepeatitions -> MaxRepetitions

557 ↗(On Diff #67060)

Can you just avoid keeping WeakVH's to intrinsics? The langref says you can't take their address. Therefore, a devirtualized call will never be to an intrinsic.
http://llvm.org/docs/LangRef.html#intrinsic-functions

chandlerc added inline comments.Aug 5 2016, 10:20 PM
include/llvm/Analysis/CGSCCPassManager.h
528 ↗(On Diff #67060)

I'm not sure what you mean...

I think there are already test cases for invalidation, but I can double check. But that shouldn't be in *this* patch, that should be in the big CG update patch. I haven't gone back to that one to update it substantially (this iteration utility was requested and I wanted to get it out for review) but I'm happy to add test cases there, or add more testing in subsequent patches for that stack as problems are discovered.

557 ↗(On Diff #67060)

It's not that we could create a VH to a call to an instrinsic.

Above, we create a VH to an indirect call. That indirect call gets devirtualized into a direct call. At some point, some pass may turn that direct call into a call to an intrinsic, and if it does that this code needs to defensively prepare for it.

Sadly, there is no pass in tree that does this, but without this logic the pass would crash if anything ever conspired to trigger this.

silvas added inline comments.Aug 6 2016, 5:28 PM
include/llvm/Analysis/CGSCCPassManager.h
528 ↗(On Diff #67060)

I think there are already test cases for invalidation, but I can double check. But that shouldn't be in *this* patch, that should be in the big CG update patch.

Totally agreed. Sorry if it wan't clear from context, but that's what I meant.

I haven't gone back to that one to update it substantially (this iteration utility was requested and I wanted to get it out for review) but I'm happy to add test cases there, or add more testing in subsequent patches for that stack as problems are discovered.

That sounds reasonable. I'll add any comments there once the discussion reopens there.

557 ↗(On Diff #67060)

I'm not sure I understand. In the situation you described regarding the intrinsic, we did in fact devirtualize something. So why are we avoiding setting Devirt = true in that case?
Is it because that doesn't actually add extra call edges to the lazy call graph? If that's the case, you need the same treatment for calls to external function declarations and calls to inlineasm too, right?

You may want to add some more comments about this (and maybe a test case (unittest?) if it isn't too verbose).

chandlerc marked 3 inline comments as done.Sep 4 2016, 5:04 PM
chandlerc added inline comments.
include/llvm/Analysis/CGSCCPassManager.h
557 ↗(On Diff #67060)

Sorry, I explained this really poorly.

The idea I had was that if we end up with a call to an intrinsic, there is no longer any call graph or IPO refinement to motivate iteration: scalar passes can already fully reason about the intrinsic.

I've added a bunch of comments and restructured the code to try to make this more clear.

I'm not sure I can craft a test case that does this, but I'll take a stab at it...

chandlerc updated this revision to Diff 70301.Sep 4 2016, 5:04 PM

Rebase and address review comments.

chandlerc marked 4 inline comments as done.Sep 4 2016, 7:06 PM
chandlerc added inline comments.
include/llvm/Analysis/CGSCCPassManager.h
557 ↗(On Diff #67060)

Actually, got a test for this and it convinced me we shouldn't be special casing intrinsics. We can actually do very useful function attrs work here even when "devirtualizing" to an intrinsic that won't be inlined.

Even better, that test pointed out that this was missing a heuristic from the legacy PM's version: we don't track the change of count and have that also trigger iteration.

I've added the heuristic, removed any special casing of intrinsics, and have a nice test case that covers this logic.

chandlerc updated this revision to Diff 70302.Sep 4 2016, 7:07 PM
chandlerc marked an inline comment as done.

Add a test case that covers devirtualization that actually turns into an
intrinsic.

Also improved the testing to be more precise and robust.

Added support for iterating even if we turn an indirect call into a direct call
to an intrinsic and added support for the heuristic of changing the call count
that the old PM used.

chandlerc updated this revision to Diff 82529.Dec 27 2016, 1:27 AM

Rebase and ping.

I've also now updated one of the devirtualization test cases for the old
inliner to run with this combined with the inliner in the new pass manager and
it passes now.

There is another devirt test case for the old pass manager that this can't
handle though, and I'm pondering the best way to handle that scenario.

mehdi_amini accepted this revision.Dec 27 2016, 12:13 PM
mehdi_amini edited edge metadata.

LGTM.

include/llvm/Analysis/CGSCCPassManager.h
637 ↗(On Diff #82529)

Wasn't it a limitation from MSVC2013 that is lifted now that we don't support it anymore?

703 ↗(On Diff #82529)

What about something more explicit than i? IterationNum or something like that.
(Two reasons: 1) this is a long loop body and 2) i is used in the body (and also there is another nested loop also using an index i)

733 ↗(On Diff #82529)

We could have some debug logging about "Found a devirtualized call from CS->getParent()->getParent()->getName() to F->getName()".

736 ↗(On Diff #82529)

On the first iteration, it is weird to check all the value handles to find an impossible devirtualized call, but I don't have a better suggestion to reorganize the code that does not make it equally annoying somewhere else...

test/Other/cgscc-devirt-iteration.ll
36 ↗(On Diff #82529)

s/repeatitions/repetitions/ (same in many other places in this patch)

43 ↗(On Diff #82529)

readonce? Do you mean readnone?

60 ↗(On Diff #82529)

If I follow correctly: this is the devirtualization that will trigger the first restart.

82 ↗(On Diff #82529)

And this is devirtualized during the second iteration, enabling the third one to deduce the readnone.

100 ↗(On Diff #82529)

Should you check that BEFORE contains the call to llvm.memcpy and not just devirtualize regularly to memcpy?

test/Other/cgscc-observe-devirt.ll
5 ↗(On Diff #82529)

The comment here suggested that this RUN line was useful outside of the repetition?

This revision is now accepted and ready to land.Dec 27 2016, 12:13 PM
chandlerc marked 8 inline comments as done.Dec 28 2016, 3:16 AM

Thanks! Submitting with fixes for essentially all of this. There is one question below that I'm happy to continue discussing and if any of the fixes didn't match what you were expecting just lemme know!

include/llvm/Analysis/CGSCCPassManager.h
736 ↗(On Diff #82529)

It doesn't look impossible to me? We have run the pass once at this point, and had handles held across it. Let me know if I'm missing something here though, happy to look for a better way to structure the code.

test/Other/cgscc-devirt-iteration.ll
43 ↗(On Diff #82529)

Er, readonly. I've added a sentence to clarify. We first deduce readonly, and then readnone.

No idea where 'readonce' came from. Fixed that.

60 ↗(On Diff #82529)

Yes, and commented.

82 ↗(On Diff #82529)

Correct, and again, comments added for a future reader.

100 ↗(On Diff #82529)

Not just before, but all of them. Added.

test/Other/cgscc-observe-devirt.ll
5 ↗(On Diff #82529)

Good point. I've restored it.

This revision was automatically updated to reflect the committed changes.
chandlerc marked 4 inline comments as done.
mehdi_amini added inline comments.Dec 28 2016, 4:30 PM
include/llvm/Analysis/CGSCCPassManager.h
736 ↗(On Diff #82529)

Sorry, I figured after writing this comment but somehow didn't delete it (likely clicked the X instead, which just hide it).

It's all good :)