This is an archive of the discontinued LLVM Phabricator instance.

[PM/LoopUnswitch] Support partial trivial unswitching.
ClosedPublic

Authored by chandlerc on May 10 2018, 10:38 AM.

Details

Summary

The idea of partial unswitching is to take a *part* of a branch's
condition that is loop invariant and just unswitching that part. This
primarily makes sense with i1 conditions of branches as opposed to
switches. When dealing with i1 conditions, we can easily extract loop
invariant inputs to a a branch and unswitch them to test them entirely
outside the loop.

As part of this, we now create much more significant cruft in the loop
body, so this relies on adding cleanup passes to the loop pipeline and
revisiting unswitched loops to do that cleanup before continuing to
process them.

This already appears to be more powerful at unswitching than the old
loop unswitch pass, and so I'd appreciate pretty careful review in case
I'm just missing some correctness checks. The LIV-loop-condition test
case is not unswitched by the old unswitch pass, but is with this pass.

Depends on D47408.

Diff Detail

Event Timeline

chandlerc created this revision.May 10 2018, 10:38 AM

What about finally renaming SimpleLoopUnswitch into something "less simple"? :)

OMG yes I want to rename it. I mostly have been avoiding this because i
have too many patches stacked on top of one another.

That said, if you have a better name, I'm happy to prioritize fixnig the
name. No arguments about it needing to happen. I also just don't really
know what to call it....

sanjoy added inline comments.May 10 2018, 4:32 PM
llvm/include/llvm/Analysis/Utils/Local.h
144

Why not s/provided vector/DeadInsts?

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
76

Nit: wrap?

322

Stray line?

333

Would be nice to be a bit more specific "For partial trivial unswitching of a condition".

398

Might be nice to assert here that if LoopExitSuccIdx == 0 then cast<Instruction>(BI.getCondition())->getOpcode() is Or etc.

llvm/test/Transforms/SimpleLoopUnswitch/trivial-unswitch.ll
511

I think we need to test a few more cases here:

  • Branch on chain of ands
  • Branch on mix of ands and ors and possibly other operations

Actually now that I think of it, I wonder if looking back through the condition expression tree to find all loop invariant values is necessary -- if we see these kinds of cases (loop_varying & loop_invariant0) & loop_invariant1) then perhaps we should teach LICM et. al. to reassociate and hoist the loop_invariant0 & loop_invariant1 bit instead of worrying about them in loop unswitch?

chandlerc updated this revision to Diff 148363.May 24 2018, 2:29 AM
chandlerc marked 5 inline comments as done.

Update with fixes from code review.

Thanks for the review!

llvm/test/Transforms/SimpleLoopUnswitch/trivial-unswitch.ll
511

So, we test branching on an and above. There is no different logic in handling N and instructions vs. N or instructions, so I didn't add that test as it didn't seem to add value compared to these two.

I can definitely mix some non or operations into this chain. Would that be enough coverage? Just trying to understand what the goal is of the added tests.

While I was adding these tests, it actually exposed a weakness in the instruction simplification we do here. Fixing it proved... a bit tricky. I've added somewhat complex logic so that when we need to simplify multiple different invariants we can correctly simplify defs before uses. However, now I need to add a better test to cover the nasty case for simplifying -- when we need to simplify around a diamond in the CFG. I'm going ahead and uploading the patch while I craft a test case for that so you can yell if I'm making this way harder than it needs to be.

sanjoy added inline comments.May 24 2018, 11:47 AM
llvm/include/llvm/Analysis/Utils/Local.h
144

Wasn't done?

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
249

"just walk"

269

Maybe this can go in a different "loop instruction simplify" pass? Just above you say "the only real goal of this is to do very basic cleanup of unswitched conditions. We don't need powerful tools here. A proper pass can be scheduled to do more comprehensive cleanup." :)

I'm worried that doing all of this in a single pass makes it less granular and harder to test -- we only get to the see the IR after all of this cross-block simplification has happened, which means it is harder to figure out what exactly loop unswitching did.

llvm/lib/Transforms/Utils/Local.cpp
446

Can we end up doing a double free here? Like

%a = add %p, 1
%b = add %a, 1

and DeadInsts is [%a,%b] -- we'll add %a back to the worklist after visiting %b and then eraseFromParent %a twice.

We could add a precondition here to avoid this, but it seems better to just handle this case.

452

I think OpU.set(nullptr) is more readable here. Otherwise this reads like we're just modifying a local variable.

llvm/test/Transforms/SimpleLoopUnswitch/trivial-unswitch.ll
511

Just trying to understand what the goal is of the added tests.

Mostly just making sure that the logic to "gather" the various loop invariant (transitive) operands keeps working as intended.

fedor.sergeev added inline comments.May 24 2018, 3:43 PM
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
269

Second that.
With our current LoopUnswitch use we tend to run massive cleanup after it, and I'm sure we will continue doing that with the new LoopUnswitch. Having a small cleanup in LoopUnswitch itself will not do a full job, and as Sanjoy rightfully points out it might make IR investigations a bit more complicated.

chandlerc marked 8 inline comments as done.May 24 2018, 4:17 PM

Ok, after some discussion, I'm pretty convinced we just need to stop simplifying here.

I was hoping to do at least *some* simplification, but there really is no good middle ground and so I think its better to *completely* nuke this.

Unfortunately that is going to require ... some more work.

  1. I think the current pass pipeline doesn't have a good cleanup pass scheduled. I don't think we even have the *right* cleanup pass here, but I'll poke around at some of the nascent ones. So first step will be that I need to go build / finish / polish a good instruction cleanup loop pass. And then I'll need to add it to the main pipelines. Prepare for patches to that effect.
  1. I'll then need to do the removal of all simplifying here in a separate patch and update testing to reflect it. Specifically, when we *only* do trivial unswitching, we were falling through to try non-trivial unswitching immediately. We will have to not do that, and instead re-add the now mutated loop to the pass manager to re-visit (much like we do with non-trivial unswitching) and rely on it then iterating for us. This will ... not be a somewhat surprisingly significant behavior change. I think its good, but its worth noting. This will essentially make loop-unswitch a fixed-point pass in the pass pipeline, but it will do so using the pass manager. I'm not aware of any other passes that currently do this. Anyways, brave new world. The pipeline will now be ... truly dynamic.
  1. Then I can return to this patch and land it w/o added simplification.

Make sense?

I've updated the code here to reflect what I think it will end up looking like. I can't update the tests yet, will do that when I get back to #3.

I'll also probably pull the recursive deletion API into one of the patches in #1 -- would you be OK with that going in on its own? I don't have a good way of testing it that way though. Otherwise, I'll let it float around and try to figure out when/if I need it.

-Chandler

llvm/include/llvm/Analysis/Utils/Local.h
144

Oh, sorry. I changed it in the last paragraph, but not the top one.

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
269

After a bunch of discussion, there really isn't even a good middle ground here.

I think we have to rip *all* of the cleanup out. This is going to make things harder to do as it changes how the current pass behaves. See my top-level response.

llvm/lib/Transforms/Utils/Local.cpp
446

This is already in the precondition documentation... Specifically that the instructions in the vector must have no uses.

I've added an assert to ensure this holds. I'm happy to widen the contract if you think users would benefit from it though.

chandlerc updated this revision to Diff 148507.May 24 2018, 4:59 PM
chandlerc marked 2 inline comments as done.

Updated patch just to show where the code ended up.

and instead re-add the now mutated loop to the pass manager to re-visit (much like we do with non-trivial unswitching) and rely on it then iterating for us. This will ... not be a somewhat surprisingly significant behavior change. I think its good, but its worth noting. This will essentially make loop-unswitch a fixed-point pass in the pass pipeline, but it will do so using the pass manager. I'm not aware of any other passes that currently do this. Anyways, brave new world. The pipeline will now be ... truly dynamic.

That's really nice :-)

The pipeline will now be ... truly dynamic.

Hoorray! :)

Make sense?

sounds good to me

chandlerc updated this revision to Diff 148741.May 26 2018, 4:55 PM
chandlerc edited the summary of this revision. (Show Details)
chandlerc removed subscribers: hfinkel, brzycki, junbuml and 2 others.

Update to rebase on the patch that adds loop-instsimplify and loop-simplifycfg
to the loop pass pipeline as cleanup passes to minimize the cleanup necessary
in the unswitch pass itself.

Also includes a bugfix found when testing this on the llvm test-suite, and new
test cases to cover this area of code.

Not exactly related to this set of changes...
I just have discovered that old LoopUnswitch performs SCEV cache invalidation when it does nontrivial updates to CFG
(forgetLoop in LoopUnswitch::unswitchNontrivialCondition) .

Perhaps makes sense to do the same in new loop unswitch as well?

Not exactly related to this set of changes...
I just have discovered that old LoopUnswitch performs SCEV cache invalidation when it does nontrivial updates to CFG
(forgetLoop in LoopUnswitch::unswitchNontrivialCondition) .

Perhaps makes sense to do the same in new loop unswitch as well?

Yes, will attack that in a separate patch unless someone gets there first.

chandlerc updated this revision to Diff 149026.May 29 2018, 7:53 PM

Rebase now that all the loop-instsimplify stuff has landed.

This should be ready for another round of review?

chandlerc updated this revision to Diff 149599.Jun 1 2018, 7:02 PM

Rebase and ping.

chandlerc updated this revision to Diff 150040.Jun 5 2018, 2:17 PM

Rebase and ping yet again. Would really like to get a review here...

Crickets.....

sanjoy accepted this revision.Jun 10 2018, 8:43 PM

This lgtm.

Have you considered making full unswitching a special case for partial unswitching? It seems like we could "partially unswitch" br %loop_invariant, label %t, label %f into br true/false, label %t, label %f and have a beefed up LoopSimplifyCFG clean this up?

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
84

Assert that Root is loop invariant? That justifies why we're not adding it to Invariants.

This revision is now accepted and ready to land.Jun 10 2018, 8:43 PM
chandlerc marked an inline comment as done.Jun 20 2018, 11:59 AM

This lgtm.

Have you considered making full unswitching a special case for partial unswitching? It seems like we could "partially unswitch" br %loop_invariant, label %t, label %f into br true/false, label %t, label %f and have a beefed up LoopSimplifyCFG clean this up?

We'd have to make LoopSimplifyCFG substantially more powerful, and all of the complex CFG-fixing logic here would need to be ported to it.

We can do that, but I'm not sure how valuable it is in practice. The really tricky thing is that we'd have to have LoopSimplifyCFG trigger the iteration when it re-shapes the loop nest. That seems a bit more clear here at the moment... but happy to chat about it further and if it makes sense, we can move all the CFG rewriting logic to LoopSimplifyCFG and make that change.

This revision was automatically updated to reflect the committed changes.
reames added a subscriber: reames.Jun 20 2018, 3:53 PM

Sorry for being late to the party, but a couple of optional post commit style comments for possible follow up. Nothing major, just ideas on how to share code and reduce a possible ordering sensitivity.

llvm/trunk/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
81 ↗(On Diff #152128)

Hm, with some slight generalization, this could be a generally useful utility elsewhere. This is basically handling reduction operations spelled with multiple instructions. Seems like that's a common enough pattern (say, instcombine?) to be worth drawing out into a generic visitReductionOperands(Root, FilterFunc)?

248 ↗(On Diff #152128)

I believe you've got an pass ordering sensitivity here which isn't present in the old pass. The old pass deliberately uses "makeLoopInvariant" which is like isLoopInvariant, but it will hoist trivially hoistable instructions as well. By not doing this, the new pass will be very sensitive to having ideal (i.e. fully LICMed) IR. This may be problematic if any other loop pass is run between LICM and the new unswitch.

349 ↗(On Diff #152128)

Hm, possibly out of scope, but I'll mention it anyway.

This ends up producing a reduce chain rather than a tree. We're going to end up trying to reassociate that later. Is it worthwhile having an interface which produces an appropriate tree to start with? Maybe this is an API that make sense on IR Builder? Something like CreateOr(ArrayRef<Value> Ops)?

377 ↗(On Diff #152128)

Hm, we have this pattern all over. Might be time for a getBoolean(bool) on ConstantInt?

Sorry for being late to the party, but a couple of optional post commit style comments for possible follow up. Nothing major, just ideas on how to share code and reduce a possible ordering sensitivity.

No problem. Mostly following up here on the more discussion-y points.

llvm/trunk/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
248 ↗(On Diff #152128)

Yes, this has been true throughout the new pass. IMO, it makes the pass quite a bit simpler as we get the invariant that if we didn't unswitch something, we didn't mutate the IR. And that matters a lot more in the new PM with caching of analyses.

I'd rather fix the pass pipelines to be much more careful about running LICM in the appropriate places. So far we haven't hit any issues, but it is definitely something that may come up in the future.

Does that seem reasonable?

349 ↗(On Diff #152128)

I think we canonicalize to chains pretty commonly to make analysis easier anyways, and rely on later passes forming trees where beneficial (most places).