This is an archive of the discontinued LLVM Phabricator instance.

[PM/unswitch] Teach SimpleLoopUnswitch to do non-trivial unswitching, making it no longer even remotely simple.
ClosedPublic

Authored by chandlerc on Jun 14 2017, 3:25 AM.

Details

Summary

The pass will now be more of a "full loop unswitching" pass rather than
anything substantively simpler than any other approach. I plan to rename
it accordingly once the dust settles.

The key ideas of the new loop unswitcher are carried over for
non-trivial unswitching:

  1. Fully unswitch a branch or switch instruction from inside of a loop to outside of it.
  2. Update the CFG and IR. This avoids needing to "remember" the unswitched branches as well as avoiding excessively cloning and reliance on complex parts of simplify-cfg to cleanup the cfg.
  3. Update the analyses rather than just blowing them away or relying on something else updating them.

Sadly, #3 is somewhat compromised here as the dominator tree updates
were too complex for me to want to reason about. I may take another stab
at it, but it may be best to wait for something like the newly proposed
dynamic dominators. However, we do adhere to #3 w.r.t. LoopInfo.

This approach also an some important principles specific to non-trivial
unswitching: not *all* of the loop will be duplicated when unswitching.
This fact allows us to compute the cost in terms of how much *duplicate*
code is inserted rather than just on raw size. Unswitching conditions
which essentialy partition loops will work regardless of how large the
loop is in reality.

Unfortunately, there is a *lot* of code to implement all of this. And
I'm still not really happy with all of it. I think there is more
factoring and cleanup to be done here, but I wanted to at least get it
out where others can see and comment on it.

Some high level outstanding things that I'd like to defer to subsequent
patches:

  • We could be much more clever about not cloning things that will be deleted. In fact, we should be able to delete *nothing* and do a minimal number of clones.
  • There are many more interesting selection criteria for which branch to unswitch that we might want to look at. One that I'm interested in particularly are a set of conditions which all exit the loop and which can be merged into a single unswitched test of them.

Anyways, even with somewhat rough code, hopefuly folks can start chewing
on this and giving feedback.

Depends on D34049.

Diff Detail

Repository
rL LLVM

Event Timeline

chandlerc created this revision.Jun 14 2017, 3:25 AM

Drive-by review of part of the first file. This is... a lot of code.

lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
91 ↗(On Diff #102518)

s/dod/do/ ?

771–773 ↗(On Diff #102518)

What is the typical depth for these things? i.e. is there a risk that a sufficiently complex loop within a function will cause this to fail spectacularly? Will it be more complex to actually do a DFS using a stack and iteratively going through the blocks?

1004 ↗(On Diff #102518)

The name seems to be inconsistent with the behaviour -- i.e. it's not really recursively calling itself. Maybe remove the 'recursively' in the name? Also, why isn't this function static unlike the others?

sanjoy edited edge metadata.Jun 25 2017, 4:37 PM

I did a quick scan and have a couple of low-level comments inline.

Do you think it is possible to split out the LoopInfo updating logic to a later patch, perhaps by using something more brute force initially? That will make later rounds of review a lot easier.

Finally, maybe I missed something, but I did not see the code handle something like this:

loop:
  br i1 %us_cond, label %left, label %right

left:
  br i1 %c, label %right, label %be

right:
  br i1 %c, label %left, label %be

be:
  br label %loop

where unswitching on %us_cond will create two subloops "out of thin air" by destroying unstructured control flow.

lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
837 ↗(On Diff #102518)

I'd call this ClonedLoopBlocksSet to make it clear that this holds BasicBlock *s not Loop *s. Otherwise later ClonedLoopSet.count(ClonedBB) reads weird. Ditto for PossibleClonedLoopSet.

1110 ↗(On Diff #102518)

Can this be common'ed up with the logic in buildClonedLoops?

1165 ↗(On Diff #102518)

I'd suggest calling this LoopBlocksSet.

1398 ↗(On Diff #102518)

The commit message says you don't update the dom tree?

1461 ↗(On Diff #102518)

Again, I think you can just directly ask the domtree for this.

1606 ↗(On Diff #102518)

s/spurrious/spurious/

1740 ↗(On Diff #102518)

static?

1828 ↗(On Diff #102518)

I'd be a bit more comfortable if you did int Const = 0, and later BBCostMap[BB] = Cost;. Right now it isn't obvious that Cost += TTI.getUserCost(&I); is modifying a densemap value.

1851 ↗(On Diff #102518)

I may be missing something, but I only see (1). Also, a one liner on what "unswitching cost" and "cost after unswitching" mean will be helpful.

1871 ↗(On Diff #102518)

Right now this is always true, right (since you're only unswitching br instructions that have different successors)?

1879 ↗(On Diff #102518)

Isn't this the same as edge domination?

1901 ↗(On Diff #102518)

Aren't you reading uninitialized memory here (for UnswitchCost)?

I'd also name UnswitchCost in a way that indicates it is the "best so far".

1907 ↗(On Diff #102518)

How about putting all of the logic up to this point into a TerminatorInst *findNonTrivialUnswitchCandidate function?

chandlerc updated this revision to Diff 104372.Jun 28 2017, 3:11 AM
chandlerc marked 8 inline comments as done.

Updates based on first round of review. Still more to come.

Thanks for initial comments. I think I have most of these (except for a couple of the "extract this chunk" comments and one issue w/ ranking) addressed so maybe useful for another look. I'll work on the remaining bits in parallel.

lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
771–773 ↗(On Diff #102518)

In practice, we can often get away with this. But really there is no reason.

There were lots of other subtle and annoying aspects of how I was cloning loops and blocks into loops. For example, this did a really poor job of ordering the blocks within the loop data structure. While this doesn't matter for correctness or any of the test cases, it would make debugging subsequent transformation passes really, really annoying so it seems worth doing in a better way.

What's more, that is tied up in a gross ineffeciency in the code that I even had a FIXME for, so I decided to re-work how this is done.

The result is a bit longer in terms of lines-of-code, but I think it is quite a bit cleaner. We now *directly* clone each loop in the loop nest, without recursion, and with easier to reason about logic for inserting blocks into their block lists.

Below, we don't map blocks into loops all over the place, and instead use a more rich intermediate data structure and then do a single pass to map the blocks into the exit loops.

I know this was just a small comment on yoru part, but thanks, I think the approach is quite a bit better now.

1004 ↗(On Diff #102518)

It's recursive in the dominator tree, even though it isn't implemented that way. The non-recursive implementation is (I think) just a detail.

Great catch on the missing static, fixed.

1110 ↗(On Diff #102518)

I really, really wanted to do that.

I couldn't figure out how to do it.

This *only* has to delete things and hoist things. It doesn't ever create a new loop. So while the overall structure is clearly very similar, at each step we end up doing a subtly different thing.

The biggest thing that seems common between them to me is the reverse walk to re-establish the loop body. However, *within* that walk we have quite different code. Still, could possibly try to extract some of that logic to a helper. Other ideas?

1398 ↗(On Diff #102518)

I don't *incrementally* update it, but it does get updated.

1461 ↗(On Diff #102518)

There is a BasicBlockEdge query to the domtree, but I'm not sure its a good idea...

For a branch instruction, it would be fine. But when this becomes a switch, I think it'll do the wrong thing. There, we'll be fine if there are multiple "continue" edges from the switch's parent BB to the continue successor BB, but because there are multiple edges, the edge-based dominates query will return false.

A different way of looking at this is that this is actually materially different from the dominates query -- this doesn't check if there is *a* dominating edge, it checks if there are any edges entering this successor from a block other than the parent of the unswitched terminator. Which is also what the code is somewhat obviously doing.

Anyways, happy to rewrite this if you still prefer a method query, but curious whaht exact query you would prefer.

1851 ↗(On Diff #102518)

No, I wrote this comment in part as notes to myself, and I never returned to it. I also think I need to think more about the ranking here. This will take me a bit of time though, and hopefully the rest we can make review progress on in the interim...

1871 ↗(On Diff #102518)

Correct, I've just tried to write the code with as few assumptions about the terminator as possible because unlike the trivial case, the logic is complex enough that I imagine almost all of it will end up shared.

1879 ↗(On Diff #102518)

See my response above. Again, I'm happy to rewrite this whichever way seems most clear.

1901 ↗(On Diff #102518)

No, because we wait until we have *some* unswitch terminator first, and only then compare the cost.

Updated names.

1907 ↗(On Diff #102518)

I like factoring something like this... Question though, is it useful to keep the call to actually do the unswitch separate? Happy to look at the code and try some splits / arrangements to make it a bit more manageable, I'm definitely not (yet) happy w/ the factoring.

In this episode I've reviewed buildClonedLoops and cloneLoopNest.

lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
1461 ↗(On Diff #102518)

I see -- this sounds fine assuming you're going to generalize this code to switches (as opposed to having a different code path for them altogether). However, I'd suggest pulling out a helper function.

759 ↗(On Diff #104372)

Can VMap be a const reference?

760 ↗(On Diff #104372)

I'd call this AddClonedBlocksToLoop, CloneLoopBlocks sounds like you're cloning the blocks themselves.

767 ↗(On Diff #104372)

Add an assert that ClonedBB does not have a containing loop yet.

edit: I think it is better to write this as:

auto *ClonedBB = cast<BasicBlock>(VMap.lookup(BB));
if (LI.getLoopFor(BB) == &OrigL)
  ClonedL.addBasicBlockToLoop(ClonedBB, LI);

which has the assert, and avoids the slightly odd looking (since it is unconditional) ClonedL.addBlockEntry(ClonedBB);, and to remove the

for (Loop *PL = ClonedL; PL; PL = PL->getParentLoop())
  PL->addBlockEntry(ClonedBB);

in buildClonedLoops.

806 ↗(On Diff #104372)

(Minor nit) How about s/Due to unswitching simplifying the CFG of the loop,/Because unswitching simplifies the CFG of the loop,/

818 ↗(On Diff #104372)

Can VMap be a const reference?

844 ↗(On Diff #104372)

I'd add an assert here that ParentL contains or is equal to the previous parent of the original loop.

849 ↗(On Diff #104372)

How about using a SmallSetVector here?

862 ↗(On Diff #104372)

(Minor) I'd call this BlocksInClonedLoop.

867 ↗(On Diff #104372)

When will PossibleClonedLoopBlockSet.count(Pred) be false (assuming Pred is not ClonedPH)? I thought you don't clone the dead blocks (dominated by the edge not cloned into the cloned loop) at all?

888 ↗(On Diff #104372)

s/posible/possible/

891 ↗(On Diff #104372)

Like above, I don't understand the use of PossibleClonedLoopBlockSet here -- since you don't clone blocks dead due to the edge removal in unswitchInvariantBranch, you shouldn't really have a predecessor of a cloned block that hasn't been cloned.

If I'm wrong here, I think the rationale should be documented.

896 ↗(On Diff #104372)

Can you move the declaration of ClonedL here?

898 ↗(On Diff #104372)

Why not remove the ParentL->addBasicBlockToLoop(ClonedPH, LI); here and add the preheader using the generic UnloopedBlockSet logic at the end?

907 ↗(On Diff #104372)

Not sure what you mean by "stable w.r.t. predecessor ordering".

924 ↗(On Diff #104372)

I've added a comment regarding this in cloneLoopNest.

956 ↗(On Diff #104372)

By "the loop nest" do you mean "the loop nest containing the loop being unswitched"? If so, changing the comment to say that may make things clearer.

989 ↗(On Diff #104372)

When will we hit this? Won't all of the paths "die out" once they hit UnloopedBlockSet.erase(PredBB)?

1002 ↗(On Diff #104372)

I thought predecessors was deterministic too (since the use-list has deterministic order)?

sanjoy requested changes to this revision.Jul 12 2017, 10:16 PM

Second and last part of the review.

lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
1103 ↗(On Diff #104372)

I'd add an assert here that if ChildL 's header is dead, so are all other blocks in ChildL.

1263 ↗(On Diff #104372)

Why do you care about a stable partition here?

1280 ↗(On Diff #104372)

Why do we care about the sort being stable?

1284 ↗(On Diff #104372)

s/ExitLoopSet/NewExitLoopBlocks/

1285 ↗(On Diff #104372)

I'd s/first/deepest/ in the comment.

1287 ↗(On Diff #104372)

There already is a variable named L in scope, so please rename the argument to something else.

Perhaps clang should warn about this? :)

1298 ↗(On Diff #104372)

Can you add a comment stating something like "get the next exit block, in decreasing loop depth order"? You already have mentioned this in the std::sort call, but there are enough moving parts here that some redundancy can help.

1303 ↗(On Diff #104372)

Add a comment here that this only works because we sorted ExitsInLoops to be in increasing order of loop depth.

1314 ↗(On Diff #104372)

When would we ever get to the preheader?

1343 ↗(On Diff #104372)

I don't see when the !L.contains(BBL) bit will kick in -- won't all of the blocks in ExitLoopSet either be exit blocks of the original L, or previously interior blocks that are no longer interior?

sanjoy added inline comments.Jul 12 2017, 10:16 PM
lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
95 ↗(On Diff #104372)

This bit looks separable, all of the call sites to this function is in pre-existing code.

186 ↗(On Diff #104372)

This bit looks separable to me -- no new call sites were introduced in this patch.

1133 ↗(On Diff #104372)

Document the return value.

1267 ↗(On Diff #104372)

I'd s/UnloopedSet/UnloopedBlocks/ (I think the Blocks suffix adds more information than the Set suffix).

1357 ↗(On Diff #104372)

Again, I can't see when !L.contains(BBL) will kick in.

1589 ↗(On Diff #104372)

Define the direction of "conservative" here -- does it mean "A conservative-dominates B implies A actually-dominates B" or the converse?

1640 ↗(On Diff #104372)

Why is DontDeleteUselessPHIs false?

1696 ↗(On Diff #104372)

When you say "cloned sibling loop", do you include all of the loops that used to be subloops of L and have now been reparented? I can't think of a case in which the exit block set for those will change.

1719 ↗(On Diff #104372)

As I said above, I'm not convinced that we need this bit.

1726 ↗(On Diff #104372)

Instead of commenting on what's at the top of the stack, how about phrasing the comment as on the order in which we will process the loops?

s/ALso/Also/

1740 ↗(On Diff #104372)

s/enclusing/enclosing/

1761 ↗(On Diff #104372)

s/entiry/entire/

1907 ↗(On Diff #102518)

Question though, is it useful to keep the call to actually do the unswitch separate?

No strong opinion on that.

This revision now requires changes to proceed.Jul 12 2017, 10:16 PM
chandlerc updated this revision to Diff 107260.Jul 19 2017, 1:28 AM
chandlerc edited edge metadata.
chandlerc marked 18 inline comments as done.

Updated based on most of the code review comments from Sanjoy. Still some pending.

Most of the comments addressed or responded to here... Still working on the actual refactoring ,sorry for the delay there.

lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
767 ↗(On Diff #104372)

I used to use addBasicBlockToLoop but it proved very error prone. It added the blocks to too many parent loops and changed the *order* in which the blocks were added to parent loops. I'd much rather keep this unless there is a functional issue here.

867 ↗(On Diff #104372)

Unreachable code within the loop that gets cloned, but when we do our backwards walk is not found and associated with the loop.

This is a somewhat hypothetical (I don't have a test case) but it seemed principled.

891 ↗(On Diff #104372)

Same answer as above, tried to add rationale. Happy to update this if I'm missing something, I've not thought deeply here.

896 ↗(On Diff #104372)

Not trivially. We don't always clone a loop and we return the cloned loop if any. Suggestions?

898 ↗(On Diff #104372)

Because this seemed like a more precise and definitive result?

Essentially, *if* we establish a loop here, we can reason about the header and preheader using that information alone. This in turn should prune the UnloopedBlockSet logic below. I'm not sure why sinking this would be more clear?

907 ↗(On Diff #104372)

I tried to clarify this comment, not sure how successful I was.

924 ↗(On Diff #104372)

Continuing discussion there.

989 ↗(On Diff #104372)

No?

The UnloopedBlockSet might have the ClonedPH in it, and if it does, we could erase it once, and put it into the worklist. We'd then visit it and not want to walk the predecessors of that cloned PH.

1002 ↗(On Diff #104372)

So, LoopInfo goes to some trouble to avoid its block order relying on use-list order via the predecessors order. Regardless of whether that's important or not, I think it is reasonable to try and continue here as unswitch isn't the thing that should walk that back.

I also personally agree with that call. I don't like use-list order dependencies anywhere we can avoid them. I understand that they're nearly impossible to avoid today, but I think we would benefit greatly from moving away from that any and everywhere we can.

1263 ↗(On Diff #104372)

Because the whole idea is that the loop blocks list should have a stable ordering, and so I don't want to permute it while partitioning? Lots of transforms' order depends on the loop block order.

1280 ↗(On Diff #104372)

Same answer as the rest of these... My default is stable. If there is a reason why the order of this *cannot* impact the order of IR data structures that impact the order of transforms downstream, that's worth optimizing, but until then we should preserve incoming order of things IMO.

1287 ↗(On Diff #104372)

There really is no good name here. But the capture is just confusing. I think this can and should be a function w/o any capture that just operates on its arguments, removing the shadowing. Does taht work for you?

1314 ↗(On Diff #104372)

See above about preheader. I think the same logic applies to both routines.

1343 ↗(On Diff #104372)

I'm not following ... exit blocks of the original L are exactly what would be identified by this check? My brain is fuzzy on this, so maybe I'm just misunderstanding the point you're making.

1357 ↗(On Diff #104372)

Hoping that these two issues are the same.

1589 ↗(On Diff #104372)

This whole thing is kinda garbage.

Anyways, tried to rephrase it. But not sure this really addresses the concern. Maybe I should nuke all usage of domtree blewo this to avoid any concerns.

1640 ↗(On Diff #104372)

Because they could be LCSSA PHIs?

1696 ↗(On Diff #104372)

Yes, I do include those.

We can change which loop an exit block fo a cloned sibling belongs to. Not sure if we can actually change the exit set. But is there a meaningful distinction here?

1719 ↗(On Diff #104372)

Which bit? i'm not sure I'm following you here.

1726 ↗(On Diff #104372)

I don't think there is an order that I can reasonably document other than the first one?

sanjoy added inline comments.Jul 28 2017, 2:19 PM
lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
767 ↗(On Diff #104372)

SGTM

867 ↗(On Diff #104372)

It looks like LoopInfo filters out unreachable blocks (and does not insert them into llvm::Loops): https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Analysis/LoopInfoImpl.h#L360 so IIUC that situation exactly won't happen. Are there any other such cases?

896 ↗(On Diff #104372)

Never mind then. :)

898 ↗(On Diff #104372)

I was thinking about this as -- the preheader is just another block that did not get included into ClonedL so it should be handled by the generic logic that handles such blocks.

907 ↗(On Diff #104372)

I might use "not arbitrary" instead of "stable". "stable" seems to imply "deterministic" which is slightly confusing since predecesor ordering should be deterministic as well.

989 ↗(On Diff #104372)

Sorry, I forgot the case where the loop disappears because we delete all of the backedges. However, I think we should be able to assert BlocksInClonedLoop.empty() under if (BB == ClonedPH) (since the header of the cloned loop should still dominate the exit, and at least that header won't be in UnloopedBlockSet?

1002 ↗(On Diff #104372)

In that case, the problem is that not that the use list order is not deterministic but that it is nonsensical, so I'd change the comment to reflect that.

1280 ↗(On Diff #104372)

That ("prefer stable unless you can prove it does not matter") SGTM.

1287 ↗(On Diff #104372)

SGTM.

1314 ↗(On Diff #104372)

I've replied above on this.

1343 ↗(On Diff #104372)

I meant to say:

  • ExitLoopSet contains ExitBB and a set of blocks that were internal to L.
  • The loop for ExitBB is already ExitL so we don't need to touch it.
  • For the blocks that were internal to L, L.contains(BBL) will never never be false except for the case when BBL == &L.
1357 ↗(On Diff #104372)

Replied above.

1589 ↗(On Diff #104372)

Maybe I should nuke all usage of domtree blewo this to avoid any concerns.

If that's possible, I'd certainly prefer that.

I'd phrase the comment as "we know that the split point definitely dominates the merge block" or something like that.

1640 ↗(On Diff #104372)

I see -- can DontDeleteUselessPHIs be !L.contains(ContinueSuccBB) in that case?

1719 ↗(On Diff #104372)

Updating LCSSA for NonChildClonedLoops and HoistedLoops. Since we didn't break apart these loops, we should not have new uses of values defined inside these loops outside these loops (as opposed to the L, ClonedL and their parents, which we did split up and thus uses for interior values that were interior before may be exterior now).

1726 ↗(On Diff #104372)

I meant that you start with L and ClonedL (if they still exist) and move up the loop tree till OuterExitL.

Rebase onto the new LoopInfo API and get everything working. Also rebase into
a monorepo checkout.

Things are now working again and this patch can be experimented with, but
I still need to do some refactoring before it is ready for further review (that
is the last significant outstanding review comment AFAIK).

chandlerc marked 38 inline comments as done.Oct 16 2017, 12:52 AM

Updates and/or responses. Largely addressed all the comments with the next patch upload.

lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
867 ↗(On Diff #104372)

I've turned this into an assert.

907 ↗(On Diff #104372)

Reworded to be even less confusing I think by just directly stating the constraint: don't rely on predecessor order.

1343 ↗(On Diff #104372)

Sorry, yes, this is trivially true. Sorry for not seeing this the first time around.

1589 ↗(On Diff #104372)

I think the comment is better now, but let me know if it still doesn't make you happy.

I've checked, and we *only* use the dominator tree on blocks in the original loop, so these nodes aren't important. I've also documented what is going on there.

I think the only realistic solution in the end will be to teach this to accurately update the dominator tree, but that should be a separate patch.

1640 ↗(On Diff #104372)

Couldn't there still be PHIs that are necessary for LCSSA form of some nested loop?

Generally, this seems risky and low value. Why waste time proving that it could be false?

1719 ↗(On Diff #104372)

Maybe I'm being overly conservative here, but I'm having a very hard time convincing myself that we have a constructive reason why LCSSA will be correct here.

We are fundamentally changing the CFG when cloning. I feel like we will keep finding devilish cases where that CFG change subtly changes the LCSSA solution set. Maybe not, but I'm quite paranoid here and not able to really convince myself through either a proof or testing that this is correct without this. Maybe you can? Maybe this should be a follow-up patch?

chandlerc marked 4 inline comments as done.Oct 16 2017, 1:02 AM
chandlerc added inline comments.
lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
1343 ↗(On Diff #104372)

Err, I actually have a test case that checks this. ;] Without this, we orphan basic blocks. Added this and the check below back.

chandlerc updated this revision to Diff 119114.Oct 16 2017, 1:03 AM
chandlerc marked an inline comment as done.

Update addressing the remaining comments from Sanjoy.

Notably, this includes the majority of work to address the factoring comments.
I've now factored several parts of this into separate functions to make things
a bit more consumable.

I think this is all ready for review again.

sanjoy accepted this revision.Nov 13 2017, 9:59 AM

I skimmed through the changes, and they lgtm. I didn't try to follow the logic closely since I assumed most of the actual mechanics stay the same (but please let me know if that's not true and there are parts of the change that I should review again).

Also, looks like the issue of unstructured control flow was not handled (unswitching a loop with unstructured control flow can create child loops that weren't present before)? That's fine to address in a separate change.

This revision is now accepted and ready to land.Nov 13 2017, 9:59 AM
davide edited edge metadata.Nov 13 2017, 3:47 PM

I have few comments. As Sanjoy already approved, feel free to update this and check in without another round trip.
Most of them can be addressed in follow-ups, but I thought they're worth mentioning regardless. Thank you for pushing forward this.
It's been a long time coming :)

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
1139 ↗(On Diff #119114)

Can you add a comment here explaining with a non-stable sort is not enough?

1245 ↗(On Diff #119114)

This doesn't (yet) use the new dominator incremental update API, from what I can tell.
I think it can be done in a separate change, but I think it's worth doing instead of rolling its own update logic.

1789 ↗(On Diff #119114)

same here.

1809 ↗(On Diff #119114)

JFYI, rebuilding LCSSA in LLVM is currently a sledgehammer for some use cases.
It's of course, not your fault, as the algorithm is O(N^3) [I can provide cases that are tight :)], but let's be prepared to fix it in case this turns out to be a compile time sink.

1858 ↗(On Diff #119114)

Want to add also verifyisRecursivelyLCSSA here? or whatever is called?

chandlerc marked 3 inline comments as done.Nov 15 2017, 11:06 PM

Thanks Davide, I think I've got these covered either by adjustments or future work. Give a shout about anything you'd still like addresed in follow-up patches.

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
1139 ↗(On Diff #119114)

At this point, I don't think it actually matters because I ended up using a stable order below, so I've changed this. Thanks!

1245 ↗(On Diff #119114)

Yep. I've got that in the change description as one of the obvious follow-ups.

1809 ↗(On Diff #119114)

Yep. Sadly, I wasn't able to easily come up with a better way, but I can believe there is one.

This at least doesn't do the recursive formulation...

1858 ↗(On Diff #119114)

We already verify all child loops of updated loops in UpdateLCSSA. So the only way this would help is if the LCSSA formation itself is busted.

davide accepted this revision.Nov 16 2017, 4:40 PM

LGTM. Thanks Chandler.

This revision was automatically updated to reflect the committed changes.
chandlerc marked 2 inline comments as done.
wuzish added inline comments.
llvm/trunk/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
61

I have a question about the default value false of "enable-nontrivial-unswitch". Could it be changed to true because it seems that it can bring much improvement of bmk when enable it. Or is there any reason to stop it?

Herald added a project: Restricted Project. · View Herald TranscriptJul 25 2019, 1:16 AM
Herald added a subscriber: mgrang. · View Herald Transcript
wuzish added inline comments.Jul 26 2019, 12:16 AM
llvm/trunk/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
61

@chandlerc @sanjoy @davide Could you please have a look at the comment?