This is an archive of the discontinued LLVM Phabricator instance.

[LICM] Make LICM able to hoist phis
ClosedPublic

Authored by john.brawn on Oct 3 2018, 5:06 AM.

Details

Summary

The general approach taken is to make note of loop invariant branches, then when we see something conditional on that branch, such as a phi, we create a copy of the branch and (empty versions of) its successors and hoist using that.

This has no impact by itself that I've been able to see, as LICM typically doesn't see such phis as they will have been converted into selects by the time LICM is run, but once we start doing phi-to-select conversion later it will be important.

Diff Detail

Repository
rL LLVM

Event Timeline

john.brawn created this revision.Oct 3 2018, 5:06 AM

This has no impact by itself that I've been able to see, as LICM typically doesn't see such phis as they will have been converted into selects by the time LICM is run

That's surprising... I would have expected this to show up in some cases. Some branches can't eliminated due to side-effects.

lib/Transforms/Scalar/LICM.cpp
791

Would it be better to insert a PHI node, rather than re-hoist?

This has no impact by itself that I've been able to see, as LICM typically doesn't see such phis as they will have been converted into selects by the time LICM is run

That's surprising... I would have expected this to show up in some cases. Some branches can't eliminated due to side-effects.

It's quite likely that the various TODOs are preventing anything anything significant from being hoisted.

lib/Transforms/Scalar/LICM.cpp
791

I considered that, but went with this as it gives more similar results to the current behaviour in the cases where we end up not hoisting any phis. However I'm currently working on fixing the various TODOs here and it looks like maybe I will need to go for the approach of inserting phis. I'm currently working on it.

reames added a comment.Oct 7 2018, 8:14 PM

Starting with only high level design comments....

I think this patch is going about things in slightly the wrong way. As far as I can tell, there are two key components to this: 1) hoisting PHIs w/loop invariant selectors and operands, and 2) hoisting instructions from below conditional branches into conditional blocks inserted in preheader. I think these two can and should be separate into distinct patches.

I'm concerned about (2). This really seems like a form of loop peeling, and a currently unrestricted one without a cost model. I'm not sure I'm convinced this belongs in LICM at all.

(1) could be formulated w/o (2). The easiest way would be to form a select chain using the LIV conditions and LIV operands. This also works for any instruction for which we have a predicated form.

lib/Transforms/Scalar/LICM.cpp
443

Basic question on design here. Once we've hoisted the branch on the loop invariant condition, shouldn't we be able to remove the condition within the loop? Or is this more analogous to loop peeling where we leave a copy of the control flow within the loop?

473

loop invariant operands

486

I don't think the approach you're using here works. Consider the following:
A -> B, C
B-> D, E
C-> D, F

B & C share a common successor, but it's not a unique successor. As such, there are paths that don't converge in D.

547

What about indirectly dependent blocks? (e.g. D in A->B, C, B->D)

762

Minor: I'd suggest sinking the invariant operand check into the helper. Reading the helper in isolation is currently confusing.

792

I think the "re-hoisting" here (which is really re-sinking right?) is probably a non-starter. We should instead prevent hoisting.

Starting with only high level design comments....

I think this patch is going about things in slightly the wrong way. As far as I can tell, there are two key components to this: 1) hoisting PHIs w/loop invariant selectors and operands, and 2) hoisting instructions from below conditional branches into conditional blocks inserted in preheader. I think these two can and should be separate into distinct patches.

I'm concerned about (2). This really seems like a form of loop peeling, and a currently unrestricted one without a cost model. I'm not sure I'm convinced this belongs in LICM at all.

(1) could be formulated w/o (2). The easiest way would be to form a select chain using the LIV conditions and LIV operands. This also works for any instruction for which we have a predicated form.

I'd not considered going direct from phi to select when hoisting. It seems like it will restrict the kinds of phis that can be hoisted, but we're probably not hoisting them anyway due to the various TODOs here. I'll look into it.

lib/Transforms/Scalar/LICM.cpp
443

Only if we also hoist everything from the if and else blocks in the loop. Currently I'm leaving it up to simplifycfg to figur out if that's the case and clean it up if so.

486

I'm pretty sure that only matters if E uses values defined in B (or F uses values defined in C) which are then hoisted to conditional blocks in the preheader. But in that case we'd see that the uses aren't dominated and rehoist. But this does cause problems if we insert phis instead of rehoisting, so I do need to look at what I'm doing here some more.

792

Current behaviour of LICM is to hoist everything to the same block, and what this re-hoisting does is essentially get us that behaviour in those cases where hoisting to a conditional block doesn't work. Preventing hoisting would therefore prevent hoisting of things that are currently hoisted. We _could_ do something like delaying the decision of where to hoist things until we know if everything will end up being dominated, but re-hoisting seems much easier.

Update based on review comments, and add some more tests.

john.brawn marked 2 inline comments as done.Oct 11 2018, 8:20 AM

Starting with only high level design comments....

I think this patch is going about things in slightly the wrong way. As far as I can tell, there are two key components to this: 1) hoisting PHIs w/loop invariant selectors and operands, and 2) hoisting instructions from below conditional branches into conditional blocks inserted in preheader. I think these two can and should be separate into distinct patches.

I'm concerned about (2). This really seems like a form of loop peeling, and a currently unrestricted one without a cost model. I'm not sure I'm convinced this belongs in LICM at all.

(1) could be formulated w/o (2). The easiest way would be to form a select chain using the LIV conditions and LIV operands. This also works for any instruction for which we have a predicated form.

I'd not considered going direct from phi to select when hoisting. It seems like it will restrict the kinds of phis that can be hoisted, but we're probably not hoisting them anyway due to the various TODOs here. I'll look into it.

After trying this out and running lnt/spec2000/spec2006 it gives no improvements (either by itself or combined with D50723). By comparison the current approach gives no improvement by itself, and 2% improvement in spec2000/254.gap combined with D50723 (on Cortex-A57).

lib/Transforms/Scalar/LICM.cpp
486

It turns out that this is taken care of by the check that BT dominates CommonSucc. I've restructured the code here a little (as in some cases we weren't picking TrueDest/FalseDest when it was a triangle destination) and adjusted the comments.

547

Then we don't do anything. In the case that we're hoisting a phi in D we make sure to call getHoistedBlock on B and C beforehand which takes care of duplicating the control flow.

791

Inserting a phi, combined with relaxing the dominated check in registerPossiblyHoistableBranch, can lead to cases where we get the phi value from the 'wrong' predecessor (e.g. in the diamond_with_extra_in_edge test). It's probably not the case that this can happen without that relaxation but I'd rather not risk it.

mkazantsev requested changes to this revision.Oct 29 2018, 11:13 PM
mkazantsev added inline comments.
lib/Transforms/Scalar/LICM.cpp
486

Bail if TrueDest == FalseDest.

498

If the size of set is not 1, you are introducing non-deterministic optimizer behavior here. Please assert that the size is 1.

551

BB != Pair.second is checked 2 times, you could instead write if (BB != Pair.second && (succ1 == BB || succ2 == BB)).

576

Do you ever add nullptr to this map? I guess it should be .count(Orig).

593

Dot missing.

627

Dot missing.

680

It's O(N^2) in the worst case, which is not super-cool. If I understand this part correctly, you need to make sure that there is an element of WorkList that is a predecessor of another element of this WorkList. It can be done faster if you accumulate all elements of the WorkList into a set and do like

for (el : Worklist)
  for (p : predecessors(el))
    if (set.contains(p)) we are done.

It will be O(N).

691

How do you ensure that you have found something? Is it correct to do it if nothing was found and idx has a default value?

This revision now requires changes to proceed.Oct 29 2018, 11:13 PM

Does this lose debug information? I do very little with optimization passes so I don't know, but it would be nice to keep it in mind.

john.brawn marked 8 inline comments as done.Oct 31 2018, 8:54 AM
john.brawn added inline comments.
lib/Transforms/Scalar/LICM.cpp
498

It is actually possible to have more than 1 here. I'll adjust it to have deterministic behaviour for that case (by picking whichever happens to be first in the function's block list).

691

It's actually possible to not find anything, in which case we fall back to picking the first element of the worklist. I've updated the comment to make this clear.

john.brawn marked 2 inline comments as done.Oct 31 2018, 9:15 AM

Does this lose debug information? I do very little with optimization passes so I don't know, but it would be nice to keep it in mind.

The hoisting of phis goes through the hoist function, so it's no worse than anything else i.e. it loses the debug information, see line 1404.

Update according to review comments.

mkazantsev requested changes to this revision.Oct 31 2018, 11:01 PM
mkazantsev added inline comments.
lib/Transforms/Scalar/LICM.cpp
502

The general algorithm for non-empty TrueDestSucc deals well with case of size == 1; do we really need separate processing for this case?

623

Maybe I am missing something, but where do we notify DomTree of existence of these new edges?

665

I guess you could use insert(Worklist.begin(), Worklist.end()), it's tad faster.

691

I think I now understand the algorithm you're trying to apply.

Imagine the graph A -> B -> C -> D -> E, and the blocks happened to be arranged as E, D, C, B, A in the Worklist. Maybe I am missing something, but it seems that the algorithm inside collectChildrenInLoop is BFS-based and cannot protect us from this situation. It could, for example, happen if all these blocks have a common parent and arranged in the order E, D, C, B, A in this parent.

The downside if your algorithm is that it will traverse over all worklist to find A, then over all remaining worklist to find B and so on. Each any_of query will work O(N), and you do N such queries, so overall algorithmic complexity is O(N^2).

The correct way to efficiently process all blocks before their successors is processing them in Reverse Post-Order. This arrangement by definition gives you what you want: it sorts the blocks in order that all successors go after their predecessors. If the elements of Worklist are arranged in RPO, you can just process them in order without extra checks, and it will be O(N).

You can see example how to build RPO in class LoopBlocksDFS, the methods you need are beginRPO(), endRPO().

And aside from all that above, erasing the last element from the Worklist is O(1) and of the first one is O(N). When you don't care what element to erase, you should do so to the last one. Erasing the first element is expensive. However once you process them in RPO order, you don't need to erase anything from the Worklist at all because you know that you've processed all predecessors before all successors.

This revision now requires changes to proceed.Oct 31 2018, 11:01 PM

I generally like what this patch is doing, but the size of code and its complexity overwhelms me. Is it possible to separate the patch into smaller pieces so that it would be easier to review them in isolation? If yes, I'd appreciate that a lot.

john.brawn added inline comments.Nov 6 2018, 8:39 AM
lib/Transforms/Scalar/LICM.cpp
502

We could have one case for size >= 1 but that would mean iterating through potentially all of the blocks in the function to find the one element of the set, which seems rather wasteful when we can just take that one element directly.

623

Inserting the edge is never needed. insertEdge only has an effect when the source and destination blocks have different immediate dominators (see InsertReachable in GenericDomTreeConstruction.h) and that never happens - HoistTrueDest and HoistFalseDest always have the same immediate dominator as HoistCommonSucc, and for the edge from HoistCommonSucc to TargetSucc it may be that HoistCommonSucc is now the immediate dominator of TargetSucc but that's handled in the loop at line 632.

691

Yes it does look like reverse post-order gives me what I want (though I have to adjust some of the tests as it picks a different but equally good order). I'll modify this to do that.

I generally like what this patch is doing, but the size of code and its complexity overwhelms me. Is it possible to separate the patch into smaller pieces so that it would be easier to review them in isolation? If yes, I'd appreciate that a lot.

I don't really think so. The only way I can think of slicing it up would be to add the control flow hoisting (which does nothing useful by itself) as one patch, and then the phi hoisting as a seprate patch, but the phi hoisting part is very small and simple so I don't think it would usefully reduce the size of the first part.

Use LoopBlocksRPO to generate the worklist.

mkazantsev added inline comments.Nov 6 2018, 7:55 PM
lib/Transforms/Scalar/LICM.cpp
469

Preheader by definition has only one successor which is the loop header. Therefore, OriginalPreheaderChildren always has one element which is header's DTNode. So it doesn't look like you even need this field, just take loop header (it doesn't change, right?)

Or am I missing something here?

And btw, if it needs to be a field and a vector, use SmallVector instead of std::vector.

558

This name is misleading, it doesn't look like something that can change the IR. How about getOrCreateHoistedBlock?

649

Please add verification of LI in debug mode just to make sure that we didn't mess up anything (maybe not here, but in some reasonable place).

664

Why not SmallVector and push_back? That wouldn't make any difference in terms of effect but less memory-consuming.

786

This is the part I don't understand. If the hoist destination doesn't dominate instruction's users then it also doesn't dominate the original instruction. Is LICM able to hoist to such locations? Is there an example of that?

797

Set Changed here.

798

Do we really need to change the hoist point? Why not just hoist of them before preheader's terminator? That would make this code simpler.

1391–1396

Braces not needed (here and in some other places like this).

test/Transforms/LICM/hoist-phi.ll
127

Please add TODO: to the tests you think we should cover in the future, that will make it easier to track if we decide to expand this transform to something more general.

mkazantsev requested changes to this revision.Nov 6 2018, 8:25 PM

Missing Changed update and LI sanitizing should be added.

This revision now requires changes to proceed.Nov 6 2018, 8:25 PM
john.brawn marked 6 inline comments as done.Nov 7 2018, 8:45 AM
john.brawn added inline comments.
lib/Transforms/Scalar/LICM.cpp
469

Yes, I think you're right. I'll do some testing to double-check then change this.

664

The iteration order when rehoisting will have to be changed (because we have to visit instructions before the instructions they use, i.e. later instructions before earlier instructions), but yes that will work.

786

We get this when we hoist an instruction but don't hoist all of its uses. An easy example of this is a phi where one operand is loop invariant and the other is not. The hoisted loop invariant operand will not dominate its use. In the tests I've added @conditional_use, @rehoist, and @diamond_with_extra_in_edge are all examples of where rehoisting happens.

798

If we have something like

%a = add i32 %loop_invariant, 1
%b = mul i32 %a, 2

then if %b does not dominate its uses we have to rehoist %b, at which point %a no longer dominates %b and has to be rehoisted to before %a. That's what HoistPoint is doing, it's making sure we rehoist instructions before rehoisted instructions that use them.

mkazantsev added inline comments.Nov 8 2018, 2:13 AM
lib/Transforms/Scalar/LICM.cpp
664

You can preserve the order of processing if you just iterate in reverse order.

mkazantsev added inline comments.Nov 8 2018, 2:17 AM
lib/Transforms/Scalar/LICM.cpp
786

Right, it makes sense when uses are Phis. Maybe it should be explicitly stated in the comment. :)

798

But if we process %a and then %b and insert each of them before the terminator, we don't have this problem. It seems that using a vector instead of a list will spare you of this problem.

john.brawn updated this revision to Diff 173148.Nov 8 2018, 5:21 AM
john.brawn marked an inline comment as done.

Use SmallVector for HoistedInstructions, rename getHoistedBlock, add LI and DT verification, other small changes.

I have some minor comment, only the one that is related to SafetyInfo worries me (yet it just needs using utility function instead of just moveBefore). All other things are non-functional nits.
I'll take some time running fuzz tests with this patch because it's big. :) Please wait few hours, I will either give you approval (under condition that my comments will be addressed) or give you a failing test example.

lib/Transforms/Scalar/LICM.cpp
504

Minor: please add explanatory messages to the assertions you make.

562

If you only expect BI to be the only element with the property you need, it would make sense to save some compile time by inserting break here. But on the other hand, the assertion you make also makes sense. How about this: BI = find_if(<cond>), and then assert that find_if(<cond> && el != BI) == HoistableBranches.end()? Just a suggestion.

797

Removal/insertion of instructions to blocks may also need SafetyInfo updates. We have a helper function eraseInstruction for removal that handles it properly, but there was no helper for moveBefore because of only 1 occurrence. I've just added it in the patch rL346472

I am pretty sure that the current code is accidentally correct because rehoisting happens outside of loop (and SafetyInfo doesn't really care), but please rebase on top of this patch and use this utility whenever you move instructions to keep things general.

805

It makes more sense to verify DT before LI since LI uses it.

1369

After you've changed the semantics, Preheader is no longer the preheader of CurLoop (or is it)? If not, please rename it. If yes, why do you need it as parameter? You can just take loop's preheader as it was before.

In any case, it should be const BasicBlock *.

mkazantsev added inline comments.Nov 8 2018, 10:06 PM
lib/Transforms/Scalar/LICM.cpp
636

It can be useful to have staitstics on hoisted branches. I'm OK if it goes as a follow-up patch.

mkazantsev requested changes to this revision.Nov 9 2018, 3:30 AM

One of my fuzz tests has failed with the following assertion:

PHINode should have one entry for each predecessor of its parent basic block!
%.us-phi = phi i32 [ %res.i.peel, %cHeapLvb.exit181 ], [ %83, %bci_229.licm.split.us.licm ], [ %83, %bci_229.licm.split.us.licm ]

It will take me time to reduce the test to something reasonable, but maybe this info will help you find the bug in the code. So far I can say that the problematic situation happens when FalseDest == CommonSucc. Hope if helps you to diagnose the bug and construct a test. If not, I'll be able to provide the reduced test case early next week.

This revision now requires changes to proceed.Nov 9 2018, 3:30 AM

The patch fails the following test:

opt -licm -S test.ll

; ModuleID = './bugpoint-reduced-simplified.ll'
source_filename = "./bugpoint-reduced-simplified.ll"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128-ni:1"

define void @bar() {
bb:
  %tmp = call i32 @llvm.x86.sse2.cvttsd2si(<2 x double> undef)
  br label %bb3

bb1:                                              ; preds = %bb4
  %tmp2 = phi i32 [ %tmp5, %bb4 ]
  ret void

bb3:                                              ; preds = %bb4, %bb
  br i1 undef, label %bb6, label %bb4

bb4:                                              ; preds = %bb6, %bb6, %bb3
  %tmp5 = phi i32 [ %tmp, %bb3 ], [ undef, %bb6 ], [ undef, %bb6 ]
  br i1 undef, label %bb1, label %bb3

bb6:                                              ; preds = %bb3
  br i1 undef, label %bb4, label %bb4
}

; Function Attrs: nounwind readnone
declare i32 @llvm.x86.sse2.cvttsd2si(<2 x double>) #0

attributes #0 = { nounwind readnone }

The failure looks like

PHINode should have one entry for each predecessor of its parent basic block!
  %tmp5 = phi i32 [ %tmp, %bb ], [ undef, %bb6.licm ], [ undef, %bb6.licm ]
in function bar
LLVM ERROR: Broken function found, compilation aborted!

Likely some corner case related to conditional branch to the same block for true and false is not handled. Please make sure that it passes before we can proceed with the patch.

john.brawn marked 6 inline comments as done.Nov 14 2018, 5:47 AM

The patch fails the following test:

...

Likely some corner case related to conditional branch to the same block for true and false is not handled. Please make sure that it passes before we can proceed with the patch.

Yes, it's specifically that in such cases the phi in the destination block has the same incoming block appear more than once which I didn't expect. I think the easiest thing to do is to have canHoistPHI return false for such phis.

lib/Transforms/Scalar/LICM.cpp
636

I'll add this and also one for the number of created blocks, which seems like it may also be useful.

1369

I left it as Preheader just because it was simpler not to change it, but yes it makes sense to rename it so I'll do so. It can't be const though, as we're hoisting I into it (i.e. modifying it).

john.brawn marked 2 inline comments as done.

Update according to review comments.

I don't see any obvious problems in code. Let me run one more round of fuzz testing, I'll give my approval if it passes.

This revision is now accepted and ready to land.Nov 18 2018, 8:25 PM
This revision was automatically updated to reflect the committed changes.

Looks like this was reverted in r347225.

The thing causing the crash was incorrect handling of instructions that need to be rehoisted. When we have hoisted a phi, then hoisted a use of that phi, then if have to rehoist that use we need to make sure that we rehoist it to _after_ the hoisted phi. We can do this by rehoisting to the immediate dominator instead of just rehoisting everything to the original preheader.

john.brawn reopened this revision.Nov 20 2018, 9:12 AM
This revision is now accepted and ready to land.Nov 20 2018, 9:12 AM
john.brawn requested review of this revision.Nov 20 2018, 9:13 AM

This rehoisting stuff has been wary for me before, and now I'm completely hesitant about what is going on there. Why do we end up hoisting to some block that does not dominate its users? Do we have a strong reason of doing so rather than finding the correct hoisting destination?

This rehoisting stuff has been wary for me before, and now I'm completely hesitant about what is going on there. Why do we end up hoisting to some block that does not dominate its users? Do we have a strong reason of doing so rather than finding the correct hoisting destination?

Let's look at the @phi_conditional_use test in hoist-phi.ll as an example:

define i64 @phi_conditional_use(i32 %f, i32* %g) {
entry:
  %cmp1 = icmp eq i32 %f, 1
  %cmp2 = icmp eq i32 %f, 0
  br label %loop

loop:
  br i1 %cmp1, label %if.end, label %if.then

if.then:
  br label %if.end

if.end:
  %phi = phi i64 [ 0, %loop ], [ 1, %if.then ]
  br i1 %cmp2, label %loop.backedge, label %if.then2

if.then2:
  %d = getelementptr inbounds i32, i32* %g, i64 %phi
  store i32 1, i32* %d, align 4
  br label %loop.backedge

loop.backedge:
  br label %loop
}

When looking at %loop we note that the branch uses a loop invariant condition. Then when looking at %if.end we see that %phi has loop invariant operands, and is the merge point of the branch we noted, so we copy the control flow and hoist the phi:

define i64 @phi_conditional_use(i32 %f, i32* %g) {
entry:
  %cmp1 = icmp eq i32 %f, 1
  %cmp2 = icmp eq i32 %f, 0
  br i1 %cmp1, label %if.end.licm, label %if.then.licm

if.then.licm:
  br label %if.end.licm

if.end.licm:
  %phi = phi i64 [ 0, %entry ], [ 1, %if.then.licm ]
  br label %loop

loop:
  br i1 %cmp1, label %if.end, label %if.then

if.then:
  br label %if.end

if.end:
  br i1 %cmp2, label %loop.backedge, label %if.then2

if.then2:
  %d = getelementptr inbounds i32, i32* %g, i64 %phi
  store i32 1, i32* %d, align 4
  br label %loop.backedge

loop.backedge:
  br label %loop
}

We then note that the branch at the end if %if.end has a loop invariant condition. We then look at %if.then2 and see that %d has loop invariant operands, and that it's in a block that's the target of a noted branch, so we duplicate control flow and hoist:

define i64 @phi_conditional_use(i32 %f, i32* %g) {
entry:
  %cmp1 = icmp eq i32 %f, 1
  %cmp2 = icmp eq i32 %f, 0
  br i1 %cmp1, label %if.end.licm, label %if.then.licm

if.then.licm:
  br label %if.end.licm

if.end.licm:
  %phi = phi i64 [ 0, %entry ], [ 1, %if.then.licm ]
  br i1 %cmp2, label %loop.backedge.licm, label %if.then2.licm

if.then2.licm:
  %d = getelementptr inbounds i32, i32* %g, i64 %phi
  br label %loop.backedge.licm

loop.backedge.licm:
  br label %loop

loop:
  br i1 %cmp1, label %if.end, label %if.then

if.then:
  br label %if.end

if.end:
  br i1 %cmp2, label %loop.backedge, label %if.then2

if.then2:
  store i32 1, i32* %d, align 4
  br label %loop.backedge

loop.backedge:
  br label %loop
}

Now we look at the store in %if.then2, which has loop invariant operands but can't be hoisted as it's not safe to hoist the store.

We're now done hoisting things, but we have the problem that %d does not dominate the store in %if.then2. So we have to move it somewhere where it does dominate %if.then2, but we also have to make sure it's after %phi, as it uses that as an operand. So it has to go in %if.end.licm after %phi.

Some comments on all of this:

At the time that we hoist %d we don't know if its uses are also going to get hoisted. We _could_ do something like rewrite LICM to be a two-step process where it first makes note of which instructions are going to get hoisted, then hoists them using knowledge of if their uses are going to get hoisted to figure out where they get hoisted. But that would be a lot more disruption than the current patch.

We could change things to never hoist things into conditional blocks i.e. %d would be hoisted directly to %if.end.licm. There's two reasons I didn't do this:

  • As explained in the comment at line 715, hoisting to conditional blocks could in the future be used as away to be able to hoist instructions that are unsafe to execute unconditionally, e.g. the store could actually be hoisted to %if.then2.licm as it would only get executed in the cases where the store in the loop would have been executed.
  • It seems a little strange to just speculatively execute instructions without limit because you're hoisting them outside of the loop. It's what LICM currently does, but it's at odds with the current behaviour elsewhere e.g. in SimplifyCFG which does the same thing (except hoisting instructions to their immediate predecessor block instead of outside a loop) but has a lot of (inconsistent) heuristics for deciding if it's a good idea or not. Doing it like this means we can leave it up to later passes to figure out if the instructions should be unconditional.
  • Additionally: doing this would means that "the hoisted version of a block X for the purpose of control flow and phis" and "which block to hoist instructions from block X into" could be different blocks whereas they're currently always the same block, which would need some extra code to handle (though of course at the same time the code to do rehoisting would get removed so it could turn out to be about the same complexity-wise).
mkazantsev requested changes to this revision.Nov 25 2018, 9:33 PM

I don't really have strong objections against that. Thanks for your explanation. It could've been a Phi node insertion instead of rehoisting in this case, but I don't really see why rehoisting should be bad (other than we may end up executing code that never executes, but it's a general LICM flaw and not specific to this patch). I also see no bug in what we have now. So I'm ok with the idea.

I am not comfortable with reverting it back and forth, so I will request some re-design of your patch. Please make an option to switch off/on CFG hoisting. We will need it if this transform will expose some unexpected bugs in the future. Reverting patch that big may become hard in time. Then, I'm OK if you check in the current patch with this option switch by default (in tests, it can be set to true).

Then, you can check in a patch that turns it on by default.

This revision now requires changes to proceed.Nov 25 2018, 9:33 PM

Added an option to enable control flow hoisting, turned off by default.

This revision is now accepted and ready to land.Nov 27 2018, 5:27 PM
This revision was automatically updated to reflect the committed changes.

This caused a significant regression in compile time for some sources, see https://bugs.llvm.org/show_bug.cgi?id=39836 for details. Building one source file went from 45 seconds to 140 seconds.

Reverted in SVN r347867.