This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
21/54
LICM.cpp
-
test/Transforms/
-
Transforms/
-
LICM/
1/1
hoist-phi.ll
-
LoopVectorize/
-
invariant-store-vectorization.ll

Differential D52827

[LICM] Make LICM able to hoist phis
ClosedPublic

Authored by john.brawn on Oct 3 2018, 5:06 AM.

Download Raw Diff

Details

Reviewers

reames
hfinkel
mkazantsev
skatkov

Commits

rGa7eb2c863fa9: [LICM] Reapply r347776 "Make LICM able to hoist phis" with fix
rG31c9769580ea: [LICM] Reapply r347190 "Make LICM able to hoist phis" with fix
rG12c046fba0cb: [LICM] Make LICM able to hoist phis
rL347889: [LICM] Reapply r347776 "Make LICM able to hoist phis" with fix
rL347776: [LICM] Reapply r347190 "Make LICM able to hoist phis" with fix
rL347190: [LICM] Make LICM able to hoist phis

Summary

The general approach taken is to make note of loop invariant branches, then when we see something conditional on that branch, such as a phi, we create a copy of the branch and (empty versions of) its successors and hoist using that.

This has no impact by itself that I've been able to see, as LICM typically doesn't see such phis as they will have been converted into selects by the time LICM is run, but once we start doing phi-to-select conversion later it will be important.

Diff Detail

Repository: rL LLVM

Event Timeline

john.brawn created this revision.Oct 3 2018, 5:06 AM

This has no impact by itself that I've been able to see, as LICM typically doesn't see such phis as they will have been converted into selects by the time LICM is run

That's surprising... I would have expected this to show up in some cases. Some branches can't eliminated due to side-effects.

lib/Transforms/Scalar/LICM.cpp
796	Would it be better to insert a PHI node, rather than re-hoist?

In D52827#1254146, @efriedma wrote:

This has no impact by itself that I've been able to see, as LICM typically doesn't see such phis as they will have been converted into selects by the time LICM is run

That's surprising... I would have expected this to show up in some cases. Some branches can't eliminated due to side-effects.

It's quite likely that the various TODOs are preventing anything anything significant from being hoisted.

lib/Transforms/Scalar/LICM.cpp
796	I considered that, but went with this as it gives more similar results to the current behaviour in the cases where we end up not hoisting any phis. However I'm currently working on fixing the various TODOs here and it looks like maybe I will need to go for the approach of inserting phis. I'm currently working on it.

Starting with only high level design comments....

I think this patch is going about things in slightly the wrong way. As far as I can tell, there are two key components to this: 1) hoisting PHIs w/loop invariant selectors and operands, and 2) hoisting instructions from below conditional branches into conditional blocks inserted in preheader. I think these two can and should be separate into distinct patches.

I'm concerned about (2). This really seems like a form of loop peeling, and a currently unrestricted one without a cost model. I'm not sure I'm convinced this belongs in LICM at all.

(1) could be formulated w/o (2). The easiest way would be to form a select chain using the LIV conditions and LIV operands. This also works for any instruction for which we have a predicated form.

lib/Transforms/Scalar/LICM.cpp
448	Basic question on design here. Once we've hoisted the branch on the loop invariant condition, shouldn't we be able to remove the condition within the loop? Or is this more analogous to loop peeling where we leave a copy of the control flow within the loop?
478	loop invariant operands
491	I don't think the approach you're using here works. Consider the following: A -> B, C B-> D, E C-> D, F B & C share a common successor, but it's not a unique successor. As such, there are paths that don't converge in D.
552	What about indirectly dependent blocks? (e.g. D in A->B, C, B->D)
769	Minor: I'd suggest sinking the invariant operand check into the helper. Reading the helper in isolation is currently confusing.
797	I think the "re-hoisting" here (which is really re-sinking right?) is probably a non-starter. We should instead prevent hoisting.

In D52827#1257485, @reames wrote:

Starting with only high level design comments....

I think this patch is going about things in slightly the wrong way. As far as I can tell, there are two key components to this: 1) hoisting PHIs w/loop invariant selectors and operands, and 2) hoisting instructions from below conditional branches into conditional blocks inserted in preheader. I think these two can and should be separate into distinct patches.

I'm concerned about (2). This really seems like a form of loop peeling, and a currently unrestricted one without a cost model. I'm not sure I'm convinced this belongs in LICM at all.

(1) could be formulated w/o (2). The easiest way would be to form a select chain using the LIV conditions and LIV operands. This also works for any instruction for which we have a predicated form.

I'd not considered going direct from phi to select when hoisting. It seems like it will restrict the kinds of phis that can be hoisted, but we're probably not hoisting them anyway due to the various TODOs here. I'll look into it.

lib/Transforms/Scalar/LICM.cpp
448	Only if we also hoist everything from the if and else blocks in the loop. Currently I'm leaving it up to simplifycfg to figur out if that's the case and clean it up if so.
491	I'm pretty sure that only matters if E uses values defined in B (or F uses values defined in C) which are then hoisted to conditional blocks in the preheader. But in that case we'd see that the uses aren't dominated and rehoist. But this does cause problems if we insert phis instead of rehoisting, so I do need to look at what I'm doing here some more.
797	Current behaviour of LICM is to hoist everything to the same block, and what this re-hoisting does is essentially get us that behaviour in those cases where hoisting to a conditional block doesn't work. Preventing hoisting would therefore prevent hoisting of things that are currently hoisted. We _could_ do something like delaying the decision of where to hoist things until we know if everything will end up being dominated, but re-hoisting seems much easier.

dmgreen added a subscriber: dmgreen.Oct 11 2018, 8:05 AM

Update based on review comments, and add some more tests.

In D52827#1257963, @john.brawn wrote:

In D52827#1257485, @reames wrote:

Starting with only high level design comments....

I think this patch is going about things in slightly the wrong way. As far as I can tell, there are two key components to this: 1) hoisting PHIs w/loop invariant selectors and operands, and 2) hoisting instructions from below conditional branches into conditional blocks inserted in preheader. I think these two can and should be separate into distinct patches.

I'm concerned about (2). This really seems like a form of loop peeling, and a currently unrestricted one without a cost model. I'm not sure I'm convinced this belongs in LICM at all.

(1) could be formulated w/o (2). The easiest way would be to form a select chain using the LIV conditions and LIV operands. This also works for any instruction for which we have a predicated form.

I'd not considered going direct from phi to select when hoisting. It seems like it will restrict the kinds of phis that can be hoisted, but we're probably not hoisting them anyway due to the various TODOs here. I'll look into it.

After trying this out and running lnt/spec2000/spec2006 it gives no improvements (either by itself or combined with D50723). By comparison the current approach gives no improvement by itself, and 2% improvement in spec2000/254.gap combined with D50723 (on Cortex-A57).

lib/Transforms/Scalar/LICM.cpp
491	It turns out that this is taken care of by the check that BT dominates CommonSucc. I've restructured the code here a little (as in some cases we weren't picking TrueDest/FalseDest when it was a triangle destination) and adjusted the comments.
552	Then we don't do anything. In the case that we're hoisting a phi in D we make sure to call getHoistedBlock on B and C beforehand which takes care of duplicating the control flow.
796	Inserting a phi, combined with relaxing the dominated check in registerPossiblyHoistableBranch, can lead to cases where we get the phi value from the 'wrong' predecessor (e.g. in the diamond_with_extra_in_edge test). It's probably not the case that this can happen without that relaxation but I'd rather not risk it.

Ping.

mkazantsev requested changes to this revision.Oct 29 2018, 11:13 PM

mkazantsev added inline comments.

lib/Transforms/Scalar/LICM.cpp
491	Bail if `TrueDest == FalseDest`.
503	If the size of set is not 1, you are introducing non-deterministic optimizer behavior here. Please assert that the size is 1.
556	`BB != Pair.second` is checked 2 times, you could instead write `if (BB != Pair.second && (succ1 == BB \|\| succ2 == BB))`.
581	Do you ever add nullptr to this map? I guess it should be `.count(Orig)`.
598	Dot missing.
632	Dot missing.
693	It's `O(N^2)` in the worst case, which is not super-cool. If I understand this part correctly, you need to make sure that there is an element of `WorkList` that is a predecessor of another element of this WorkList. It can be done faster if you accumulate all elements of the WorkList into a set and do like for (el : Worklist) for (p : predecessors(el)) if (set.contains(p)) we are done. It will be `O(N)`.
704	How do you ensure that you have found something? Is it correct to do it if nothing was found and `idx` has a default value?

This revision now requires changes to proceed.Oct 29 2018, 11:13 PM

Does this lose debug information? I do very little with optimization passes so I don't know, but it would be nice to keep it in mind.

john.brawn marked 8 inline comments as done.Oct 31 2018, 8:54 AM

john.brawn added inline comments.

lib/Transforms/Scalar/LICM.cpp
503	It is actually possible to have more than 1 here. I'll adjust it to have deterministic behaviour for that case (by picking whichever happens to be first in the function's block list).
704	It's actually possible to not find anything, in which case we fall back to picking the first element of the worklist. I've updated the comment to make this clear.

In D52827#1280717, @probinson wrote:

Does this lose debug information? I do very little with optimization passes so I don't know, but it would be nice to keep it in mind.

The hoisting of phis goes through the hoist function, so it's no worse than anything else i.e. it loses the debug information, see line 1404.

Update according to review comments.

mkazantsev requested changes to this revision.Oct 31 2018, 11:01 PM

mkazantsev added inline comments.

lib/Transforms/Scalar/LICM.cpp
507	The general algorithm for non-empty `TrueDestSucc` deals well with case of `size == 1`; do we really need separate processing for this case?
628	Maybe I am missing something, but where do we notify DomTree of existence of these new edges?
686	I guess you could use `insert(Worklist.begin(), Worklist.end())`, it's tad faster.
704	I think I now understand the algorithm you're trying to apply. Imagine the graph `A -> B -> C -> D -> E`, and the blocks happened to be arranged as `E, D, C, B, A` in the Worklist. Maybe I am missing something, but it seems that the algorithm inside `collectChildrenInLoop` is BFS-based and cannot protect us from this situation. It could, for example, happen if all these blocks have a common parent and arranged in the order `E, D, C, B, A` in this parent. The downside if your algorithm is that it will traverse over all worklist to find `A`, then over all remaining worklist to find `B` and so on. Each `any_of` query will work `O(N)`, and you do N such queries, so overall algorithmic complexity is `O(N^2)`. The correct way to efficiently process all blocks before their successors is processing them in Reverse Post-Order. This arrangement by definition gives you what you want: it sorts the blocks in order that all successors go after their predecessors. If the elements of Worklist are arranged in RPO, you can just process them in order without extra checks, and it will be `O(N)`. You can see example how to build RPO in class `LoopBlocksDFS`, the methods you need are `beginRPO(), endRPO()`. And aside from all that above, erasing the last element from the Worklist is `O(1)` and of the first one is `O(N)`. When you don't care what element to erase, you should do so to the last one. Erasing the first element is expensive. However once you process them in RPO order, you don't need to erase anything from the Worklist at all because you know that you've processed all predecessors before all successors.

This revision now requires changes to proceed.Oct 31 2018, 11:01 PM

I generally like what this patch is doing, but the size of code and its complexity overwhelms me. Is it possible to separate the patch into smaller pieces so that it would be easier to review them in isolation? If yes, I'd appreciate that a lot.

john.brawn added inline comments.Nov 6 2018, 8:39 AM

lib/Transforms/Scalar/LICM.cpp
507	We could have one case for size >= 1 but that would mean iterating through potentially all of the blocks in the function to find the one element of the set, which seems rather wasteful when we can just take that one element directly.
628	Inserting the edge is never needed. insertEdge only has an effect when the source and destination blocks have different immediate dominators (see InsertReachable in GenericDomTreeConstruction.h) and that never happens - HoistTrueDest and HoistFalseDest always have the same immediate dominator as HoistCommonSucc, and for the edge from HoistCommonSucc to TargetSucc it may be that HoistCommonSucc is now the immediate dominator of TargetSucc but that's handled in the loop at line 632.
704	Yes it does look like reverse post-order gives me what I want (though I have to adjust some of the tests as it picks a different but equally good order). I'll modify this to do that.

In D52827#1283474, @mkazantsev wrote:

I generally like what this patch is doing, but the size of code and its complexity overwhelms me. Is it possible to separate the patch into smaller pieces so that it would be easier to review them in isolation? If yes, I'd appreciate that a lot.

I don't really think so. The only way I can think of slicing it up would be to add the control flow hoisting (which does nothing useful by itself) as one patch, and then the phi hoisting as a seprate patch, but the phi hoisting part is very small and simple so I don't think it would usefully reduce the size of the first part.

Use LoopBlocksRPO to generate the worklist.

mkazantsev added inline comments.Nov 6 2018, 7:55 PM

lib/Transforms/Scalar/LICM.cpp
474	Preheader by definition has only one successor which is the loop header. Therefore, `OriginalPreheaderChildren` always has one element which is header's DTNode. So it doesn't look like you even need this field, just take loop header (it doesn't change, right?) Or am I missing something here? And btw, if it needs to be a field and a vector, use `SmallVector` instead of `std::vector`.
563	This name is misleading, it doesn't look like something that can change the IR. How about `getOrCreateHoistedBlock`?
654	Please add verification of `LI` in debug mode just to make sure that we didn't mess up anything (maybe not here, but in some reasonable place).
685	Why not `SmallVector` and `push_back`? That wouldn't make any difference in terms of effect but less memory-consuming.
791	This is the part I don't understand. If the hoist destination doesn't dominate instruction's users then it also doesn't dominate the original instruction. Is LICM able to hoist to such locations? Is there an example of that?
802	Set `Changed` here.
803	Do we really need to change the hoist point? Why not just hoist of them before preheader's terminator? That would make this code simpler.
1411–1416	Braces not needed (here and in some other places like this).
test/Transforms/LICM/hoist-phi.ll
127	Please add `TODO:` to the tests you think we should cover in the future, that will make it easier to track if we decide to expand this transform to something more general.

Missing Changed update and LI sanitizing should be added.

This revision now requires changes to proceed.Nov 6 2018, 8:25 PM

john.brawn marked 6 inline comments as done.Nov 7 2018, 8:45 AM

john.brawn added inline comments.

lib/Transforms/Scalar/LICM.cpp
474	Yes, I think you're right. I'll do some testing to double-check then change this.
685	The iteration order when rehoisting will have to be changed (because we have to visit instructions before the instructions they use, i.e. later instructions before earlier instructions), but yes that will work.
791	We get this when we hoist an instruction but don't hoist all of its uses. An easy example of this is a phi where one operand is loop invariant and the other is not. The hoisted loop invariant operand will not dominate its use. In the tests I've added @conditional_use, @rehoist, and @diamond_with_extra_in_edge are all examples of where rehoisting happens.
803	If we have something like %a = add i32 %loop_invariant, 1 %b = mul i32 %a, 2 then if %b does not dominate its uses we have to rehoist %b, at which point %a no longer dominates %b and has to be rehoisted to before %a. That's what HoistPoint is doing, it's making sure we rehoist instructions before rehoisted instructions that use them.

mkazantsev added inline comments.Nov 8 2018, 2:13 AM

lib/Transforms/Scalar/LICM.cpp
685	You can preserve the order of processing if you just iterate in reverse order.

mkazantsev added inline comments.Nov 8 2018, 2:17 AM

lib/Transforms/Scalar/LICM.cpp
791	Right, it makes sense when uses are Phis. Maybe it should be explicitly stated in the comment. :)
803	But if we process `%a` and then `%b` and insert each of them before the terminator, we don't have this problem. It seems that using a vector instead of a list will spare you of this problem.

Use SmallVector for HoistedInstructions, rename getHoistedBlock, add LI and DT verification, other small changes.

I have some minor comment, only the one that is related to SafetyInfo worries me (yet it just needs using utility function instead of just moveBefore). All other things are non-functional nits.
I'll take some time running fuzz tests with this patch because it's big. :) Please wait few hours, I will either give you approval (under condition that my comments will be addressed) or give you a failing test example.

lib/Transforms/Scalar/LICM.cpp
509	Minor: please add explanatory messages to the assertions you make.
567	If you only expect BI to be the only element with the property you need, it would make sense to save some compile time by inserting `break` here. But on the other hand, the assertion you make also makes sense. How about this: `BI = find_if(<cond>)`, and then assert that `find_if(<cond> && el != BI) == HoistableBranches.end()`? Just a suggestion.
802	Removal/insertion of instructions to blocks may also need SafetyInfo updates. We have a helper function `eraseInstruction` for removal that handles it properly, but there was no helper for `moveBefore` because of only 1 occurrence. I've just added it in the patch rL346472 I am pretty sure that the current code is accidentally correct because rehoisting happens outside of loop (and `SafetyInfo` doesn't really care), but please rebase on top of this patch and use this utility whenever you move instructions to keep things general.
810	It makes more sense to verify `DT` before `LI` since `LI` uses it.
1390–1391	After you've changed the semantics, `Preheader` is no longer the preheader of `CurLoop` (or is it)? If not, please rename it. If yes, why do you need it as parameter? You can just take loop's preheader as it was before. In any case, it should be `const BasicBlock *`.

mkazantsev added inline comments.Nov 8 2018, 10:06 PM

lib/Transforms/Scalar/LICM.cpp
641	It can be useful to have staitstics on hoisted branches. I'm OK if it goes as a follow-up patch.

One of my fuzz tests has failed with the following assertion:

PHINode should have one entry for each predecessor of its parent basic block!
%.us-phi = phi i32 [ %res.i.peel, %cHeapLvb.exit181 ], [ %83, %bci_229.licm.split.us.licm ], [ %83, %bci_229.licm.split.us.licm ]

It will take me time to reduce the test to something reasonable, but maybe this info will help you find the bug in the code. So far I can say that the problematic situation happens when FalseDest == CommonSucc. Hope if helps you to diagnose the bug and construct a test. If not, I'll be able to provide the reduced test case early next week.

This revision now requires changes to proceed.Nov 9 2018, 3:30 AM

The patch fails the following test:

opt -licm -S test.ll

; ModuleID = './bugpoint-reduced-simplified.ll'
source_filename = "./bugpoint-reduced-simplified.ll"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128-ni:1"

define void @bar() {
bb:
  %tmp = call i32 @llvm.x86.sse2.cvttsd2si(<2 x double> undef)
  br label %bb3

bb1:                                              ; preds = %bb4
  %tmp2 = phi i32 [ %tmp5, %bb4 ]
  ret void

bb3:                                              ; preds = %bb4, %bb
  br i1 undef, label %bb6, label %bb4

bb4:                                              ; preds = %bb6, %bb6, %bb3
  %tmp5 = phi i32 [ %tmp, %bb3 ], [ undef, %bb6 ], [ undef, %bb6 ]
  br i1 undef, label %bb1, label %bb3

bb6:                                              ; preds = %bb3
  br i1 undef, label %bb4, label %bb4
}

; Function Attrs: nounwind readnone
declare i32 @llvm.x86.sse2.cvttsd2si(<2 x double>) #0

attributes #0 = { nounwind readnone }

The failure looks like

PHINode should have one entry for each predecessor of its parent basic block!
  %tmp5 = phi i32 [ %tmp, %bb ], [ undef, %bb6.licm ], [ undef, %bb6.licm ]
in function bar
LLVM ERROR: Broken function found, compilation aborted!

Likely some corner case related to conditional branch to the same block for true and false is not handled. Please make sure that it passes before we can proceed with the patch.

In D52827#1295052, @mkazantsev wrote:

The patch fails the following test:

...

Likely some corner case related to conditional branch to the same block for true and false is not handled. Please make sure that it passes before we can proceed with the patch.

Yes, it's specifically that in such cases the phi in the destination block has the same incoming block appear more than once which I didn't expect. I think the easiest thing to do is to have canHoistPHI return false for such phis.

lib/Transforms/Scalar/LICM.cpp
641	I'll add this and also one for the number of created blocks, which seems like it may also be useful.
1390–1391	I left it as `Preheader` just because it was simpler not to change it, but yes it makes sense to rename it so I'll do so. It can't be const though, as we're hoisting I into it (i.e. modifying it).

Update according to review comments.

I don't see any obvious problems in code. Let me run one more round of fuzz testing, I'll give my approval if it passes.

LGTM.

This revision is now accepted and ready to land.Nov 18 2018, 8:25 PM

Closed by commit rL347190: [LICM] Make LICM able to hoist phis (authored by john.brawn). · Explain WhyNov 19 2018, 3:34 AM

This revision was automatically updated to reflect the committed changes.

Hi John,

This crashes on AArch64 kernel build:

https://ci.linaro.org/job/tcwg_kernel-bisect-llvm-master-aarch64-stable-allnoconfig/4/artifact/artifacts/build-d9902c5262f22d97eda691f00daee1ef7f5623b8/3-count_linux_objs/console.log/*view*/

Let me know if you need help reproducing this.

Maxim Kuvyrkov
www.linaro.org

Looks like this was reverted in r347225.

The thing causing the crash was incorrect handling of instructions that need to be rehoisted. When we have hoisted a phi, then hoisted a use of that phi, then if have to rehoist that use we need to make sure that we rehoist it to _after_ the hoisted phi. We can do this by rehoisting to the immediate dominator instead of just rehoisting everything to the original preheader.

john.brawn reopened this revision.Nov 20 2018, 9:12 AM

This revision is now accepted and ready to land.Nov 20 2018, 9:12 AM

john.brawn requested review of this revision.Nov 20 2018, 9:13 AM

This rehoisting stuff has been wary for me before, and now I'm completely hesitant about what is going on there. Why do we end up hoisting to some block that does not dominate its users? Do we have a strong reason of doing so rather than finding the correct hoisting destination?

In D52827#1306034, @mkazantsev wrote:

This rehoisting stuff has been wary for me before, and now I'm completely hesitant about what is going on there. Why do we end up hoisting to some block that does not dominate its users? Do we have a strong reason of doing so rather than finding the correct hoisting destination?

Let's look at the @phi_conditional_use test in hoist-phi.ll as an example:

define i64 @phi_conditional_use(i32 %f, i32* %g) {
entry:
  %cmp1 = icmp eq i32 %f, 1
  %cmp2 = icmp eq i32 %f, 0
  br label %loop

loop:
  br i1 %cmp1, label %if.end, label %if.then

if.then:
  br label %if.end

if.end:
  %phi = phi i64 [ 0, %loop ], [ 1, %if.then ]
  br i1 %cmp2, label %loop.backedge, label %if.then2

if.then2:
  %d = getelementptr inbounds i32, i32* %g, i64 %phi
  store i32 1, i32* %d, align 4
  br label %loop.backedge

loop.backedge:
  br label %loop
}

When looking at %loop we note that the branch uses a loop invariant condition. Then when looking at %if.end we see that %phi has loop invariant operands, and is the merge point of the branch we noted, so we copy the control flow and hoist the phi:

define i64 @phi_conditional_use(i32 %f, i32* %g) {
entry:
  %cmp1 = icmp eq i32 %f, 1
  %cmp2 = icmp eq i32 %f, 0
  br i1 %cmp1, label %if.end.licm, label %if.then.licm

if.then.licm:
  br label %if.end.licm

if.end.licm:
  %phi = phi i64 [ 0, %entry ], [ 1, %if.then.licm ]
  br label %loop

loop:
  br i1 %cmp1, label %if.end, label %if.then

if.then:
  br label %if.end

if.end:
  br i1 %cmp2, label %loop.backedge, label %if.then2

if.then2:
  %d = getelementptr inbounds i32, i32* %g, i64 %phi
  store i32 1, i32* %d, align 4
  br label %loop.backedge

loop.backedge:
  br label %loop
}

We then note that the branch at the end if %if.end has a loop invariant condition. We then look at %if.then2 and see that %d has loop invariant operands, and that it's in a block that's the target of a noted branch, so we duplicate control flow and hoist:

define i64 @phi_conditional_use(i32 %f, i32* %g) {
entry:
  %cmp1 = icmp eq i32 %f, 1
  %cmp2 = icmp eq i32 %f, 0
  br i1 %cmp1, label %if.end.licm, label %if.then.licm

if.then.licm:
  br label %if.end.licm

if.end.licm:
  %phi = phi i64 [ 0, %entry ], [ 1, %if.then.licm ]
  br i1 %cmp2, label %loop.backedge.licm, label %if.then2.licm

if.then2.licm:
  %d = getelementptr inbounds i32, i32* %g, i64 %phi
  br label %loop.backedge.licm

loop.backedge.licm:
  br label %loop

loop:
  br i1 %cmp1, label %if.end, label %if.then

if.then:
  br label %if.end

if.end:
  br i1 %cmp2, label %loop.backedge, label %if.then2

if.then2:
  store i32 1, i32* %d, align 4
  br label %loop.backedge

loop.backedge:
  br label %loop
}

Now we look at the store in %if.then2, which has loop invariant operands but can't be hoisted as it's not safe to hoist the store.

We're now done hoisting things, but we have the problem that %d does not dominate the store in %if.then2. So we have to move it somewhere where it does dominate %if.then2, but we also have to make sure it's after %phi, as it uses that as an operand. So it has to go in %if.end.licm after %phi.

Some comments on all of this:

At the time that we hoist %d we don't know if its uses are also going to get hoisted. We _could_ do something like rewrite LICM to be a two-step process where it first makes note of which instructions are going to get hoisted, then hoists them using knowledge of if their uses are going to get hoisted to figure out where they get hoisted. But that would be a lot more disruption than the current patch.

We could change things to never hoist things into conditional blocks i.e. %d would be hoisted directly to %if.end.licm. There's two reasons I didn't do this:

As explained in the comment at line 715, hoisting to conditional blocks could in the future be used as away to be able to hoist instructions that are unsafe to execute unconditionally, e.g. the store could actually be hoisted to %if.then2.licm as it would only get executed in the cases where the store in the loop would have been executed.
It seems a little strange to just speculatively execute instructions without limit because you're hoisting them outside of the loop. It's what LICM currently does, but it's at odds with the current behaviour elsewhere e.g. in SimplifyCFG which does the same thing (except hoisting instructions to their immediate predecessor block instead of outside a loop) but has a lot of (inconsistent) heuristics for deciding if it's a good idea or not. Doing it like this means we can leave it up to later passes to figure out if the instructions should be unconditional.
Additionally: doing this would means that "the hoisted version of a block X for the purpose of control flow and phis" and "which block to hoist instructions from block X into" could be different blocks whereas they're currently always the same block, which would need some extra code to handle (though of course at the same time the code to do rehoisting would get removed so it could turn out to be about the same complexity-wise).

I don't really have strong objections against that. Thanks for your explanation. It could've been a Phi node insertion instead of rehoisting in this case, but I don't really see why rehoisting should be bad (other than we may end up executing code that never executes, but it's a general LICM flaw and not specific to this patch). I also see no bug in what we have now. So I'm ok with the idea.

I am not comfortable with reverting it back and forth, so I will request some re-design of your patch. Please make an option to switch off/on CFG hoisting. We will need it if this transform will expose some unexpected bugs in the future. Reverting patch that big may become hard in time. Then, I'm OK if you check in the current patch with this option switch by default (in tests, it can be set to true).

Then, you can check in a patch that turns it on by default.

This revision now requires changes to proceed.Nov 25 2018, 9:33 PM

Added an option to enable control flow hoisting, turned off by default.

john.brawn added a child revision: D54949: [LICM] Enable control flow hoisting by default.Nov 27 2018, 6:31 AM

LGTM

This revision is now accepted and ready to land.Nov 27 2018, 5:27 PM

Closed by commit rL347776: [LICM] Reapply r347190 "Make LICM able to hoist phis" with fix (authored by john.brawn). · Explain WhyNov 28 2018, 9:24 AM

This revision was automatically updated to reflect the committed changes.

This caused a significant regression in compile time for some sources, see https://bugs.llvm.org/show_bug.cgi?id=39836 for details. Building one source file went from 45 seconds to 140 seconds.

Reverted in SVN r347867.

Revision Contents

Path

Size

lib/

Transforms/

Scalar/

LICM.cpp

322 lines

test/

Transforms/

LICM/

hoist-phi.ll

1274 lines

LoopVectorize/

invariant-store-vectorization.ll

20 lines

Diff 174796

lib/Transforms/Scalar/LICM.cpp

Show All 25 Lines
// pointer. There are no calls in the loop which mod/ref the pointer.		// pointer. There are no calls in the loop which mod/ref the pointer.
// If these conditions are true, we can promote the loads and stores in the		// If these conditions are true, we can promote the loads and stores in the
// loop of the pointer to use a temporary alloca'd variable. We then use		// loop of the pointer to use a temporary alloca'd variable. We then use
// the SSAUpdater to construct the appropriate SSA form for the value.		// the SSAUpdater to construct the appropriate SSA form for the value.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/Transforms/Scalar/LICM.h"		#include "llvm/Transforms/Scalar/LICM.h"
		#include "llvm/ADT/SetOperations.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/Analysis/AliasSetTracker.h"		#include "llvm/Analysis/AliasSetTracker.h"
#include "llvm/Analysis/BasicAliasAnalysis.h"		#include "llvm/Analysis/BasicAliasAnalysis.h"
#include "llvm/Analysis/CaptureTracking.h"		#include "llvm/Analysis/CaptureTracking.h"
#include "llvm/Analysis/ConstantFolding.h"		#include "llvm/Analysis/ConstantFolding.h"
#include "llvm/Analysis/GlobalsModRef.h"		#include "llvm/Analysis/GlobalsModRef.h"
#include "llvm/Analysis/GuardUtils.h"		#include "llvm/Analysis/GuardUtils.h"
#include "llvm/Analysis/Loads.h"		#include "llvm/Analysis/Loads.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
		#include "llvm/Analysis/LoopIterator.h"
#include "llvm/Analysis/LoopPass.h"		#include "llvm/Analysis/LoopPass.h"
#include "llvm/Analysis/MemoryBuiltins.h"		#include "llvm/Analysis/MemoryBuiltins.h"
#include "llvm/Analysis/MemorySSA.h"		#include "llvm/Analysis/MemorySSA.h"
#include "llvm/Analysis/OptimizationRemarkEmitter.h"		#include "llvm/Analysis/OptimizationRemarkEmitter.h"
#include "llvm/Analysis/ScalarEvolution.h"		#include "llvm/Analysis/ScalarEvolution.h"
#include "llvm/Analysis/ScalarEvolutionAliasAnalysis.h"		#include "llvm/Analysis/ScalarEvolutionAliasAnalysis.h"
#include "llvm/Analysis/TargetLibraryInfo.h"		#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/Transforms/Utils/Local.h"		#include "llvm/Transforms/Utils/Local.h"
Show All 18 Lines
#include "llvm/Transforms/Utils/LoopUtils.h"		#include "llvm/Transforms/Utils/LoopUtils.h"
#include "llvm/Transforms/Utils/SSAUpdater.h"		#include "llvm/Transforms/Utils/SSAUpdater.h"
#include <algorithm>		#include <algorithm>
#include <utility>		#include <utility>
using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "licm"		#define DEBUG_TYPE "licm"

		STATISTIC(NumCreatedBlocks, "Number of blocks created");
		STATISTIC(NumClonedBranches, "Number of branches cloned");
STATISTIC(NumSunk, "Number of instructions sunk out of loop");		STATISTIC(NumSunk, "Number of instructions sunk out of loop");
STATISTIC(NumHoisted, "Number of instructions hoisted out of loop");		STATISTIC(NumHoisted, "Number of instructions hoisted out of loop");
STATISTIC(NumMovedLoads, "Number of load insts hoisted or sunk");		STATISTIC(NumMovedLoads, "Number of load insts hoisted or sunk");
STATISTIC(NumMovedCalls, "Number of call insts hoisted or sunk");		STATISTIC(NumMovedCalls, "Number of call insts hoisted or sunk");
STATISTIC(NumPromoted, "Number of memory locations promoted to registers");		STATISTIC(NumPromoted, "Number of memory locations promoted to registers");

/// Memory promotion is enabled by default.		/// Memory promotion is enabled by default.
static cl::opt<bool>		static cl::opt<bool>
Show All 12 Lines
LICMN2Theshold("licm-n2-threshold", cl::Hidden, cl::init(0),		LICMN2Theshold("licm-n2-threshold", cl::Hidden, cl::init(0),
cl::desc("How many instruction to cross product using AA"));		cl::desc("How many instruction to cross product using AA"));

static bool inSubLoop(BasicBlock BB, Loop CurLoop, LoopInfo *LI);		static bool inSubLoop(BasicBlock BB, Loop CurLoop, LoopInfo *LI);
static bool isNotUsedOrFreeInLoop(const Instruction &I, const Loop *CurLoop,		static bool isNotUsedOrFreeInLoop(const Instruction &I, const Loop *CurLoop,
const LoopSafetyInfo *SafetyInfo,		const LoopSafetyInfo *SafetyInfo,
TargetTransformInfo *TTI, bool &FreeInLoop);		TargetTransformInfo *TTI, bool &FreeInLoop);
static void hoist(Instruction &I, const DominatorTree DT, const Loop CurLoop,		static void hoist(Instruction &I, const DominatorTree DT, const Loop CurLoop,
ICFLoopSafetyInfo *SafetyInfo,		BasicBlock Dest, ICFLoopSafetyInfo SafetyInfo,
OptimizationRemarkEmitter *ORE);		OptimizationRemarkEmitter *ORE);
static bool sink(Instruction &I, LoopInfo LI, DominatorTree DT,		static bool sink(Instruction &I, LoopInfo LI, DominatorTree DT,
const Loop CurLoop, ICFLoopSafetyInfo SafetyInfo,		const Loop CurLoop, ICFLoopSafetyInfo SafetyInfo,
OptimizationRemarkEmitter *ORE, bool FreeInLoop);		OptimizationRemarkEmitter *ORE, bool FreeInLoop);
static bool isSafeToExecuteUnconditionally(Instruction &Inst,		static bool isSafeToExecuteUnconditionally(Instruction &Inst,
const DominatorTree *DT,		const DominatorTree *DT,
const Loop *CurLoop,		const Loop *CurLoop,
const LoopSafetyInfo *SafetyInfo,		const LoopSafetyInfo *SafetyInfo,
▲ Show 20 Lines • Show All 317 Lines • ▼ Show 20 Lines	for (BasicBlock::iterator II = BB->end(); II != BB->begin();) {
Changed = true;		Changed = true;
}		}
}		}
}		}
}		}
return Changed;		return Changed;
}		}

		// This is a helper class for hoistRegion to make it able to hoist control flow
		// in order to be able to hoist phis. The way this works is that we initially
		// start hoisting to the loop preheader, and when we see a loop invariant branch
		// we make note of this. When we then come to hoist an instruction that's
		// conditional on such a branch we duplicate the branch and the relevant control
		reamesUnsubmitted Not Done Reply Inline Actions Basic question on design here. Once we've hoisted the branch on the loop invariant condition, shouldn't we be able to remove the condition within the loop? Or is this more analogous to loop peeling where we leave a copy of the control flow within the loop? reames: Basic question on design here. Once we've hoisted the branch on the loop invariant condition…
		john.brawnAuthorUnsubmitted Not Done Reply Inline Actions Only if we also hoist everything from the if and else blocks in the loop. Currently I'm leaving it up to simplifycfg to figur out if that's the case and clean it up if so. john.brawn: Only if we also hoist everything from the if and else blocks in the loop. Currently I'm leaving…
		// flow, then hoist the instruction into the block corresponding to its original
		// block in the duplicated control flow.
		class ControlFlowHoister {
		private:
		// Information about the loop we are hoisting from
		LoopInfo *LI;
		DominatorTree *DT;
		Loop *CurLoop;

		// A map of blocks in the loop to the block their instructions will be hoisted
		// to.
		DenseMap<BasicBlock , BasicBlock > HoistDestinationMap;

		// The branches that we can hoist, mapped to the block that marks a
		// convergence point of their control flow.
		DenseMap<BranchInst , BasicBlock > HoistableBranches;

		public:
		ControlFlowHoister(LoopInfo LI, DominatorTree DT, Loop *CurLoop)
		: LI(LI), DT(DT), CurLoop(CurLoop) {}

		void registerPossiblyHoistableBranch(BranchInst *BI) {
		// We can only hoist conditional branches with loop invariant operands.
		if (!BI->isConditional() \|\| !CurLoop->hasLoopInvariantOperands(BI))
		return;

		mkazantsevUnsubmitted Not Done Reply Inline Actions Preheader by definition has only one successor which is the loop header. Therefore, `OriginalPreheaderChildren` always has one element which is header's DTNode. So it doesn't look like you even need this field, just take loop header (it doesn't change, right?) Or am I missing something here? And btw, if it needs to be a field and a vector, use `SmallVector` instead of `std::vector`. mkazantsev: Preheader by definition has only one successor which is the loop header. Therefore…
		john.brawnAuthorUnsubmitted Not Done Reply Inline Actions Yes, I think you're right. I'll do some testing to double-check then change this. john.brawn: Yes, I think you're right. I'll do some testing to double-check then change this.
		// The branch destinations need to be in the loop, and we don't gain
		// anything by duplicating conditional branches with duplicate successors,
		// as it's essentially the same as an unconditional branch.
		BasicBlock *TrueDest = BI->getSuccessor(0);
		reamesUnsubmitted Done Reply Inline Actions loop invariant operands reames: loop invariant operands
		BasicBlock *FalseDest = BI->getSuccessor(1);
		if (!CurLoop->contains(TrueDest) \|\| !CurLoop->contains(FalseDest) \|\|
		TrueDest == FalseDest)
		return;

		// We can hoist BI if one branch destination is the successor of the other,
		// or both have common successor which we check by seeing if the
		// intersection of their successors is non-empty.
		// TODO: This could be expanded to allowing branches where both ends
		// eventually converge to a single block.
		SmallPtrSet<BasicBlock *, 4> TrueDestSucc, FalseDestSucc;
		TrueDestSucc.insert(succ_begin(TrueDest), succ_end(TrueDest));
		FalseDestSucc.insert(succ_begin(FalseDest), succ_end(FalseDest));
		reamesUnsubmitted Not Done Reply Inline Actions I don't think the approach you're using here works. Consider the following: A -> B, C B-> D, E C-> D, F B & C share a common successor, but it's not a unique successor. As such, there are paths that don't converge in D. reames: I don't think the approach you're using here works. Consider the following: A -> B, C B-> D, E…
		john.brawnAuthorUnsubmitted Not Done Reply Inline Actions I'm pretty sure that only matters if E uses values defined in B (or F uses values defined in C) which are then hoisted to conditional blocks in the preheader. But in that case we'd see that the uses aren't dominated and rehoist. But this does cause problems if we insert phis instead of rehoisting, so I do need to look at what I'm doing here some more. john.brawn: I'm pretty sure that only matters if E uses values defined in B (or F uses values defined in C)…
		john.brawnAuthorUnsubmitted Not Done Reply Inline Actions It turns out that this is taken care of by the check that BT dominates CommonSucc. I've restructured the code here a little (as in some cases we weren't picking TrueDest/FalseDest when it was a triangle destination) and adjusted the comments. john.brawn: It turns out that this is taken care of by the check that BT dominates CommonSucc. I've…
		mkazantsevUnsubmitted Done Reply Inline Actions Bail if `TrueDest == FalseDest`. mkazantsev: Bail if `TrueDest == FalseDest`.
		BasicBlock *CommonSucc = nullptr;
		if (TrueDestSucc.count(FalseDest)) {
		CommonSucc = FalseDest;
		} else if (FalseDestSucc.count(TrueDest)) {
		CommonSucc = TrueDest;
		} else {
		set_intersect(TrueDestSucc, FalseDestSucc);
		// If there's one common successor use that.
		if (TrueDestSucc.size() == 1)
		CommonSucc = *TrueDestSucc.begin();
		// If there's more than one pick whichever appears first in the block list
		// (we can't use the value returned by TrueDestSucc.begin() as it's
		mkazantsevUnsubmitted Done Reply Inline Actions If the size of set is not 1, you are introducing non-deterministic optimizer behavior here. Please assert that the size is 1. mkazantsev: If the size of set is not 1, you are introducing non-deterministic optimizer behavior here.
		john.brawnAuthorUnsubmitted Not Done Reply Inline Actions It is actually possible to have more than 1 here. I'll adjust it to have deterministic behaviour for that case (by picking whichever happens to be first in the function's block list). john.brawn: It is actually possible to have more than 1 here. I'll adjust it to have deterministic…
		// unpredicatable which element gets returned).
		else if (!TrueDestSucc.empty()) {
		Function *F = TrueDest->getParent();
		auto IsSucc = [&](BasicBlock &BB) { return TrueDestSucc.count(&BB); };
		mkazantsevUnsubmitted Not Done Reply Inline Actions The general algorithm for non-empty `TrueDestSucc` deals well with case of `size == 1`; do we really need separate processing for this case? mkazantsev: The general algorithm for non-empty `TrueDestSucc` deals well with case of `size == 1`; do we…
		john.brawnAuthorUnsubmitted Not Done Reply Inline Actions We could have one case for size >= 1 but that would mean iterating through potentially all of the blocks in the function to find the one element of the set, which seems rather wasteful when we can just take that one element directly. john.brawn: We could have one case for size >= 1 but that would mean iterating through potentially all of…
		auto It = std::find_if(F->begin(), F->end(), IsSucc);
		assert(It != F->end() && "Could not find successor in function");
		mkazantsevUnsubmitted Done Reply Inline Actions Minor: please add explanatory messages to the assertions you make. mkazantsev: Minor: please add explanatory messages to the assertions you make.
		CommonSucc = &*It;
		}
		}
		// The common successor has to be dominated by the branch, as otherwise
		// there will be some other path to the successor that will not be
		// controlled by this branch so any phi we hoist would be controlled by the
		// wrong condition. This also takes care of avoiding hoisting of loop back
		// edges.
		// TODO: In some cases this could be relaxed if the successor is dominated
		// by another block that's been hoisted and we can guarantee that the
		// control flow has been replicated exactly.
		if (CommonSucc && DT->dominates(BI, CommonSucc))
		HoistableBranches[BI] = CommonSucc;
		}

		bool canHoistPHI(PHINode *PN) {
		// The phi must have loop invariant operands.
		if (!CurLoop->hasLoopInvariantOperands(PN))
		return false;
		// We can hoist phis if the block they are in is the target of hoistable
		// branches which cover all of the predecessors of the block.
		SmallPtrSet<BasicBlock *, 8> PredecessorBlocks;
		BasicBlock *BB = PN->getParent();
		for (BasicBlock *PredBB : predecessors(BB))
		PredecessorBlocks.insert(PredBB);
		// If we have less predecessor blocks than predecessors then the phi will
		// have more than one incoming value for the same block which we can't
		// handle.
		// TODO: This could be handled be erasing some of the duplicate incoming
		// values.
		if (PredecessorBlocks.size() != pred_size(BB))
		return false;
		for (auto &Pair : HoistableBranches) {
		if (Pair.second == BB) {
		// Which blocks are predecessors via this branch depends on if the
		// branch is triangle-like or diamond-like.
		if (Pair.first->getSuccessor(0) == BB) {
		PredecessorBlocks.erase(Pair.first->getParent());
		PredecessorBlocks.erase(Pair.first->getSuccessor(1));
		} else if (Pair.first->getSuccessor(1) == BB) {
		PredecessorBlocks.erase(Pair.first->getParent());
		PredecessorBlocks.erase(Pair.first->getSuccessor(0));
		} else {
		reamesUnsubmitted Not Done Reply Inline Actions What about indirectly dependent blocks? (e.g. D in A->B, C, B->D) reames: What about indirectly dependent blocks? (e.g. D in A->B, C, B->D)
		john.brawnAuthorUnsubmitted Not Done Reply Inline Actions Then we don't do anything. In the case that we're hoisting a phi in D we make sure to call getHoistedBlock on B and C beforehand which takes care of duplicating the control flow. john.brawn: Then we don't do anything. In the case that we're hoisting a phi in D we make sure to call…
		PredecessorBlocks.erase(Pair.first->getSuccessor(0));
		PredecessorBlocks.erase(Pair.first->getSuccessor(1));
		}
		}
		mkazantsevUnsubmitted Done Reply Inline Actions `BB != Pair.second` is checked 2 times, you could instead write `if (BB != Pair.second && (succ1 == BB \|\| succ2 == BB))`. mkazantsev: `BB != Pair.second` is checked 2 times, you could instead write `if (BB != Pair.second &&…
		}
		// PredecessorBlocks will now be empty if for every predecessor of BB we
		// found a hoistable branch source.
		return PredecessorBlocks.empty();
		}

		BasicBlock getOrCreateHoistedBlock(BasicBlock BB) {
		mkazantsevUnsubmitted Done Reply Inline Actions This name is misleading, it doesn't look like something that can change the IR. How about `getOrCreateHoistedBlock`? mkazantsev: This name is misleading, it doesn't look like something that can change the IR. How about…
		// If BB has already been hoisted, return that
		if (HoistDestinationMap.count(BB))
		return HoistDestinationMap[BB];

		mkazantsevUnsubmitted Done Reply Inline Actions If you only expect BI to be the only element with the property you need, it would make sense to save some compile time by inserting `break` here. But on the other hand, the assertion you make also makes sense. How about this: `BI = find_if(<cond>)`, and then assert that `find_if(<cond> && el != BI) == HoistableBranches.end()`? Just a suggestion. mkazantsev: If you only expect BI to be the only element with the property you need, it would make sense to…
		// Check if this block is conditional based on a pending branch
		auto HasBBAsSuccessor =
		[&](DenseMap<BranchInst , BasicBlock >::value_type &Pair) {
		return BB != Pair.second && (Pair.first->getSuccessor(0) == BB \|\|
		Pair.first->getSuccessor(1) == BB);
		};
		auto It = std::find_if(HoistableBranches.begin(), HoistableBranches.end(),
		HasBBAsSuccessor);

		// If not involved in a pending branch, hoist to preheader
		BasicBlock *InitialPreheader = CurLoop->getLoopPreheader();
		if (It == HoistableBranches.end()) {
		LLVM_DEBUG(dbgs() << "LICM using " << InitialPreheader->getName()
		<< " as hoist destination for " << BB->getName()
		mkazantsevUnsubmitted Done Reply Inline Actions Do you ever add nullptr to this map? I guess it should be `.count(Orig)`. mkazantsev: Do you ever add nullptr to this map? I guess it should be `.count(Orig)`.
		<< "\n");
		HoistDestinationMap[BB] = InitialPreheader;
		return InitialPreheader;
		}
		BranchInst *BI = It->first;
		assert(std::find_if(++It, HoistableBranches.end(), HasBBAsSuccessor) ==
		HoistableBranches.end() &&
		"BB is expected to be the target of at most one branch");

		LLVMContext &C = BB->getContext();
		BasicBlock *TrueDest = BI->getSuccessor(0);
		BasicBlock *FalseDest = BI->getSuccessor(1);
		BasicBlock *CommonSucc = HoistableBranches[BI];
		BasicBlock *HoistTarget = getOrCreateHoistedBlock(BI->getParent());

		// Create hoisted versions of blocks that currently don't have them
		auto CreateHoistedBlock = [&](BasicBlock *Orig) {
		mkazantsevUnsubmitted Done Reply Inline Actions Dot missing. mkazantsev: Dot missing.
		if (HoistDestinationMap.count(Orig))
		return HoistDestinationMap[Orig];
		BasicBlock *New =
		BasicBlock::Create(C, Orig->getName() + ".licm", Orig->getParent());
		HoistDestinationMap[Orig] = New;
		DT->addNewBlock(New, HoistTarget);
		if (CurLoop->getParentLoop())
		CurLoop->getParentLoop()->addBasicBlockToLoop(New, *LI);
		++NumCreatedBlocks;
		LLVM_DEBUG(dbgs() << "LICM created " << New->getName()
		<< " as hoist destination for " << Orig->getName()
		<< "\n");
		return New;
		};
		BasicBlock *HoistTrueDest = CreateHoistedBlock(TrueDest);
		BasicBlock *HoistFalseDest = CreateHoistedBlock(FalseDest);
		BasicBlock *HoistCommonSucc = CreateHoistedBlock(CommonSucc);

		// Link up these blocks with branches.
		if (!HoistCommonSucc->getTerminator()) {
		// The new common successor we've generated will branch to whatever that
		// hoist target branched to.
		BasicBlock *TargetSucc = HoistTarget->getSingleSuccessor();
		assert(TargetSucc && "Expected hoist target to have a single successor");
		HoistCommonSucc->moveBefore(TargetSucc);
		BranchInst::Create(TargetSucc, HoistCommonSucc);
		}
		if (!HoistTrueDest->getTerminator()) {
		HoistTrueDest->moveBefore(HoistCommonSucc);
		BranchInst::Create(HoistCommonSucc, HoistTrueDest);
		mkazantsevUnsubmitted Not Done Reply Inline Actions Maybe I am missing something, but where do we notify DomTree of existence of these new edges? mkazantsev: Maybe I am missing something, but where do we notify DomTree of existence of these new edges?
		john.brawnAuthorUnsubmitted Not Done Reply Inline Actions Inserting the edge is never needed. insertEdge only has an effect when the source and destination blocks have different immediate dominators (see InsertReachable in GenericDomTreeConstruction.h) and that never happens - HoistTrueDest and HoistFalseDest always have the same immediate dominator as HoistCommonSucc, and for the edge from HoistCommonSucc to TargetSucc it may be that HoistCommonSucc is now the immediate dominator of TargetSucc but that's handled in the loop at line 632. john.brawn: Inserting the edge is never needed. insertEdge only has an effect when the source and…
		}
		if (!HoistFalseDest->getTerminator()) {
		HoistFalseDest->moveBefore(HoistCommonSucc);
		BranchInst::Create(HoistCommonSucc, HoistFalseDest);
		mkazantsevUnsubmitted Done Reply Inline Actions Dot missing. mkazantsev: Dot missing.
		}

		// If BI is being cloned to what was originally the preheader then
		// HoistCommonSucc will now be the new preheader.
		if (HoistTarget == InitialPreheader) {
		// Phis in the loop header now need to use the new preheader.
		InitialPreheader->replaceSuccessorsPhiUsesWith(HoistCommonSucc);
		// The new preheader dominates the loop header.
		DomTreeNode *PreheaderNode = DT->getNode(HoistCommonSucc);
		mkazantsevUnsubmitted Done Reply Inline Actions It can be useful to have staitstics on hoisted branches. I'm OK if it goes as a follow-up patch. mkazantsev: It can be useful to have staitstics on hoisted branches. I'm OK if it goes as a follow-up patch.
		john.brawnAuthorUnsubmitted Not Done Reply Inline Actions I'll add this and also one for the number of created blocks, which seems like it may also be useful. john.brawn: I'll add this and also one for the number of created blocks, which seems like it may also be…
		DomTreeNode *HeaderNode = DT->getNode(CurLoop->getHeader());
		DT->changeImmediateDominator(HeaderNode, PreheaderNode);
		// The preheader hoist destination is now the new preheader, with the
		// exception of the hoist destination of this branch.
		for (auto &Pair : HoistDestinationMap)
		if (Pair.second == InitialPreheader && Pair.first != BI->getParent())
		Pair.second = HoistCommonSucc;
		}

		// Now finally clone BI.
		ReplaceInstWithInst(
		HoistTarget->getTerminator(),
		BranchInst::Create(HoistTrueDest, HoistFalseDest, BI->getCondition()));
		mkazantsevUnsubmitted Done Reply Inline Actions Please add verification of `LI` in debug mode just to make sure that we didn't mess up anything (maybe not here, but in some reasonable place). mkazantsev: Please add verification of `LI` in debug mode just to make sure that we didn't mess up anything…
		++NumClonedBranches;

		assert(CurLoop->getLoopPreheader() &&
		"Hoisting blocks should not have destroyed preheader");
		return HoistDestinationMap[BB];
		}
		};

/// Walk the specified region of the CFG (defined by all blocks dominated by		/// Walk the specified region of the CFG (defined by all blocks dominated by
/// the specified block, and that are in the current loop) in depth first		/// the specified block, and that are in the current loop) in depth first
/// order w.r.t the DominatorTree. This allows us to visit definitions before		/// order w.r.t the DominatorTree. This allows us to visit definitions before
/// uses, allowing us to hoist a loop body in one pass without iteration.		/// uses, allowing us to hoist a loop body in one pass without iteration.
///		///
bool llvm::hoistRegion(DomTreeNode N, AliasAnalysis AA, LoopInfo *LI,		bool llvm::hoistRegion(DomTreeNode N, AliasAnalysis AA, LoopInfo *LI,
DominatorTree DT, TargetLibraryInfo TLI, Loop *CurLoop,		DominatorTree DT, TargetLibraryInfo TLI, Loop *CurLoop,
AliasSetTracker CurAST, ICFLoopSafetyInfo SafetyInfo,		AliasSetTracker CurAST, ICFLoopSafetyInfo SafetyInfo,
OptimizationRemarkEmitter *ORE) {		OptimizationRemarkEmitter *ORE) {
// Verify inputs.		// Verify inputs.
assert(N != nullptr && AA != nullptr && LI != nullptr && DT != nullptr &&		assert(N != nullptr && AA != nullptr && LI != nullptr && DT != nullptr &&
CurLoop != nullptr && CurAST != nullptr && SafetyInfo != nullptr &&		CurLoop != nullptr && CurAST != nullptr && SafetyInfo != nullptr &&
"Unexpected input to hoistRegion");		"Unexpected input to hoistRegion");

// We want to visit parents before children. We will enque all the parents		ControlFlowHoister CFH(LI, DT, CurLoop);
// before their children in the worklist and process the worklist in order.
SmallVector<DomTreeNode *, 16> Worklist = collectChildrenInLoop(N, CurLoop);

		// Keep track of instructions that have been hoisted, as they may need to be
		// re-hoisted if they end up not dominating all of their uses.
		SmallVector<Instruction *, 16> HoistedInstructions;

		// For PHI hoisting to work we need to hoist blocks before their successors.
		// We can do this by iterating through the blocks in the loop in reverse
		// post-order.
		mkazantsevUnsubmitted Done Reply Inline Actions Why not `SmallVector` and `push_back`? That wouldn't make any difference in terms of effect but less memory-consuming. mkazantsev: Why not `SmallVector` and `push_back`? That wouldn't make any difference in terms of effect but…
		john.brawnAuthorUnsubmitted Not Done Reply Inline Actions The iteration order when rehoisting will have to be changed (because we have to visit instructions before the instructions they use, i.e. later instructions before earlier instructions), but yes that will work. john.brawn: The iteration order when rehoisting will have to be changed (because we have to visit…
		mkazantsevUnsubmitted Not Done Reply Inline Actions You can preserve the order of processing if you just iterate in reverse order. mkazantsev: You can preserve the order of processing if you just iterate in reverse order.
		LoopBlocksRPO Worklist(CurLoop);
		mkazantsevUnsubmitted Not Done Reply Inline Actions I guess you could use `insert(Worklist.begin(), Worklist.end())`, it's tad faster. mkazantsev: I guess you could use `insert(Worklist.begin(), Worklist.end())`, it's tad faster.
		Worklist.perform(LI);
bool Changed = false;		bool Changed = false;
for (DomTreeNode *DTN : Worklist) {		for (BasicBlock *BB : Worklist) {
BasicBlock *BB = DTN->getBlock();
// Only need to process the contents of this block if it is not part of a		// Only need to process the contents of this block if it is not part of a
// subloop (which would already have been processed).		// subloop (which would already have been processed).
if (inSubLoop(BB, CurLoop, LI))		if (inSubLoop(BB, CurLoop, LI))
continue;		continue;
		mkazantsevUnsubmitted Done Reply Inline Actions It's `O(N^2)` in the worst case, which is not super-cool. If I understand this part correctly, you need to make sure that there is an element of `WorkList` that is a predecessor of another element of this WorkList. It can be done faster if you accumulate all elements of the WorkList into a set and do like for (el : Worklist) for (p : predecessors(el)) if (set.contains(p)) we are done. It will be `O(N)`. mkazantsev: It's `O(N^2)` in the worst case, which is not super-cool. If I understand this part correctly…

for (BasicBlock::iterator II = BB->begin(), E = BB->end(); II != E;) {		for (BasicBlock::iterator II = BB->begin(), E = BB->end(); II != E;) {
Instruction &I = *II++;		Instruction &I = *II++;
// Try constant folding this instruction. If all the operands are		// Try constant folding this instruction. If all the operands are
// constants, it is technically hoistable, but it would be better to		// constants, it is technically hoistable, but it would be better to
// just fold it.		// just fold it.
if (Constant *C = ConstantFoldInstruction(		if (Constant *C = ConstantFoldInstruction(
&I, I.getModule()->getDataLayout(), TLI)) {		&I, I.getModule()->getDataLayout(), TLI)) {
LLVM_DEBUG(dbgs() << "LICM folding inst: " << I << " --> " << *C		LLVM_DEBUG(dbgs() << "LICM folding inst: " << I << " --> " << *C
<< '\n');		<< '\n');
CurAST->copyValue(&I, C);		CurAST->copyValue(&I, C);
		mkazantsevUnsubmitted Done Reply Inline Actions How do you ensure that you have found something? Is it correct to do it if nothing was found and `idx` has a default value? mkazantsev: How do you ensure that you have found something? Is it correct to do it if nothing was found…
		john.brawnAuthorUnsubmitted Not Done Reply Inline Actions It's actually possible to not find anything, in which case we fall back to picking the first element of the worklist. I've updated the comment to make this clear. john.brawn: It's actually possible to not find anything, in which case we fall back to picking the first…
		mkazantsevUnsubmitted Not Done Reply Inline Actions I think I now understand the algorithm you're trying to apply. Imagine the graph `A -> B -> C -> D -> E`, and the blocks happened to be arranged as `E, D, C, B, A` in the Worklist. Maybe I am missing something, but it seems that the algorithm inside `collectChildrenInLoop` is BFS-based and cannot protect us from this situation. It could, for example, happen if all these blocks have a common parent and arranged in the order `E, D, C, B, A` in this parent. The downside if your algorithm is that it will traverse over all worklist to find `A`, then over all remaining worklist to find `B` and so on. Each `any_of` query will work `O(N)`, and you do N such queries, so overall algorithmic complexity is `O(N^2)`. The correct way to efficiently process all blocks before their successors is processing them in Reverse Post-Order. This arrangement by definition gives you what you want: it sorts the blocks in order that all successors go after their predecessors. If the elements of Worklist are arranged in RPO, you can just process them in order without extra checks, and it will be `O(N)`. You can see example how to build RPO in class `LoopBlocksDFS`, the methods you need are `beginRPO(), endRPO()`. And aside from all that above, erasing the last element from the Worklist is `O(1)` and of the first one is `O(N)`. When you don't care what element to erase, you should do so to the last one. Erasing the first element is expensive. However once you process them in RPO order, you don't need to erase anything from the Worklist at all because you know that you've processed all predecessors before all successors. mkazantsev: I think I now understand the algorithm you're trying to apply. Imagine the graph `A -> B -> C…
		john.brawnAuthorUnsubmitted Not Done Reply Inline Actions Yes it does look like reverse post-order gives me what I want (though I have to adjust some of the tests as it picks a different but equally good order). I'll modify this to do that. john.brawn: Yes it does look like reverse post-order gives me what I want (though I have to adjust some of…
I.replaceAllUsesWith(C);		I.replaceAllUsesWith(C);
if (isInstructionTriviallyDead(&I, TLI))		if (isInstructionTriviallyDead(&I, TLI))
eraseInstruction(I, *SafetyInfo, CurAST);		eraseInstruction(I, *SafetyInfo, CurAST);
Changed = true;		Changed = true;
continue;		continue;
}		}

// Try hoisting the instruction out to the preheader. We can only do		// Try hoisting the instruction out to the preheader. We can only do
// this if all of the operands of the instruction are loop invariant and		// this if all of the operands of the instruction are loop invariant and
// if it is safe to hoist the instruction.		// if it is safe to hoist the instruction.
//		// TODO: It may be safe to hoist if we are hoisting to a conditional block
		// and we have accurately duplicated the control flow from the loop header
		// to that block.
if (CurLoop->hasLoopInvariantOperands(&I) &&		if (CurLoop->hasLoopInvariantOperands(&I) &&
canSinkOrHoistInst(I, AA, DT, CurLoop, CurAST, true, ORE) &&		canSinkOrHoistInst(I, AA, DT, CurLoop, CurAST, true, ORE) &&
isSafeToExecuteUnconditionally(		isSafeToExecuteUnconditionally(
I, DT, CurLoop, SafetyInfo, ORE,		I, DT, CurLoop, SafetyInfo, ORE,
CurLoop->getLoopPreheader()->getTerminator())) {		CurLoop->getLoopPreheader()->getTerminator())) {
hoist(I, DT, CurLoop, SafetyInfo, ORE);		hoist(I, DT, CurLoop, CFH.getOrCreateHoistedBlock(BB), SafetyInfo, ORE);
		HoistedInstructions.push_back(&I);
Changed = true;		Changed = true;
continue;		continue;
}		}

// Attempt to remove floating point division out of the loop by		// Attempt to remove floating point division out of the loop by
// converting it to a reciprocal multiplication.		// converting it to a reciprocal multiplication.
if (I.getOpcode() == Instruction::FDiv &&		if (I.getOpcode() == Instruction::FDiv &&
CurLoop->isLoopInvariant(I.getOperand(1)) &&		CurLoop->isLoopInvariant(I.getOperand(1)) &&
I.hasAllowReciprocal()) {		I.hasAllowReciprocal()) {
auto Divisor = I.getOperand(1);		auto Divisor = I.getOperand(1);
auto One = llvm::ConstantFP::get(Divisor->getType(), 1.0);		auto One = llvm::ConstantFP::get(Divisor->getType(), 1.0);
auto ReciprocalDivisor = BinaryOperator::CreateFDiv(One, Divisor);		auto ReciprocalDivisor = BinaryOperator::CreateFDiv(One, Divisor);
ReciprocalDivisor->setFastMathFlags(I.getFastMathFlags());		ReciprocalDivisor->setFastMathFlags(I.getFastMathFlags());
SafetyInfo->insertInstructionTo(I.getParent());		SafetyInfo->insertInstructionTo(I.getParent());
ReciprocalDivisor->insertBefore(&I);		ReciprocalDivisor->insertBefore(&I);

auto Product =		auto Product =
BinaryOperator::CreateFMul(I.getOperand(0), ReciprocalDivisor);		BinaryOperator::CreateFMul(I.getOperand(0), ReciprocalDivisor);
Product->setFastMathFlags(I.getFastMathFlags());		Product->setFastMathFlags(I.getFastMathFlags());
SafetyInfo->insertInstructionTo(I.getParent());		SafetyInfo->insertInstructionTo(I.getParent());
Product->insertAfter(&I);		Product->insertAfter(&I);
I.replaceAllUsesWith(Product);		I.replaceAllUsesWith(Product);
eraseInstruction(I, *SafetyInfo, CurAST);		eraseInstruction(I, *SafetyInfo, CurAST);

hoist(*ReciprocalDivisor, DT, CurLoop, SafetyInfo, ORE);		hoist(*ReciprocalDivisor, DT, CurLoop, CFH.getOrCreateHoistedBlock(BB),
		SafetyInfo, ORE);
		HoistedInstructions.push_back(ReciprocalDivisor);
Changed = true;		Changed = true;
continue;		continue;
}		}

using namespace PatternMatch;		using namespace PatternMatch;
if (((I.use_empty() &&		if (((I.use_empty() &&
match(&I, m_Intrinsic<Intrinsic::invariant_start>())) \|\|		match(&I, m_Intrinsic<Intrinsic::invariant_start>())) \|\|
isGuard(&I)) &&		isGuard(&I)) &&
CurLoop->hasLoopInvariantOperands(&I) &&		CurLoop->hasLoopInvariantOperands(&I) &&
SafetyInfo->isGuaranteedToExecute(I, DT, CurLoop) &&		SafetyInfo->isGuaranteedToExecute(I, DT, CurLoop) &&
SafetyInfo->doesNotWriteMemoryBefore(I, CurLoop)) {		SafetyInfo->doesNotWriteMemoryBefore(I, CurLoop)) {
hoist(I, DT, CurLoop, SafetyInfo, ORE);		hoist(I, DT, CurLoop, CFH.getOrCreateHoistedBlock(BB), SafetyInfo, ORE);
		HoistedInstructions.push_back(&I);
Changed = true;		Changed = true;
continue;		continue;
}		}

		if (PHINode *PN = dyn_cast<PHINode>(&I)) {
		reamesUnsubmitted Done Reply Inline Actions Minor: I'd suggest sinking the invariant operand check into the helper. Reading the helper in isolation is currently confusing. reames: Minor: I'd suggest sinking the invariant operand check into the helper. Reading the helper in…
		if (CFH.canHoistPHI(PN)) {
		// Redirect incoming blocks first to ensure that we create hoisted
		// versions of those blocks before we hoist the phi.
		for (unsigned int i = 0; i < PN->getNumIncomingValues(); ++i)
		PN->setIncomingBlock(
		i, CFH.getOrCreateHoistedBlock(PN->getIncomingBlock(i)));
		hoist(*PN, DT, CurLoop, CFH.getOrCreateHoistedBlock(BB), SafetyInfo,
		ORE);
		assert(DT->dominates(PN, BB) && "Conditional PHIs not expected");
		Changed = true;
		continue;
}		}
}		}

		// Remember possibly hoistable branches so we can actually hoist them
		// later if needed.
		if (BranchInst *BI = dyn_cast<BranchInst>(&I))
		CFH.registerPossiblyHoistableBranch(BI);
		}
		}

		// If we hoisted instructions to a conditional block they may not dominate
		mkazantsevUnsubmitted Not Done Reply Inline Actions This is the part I don't understand. If the hoist destination doesn't dominate instruction's users then it also doesn't dominate the original instruction. Is LICM able to hoist to such locations? Is there an example of that? mkazantsev: This is the part I don't understand. If the hoist destination doesn't dominate instruction's…
		john.brawnAuthorUnsubmitted Not Done Reply Inline Actions We get this when we hoist an instruction but don't hoist all of its uses. An easy example of this is a phi where one operand is loop invariant and the other is not. The hoisted loop invariant operand will not dominate its use. In the tests I've added @conditional_use, @rehoist, and @diamond_with_extra_in_edge are all examples of where rehoisting happens. john.brawn: We get this when we hoist an instruction but don't hoist all of its uses. An easy example of…
		mkazantsevUnsubmitted Not Done Reply Inline Actions Right, it makes sense when uses are Phis. Maybe it should be explicitly stated in the comment. :) mkazantsev: Right, it makes sense when uses are Phis. Maybe it should be explicitly stated in the comment.
		// their uses that weren't hoisted (such as phis where some operands are not
		// loop invariant). If so make them unconditional by moving them to their
		// immediate dominator. We iterate through the instructions in reverse order
		// which ensures that when we rehoist an instruction we rehoist its operands,
		// and also keep track of where in the block we are rehoisting to to make sure
		efriedmaUnsubmitted Not Done Reply Inline Actions Would it be better to insert a PHI node, rather than re-hoist? efriedma: Would it be better to insert a PHI node, rather than re-hoist?
		john.brawnAuthorUnsubmitted Not Done Reply Inline Actions I considered that, but went with this as it gives more similar results to the current behaviour in the cases where we end up not hoisting any phis. However I'm currently working on fixing the various TODOs here and it looks like maybe I will need to go for the approach of inserting phis. I'm currently working on it. john.brawn: I considered that, but went with this as it gives more similar results to the current behaviour…
		john.brawnAuthorUnsubmitted Not Done Reply Inline Actions Inserting a phi, combined with relaxing the dominated check in registerPossiblyHoistableBranch, can lead to cases where we get the phi value from the 'wrong' predecessor (e.g. in the diamond_with_extra_in_edge test). It's probably not the case that this can happen without that relaxation but I'd rather not risk it. john.brawn: Inserting a phi, combined with relaxing the dominated check in registerPossiblyHoistableBranch…
		// that we rehoist instructions before the instructions that use them.
		reamesUnsubmitted Not Done Reply Inline Actions I think the "re-hoisting" here (which is really re-sinking right?) is probably a non-starter. We should instead prevent hoisting. reames: I think the "re-hoisting" here (which is really re-sinking right?) is probably a non-starter.
		john.brawnAuthorUnsubmitted Not Done Reply Inline Actions Current behaviour of LICM is to hoist everything to the same block, and what this re-hoisting does is essentially get us that behaviour in those cases where hoisting to a conditional block doesn't work. Preventing hoisting would therefore prevent hoisting of things that are currently hoisted. We _could_ do something like delaying the decision of where to hoist things until we know if everything will end up being dominated, but re-hoisting seems much easier. john.brawn: Current behaviour of LICM is to hoist everything to the same block, and what this re-hoisting…
		Instruction *HoistPoint = nullptr;
		for (Instruction *I : reverse(HoistedInstructions)) {
		if (!llvm::all_of(I->uses(), [&](Use &U) { return DT->dominates(I, U); })) {
		BasicBlock *Dominator =
		DT->getNode(I->getParent())->getIDom()->getBlock();
		mkazantsevUnsubmitted Done Reply Inline Actions Set `Changed` here. mkazantsev: Set `Changed` here.
		mkazantsevUnsubmitted Done Reply Inline Actions Removal/insertion of instructions to blocks may also need SafetyInfo updates. We have a helper function `eraseInstruction` for removal that handles it properly, but there was no helper for `moveBefore` because of only 1 occurrence. I've just added it in the patch rL346472 I am pretty sure that the current code is accidentally correct because rehoisting happens outside of loop (and `SafetyInfo` doesn't really care), but please rebase on top of this patch and use this utility whenever you move instructions to keep things general. mkazantsev: Removal/insertion of instructions to blocks may also need SafetyInfo updates. We have a helper…
		LLVM_DEBUG(dbgs() << "LICM rehoisting to " << Dominator->getName() << ": "
		mkazantsevUnsubmitted Not Done Reply Inline Actions Do we really need to change the hoist point? Why not just hoist of them before preheader's terminator? That would make this code simpler. mkazantsev: Do we really need to change the hoist point? Why not just hoist of them before preheader's…
		john.brawnAuthorUnsubmitted Not Done Reply Inline Actions If we have something like %a = add i32 %loop_invariant, 1 %b = mul i32 %a, 2 then if %b does not dominate its uses we have to rehoist %b, at which point %a no longer dominates %b and has to be rehoisted to before %a. That's what HoistPoint is doing, it's making sure we rehoist instructions before rehoisted instructions that use them. john.brawn: If we have something like ``` %a = add i32 %loop_invariant, 1 %b = mul i32 %a, 2 ``` then if %b…
		mkazantsevUnsubmitted Not Done Reply Inline Actions But if we process `%a` and then `%b` and insert each of them before the terminator, we don't have this problem. It seems that using a vector instead of a list will spare you of this problem. mkazantsev: But if we process `%a` and then `%b` and insert each of them before the terminator, we don't…
		<< *I << "\n");
		if (!HoistPoint \|\| HoistPoint->getParent() != Dominator) {
		if (HoistPoint)
		assert(DT->dominates(Dominator, HoistPoint->getParent()) &&
		"New hoist point expected to dominate old hoist point");
		HoistPoint = Dominator->getTerminator();
		}
		mkazantsevUnsubmitted Done Reply Inline Actions It makes more sense to verify `DT` before `LI` since `LI` uses it. mkazantsev: It makes more sense to verify `DT` before `LI` since `LI` uses it.
		moveInstructionBefore(I, HoistPoint, *SafetyInfo);
		HoistPoint = I;
		Changed = true;
		}
		}

		// Now that we've finished hoisting make sure that LI and DT are still valid.
		#ifndef NDEBUG
		assert(DT->verify(DominatorTree::VerificationLevel::Fast) &&
		"Dominator tree verification failed");
		LI->verify(*DT);
		#endif

return Changed;		return Changed;
}		}

// Return true if LI is invariant within scope of the loop. LI is invariant if		// Return true if LI is invariant within scope of the loop. LI is invariant if
// CurLoop is dominated by an invariant.start representing the same memory		// CurLoop is dominated by an invariant.start representing the same memory
// location and size as the memory location LI loads from, and also the		// location and size as the memory location LI loads from, and also the
// invariant.start has no uses.		// invariant.start has no uses.
static bool isLoadInvariantInLoop(LoadInst LI, DominatorTree DT,		static bool isLoadInvariantInLoop(LoadInst LI, DominatorTree DT,
▲ Show 20 Lines • Show All 550 Lines • ▼ Show 20 Lines	for (auto *UI : Users) {
Changed = true;		Changed = true;
}		}
return Changed;		return Changed;
}		}

/// When an instruction is found to only use loop invariant operands that		/// When an instruction is found to only use loop invariant operands that
/// is safe to hoist, this instruction is called to do the dirty work.		/// is safe to hoist, this instruction is called to do the dirty work.
///		///
static void hoist(Instruction &I, const DominatorTree DT, const Loop CurLoop,		static void hoist(Instruction &I, const DominatorTree DT, const Loop CurLoop,
ICFLoopSafetyInfo SafetyInfo, OptimizationRemarkEmitter ORE) {		BasicBlock Dest, ICFLoopSafetyInfo SafetyInfo,
		mkazantsevUnsubmitted Done Reply Inline Actions After you've changed the semantics, `Preheader` is no longer the preheader of `CurLoop` (or is it)? If not, please rename it. If yes, why do you need it as parameter? You can just take loop's preheader as it was before. In any case, it should be `const BasicBlock `. mkazantsev:* After you've changed the semantics, `Preheader` is no longer the preheader of `CurLoop` (or is…
		john.brawnAuthorUnsubmitted Not Done Reply Inline Actions I left it as `Preheader` just because it was simpler not to change it, but yes it makes sense to rename it so I'll do so. It can't be const though, as we're hoisting I into it (i.e. modifying it). john.brawn: I left it as `Preheader` just because it was simpler not to change it, but yes it makes sense…
auto *Preheader = CurLoop->getLoopPreheader();		OptimizationRemarkEmitter *ORE) {
LLVM_DEBUG(dbgs() << "LICM hoisting to " << Preheader->getName() << ": " << I		LLVM_DEBUG(dbgs() << "LICM hoisting to " << Dest->getName() << ": " << I
<< "\n");		<< "\n");
ORE->emit([&]() {		ORE->emit([&]() {
return OptimizationRemark(DEBUG_TYPE, "Hoisted", &I) << "hoisting "		return OptimizationRemark(DEBUG_TYPE, "Hoisted", &I) << "hoisting "
<< ore::NV("Inst", &I);		<< ore::NV("Inst", &I);
});		});

// Metadata can be dependent on conditions we are hoisting above.		// Metadata can be dependent on conditions we are hoisting above.
// Conservatively strip all metadata on the instruction unless we were		// Conservatively strip all metadata on the instruction unless we were
// guaranteed to execute I if we entered the loop, in which case the metadata		// guaranteed to execute I if we entered the loop, in which case the metadata
// is valid in the loop preheader.		// is valid in the loop preheader.
if (I.hasMetadataOtherThanDebugLoc() &&		if (I.hasMetadataOtherThanDebugLoc() &&
// The check on hasMetadataOtherThanDebugLoc is to prevent us from burning		// The check on hasMetadataOtherThanDebugLoc is to prevent us from burning
// time in isGuaranteedToExecute if we don't actually have anything to		// time in isGuaranteedToExecute if we don't actually have anything to
// drop. It is a compile time optimization, not required for correctness.		// drop. It is a compile time optimization, not required for correctness.
!SafetyInfo->isGuaranteedToExecute(I, DT, CurLoop))		!SafetyInfo->isGuaranteedToExecute(I, DT, CurLoop))
I.dropUnknownNonDebugMetadata();		I.dropUnknownNonDebugMetadata();

// Move the new node to the Preheader, before its terminator.		if (isa<PHINode>(I))
moveInstructionBefore(I, Preheader->getTerminator(), SafetyInfo);		// Move the new node to the end of the phi list in the destination block.
		moveInstructionBefore(I, Dest->getFirstNonPHI(), SafetyInfo);
		else
		// Move the new node to the destination block, before its terminator.
		moveInstructionBefore(I, Dest->getTerminator(), SafetyInfo);
		mkazantsevUnsubmitted Done Reply Inline Actions Braces not needed (here and in some other places like this). mkazantsev: Braces not needed (here and in some other places like this).

// Do not retain debug locations when we are moving instructions to different		// Do not retain debug locations when we are moving instructions to different
// basic blocks, because we want to avoid jumpy line tables. Calls, however,		// basic blocks, because we want to avoid jumpy line tables. Calls, however,
// need to retain their debug locs because they may be inlined.		// need to retain their debug locs because they may be inlined.
// FIXME: How do we retain source locations without causing poor debugging		// FIXME: How do we retain source locations without causing poor debugging
// behavior?		// behavior?
if (!isa<CallInst>(I))		if (!isa<CallInst>(I))
I.setDebugLoc(DebugLoc());		I.setDebugLoc(DebugLoc());
▲ Show 20 Lines • Show All 533 Lines • Show Last 20 Lines

test/Transforms/LICM/hoist-phi.ll

This file was added.

				; RUN: opt -S -licm < %s \| FileCheck %s
				; RUN: opt -passes='require<opt-remark-emit>,loop(licm)' -S < %s \| FileCheck %s

				; CHECK-LABEL: @triangle_phi
				define void @triangle_phi(i32 %x, i32* %p) {
				; CHECK-LABEL: entry:
				; CHECK: %cmp1 = icmp sgt i32 %x, 0
				; CHECK: br i1 %cmp1, label %[[IF_LICM:.]], label %[[THEN_LICM:.]]
				entry:
				br label %loop

				; CHECK: [[IF_LICM]]:
				; CHECK: %add = add i32 %x, 1
				; CHECK: br label %[[THEN_LICM]]

				; CHECK: [[THEN_LICM]]:
				; CHECK: phi i32 [ %add, %[[IF_LICM]] ], [ %x, %entry ]
				; CHECK: store i32 %phi, i32* %p
				; CHECK: %cmp2 = icmp ne i32 %phi, 0
				; CHECK: br label %loop

				loop:
				%cmp1 = icmp sgt i32 %x, 0
				br i1 %cmp1, label %if, label %then

				if:
				%add = add i32 %x, 1
				br label %then

				then:
				%phi = phi i32 [ %add, %if ], [ %x, %loop ]
				store i32 %phi, i32* %p
				%cmp2 = icmp ne i32 %phi, 0
				br i1 %cmp2, label %loop, label %end

				end:
				ret void
				}

				; CHECK-LABEL: @diamond_phi
				define void @diamond_phi(i32 %x, i32* %p) {
				; CHECK-LABEL: entry:
				; CHECK: %cmp1 = icmp sgt i32 %x, 0
				; CHECK: br i1 %cmp1, label %[[IF_LICM:.]], label %[[ELSE_LICM:.]]
				entry:
				br label %loop

				; CHECK: [[IF_LICM]]:
				; CHECK: %add = add i32 %x, 1
				; CHECK: br label %[[THEN_LICM:.*]]

				; CHECK: [[ELSE_LICM]]:
				; CHECK: %sub = sub i32 %x, 1
				; CHECK: br label %[[THEN_LICM]]

				; CHECK: [[THEN_LICM]]
				; CHECK: %phi = phi i32 [ %add, %[[IF_LICM]] ], [ %sub, %[[ELSE_LICM]] ]
				; CHECK: store i32 %phi, i32* %p
				; CHECK: %cmp2 = icmp ne i32 %phi, 0
				; CHECK: br label %loop

				loop:
				%cmp1 = icmp sgt i32 %x, 0
				br i1 %cmp1, label %if, label %else

				if:
				%add = add i32 %x, 1
				br label %then

				else:
				%sub = sub i32 %x, 1
				br label %then

				then:
				%phi = phi i32 [ %add, %if ], [ %sub, %else ]
				store i32 %phi, i32* %p
				%cmp2 = icmp ne i32 %phi, 0
				br i1 %cmp2, label %loop, label %end

				end:
				ret void
				}

				; TODO: This is currently too complicated for us to be able to hoist the phi.
				; CHECK-LABEL: @three_way_phi
				define void @three_way_phi(i32 %x, i32* %p) {
				; CHECK-LABEL: entry:
				; CHECK-DAG: %cmp1 = icmp sgt i32 %x, 0
				; CHECK-DAG: %add = add i32 %x, 1
				; CHECK-DAG: %cmp2 = icmp sgt i32 %add, 0
				; CHECK: br i1 %cmp1, label %[[IF_LICM:.]], label %[[ELSE_LICM:.]]

				; CHECK: [[IF_LICM]]:
				; CHECK: br label %[[THEN_LICM:.*]]

				; CHECK: [[THEN_LICM]]:
				; CHECK: %sub = sub i32 %x, 1
				; CHECK: br label %loop

				entry:
				br label %loop

				loop:
				%cmp1 = icmp sgt i32 %x, 0
				br i1 %cmp1, label %if, label %then

				if:
				%add = add i32 %x, 1
				%cmp2 = icmp sgt i32 %add, 0
				br i1 %cmp2, label %if.if, label %then

				if.if:
				%sub = sub i32 %x, 1
				br label %then

				then:
				%phi = phi i32 [ 0, %loop ], [ %add, %if ], [ %sub, %if.if ]
				store i32 %phi, i32* %p
				%cmp3 = icmp ne i32 %phi, 0
				br i1 %cmp3, label %loop, label %end

				end:
				ret void
				}

				; TODO: This is currently too complicated for us to be able to hoist the phi.
				; CHECK-LABEL: @tree_phi
				mkazantsevUnsubmitted Done Reply Inline Actions Please add `TODO:` to the tests you think we should cover in the future, that will make it easier to track if we decide to expand this transform to something more general. mkazantsev: Please add `TODO:` to the tests you think we should cover in the future, that will make it…
				define void @tree_phi(i32 %x, i32* %p) {
				; CHECK-LABEL: entry:
				; CHECK-DAG: %cmp1 = icmp sgt i32 %x, 0
				; CHECK-DAG: %add = add i32 %x, 1
				; CHECK-DAG: %cmp2 = icmp sgt i32 %add, 0
				; CHECK-DAG: %sub = sub i32 %x, 1
				; CHECK: br label %loop

				entry:
				br label %loop

				loop:
				%cmp1 = icmp sgt i32 %x, 0
				br i1 %cmp1, label %if, label %else

				if:
				%add = add i32 %x, 1
				%cmp2 = icmp sgt i32 %add, 0
				br i1 %cmp2, label %if.if, label %if.else

				if.if:
				br label %then

				if.else:
				br label %then

				else:
				%sub = sub i32 %x, 1
				br label %then

				then:
				%phi = phi i32 [ %add, %if.if ], [ 0, %if.else ], [ %sub, %else ]
				store i32 %phi, i32* %p
				%cmp3 = icmp ne i32 %phi, 0
				br i1 %cmp3, label %loop, label %end

				end:
				ret void
				}

				; TODO: We can hoist the first phi, but not the second.
				; CHECK-LABEL: @phi_phi
				define void @phi_phi(i32 %x, i32* %p) {
				; CHECK-LABEL: entry:
				; CHECK-DAG: %cmp1 = icmp sgt i32 %x, 0
				; CHECK-DAG: %add = add i32 %x, 1
				; CHECK-DAG: %cmp2 = icmp sgt i32 %add, 0
				; CHECK-DAG: %sub = sub i32 %x, 1
				; CHECK: br i1 %cmp2, label %[[IF_IF_LICM:.]], label %[[IF_ELSE_LICM:.]]

				; CHECK: [[IF_IF_LICM]]:
				; CHECK: br label %[[IF_THEN_LICM:.*]]

				; CHECK: [[IF_ELSE_LICM]]:
				; CHECK: br label %[[IF_THEN_LICM]]

				; CHECK: [[IF_THEN_LICM]]:
				; CHECK: %phi1 = phi i32 [ %add, %[[IF_IF_LICM]] ], [ 0, %[[IF_ELSE_LICM]] ]
				; CHECK: br label %loop

				entry:
				br label %loop

				loop:
				%cmp1 = icmp sgt i32 %x, 0
				br i1 %cmp1, label %if, label %else

				if:
				%add = add i32 %x, 1
				%cmp2 = icmp sgt i32 %add, 0
				br i1 %cmp2, label %if.if, label %if.else

				if.if:
				br label %if.then

				if.else:
				br label %if.then

				if.then:
				%phi1 = phi i32 [ %add, %if.if ], [ 0, %if.else ]
				br label %then

				else:
				%sub = sub i32 %x, 1
				br label %then

				then:
				%phi2 = phi i32 [ %phi1, %if.then ], [ %sub, %else ]
				store i32 %phi2, i32* %p
				%cmp3 = icmp ne i32 %phi2, 0
				br i1 %cmp3, label %loop, label %end

				end:
				ret void
				}

				; Check that we correctly duplicate empty control flow.
				; CHECK-LABEL: @empty_triangle_phi
				define i8 @empty_triangle_phi(i32 %x, i32 %y) {
				; CHECK-LABEL: entry:
				; CHECK: %cmp1 = icmp eq i32 %x, 0
				; CHECK: br i1 %cmp1, label %[[IF_LICM:.]], label %[[THEN_LICM:.]]
				entry:
				br label %loop

				; CHECK: [[IF_LICM]]:
				; CHECK: br label %[[THEN_LICM]]

				; CHECK: [[THEN_LICM]]:
				; CHECK: %phi = phi i8 [ 0, %[[IF_LICM]] ], [ 1, %entry ]
				; CHECK: %cmp2 = icmp eq i32 %y, 0
				; CHECK: br label %loop

				loop:
				%cmp1 = icmp eq i32 %x, 0
				br i1 %cmp1, label %if, label %then

				if:
				br label %then

				then:
				%phi = phi i8 [ 0, %if ], [ 1, %loop ]
				%cmp2 = icmp eq i32 %y, 0
				br i1 %cmp2, label %end, label %loop

				end:
				ret i8 %phi
				}

				; CHECK-LABEL: @empty_diamond_phi
				define i8 @empty_diamond_phi(i32 %x, i32 %y) {
				; CHECK-LABEL: entry:
				; CHECK: %cmp1 = icmp eq i32 %x, 0
				; CHECK: br i1 %cmp1, label %[[IF_LICM:.]], label %[[ELSE_LICM:.]]
				entry:
				br label %loop

				; CHECK: [[IF_LICM]]:
				; CHECK: br label %[[THEN_LICM:.*]]

				; CHECK: [[ELSE_LICM]]:
				; CHECK: br label %[[THEN_LICM]]

				; CHECK: [[THEN_LICM]]:
				; CHECK: %phi = phi i8 [ 0, %[[IF_LICM]] ], [ 1, %[[ELSE_LICM]] ]
				; CHECK: %cmp2 = icmp eq i32 %y, 0
				; CHECK: br label %loop

				loop:
				%cmp1 = icmp eq i32 %x, 0
				br i1 %cmp1, label %if, label %else

				if:
				br label %then

				else:
				br label %then

				then:
				%phi = phi i8 [ 0, %if ], [ 1, %else ]
				%cmp2 = icmp eq i32 %y, 0
				br i1 %cmp2, label %end, label %loop

				end:
				ret i8 %phi
				}

				; Check that we correctly handle the case that the first thing we try to hoist is a phi.
				; CHECK-LABEL: @empty_triangle_phi_first
				define i8 @empty_triangle_phi_first(i32 %x, i1 %cond) {
				; CHECK-LABEL: entry:
				; CHECK: br i1 %cond, label %[[IF_LICM:.]], label %[[THEN_LICM:.]]
				entry:
				br label %loop

				; CHECK: [[IF_LICM]]:
				; CHECK: br label %[[THEN_LICM]]

				; CHECK: [[THEN_LICM]]:
				; CHECK: %phi = phi i8 [ 0, %[[IF_LICM]] ], [ 1, %entry ]
				; CHECK: %cmp = icmp eq i32 %x, 0
				; CHECK: br label %loop

				loop:
				br i1 %cond, label %if, label %then

				if:
				br label %then

				then:
				%phi = phi i8 [ 0, %if ], [ 1, %loop ]
				%cmp = icmp eq i32 %x, 0
				br i1 %cmp, label %end, label %loop

				end:
				ret i8 %phi
				}

				; CHECK-LABEL: @empty_diamond_phi
				define i8 @empty_diamond_phi_first(i32 %x, i1 %cond) {
				; CHECK-LABEL: entry:
				; CHECK: br i1 %cond, label %[[IF_LICM:.]], label %[[ELSE_LICM:.]]
				entry:
				br label %loop

				; CHECK: [[IF_LICM]]:
				; CHECK: br label %[[THEN_LICM:.*]]

				; CHECK: [[ELSE_LICM]]:
				; CHECK: br label %[[THEN_LICM]]

				; CHECK: [[THEN_LICM]]:
				; CHECK: %phi = phi i8 [ 0, %[[IF_LICM]] ], [ 1, %[[ELSE_LICM]] ]
				; CHECK: %cmp = icmp eq i32 %x, 0
				; CHECK: br label %loop

				loop:
				br i1 %cond, label %if, label %else

				if:
				br label %then

				else:
				br label %then

				then:
				%phi = phi i8 [ 0, %if ], [ 1, %else ]
				%cmp = icmp eq i32 %x, 0
				br i1 %cmp, label %end, label %loop

				end:
				ret i8 %phi
				}

				; CHECK-LABEL: @empty_triangle_phi_first
				define i8 @empty_triangle_phi_first_empty_loop_head(i32 %x, i1 %cond) {
				; CHECK-LABEL: entry:
				; CHECK: br i1 %cond, label %[[IF_LICM:.]], label %[[THEN_LICM:.]]
				entry:
				br label %loop

				; CHECK: [[IF_LICM]]:
				; CHECK: br label %[[THEN_LICM]]

				; CHECK: [[THEN_LICM]]:
				; CHECK: %phi = phi i8 [ 0, %[[IF_LICM]] ], [ 1, %entry ]
				; CHECK: %cmp = icmp eq i32 %x, 0
				; CHECK: br label %loop

				loop:
				br label %test

				test:
				br i1 %cond, label %if, label %then

				if:
				br label %then

				then:
				%phi = phi i8 [ 0, %if ], [ 1, %test ]
				%cmp = icmp eq i32 %x, 0
				br i1 %cmp, label %end, label %loop

				end:
				ret i8 %phi
				}

				; CHECK-LABEL: @empty_diamond_phi_first_empty_loop_head
				define i8 @empty_diamond_phi_first_empty_loop_head(i32 %x, i1 %cond) {
				; CHECK-LABEL: entry:
				; CHECK: br i1 %cond, label %[[IF_LICM:.]], label %[[ELSE_LICM:.]]
				entry:
				br label %loop

				; CHECK: [[IF_LICM]]:
				; CHECK: br label %[[THEN_LICM:.*]]

				; CHECK: [[ELSE_LICM]]:
				; CHECK: br label %[[THEN_LICM]]

				; CHECK: [[THEN_LICM]]:
				; CHECK: %phi = phi i8 [ 0, %[[IF_LICM]] ], [ 1, %[[ELSE_LICM]] ]
				; CHECK: %cmp = icmp eq i32 %x, 0
				; CHECK: br label %loop

				loop:
				br label %test

				test:
				br i1 %cond, label %if, label %else

				if:
				br label %then

				else:
				br label %then

				then:
				%phi = phi i8 [ 0, %if ], [ 1, %else ]
				%cmp = icmp eq i32 %x, 0
				br i1 %cmp, label %end, label %loop

				end:
				ret i8 %phi
				}

				; The phi is on one branch of a diamond while simultaneously at the end of a
				; triangle. Check that we duplicate the triangle and not the diamond.
				; CHECK-LABEL: @triangle_diamond
				define void @triangle_diamond(i32* %ptr, i32 %x, i32 %y) {
				; CHECK-LABEL: entry:
				; CHECK-DAG: %cmp1 = icmp ne i32 %x, 0
				; CHECK-DAG: %cmp2 = icmp ne i32 %y, 0
				; CHECK: br i1 %cmp1, label %[[IF_LICM:.]], label %[[THEN_LICM:.]]
				entry:
				br label %loop

				; CHECK: [[IF_LICM]]:
				; CHECK: br label %[[THEN_LICM]]

				; CHECK: [[THEN_LICM]]:
				; CHECK: %phi = phi i32 [ 0, %[[IF_LICM]] ], [ 127, %entry ]

				loop:
				%cmp1 = icmp ne i32 %x, 0
				br i1 %cmp1, label %if, label %then

				if:
				%cmp2 = icmp ne i32 %y, 0
				br i1 %cmp2, label %if.then, label %then

				then:
				%phi = phi i32 [ 0, %if ], [ 127, %loop ]
				store i32 %phi, i32* %ptr
				br label %end

				if.then:
				br label %end

				end:
				br label %loop
				}

				; As the previous, but the end of the diamond is the head of the loop.
				; CHECK-LABEL: @triangle_diamond_backedge
				define void @triangle_diamond_backedge(i32* %ptr, i32 %x, i32 %y) {
				; CHECK-LABEL: entry:
				; CHECK-DAG: %cmp1 = icmp ne i32 %x, 0
				; CHECK-DAG: %cmp2 = icmp ne i32 %y, 0
				; CHECK: br i1 %cmp1, label %[[IF_LICM:.]], label %[[THEN_LICM:.]]
				entry:
				br label %loop

				; CHECK: [[IF_LICM]]:
				; CHECK: br label %[[THEN_LICM]]

				; CHECK: [[THEN_LICM]]:
				; CHECK: %phi = phi i32 [ 0, %[[IF_LICM]] ], [ 127, %entry ]

				loop:
				%cmp1 = icmp ne i32 %x, 0
				br i1 %cmp1, label %if, label %then

				if:
				%cmp2 = icmp ne i32 %y, 0
				br i1 %cmp2, label %backedge, label %then

				then:
				%phi = phi i32 [ 0, %if ], [ 127, %loop ]
				store i32 %phi, i32* %ptr
				br label %loop

				backedge:
				br label %loop
				}

				; TODO: The inner diamonds can be hoisted, but not currently the outer diamond
				; CHECK-LABEL: @diamonds_inside_diamond
				define void @diamonds_inside_diamond(i32 %x, i32* %p) {
				; CHECK-LABEL: entry:
				; CHECK-DAG: %cmp1 = icmp sgt i32 %x, 0
				; CHECK-DAG: %cmp3 = icmp slt i32 %x, -10
				; CHECK: br i1 %cmp3, label %[[ELSE_IF_LICM:.]], label %[[ELSE_ELSE_LICM:.]]
				entry:
				br label %loop

				; CHECK: [[ELSE_IF_LICM]]:
				; CHECK: br label %[[ELSE_THEN_LICM:.*]]

				; CHECK: [[ELSE_ELSE_LICM]]:
				; CHECK: br label %[[ELSE_THEN_LICM]]

				; CHECK: [[ELSE_THEN_LICM]]:
				; CHECK: %phi2 = phi i32 [ 2, %[[ELSE_IF_LICM]] ], [ 3, %[[ELSE_ELSE_LICM]] ]
				; CHECK: %cmp2 = icmp sgt i32 %x, 10
				; CHECK: br i1 %cmp2, label %[[IF_IF_LICM:.]], label %[[IF_ELSE_LICM:.]]

				; CHECK: [[IF_IF_LICM]]:
				; CHECK: br label %[[IF_THEN_LICM:.*]]

				; CHECK: [[IF_ELSE_LICM]]:
				; CHECK: br label %[[IF_THEN_LICM]]

				; CHECK: [[IF_THEN_LICM]]:
				; CHECK: %phi1 = phi i32 [ 0, %[[IF_IF_LICM]] ], [ 1, %[[IF_ELSE_LICM]] ]
				; CHECK: br label %loop

				loop:
				%cmp1 = icmp sgt i32 %x, 0
				br i1 %cmp1, label %if, label %else

				if:
				%cmp2 = icmp sgt i32 %x, 10
				br i1 %cmp2, label %if.if, label %if.else

				if.if:
				br label %if.then

				if.else:
				br label %if.then

				if.then:
				%phi1 = phi i32 [ 0, %if.if ], [ 1, %if.else ]
				br label %then

				else:
				%cmp3 = icmp slt i32 %x, -10
				br i1 %cmp3, label %else.if, label %else.else

				else.if:
				br label %else.then

				else.else:
				br label %else.then

				else.then:
				%phi2 = phi i32 [ 2, %else.if ], [ 3, %else.else ]
				br label %then

				then:
				%phi3 = phi i32 [ %phi1, %if.then ], [ %phi2, %else.then ]
				store i32 %phi3, i32* %p
				%cmp4 = icmp ne i32 %phi3, 0
				br i1 %cmp4, label %loop, label %end

				end:
				ret void
				}

				; We can hoist blocks that contain an edge that exits the loop by ignoring that
				; edge in the hoisted block.
				; CHECK-LABEL: @triangle_phi_loopexit
				define void @triangle_phi_loopexit(i32 %x, i32* %p) {
				; CHECK-LABEL: entry:
				; CHECK-DAG: %add = add i32 %x, 1
				; CHECK-DAG: %cmp1 = icmp sgt i32 %x, 0
				; CHECK-DAG: %cmp2 = icmp sgt i32 10, %add
				; CHECK: br i1 %cmp1, label %[[IF_LICM:.]], label %[[THEN_LICM:.]]
				entry:
				br label %loop

				; CHECK: [[IF_LICM]]:
				; CHECK: br label %[[THEN_LICM]]

				; CHECK: [[THEN_LICM]]:
				; CHECK: %phi = phi i32 [ %add, %[[IF_LICM]] ], [ %x, %entry ]

				loop:
				%cmp1 = icmp sgt i32 %x, 0
				br i1 %cmp1, label %if, label %then

				if:
				%add = add i32 %x, 1
				%cmp2 = icmp sgt i32 10, %add
				br i1 %cmp2, label %then, label %end

				then:
				%phi = phi i32 [ %add, %if ], [ %x, %loop ]
				store i32 %phi, i32* %p
				%cmp3 = icmp ne i32 %phi, 0
				br i1 %cmp3, label %loop, label %end

				end:
				ret void
				}

				; CHECK-LABEL: @diamond_phi_oneloopexit
				define void @diamond_phi_oneloopexit(i32 %x, i32* %p) {
				; CHECK-LABEL: entry:
				; CHECK-DAG: %add = add i32 %x, 1
				; CHECK-DAG: %cmp1 = icmp sgt i32 %x, 0
				; CHECK-DAG: %cmp2 = icmp sgt i32 10, %add
				; CHECK: br i1 %cmp1, label %[[IF_LICM:.]], label %[[THEN_LICM:.]]
				entry:
				br label %loop

				; CHECK: [[IF_LICM]]:
				; CHECK: br label %[[THEN_LICM:.*]]

				; CHECK: [[ELSE_LICM]]:
				; CHECK: %sub = sub i32 %x, 1
				; CHECK: br label %[[THEN_LICM]]

				; CHECK: [[THEN_LICM]]
				; CHECK: %phi = phi i32 [ %add, %[[IF_LICM]] ], [ %sub, %[[ELSE_LICM]] ]
				; CHECK: %cmp3 = icmp ne i32 %phi, 0
				; CHECK: br label %loop

				loop:
				%cmp1 = icmp sgt i32 %x, 0
				br i1 %cmp1, label %if, label %else

				if:
				%add = add i32 %x, 1
				%cmp2 = icmp sgt i32 10, %add
				br i1 %cmp2, label %then, label %end

				else:
				%sub = sub i32 %x, 1
				br label %then

				then:
				%phi = phi i32 [ %add, %if ], [ %sub, %else ]
				store i32 %phi, i32* %p
				%cmp3 = icmp ne i32 %phi, 0
				br i1 %cmp3, label %loop, label %end

				end:
				ret void
				}

				; CHECK-LABEL: @diamond_phi_twoloopexit
				define void @diamond_phi_twoloopexit(i32 %x, i32* %p) {
				; CHECK-LABEL: entry:
				; CHECK-DAG: %sub = sub i32 %x, 1
				; CHECK-DAG: %add = add i32 %x, 1
				; CHECK-DAG: %cmp1 = icmp sgt i32 %x, 0
				; CHECK-DAG: %cmp2 = icmp sgt i32 10, %add
				; CHECK-DAG: %cmp3 = icmp sgt i32 10, %sub
				; CHECK: br i1 %cmp1, label %[[IF_LICM:.]], label %[[THEN_LICM:.]]
				entry:
				br label %loop

				; CHECK: [[IF_LICM]]:
				; CHECK: br label %[[THEN_LICM:.*]]

				; CHECK: [[ELSE_LICM]]:
				; CHECK: br label %[[THEN_LICM]]

				; CHECK: [[THEN_LICM]]
				; CHECK: %phi = phi i32 [ %add, %[[IF_LICM]] ], [ %sub, %[[ELSE_LICM]] ]
				; CHECK: %cmp4 = icmp ne i32 %phi, 0
				; CHECK: br label %loop

				loop:
				%cmp1 = icmp sgt i32 %x, 0
				br i1 %cmp1, label %if, label %else

				if:
				%add = add i32 %x, 1
				%cmp2 = icmp sgt i32 10, %add
				br i1 %cmp2, label %then, label %end

				else:
				%sub = sub i32 %x, 1
				%cmp3 = icmp sgt i32 10, %sub
				br i1 %cmp3, label %then, label %end

				then:
				%phi = phi i32 [ %add, %if ], [ %sub, %else ]
				store i32 %phi, i32* %p
				%cmp4 = icmp ne i32 %phi, 0
				br i1 %cmp4, label %loop, label %end

				end:
				ret void
				}

				; The store cannot be hoisted, so add and shr cannot be hoisted into a
				; conditional block.
				; CHECK-LABEL: @conditional_use
				define void @conditional_use(i32 %x, i32* %p) {
				; CHECK-LABEL: entry:
				; CHECK-DAG: %cond = icmp ugt i32 %x, 0
				; CHECK-DAG: %add = add i32 %x, 5
				; CHECK-DAG: %shr = ashr i32 %add, 1
				; CHECK: br label %loop
				entry:
				br label %loop

				loop:
				%cond = icmp ugt i32 %x, 0
				br i1 %cond, label %if, label %else

				; CHECK-LABEL: if:
				; CHECK: store i32 %shr, i32* %p, align 4
				if:
				%add = add i32 %x, 5
				%shr = ashr i32 %add, 1
				store i32 %shr, i32* %p, align 4
				br label %then

				else:
				br label %then

				then:
				br label %loop
				}

				; A diamond with two triangles on the left and one on the right. This test is
				; to check that we have a unique loop preheader when we hoist the store (and so
				; don't fail an assertion).
				; CHECK-LABEL: @triangles_in_diamond
				define void @triangles_in_diamond(i32* %ptr) {
				; CHECK-LABEL: entry:
				; CHECK: store i32 0, i32* %ptr, align 4
				; CHECK: br label %loop
				entry:
				br label %loop

				loop:
				br i1 undef, label %left_triangle_1, label %right_triangle

				left_triangle_1:
				br i1 undef, label %left_triangle_1_if, label %left_triangle_2

				left_triangle_1_if:
				br label %left_triangle_2

				left_triangle_2:
				br i1 undef, label %left_triangle_2_if, label %left_triangle_2_then

				left_triangle_2_if:
				br label %left_triangle_2_then

				left_triangle_2_then:
				br label %loop.end

				right_triangle:
				br i1 undef, label %right_triangle.if, label %right_triangle.then

				right_triangle.if:
				br label %right_triangle.then

				right_triangle.then:
				br label %loop.end

				loop.end:
				store i32 0, i32* %ptr, align 4
				br label %loop
				}

				; %cmp dominates its used after being hoisted, but not after %brmerge is rehoisted
				; CHECK-LABEL: @rehoist
				define void @rehoist(i8* %this, i32 %x) {
				; CHECK-LABEL: entry:
				; CHECK-DAG: %sub = add nsw i32 %x, -1
				; CHECK-DAG: %fptr = bitcast i8* %this to void (i8)
				; CHECK-DAG: %cmp = icmp eq i32 0, %sub
				; CHECK-DAG: %brmerge = or i1 %cmp, true
				entry:
				%sub = add nsw i32 %x, -1
				br label %loop

				loop:
				br i1 undef, label %if1, label %else1

				if1:
				%fptr = bitcast i8* %this to void (i8)
				call void %fptr(i8* %this)
				br label %then1

				else1:
				br label %then1

				then1:
				%cmp = icmp eq i32 0, %sub
				br i1 %cmp, label %end, label %else2

				else2:
				%brmerge = or i1 %cmp, true
				br i1 %brmerge, label %if3, label %end

				if3:
				br label %end

				end:
				br label %loop
				}

				; A test case that uses empty blocks in a way that can cause control flow
				; hoisting to get confused.
				; CHECK-LABEL: @empty_blocks_multiple_conditional_branches
				define void @empty_blocks_multiple_conditional_branches(float %arg, float* %ptr) {
				; CHECK-LABEL: entry
				; CHECK-DAG: %div1 = fmul float %arg, 4.000000e+00
				; CHECK-DAG: %div2 = fmul float %arg, 2.000000e+00
				entry:
				br label %loop

				; The exact path to the phi isn't checked here, because it depends on whether
				; cond2 or cond3 is hoisted first
				; CHECK: %phi = phi float [ 0.000000e+00, %{{.}} ], [ %div1, %{{.}} ]
				; CHECK: br label %loop

				loop:
				br i1 undef, label %backedge2, label %cond1

				cond1:
				br i1 undef, label %cond1.if, label %cond1.else

				cond1.else:
				br label %cond3

				cond1.if:
				br label %cond1.if.next

				cond1.if.next:
				br label %cond2

				cond2:
				%div1 = fmul float %arg, 4.000000e+00
				br i1 undef, label %cond2.if, label %cond2.then

				cond2.if:
				br label %cond2.then

				cond2.then:
				%phi = phi float [ 0.000000e+00, %cond2 ], [ %div1, %cond2.if ]
				store float %phi, float* %ptr
				br label %backedge2

				cond3:
				br i1 undef, label %cond3.then, label %cond3.if

				cond3.if:
				%div2 = fmul float %arg, 2.000000e+00
				store float %div2, float* %ptr
				br label %cond3.then

				cond3.then:
				br label %loop

				backedge2:
				br label %loop
				}

				; We can't do much here, so mainly just check that we don't crash.
				; CHECK-LABEL: @many_path_phi
				define void @many_path_phi(i32* %ptr1, i32* %ptr2) {
				; CHECK-LABEL: entry:
				; CHECK-DAG: %gep3 = getelementptr inbounds i32, i32* %ptr2, i32 2
				; CHECK-DAG: %gep2 = getelementptr inbounds i32, i32* %ptr2, i32 2
				; CHECK: br label %loop
				entry:
				br label %loop

				loop:
				%phi1 = phi i32 [ 0, %entry ], [ %phi2, %end ]
				%cmp1 = icmp ugt i32 %phi1, 3
				br i1 %cmp1, label %cond2, label %cond1

				cond1:
				br i1 undef, label %end, label %cond1.else

				cond1.else:
				%gep2 = getelementptr inbounds i32, i32* %ptr2, i32 2
				%val2 = load i32, i32* %gep2, align 4
				%cmp2 = icmp eq i32 %val2, 13
				br i1 %cmp2, label %cond1.end, label %end

				cond1.end:
				br label %end

				cond2:
				br i1 undef, label %end, label %cond2.else

				cond2.else:
				%gep3 = getelementptr inbounds i32, i32* %ptr2, i32 2
				%val3 = load i32, i32* %gep3, align 4
				%cmp3 = icmp eq i32 %val3, 13
				br i1 %cmp3, label %cond2.end, label %end

				cond2.end:
				br label %end

				end:
				%phi2 = phi i32 [ 1, %cond1 ], [ 2, %cond1.else ], [ 3, %cond1.end ], [ 4, %cond2 ], [ 5, %cond2.else ], [ 6, %cond2.end ]
				br label %loop
				}

				; Check that we correctly handle the hoisting of %gep when theres a critical
				; edge that branches to the preheader.
				; CHECK-LABEL: @crit_edge
				define void @crit_edge(i32* %ptr, i32 %idx, i1 %cond1, i1 %cond2) {
				; CHECK-LABEL: entry:
				; CHECK: %gep = getelementptr inbounds i32, i32* %ptr, i32 %idx
				; CHECK: br label %preheader
				entry:
				br label %preheader

				preheader:
				br label %loop

				loop:
				br i1 %cond1, label %then, label %if

				if:
				%gep = getelementptr inbounds i32, i32* %ptr, i32 %idx
				%val = load i32, i32* %gep
				br label %then

				then:
				%phi = phi i32 [ %val, %if ], [ 0, %loop ]
				store i32 %phi, i32* %ptr
				br i1 %cond2, label %loop, label %crit_edge

				crit_edge:
				br label %preheader
				}

				; Check that the conditional sub is correctly hoisted from the inner loop to the
				; preheader of the outer loop.
				; CHECK-LABEL: @hoist_from_innermost_loop
				define void @hoist_from_innermost_loop(i32 %nx, i32* %ptr) {
				; CHECK-LABEL: entry:
				; CHECK-DAG: %sub = sub nsw i32 0, %nx
				; CHECK: br label %outer_loop
				entry:
				br label %outer_loop

				outer_loop:
				br label %middle_loop

				middle_loop:
				br label %inner_loop

				inner_loop:
				br i1 undef, label %inner_loop_end, label %if

				if:
				%sub = sub nsw i32 0, %nx
				store i32 %sub, i32* %ptr, align 4
				br label %inner_loop_end

				inner_loop_end:
				br i1 undef, label %inner_loop, label %middle_loop_end

				middle_loop_end:
				br i1 undef, label %middle_loop, label %outer_loop_end

				outer_loop_end:
				br label %outer_loop
				}

				; We have a diamond starting from %if, but %if.if is also reachable from %loop,
				; so %gep should not be conditionally hoisted.
				; CHECK-LABEL: @diamond_with_extra_in_edge
				define void @diamond_with_extra_in_edge(i32* %ptr1, i32* %ptr2, i32 %arg) {
				; CHECK-LABEL: entry:
				; CHECK-DAG: %cmp2 = icmp ne i32 0, %arg
				; CHECK-DAG: %gep = getelementptr i32, i32* %ptr1, i32 4
				; CHECK: br label %loop
				entry:
				br label %loop

				loop:
				%phi1 = phi i32 [ 0, %entry ], [ %phi2, %then ]
				%cmp1 = icmp ugt i32 16, %phi1
				br i1 %cmp1, label %if, label %if.if

				if:
				%cmp2 = icmp ne i32 0, %arg
				br i1 %cmp2, label %if.if, label %if.else

				if.if:
				%gep = getelementptr i32, i32* %ptr1, i32 4
				%val = load i32, i32* %gep, align 4
				br label %then

				if.else:
				br label %then

				then:
				%phi2 = phi i32 [ %val, %if.if ], [ %phi1, %if.else ]
				store i32 %phi2, i32* %ptr2, align 4
				br label %loop
				}

				; %loop/%if/%then form a triangle, but %loop/%if/%then/%end also form a diamond.
				; The triangle should be picked for conditional hoisting.
				; CHECK-LABEL: @both_triangle_and_diamond
				define void @both_triangle_and_diamond(i32* %ptr1, i32* %ptr2, i32 %arg) {
				; CHECK-LABEL: entry:
				; CHECK-DAG: %cmp1 = icmp ne i32 0, %arg
				; CHECK-DAG: %gep = getelementptr i32, i32* %ptr1, i32 4
				; CHECK: br i1 %cmp1, label %[[IF_LICM:.]], label %[[THEN_LICM:.]]
				entry:
				br label %loop

				; CHECK: [[IF_LICM]]:
				; CHECK: br label %[[THEN_LICM]]

				; CHECK: [[THEN_LICM]]:
				; CHECK: %phi2 = phi i32 [ 0, %[[IF_LICM]] ], [ 1, %entry ]
				; CHECK: br label %loop

				loop:
				%phi1 = phi i32 [ 0, %entry ], [ %phi3, %end ]
				%cmp1 = icmp ne i32 0, %arg
				br i1 %cmp1, label %if, label %then

				if:
				%gep = getelementptr i32, i32* %ptr1, i32 4
				%val = load i32, i32* %gep, align 4
				%cmp2 = icmp ugt i32 16, %phi1
				br i1 %cmp2, label %end, label %then

				then:
				%phi2 = phi i32 [ 0, %if ], [ 1, %loop ]
				br label %end

				end:
				%phi3 = phi i32 [ %phi2, %then ], [ %val, %if ]
				store i32 %phi3, i32* %ptr2, align 4
				br label %loop
				}

				; We shouldn't duplicate the branch at the end of %loop and should instead hoist
				; %val to %entry.
				; CHECK-LABEL: @same_destination_branch
				define i32 @same_destination_branch(i32 %arg1, i32 %arg2) {
				; CHECK-LABEL: entry:
				; CHECK-DAG: %cmp1 = icmp ne i32 %arg2, 0
				; CHECK-DAG: %val = add i32 %arg1, 1
				; CHECK: br label %loop
				entry:
				br label %loop

				loop:
				%phi = phi i32 [ 0, %entry ], [ %add, %then ]
				%add = add i32 %phi, 1
				%cmp1 = icmp ne i32 %arg2, 0
				br i1 %cmp1, label %if, label %if

				if:
				%val = add i32 %arg1, 1
				br label %then

				then:
				%cmp2 = icmp ne i32 %val, %phi
				br i1 %cmp2, label %loop, label %end

				end:
				ret i32 %val
				}

				; Diamond-like control flow but the left/right blocks actually have the same
				; destinations.
				; TODO: We could potentially hoist all of phi2-4, but currently only hoist phi2.
				; CHECK-LABEL: @diamond_like_same_destinations
				define i32 @diamond_like_same_destinations(i32 %arg1, i32 %arg2) {
				; CHECK-LABEL: entry:
				; CHECK-DAG: %cmp1 = icmp ne i32 %arg1, 0
				; CHECK-DAG: %cmp2 = icmp ugt i32 %arg2, 1
				; CHECK-DAG: %cmp3 = icmp ugt i32 %arg2, 2
				; CHECK: br i1 %cmp1, label %[[LEFT1_LICM:.]], label %[[RIGHT1_LICM:.]]
				entry:
				br label %loop

				; CHECK: [[LEFT1_LICM]]:
				; CHECK: br label %[[LEFT2_LICM:.*]]

				; CHECK: [[RIGHT1_LICM]]:
				; CHECK: br label %[[LEFT2_LICM]]

				; CHECK: [[LEFT2_LICM]]:
				; CHECK: %phi2 = phi i32 [ 0, %[[LEFT1_LICM]] ], [ 1, %[[RIGHT1_LICM]] ]
				; CHECK: br label %loop

				loop:
				%phi1 = phi i32 [ 0, %entry ], [ %add, %loopend ]
				%add = add i32 %phi1, 1
				%cmp1 = icmp ne i32 %arg1, 0
				br i1 %cmp1, label %left1, label %right1

				left1:
				%cmp2 = icmp ugt i32 %arg2, 1
				br i1 %cmp2, label %left2, label %right2

				right1:
				%cmp3 = icmp ugt i32 %arg2, 2
				br i1 %cmp3, label %left2, label %right2

				left2:
				%phi2 = phi i32 [ 0, %left1 ], [ 1, %right1 ]
				br label %loopend

				right2:
				%phi3 = phi i32 [ 2, %left1 ], [ 3, %right1 ]
				br label %loopend

				loopend:
				%phi4 = phi i32 [ %phi2, %left2 ], [ %phi3, %right2 ]
				%cmp4 = icmp ne i32 %phi1, 32
				br i1 %cmp4, label %loop, label %end

				end:
				ret i32 %phi4
				}

				; A phi with multiple incoming values for the same block due to a branch with
				; two destinations that are actually the same. We can't hoist this.
				; TODO: This could be hoisted by erasing one of the incoming values.
				; CHECK-LABEL: @phi_multiple_values_same_block
				define i32 @phi_multiple_values_same_block(i32 %arg) {
				; CHECK-LABEL: entry:
				; CHECK: %cmp = icmp sgt i32 %arg, 4
				; CHECK-NOT: phi
				; CHECK: br label %loop
				entry:
				br label %loop

				loop:
				%cmp = icmp sgt i32 %arg, 4
				br i1 %cmp, label %if, label %then

				if:
				br i1 undef, label %then, label %then

				then:
				%phi = phi i32 [ %arg, %loop ], [ 1, %if ], [ 1, %if ]
				br i1 undef, label %exit, label %loop

				exit:
				ret i32 %phi
				}

				; %phi is conditionally used in %d, and the store that %d is used in cannot be
				; hoisted. This means that we have to rehoist %d, but have to make sure to
				; rehoist it after %phi.
				; CHECK-LABEL: @phi_conditional_use
				define i64 @phi_conditional_use(i32 %f, i32* %g) {
				; CHECK-LABEL: entry:
				; CHECK: %cmp1 = icmp eq i32 %f, 1
				; CHECK: %cmp2 = icmp eq i32 %f, 0
				; CHECK: br i1 %cmp1, label %[[IF_END_LICM:.]], label %[[IF_THEN_LICM:.]]
				entry:
				%cmp1 = icmp eq i32 %f, 1
				%cmp2 = icmp eq i32 %f, 0
				br label %loop

				; CHECK: [[IF_THEN_LICM]]:
				; CHECK: br label %[[IF_END_LICM]]

				; CHECK: [[IF_END_LICM]]:
				; CHECK: %phi = phi i64 [ 0, %entry ], [ 1, %[[IF_THEN_LICM]] ]
				; CHECK: %d = getelementptr inbounds i32, i32* %g, i64 %phi
				; CHECK: i1 %cmp2, label %[[LOOP_BACKEDGE_LICM:.]], label %[[IF_THEN2_LICM:.]]

				; CHECK: [[IF_THEN2_LICM]]:
				; CHECK: br label %[[LOOP_BACKEDGE_LICM]]

				; CHECK: [[LOOP_BACKEDGE_LICM]]:
				; CHECK: br label %loop

				loop:
				br i1 %cmp1, label %if.end, label %if.then

				if.then:
				br label %if.end

				if.end:
				%phi = phi i64 [ 0, %loop ], [ 1, %if.then ]
				br i1 %cmp2, label %loop.backedge, label %if.then2

				if.then2:
				%d = getelementptr inbounds i32, i32* %g, i64 %phi
				store i32 1, i32* %d, align 4
				br label %loop.backedge

				loop.backedge:
				br label %loop
				}

				; As above, but we have two such phis
				; CHECK-LABEL: @phi_conditional_use_twice
				define i64 @phi_conditional_use_twice(i32 %f, i32* %g) {
				; CHECK-LABEL: entry:
				; CHECK: %cmp1 = icmp eq i32 %f, 1
				; CHECK: %cmp2 = icmp eq i32 %f, 0
				; CHECK: br i1 %cmp1, label %[[IF_END_LICM:.]], label %[[IF_THEN_LICM:.]]
				entry:
				%cmp1 = icmp eq i32 %f, 1
				%cmp2 = icmp eq i32 %f, 0
				%cmp3 = icmp sgt i32 %f, 0
				br label %loop

				; CHECK: [[IF_THEN_LICM]]:
				; CHECK: br label %[[IF_END_LICM]]

				; CHECK: [[IF_END_LICM]]:
				; CHECK: %phi1 = phi i64 [ 0, %entry ], [ 1, %[[IF_THEN_LICM]] ]
				; CHECK: %d = getelementptr inbounds i32, i32* %g, i64 %phi1
				; CHECK: i1 %cmp2, label %[[IF_END2_LICM:.]], label %[[IF_THEN2_LICM:.]]

				; CHECK: [[IF_THEN2_LICM]]:
				; CHECK: br label %[[IF_END2_LICM]]

				; CHECK: [[IF_END2_LICM]]:
				; CHECK: %phi2 = phi i64 [ 2, %[[IF_END_LICM]] ], [ 3, %[[IF_THEN2_LICM]] ]
				; CHECK: %e = getelementptr inbounds i32, i32* %g, i64 %phi2
				; CHECK: i1 %cmp3, label %[[LOOP_BACKEDGE_LICM:.]], label %[[IF_THEN3_LICM:.]]

				; CHECK: [[IF_THEN3_LICM]]:
				; CHECK: br label %[[LOOP_BACKEDGE_LICM]]

				; CHECK: [[LOOP_BACKEDGE_LICM]]:
				; CHECK: br label %loop

				loop:
				br i1 %cmp1, label %if.end, label %if.then

				if.then:
				br label %if.end

				if.end:
				%phi1 = phi i64 [ 0, %loop ], [ 1, %if.then ]
				br i1 %cmp2, label %if.end2, label %if.then2

				if.then2:
				%d = getelementptr inbounds i32, i32* %g, i64 %phi1
				store i32 1, i32* %d, align 4
				br label %if.end2

				if.end2:
				%phi2 = phi i64 [ 2, %if.end ], [ 3, %if.then2 ]
				br i1 %cmp3, label %loop.backedge, label %if.then3

				if.then3:
				%e = getelementptr inbounds i32, i32* %g, i64 %phi2
				store i32 1, i32* %e, align 4
				br label %loop.backedge

				loop.backedge:
				br label %loop
				}

test/Transforms/LoopVectorize/invariant-store-vectorization.ll

	Show First 20 Lines • Show All 260 Lines • ▼ Show 20 Lines
	}			}

	; invariant val stored to invariant address predicated on invariant condition			; invariant val stored to invariant address predicated on invariant condition
	; This is not treated as a predicated store since the block the store belongs to			; This is not treated as a predicated store since the block the store belongs to
	; is the latch block (which doesn't need to be predicated).			; is the latch block (which doesn't need to be predicated).
	; variant/invariant values being stored to invariant address.			; variant/invariant values being stored to invariant address.
	; test checks that the last element of the phi is extracted and scalar stored			; test checks that the last element of the phi is extracted and scalar stored
	; into the uniform address within the loop.			; into the uniform address within the loop.
	; Since the condition and the phi is loop invariant, they are LICM'ed after			; Since the condition and the phi is loop invariant, they are LICM'ed before
	; vectorization.			; vectorization.
	; CHECK-LABEL: inv_val_store_to_inv_address_conditional_inv			; CHECK-LABEL: inv_val_store_to_inv_address_conditional_inv
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
				; CHECK-NEXT: [[B1:%.]] = bitcast i32 [[B:%.]] to i8
				; CHECK-NEXT: [[A4:%.]] = bitcast i32 [[A:%.]] to i8
	; CHECK-NEXT: [[NTRUNC:%.]] = trunc i64 [[N:%.]] to i32			; CHECK-NEXT: [[NTRUNC:%.]] = trunc i64 [[N:%.]] to i32
	; CHECK-NEXT: [[CMP:%.]] = icmp eq i32 [[NTRUNC]], [[K:%.]]			; CHECK-NEXT: [[CMP:%.]] = icmp eq i32 [[NTRUNC]], [[K:%.]]
				; CHECK-NEXT: br i1 [[CMP]], label %[[COND_STORE_LICM:.]], label %[[COND_STORE_K_LICM:.]]
				; CHECK: [[COND_STORE_LICM]]:
				; CHECK-NEXT: br label %[[LATCH_LICM:.*]]
				; CHECK: [[COND_STORE_K_LICM]]:
				; CHECK-NEXT: br label %[[LATCH_LICM]]
				; CHECK: [[LATCH_LICM]]:
				; CHECK-NEXT: [[STOREVAL:%.*]] = phi i32 [ [[NTRUNC]], %[[COND_STORE_LICM]] ], [ [[K]], %[[COND_STORE_K_LICM]] ]
	; CHECK-NEXT: [[TMP0:%.*]] = icmp sgt i64 [[N]], 1			; CHECK-NEXT: [[TMP0:%.*]] = icmp sgt i64 [[N]], 1
	; CHECK-NEXT: [[SMAX:%.*]] = select i1 [[TMP0]], i64 [[N]], i64 1			; CHECK-NEXT: [[SMAX:%.*]] = select i1 [[TMP0]], i64 [[N]], i64 1
	; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[SMAX]], 4			; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[SMAX]], 4
	; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]			; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
	; CHECK: vector.memcheck:			; CHECK: vector.memcheck:
	; CHECK-NEXT: [[A4:%.]] = bitcast i32 [[A:%.]] to i8
	; CHECK-NEXT: [[B1:%.]] = bitcast i32 [[B:%.]] to i8
	; CHECK-NEXT: [[TMP1:%.*]] = icmp sgt i64 [[N]], 1			; CHECK-NEXT: [[TMP1:%.*]] = icmp sgt i64 [[N]], 1
	; CHECK-NEXT: [[SMAX2:%.*]] = select i1 [[TMP1]], i64 [[N]], i64 1			; CHECK-NEXT: [[SMAX2:%.*]] = select i1 [[TMP1]], i64 [[N]], i64 1
	; CHECK-NEXT: [[SCEVGEP:%.]] = getelementptr i32, i32 [[B]], i64 [[SMAX2]]			; CHECK-NEXT: [[SCEVGEP:%.]] = getelementptr i32, i32 [[B]], i64 [[SMAX2]]
	; CHECK-NEXT: [[UGLYGEP:%.]] = getelementptr i8, i8 [[A4]], i64 1			; CHECK-NEXT: [[UGLYGEP:%.]] = getelementptr i8, i8 [[A4]], i64 1
	; CHECK-NEXT: [[BOUND0:%.]] = icmp ugt i8 [[UGLYGEP]], [[B1]]			; CHECK-NEXT: [[BOUND0:%.]] = icmp ugt i8 [[UGLYGEP]], [[B1]]
	; CHECK-NEXT: [[BOUND1:%.]] = icmp ugt i32 [[SCEVGEP]], [[A]]			; CHECK-NEXT: [[BOUND1:%.]] = icmp ugt i32 [[SCEVGEP]], [[A]]
	; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]			; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
	; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]			; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[SMAX]], 9223372036854775804			; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[SMAX]], 9223372036854775804
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT5:%.*]] = insertelement <4 x i32> undef, i32 [[NTRUNC]], i32 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT5:%.*]] = insertelement <4 x i32> undef, i32 [[NTRUNC]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT6:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT5]], <4 x i32> undef, <4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT6:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT5]], <4 x i32> undef, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x i1> undef, i1 [[CMP]], i32 3
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x i32> undef, i32 [[K]], i32 3
	; CHECK-NEXT: [[PREDPHI:%.*]] = select <4 x i1> [[TMP2]], <4 x i32> [[BROADCAST_SPLAT6]], <4 x i32> [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[PREDPHI]], i32 3
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[INDEX]]
	; CHECK-NEXT: [[TMP7:%.]] = bitcast i32 [[TMP6]] to <4 x i32>*			; CHECK-NEXT: [[TMP7:%.]] = bitcast i32 [[TMP6]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[BROADCAST_SPLAT6]], <4 x i32>* [[TMP7]], align 4			; CHECK-NEXT: store <4 x i32> [[BROADCAST_SPLAT6]], <4 x i32>* [[TMP7]], align 4
	; CHECK-NEXT: store i32 [[TMP5]], i32* [[A]], align 4			; CHECK-NEXT: store i32 [[STOREVAL]], i32* [[A]], align 4
	; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4
	; CHECK-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]]			; CHECK-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[SMAX]], [[N_VEC]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[SMAX]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ], [ 0, [[VECTOR_MEMCHECK]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ], [ 0, [[VECTOR_MEMCHECK]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[I:%.]] = phi i64 [ [[I_NEXT:%.]], [[LATCH:%.*]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[I:%.]] = phi i64 [ [[I_NEXT:%.]], [[LATCH:%.*]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[I]]			; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[I]]
	; CHECK-NEXT: store i32 [[NTRUNC]], i32* [[TMP1]], align 4			; CHECK-NEXT: store i32 [[NTRUNC]], i32* [[TMP1]], align 4
	; CHECK-NEXT: br i1 [[CMP]], label [[COND_STORE:%.]], label [[COND_STORE_K:%.]]			; CHECK-NEXT: br i1 [[CMP]], label [[COND_STORE:%.]], label [[COND_STORE_K:%.]]
	; CHECK: cond_store:			; CHECK: cond_store:
	; CHECK-NEXT: br label [[LATCH]]			; CHECK-NEXT: br label [[LATCH]]
	; CHECK: cond_store_k:			; CHECK: cond_store_k:
	; CHECK-NEXT: br label [[LATCH]]			; CHECK-NEXT: br label [[LATCH]]
	; CHECK: latch:			; CHECK: latch:
	; CHECK-NEXT: [[STOREVAL:%.*]] = phi i32 [ [[NTRUNC]], [[COND_STORE]] ], [ [[K]], [[COND_STORE_K]] ]
	; CHECK-NEXT: store i32 [[STOREVAL]], i32* [[A]], align 4			; CHECK-NEXT: store i32 [[STOREVAL]], i32* [[A]], align 4
	; CHECK-NEXT: [[I_NEXT]] = add nuw nsw i64 [[I]], 1			; CHECK-NEXT: [[I_NEXT]] = add nuw nsw i64 [[I]], 1
	; CHECK-NEXT: [[COND:%.*]] = icmp slt i64 [[I_NEXT]], [[N]]			; CHECK-NEXT: [[COND:%.*]] = icmp slt i64 [[I_NEXT]], [[N]]
	; CHECK-NEXT: br i1 [[COND]], label [[FOR_BODY]], label [[FOR_END_LOOPEXIT:%.*]]			; CHECK-NEXT: br i1 [[COND]], label [[FOR_BODY]], label [[FOR_END_LOOPEXIT:%.*]]
	; CHECK: for.end.loopexit:			; CHECK: for.end.loopexit:
	; CHECK-NEXT: br label [[FOR_END]]			; CHECK-NEXT: br label [[FOR_END]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	▲ Show 20 Lines • Show All 261 Lines • Show Last 20 Lines