This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
10/18
LoopUnswitch.cpp
-
test/Transforms/LoopUnswitch/
-
Transforms/
-
LoopUnswitch/
-
partial-unswitch-mssa-threshold.ll
-
partial-unswitch.ll

Differential D93764

[LoopUnswitch] Implement first version of partial unswitching.
ClosedPublic

Authored by fhahn on Dec 23 2020, 7:25 AM.

Download Raw Diff

Details

Reviewers

efriedma
reames
jdoerfert
jonpa

Commits

rGbee486851c1a: [LoopUnswitch] Implement first version of partial unswitching.

Summary

This patch applies the idea from D93734 to LoopUnswitch.

It adds support for unswitching on conditions that are only
invariant along certain paths through a loop.

In particular, it targets conditions in the loop header that
depend on values loaded from memory. If either path from
the true or false successor through the loop does not modify
memory, perform partial loop unswitching.

That is, duplicate the instructions feeding the condition in the pre-header.
Then unswitch on the duplicated condition. The condition is now known
in the unswitched version for the 'invariant' path through the original loop.

On caveat of this approach is that one of the loops created can be partially
unswitched again. To avoid this behavior, llvm.loop.unswitch.partial.disable
metadata is added to the unswitched loops, to avoid subsequent partial
unswitching.

If that's the approach to go, I can move the code handling the metadata kind
into separate functions.

This increases the cases we unswitch quite a bit in SPEC2006/SPEC2000 &
MultiSource. It also allows us to eliminate a dead loop in SPEC2017's omnetpp

Tests: 236
Same hash: 170 (filtered out)
Remaining: 66
Metric: loop-unswitch.NumBranches

Program                                        base   patch  diff
 test-suite...000/255.vortex/255.vortex.test     2.00  23.00 1050.0%
 test-suite...T2006/401.bzip2/401.bzip2.test     7.00  55.00 685.7%
 test-suite :: External/Nurbs/nurbs.test         5.00  26.00 420.0%
 test-suite...s-C/unix-smail/unix-smail.test     1.00   3.00 200.0%
 test-suite.../Prolangs-C++/ocean/ocean.test     1.00   3.00 200.0%
 test-suite...tions/lambda-0.1.3/lambda.test     1.00   3.00 200.0%
 test-suite...yApps-C++/PENNANT/PENNANT.test     2.00   5.00 150.0%
 test-suite...marks/Ptrdist/yacr2/yacr2.test     1.00   2.00 100.0%
 test-suite...lications/viterbi/viterbi.test     1.00   2.00 100.0%
 test-suite...plications/d/make_dparser.test    12.00  24.00 100.0%
 test-suite...CFP2006/433.milc/433.milc.test    14.00  27.00 92.9%
 test-suite.../Applications/lemon/lemon.test     7.00  12.00 71.4%
 test-suite...ce/Applications/Burg/burg.test     6.00  10.00 66.7%
 test-suite...T2006/473.astar/473.astar.test    16.00  26.00 62.5%
 test-suite...marks/7zip/7zip-benchmark.test    78.00 121.00 55.1%

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

fhahn created this revision.Dec 23 2020, 7:25 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptDec 23 2020, 7:25 AM

fhahn requested review of this revision.Dec 23 2020, 7:25 AM

Herald added a project: Restricted Project. · View Herald TranscriptDec 23 2020, 7:25 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B83400: Diff 313557.Dec 23 2020, 7:32 AM

fhahn mentioned this in D93734: [LoopDeletion] Insert an early exit from dead path in loop.Dec 23 2020, 7:42 AM

Add comments and tests

fhahn retitled this revision from [LoopUnswitch] Implement first version of partial unswitching. (WIP) to [LoopUnswitch] Implement first version of partial unswitching..Dec 27 2020, 11:43 AM

fhahn edited the summary of this revision. (Show Details)

fhahn added reviewers: efriedma, reames, jdoerfert, jonpa.Dec 27 2020, 11:45 AM

xbolva00 added a subscriber: xbolva00.Dec 27 2020, 11:54 AM

Harbormaster completed remote builds in B83545: Diff 313796.Dec 27 2020, 12:12 PM

This patch applies the idea from D93734 to LoopUnswitch.

One disadvantage here is that LoopUnswitch has a size-limitation on the loop it handles, which is not really true: If a no-side-effects loop is created and the intent is to later delete it, it should not be limited by the size heuristic.

On the other hand, I wonder if the idea here is to only isolate totally dead paths per this first draft or perhaps also paths where specifically the condition is invariant? This would be the advantage I can see compared to D93734: to increase unswitching where there may be side-effects that is known not to alias with the ToDuplicate instructions.

What about the original loop: If it is entered (instead of the new smaller loop your patch creates), it may still be true that the path where the condition becomes invariant is entered. So (at least theoretically) the full loop should in that case branch into the smaller loop. This would also eliminate the need to duplicate instructions in the preheader.

If we only want to achieve the "early exit" result, in other words that this patch will create a new loop with no side-effects (and then depend on later passes to remove it), it seems simpler to me to have this as a separate pass (or perhaps as a fix-up in LoopDeletion per D93734).

GCC has separate pass for this - loopsplit. It can handle also general case like:

for (i = 0; i < 100; i++)
     {
       if (i < 50)
         A;
       else
         B;
     }
   into:
   for (i = 0; i < 50; i++)
     {
       A;
     }
   for (; i < 100; i++)
     {
       B;
     }

https://godbolt.org/z/zqsPY6

Updated to use MemorySSA to allow non-clobbering stores in the 'invariant' part. There's a crash while updating MemorySSA, probably because of a new case of loop not accounted in the existing updating. Will fix tomorrow.

In D93764#2472373, @jonpa wrote:

This patch applies the idea from D93734 to LoopUnswitch.

One disadvantage here is that LoopUnswitch has a size-limitation on the loop it handles, which is not really true: If a no-side-effects loop is created and the intent is to later delete it, it should not be limited by the size heuristic.

That is true, but the size of the overall loop is probably not too big in practice, if there is a no-op part. If we implement your suggestion below (to jump to the smaller one for the invariant case in the larger one), the size increase should be negligible. It would be interesting to check how many dead loops get missed by unswitching. If there are enough cases, it should be easy to share the detection logic between both passes and also use it in LoopDeletion.

On the other hand, I wonder if the idea here is to only isolate totally dead paths per this first draft or perhaps also paths where specifically the condition is invariant? This would be the advantage I can see compared to D93734: to increase unswitching where there may be side-effects that is known not to alias with the ToDuplicate instructions.

Yes, I think the main advantage of the 'partial' unswitching is that neither path needs to be dead, it just cannot clobber the condition we unswitch on. So for example, reductions would be fine or stores to locations that do not alias any loads feeding the condition. The original version of the patch did not yet implement that. But I updated the patch to use MemorySSA to also catch this case.

What about the original loop: If it is entered (instead of the new smaller loop your patch creates), it may still be true that the path where the condition becomes invariant is entered. So (at least theoretically) the full loop should in that case branch into the smaller loop. This would also eliminate the need to duplicate instructions in the preheader.

Yes, in some cases it would be possible to just jump to the smaller loop. Unfortunately this will require some extra checks and work to set up the continuation values in the smaller loop. This is certainly something interesting worth exploring, but probably not in the initial version.

If we only want to achieve the "early exit" result, in other words that this patch will create a new loop with no side-effects (and then depend on later passes to remove it), it seems simpler to me to have this as a separate pass (or perhaps as a fix-up in LoopDeletion per D93734).

The aim for this patch is to catch more cases in general, including cases that are not no-ops. The updated version supports memory ops that do not alias the loads feeding the condition.

In D93764#2472579, @xbolva00 wrote:
GCC has separate pass for this - loopsplit. It can handle also general case like:
for (i = 0; i < 100; i++)
     {
       if (i < 50)
         A;
       else
         B;
     }
   into:
   for (i = 0; i < 50; i++)
     {
       A;
     }
   for (; i < 100; i++)
     {
       B;
     }
https://godbolt.org/z/zqsPY6

I think this is a separate issue (the condition is dependent on the induction variable), but nevertheless interesting to pursue. IIRC there is already a pass that tries to do something similar, but I don't know which one of the top of my head. Will check.

Harbormaster completed remote builds in B83611: Diff 313906.Dec 28 2020, 3:38 PM

xbolva00 added a comment.Dec 28 2020, 8:32 PM

This comment was removed by xbolva00.

In D93764#2473241, @fhahn wrote:

I think this is a separate issue (the condition is dependent on the induction variable), but nevertheless interesting to pursue. IIRC there is already a pass that tries to do something similar, but I don't know which one of the top of my head. Will check.

This is InductiveRangeCheckElimination which implements a form of iteration set splitting. I'll warn that IRCE has been historically bug prone, and is not on by default upstream.

Overall, I think this is the right place for such logic. Some initial code comments.

TBH, I think it would be nicer to "search for the paths" instead. Though, a TODO might be sufficient for the start.
What I think would be nice if we do not only deal with the conditional in the header, still thinking of *l && *r.

llvm/lib/Transforms/Scalar/LoopUnswitch.cpp
675	`.pop_back_val()` ?
677	Does code generation handle the case where a loop invariant instruction is inside the loop? With the comment below, this might be changed to `L->contains(I)` instead.
680	"Later" we could allow anything that is speculatively executable and loads, right?

In D93764#2473944, @jdoerfert wrote:

Overall, I think this is the right place for such logic. Some initial code comments.

Thanks for taking a look!

TBH, I think it would be nicer to "search for the paths" instead. Though, a TODO might be sufficient for the start.

Do you mean not using MemorySSA initially? In a way, the current approach first collects all blocks on a path and then walks MemorySSA downwards to check all potential clobbers in blocks on the path. Instead of that, we could just iterate over all instructions in the blocks and bail out if any has side-effects, as in the original version.

The main advantage of the MemorySSA version is that it allows us to just check the potentially clobbering instructions, so this is mostly an optimization. If it makes things easier, that could be done as a separate patch?

What I think would be nice if we do not only deal with the conditional in the header, still thinking of *l && *r.

IIUC there are 2 things that could be improved: currently the condition must be an icmp, but this is mostly an artificial initial restriction that is easy to lift subsequently, e.g. to allow OR/AND instructions as conditions.

And we could also look for conditional branches in other blocks than the header. There very likely will be cases where this helps, but I am not sure how frequently it will trigger compared to the additional compile-time spent. Certainly something to explore further though.

Harbormaster completed remote builds in B83805: Diff 314219.Jan 1 2021, 7:56 AM

Add a few additional test cases.

llvm/lib/Transforms/Scalar/LoopUnswitch.cpp
677	Loop-invariant instructions will be duplicated outside the loop at the moment (added a test). With 'handling them', do you mean updating the uses inside the loop to use the hoisted instruction? Not sure if that will help a lot in practice, as they should already have been moved out I think and/or GVN/LICM will clean them up.
680	yep I think so.

Harbormaster completed remote builds in B83814: Diff 314234.Jan 1 2021, 3:27 PM

I refined one of my comments. I'm other than that OK with this. Let's wait for a few days and get a second opinion.

llvm/lib/Transforms/Scalar/LoopUnswitch.cpp
677	So, my problem was that I assumed something like X below might be considered "loop invariant". However, the API in `llvm::Loop` does simply perform a `constains` check anyway, which is what I suggested instead. That said, I think it is confusing to call `isLoopInvariant` here because what we want/need is `contains`. If someone later recognizes in `isLoopInvariant` that `X` is not loop variant, we would not put the add in the `toDuplicate` set and fail to create valid code, agreed? (Right now it would only show for GEP but it's the same story.) int A[100]; int X, a = ..., b = ...; for (...) { X = a + b; A[i] = X; }

aqjune mentioned this in D93065: [InstCombine] Disable optimizations of select instructions that causes propagation of poison values.Jan 3 2021, 7:34 AM

rebased after precommitting tests in edb52c626b5340a5a42ed833fc776bc815507283

Also updated to use !L->contains() instead of isLoopInvariant()

fhahn added inline comments.Jan 3 2021, 12:43 PM

llvm/lib/Transforms/Scalar/LoopUnswitch.cpp
677	That said, I think it is confusing to call isLoopInvariant here because what we want/need is contains Yeah, given that we only call it for instructions anyways, using `contains` seems clearer. Updated.

Harbormaster completed remote builds in B83858: Diff 314300.Jan 3 2021, 1:14 PM

Rebase & ping :)

Harbormaster completed remote builds in B84693: Diff 315804.Jan 11 2021, 8:19 AM

Generally fine, I have one questions though

llvm/lib/Transforms/Scalar/LoopUnswitch.cpp
716	Nit: `const auto &`
748	Why do we need to check these uses?

Thanks! Addressed the nit and also added an option to limit the number of memory accesses to explore, to avoid excessive compile-times in degenerate cases

fhahn added inline comments.Jan 11 2021, 2:05 PM

llvm/lib/Transforms/Scalar/LoopUnswitch.cpp
748	Unfortunately a `MemoryDef` does not necessarily have all may/must-alias defs that follow it as users. For example, I think we could have something like %0 = MemoryDef (LiveOnEntry) %1 = MemoryPhi(%0,...) %2 = MemoryDef(%1,...) ; may-alias %0 depending on what MemorySSA optimizations are applied, I think there could be similar examples with just MemoryDefs.

jdoerfert added inline comments.Jan 11 2021, 2:27 PM

llvm/lib/Transforms/Scalar/LoopUnswitch.cpp
748	I'm trying to wrap my head around this and it is probably me. I haven't worked with MSSA much. So, `AccessesToCheck` starts with the defining access for each read location, so far so good. (correct me if I'm wrong at some point) If that access is outside the loop, done. If that access is inside and aliasing a location, done. If that access is inside and not aliasing a location, why do we look at the uses? I would understand if we look at the "operands": Header: %1 = MemDef(%0,...) // clobbers %3 %2 = MemDef(%1,...) // defining access of %3 %3 = MemUse(%2,...) // in AccessedLocations Though, I assume I simply do not understand MSSA in various ways.

Harbormaster completed remote builds in B84758: Diff 315932.Jan 11 2021, 3:50 PM

fhahn added inline comments.Jan 12 2021, 11:53 AM

llvm/lib/Transforms/Scalar/LoopUnswitch.cpp
748	So, AccessesToCheck starts with the defining access for each read location, so far so good. (correct me if I'm wrong at some point) If that access is outside the loop, done. If that access is inside and aliasing a location, done. Sounds good so far. If that access is inside and not aliasing a location, why do we look at the uses? Because the uses of access `A` are the memory accesses that may read/write the memory location written by `A`, after `A`. For accesses that may access the same memory locations in a loop/cycle, there will be uses in `MemoryPhis`. A concrete example is below. To visit all potential clobbers of `; MemoryUse(1) ( %lv = load i32, i32* %ptr, align 4)` , we have to visit all uses of `%1 = MemoryDef(4) (call void @clobber())`, their uses and so on. The case where a clobber comes before the defining access in the header (as in your example) will be handled by this forward-scanning approach because there will be a `MemoryPhi` that's the defining access of `%1` in your example. Does this make sense? While looking at this again, I realized that queuing the defining access at line 692 could be overly pessimistic, if there are scenarios where it may alias a location in `AccessedLocations`, but its defining access is outside the loop. It would probably better to directly queue the uses of the defining access. But I am not sure if such a scenario can happen in practice. define i32 @test(i32* %ptr, i32 %N) { entry: br label %loop.header loop.header: ; preds = %loop.latch, %entry ; 4 = MemoryPhi({entry,liveOnEntry},{loop.latch,3}) %iv = phi i32 [ 0, %entry ], [ %iv.next, %loop.latch ] ; 1 = MemoryDef(4) call void @clobber() ; MemoryUse(1) %lv = load i32, i32* %ptr, align 4 %sc = icmp eq i32 %lv, 100 br i1 %sc, label %noclobber, label %clobber noclobber: ; preds = %loop.header br label %loop.latch clobber: ; preds = %loop.header ; 2 = MemoryDef(1) call void @clobber() br label %loop.latch loop.latch: ; preds = %clobber, %noclobber ; 3 = MemoryPhi({noclobber,1},{clobber,2}) %c = icmp ult i32 %iv, %N %iv.next = add i32 %iv, 1 br i1 %c, label %loop.header, label %exit exit: ; preds = %loop.latch ret i32 10 }

I think I got it now. Due to the PHI nodes we can follow uses and eventually visit every definition in the loop (we care about). LGTM. Feel free to wait for someone else to look at it, or not.

This revision is now accepted and ready to land.Jan 12 2021, 12:15 PM

If a dead path in a loop is unswitched into an empty loop, I suppose the idea is that LoopDeletion will later then delete it?

What about the remaining original loop: will it remain the same or will it get the edge into the dead path redirected to the new smaller loop?

mdchen added a subscriber: mdchen.Jan 13 2021, 6:53 PM

mdchen added inline comments.Jan 14 2021, 7:20 PM

llvm/lib/Transforms/Scalar/LoopUnswitch.cpp
671	I guess only the conditional value needs to be added here.
692	`MemUse` could be nullptr.
713	pop_back_val()

In D93764#2496964, @jonpa wrote:

If a dead path in a loop is unswitched into an empty loop, I suppose the idea is that LoopDeletion will later then delete it?

Yes exactly, this work is delegated to LoopDeletion; LoopUnswitch does not really care if the loop is dead at the moment or not, just if a condition in the body can be simplified by moving out a memory dependent check.

What about the remaining original loop: will it remain the same or will it get the edge into the dead path redirected to the new smaller loop?

At the moment, it remains the same. But it would be a great extension to update the body to jump to the unswitched version, if possible. I think this would best be done in a separate patch.

fhahn marked an inline comment as done.Jan 17 2021, 10:36 AM

fhahn added inline comments.

llvm/lib/Transforms/Scalar/LoopUnswitch.cpp
671	Yes that would be sufficient, if the loop below would support compares, which it does not in the current version. It would be good to extend the supported instructions below to any arithmetic/compare instruction in the future though.
692	I think we should not reach the code, if `nullptr` is assigned to `MemUse` in the `if` condition.
713	Thanks, I updated the code!

Harbormaster completed remote builds in B85527: Diff 317231.Jan 17 2021, 11:13 AM

Thanks everyone for the feedback. I am planning on landing the change tomorrow.

Closed by commit rGbee486851c1a: [LoopUnswitch] Implement first version of partial unswitching. (authored by fhahn). · Explain WhyJan 21 2021, 1:47 AM

This revision was automatically updated to reflect the committed changes.

fhahn added a commit: rGbee486851c1a: [LoopUnswitch] Implement first version of partial unswitching..

To give you an early heads up, we're seeing crashes in Firefox after this commit. Unfortunately it's only happening on Mac, which I am not equipped to debug. I may need to try enlisting a colleague to investigate and/or throw more test suites at it and hope we can capture something on Linux or Windows.

Since there's a release branch point coming soon, if we don't get anywhere by the end of the week, what do you think about reverting until after 12.0.0-rc1 is tagged?

I'm also seeing lots of hangs/infinite loops in code built after this commit. I don't have anything nicely reduced to dig in on (yet), but it broke a significant amount of my testing setup.

In D93764#2514037, @dmajor wrote:

To give you an early heads up, we're seeing crashes in Firefox after this commit. Unfortunately it's only happening on Mac, which I am not equipped to debug. I may need to try enlisting a colleague to investigate and/or throw more test suites at it and hope we can capture something on Linux or Windows.

Since there's a release branch point coming soon, if we don't get anywhere by the end of the week, what do you think about reverting until after 12.0.0-rc1 is tagged?

In D93764#2514813, @mstorsjo wrote:

I'm also seeing lots of hangs/infinite loops in code built after this commit. I don't have anything nicely reduced to dig in on (yet), but it broke a significant amount of my testing setup.

Thanks for the heads up! It would be great to get a reproducer, but I'll revert by the end of day if the issue is not resolved by then.

I've got one problem narrowed down to one source file in a project; https://martin.st/temp/dav1d-thread_task-preproc.c, compiled with clang -target x86_64-w64-mingw32 -c -O3 does trigger the issue - I haven't dug in closer to zoom in on exactly what it is though - hopefully you can spot the differences.

dmgreen added a subscriber: dmgreen.Jan 22 2021, 5:53 AM

In D93764#2515106, @mstorsjo wrote:

I've got one problem narrowed down to one source file in a project; https://martin.st/temp/dav1d-thread_task-preproc.c, compiled with clang -target x86_64-w64-mingw32 -c -O3 does trigger the issue - I haven't dug in closer to zoom in on exactly what it is though - hopefully you can spot the differences.

One issue with the file was that the patch did not account for atomic loads and would add additional atomic loads. I fixed that in 86991d323133. Please let me know if that fixes your issue. Otherwise I'll revert the patch to investigate further.

In D93764#2515449, @fhahn wrote:

One issue with the file was that the patch did not account for atomic loads and would add additional atomic loads. I fixed that in 86991d323133. Please let me know if that fixes your issue. Otherwise I'll revert the patch to investigate further.

Ah, that sounds promising! On our side we haven't got down to specific details yet but it _may_ be related to atomics. I've kicked off new tests on 86991d323133, should have results in 2-3 hours.

In D93764#2515449, @fhahn wrote:

In D93764#2515106, @mstorsjo wrote:

I've got one problem narrowed down to one source file in a project; https://martin.st/temp/dav1d-thread_task-preproc.c, compiled with clang -target x86_64-w64-mingw32 -c -O3 does trigger the issue - I haven't dug in closer to zoom in on exactly what it is though - hopefully you can spot the differences.

One issue with the file was that the patch did not account for atomic loads and would add additional atomic loads. I fixed that in 86991d323133. Please let me know if that fixes your issue. Otherwise I'll revert the patch to investigate further.

Thanks! I think it looks mostly good now - I'm still seeing some breakage compared to thursday, but I'll have to bisect those errors separately to see where they broke.

86991d323133 looks good from our side, thank you!

In D93764#2515922, @mstorsjo wrote:

Thanks! I think it looks mostly good now - I'm still seeing some breakage compared to thursday, but I'll have to bisect those errors separately to see where they broke.

The other issues turned out to be false alarms, so I think everything should be in order now again. Thanks!

Thank you very much for the quick feedback! Glad that it looks like the issues have been resolved.

ChuanqiXu added a subscriber: ChuanqiXu.Jan 27 2021, 5:47 PM

Hi!

I started seeing crashes with this patch:

opt -o /dev/null bbi-52312.ll -loop-unswitch

Result:

opt: ../lib/Analysis/MemorySSAUpdater.cpp:1154: void llvm::MemorySSAUpdater::applyInsertUpdates(ArrayRef<llvm::CFGUpdate>, llvm::DominatorTree &, const GraphDiff<llvm::BasicBlock *> *): Assertion `IDom && "Block must have a valid IDom."' failed.

bbi-52312.ll4 KBDownload

For another program I see

opt: ../lib/Analysis/MemorySSA.cpp:2032: void llvm::MemorySSA::verifyOrderingDominationAndDefUses(llvm::Function &) const: Assertion `dominates(MD, U) && "Memory Def does not dominate it's uses"' failed.

when I run

opt -O1 -inline -loop-unswitch -verify-memoryssa

with this patch but unfortunately I haven't been able to reduce that down to a simpler command line (it doesn't reproduce if I run the passes I get from -debug-pass=Arguments) and the input uses some features we only have downstream so I don't have a reproducer I can share for this :(

In D93764#2527703, @uabelho wrote:

Hi!

I started seeing crashes with this patch:

opt -o /dev/null bbi-52312.ll -loop-unswitch

Result:

opt: ../lib/Analysis/MemorySSAUpdater.cpp:1154: void llvm::MemorySSAUpdater::applyInsertUpdates(ArrayRef<llvm::CFGUpdate>, llvm::DominatorTree &, const GraphDiff<llvm::BasicBlock *> *): Assertion `IDom && "Block must have a valid IDom."' failed.

bbi-52312.ll4 KBDownload

Thanks ,let me take a look!

In D93764#2531332, @fhahn wrote:

In D93764#2527703, @uabelho wrote:

Hi!

I started seeing crashes with this patch:

opt -o /dev/null bbi-52312.ll -loop-unswitch

Result:

opt: ../lib/Analysis/MemorySSAUpdater.cpp:1154: void llvm::MemorySSAUpdater::applyInsertUpdates(ArrayRef<llvm::CFGUpdate>, llvm::DominatorTree &, const GraphDiff<llvm::BasicBlock *> *): Assertion `IDom && "Block must have a valid IDom."' failed.

bbi-52312.ll4 KBDownload

Thanks ,let me take a look!

Should be fixed by 10c57268c074. Please let me know if you are seeing any other issues.

In D93764#2532219, @fhahn wrote:

In D93764#2531332, @fhahn wrote:

In D93764#2527703, @uabelho wrote:

Hi!

I started seeing crashes with this patch:

opt -o /dev/null bbi-52312.ll -loop-unswitch

Result:

opt: ../lib/Analysis/MemorySSAUpdater.cpp:1154: void llvm::MemorySSAUpdater::applyInsertUpdates(ArrayRef<llvm::CFGUpdate>, llvm::DominatorTree &, const GraphDiff<llvm::BasicBlock *> *): Assertion `IDom && "Block must have a valid IDom."' failed.

bbi-52312.ll4 KBDownload

Thanks ,let me take a look!

Should be fixed by 10c57268c074. Please let me know if you are seeing any other issues.

I guess that one needs to be backported to 12.x, after cooking in main for a while.

In D93764#2532220, @mstorsjo wrote:

In D93764#2532219, @fhahn wrote:

Should be fixed by 10c57268c074. Please let me know if you are seeing any other issues.

I guess that one needs to be backported to 12.x, after cooking in main for a while.

I picked the fix on 12.x already for https://bugs.llvm.org/show_bug.cgi?id=48865. I'll keep an eye out for any issues.

In D93764#2532219, @fhahn wrote:

Should be fixed by 10c57268c074. Please let me know if you are seeing any other issues.

Thanks! Both problems I saw went away with the fix!

Maybe I'm missing something, but this change doesn't seem to be effective anymore after the new pass manager switcheroo. Did this pass not get ported, in favour of SimpleLoopUnswitch? This now shows up as a regression relative to (what will be) LLVM 12.

In D93764#2549011, @sanwou01 wrote:

Maybe I'm missing something, but this change doesn't seem to be effective anymore after the new pass manager switcheroo. Did this pass not get ported, in favour of SimpleLoopUnswitch? This now shows up as a regression relative to (what will be) LLVM 12.

LoopUnswitch is legacy PM only unfortunately. This needs to be ported to SimpleLoopUNswitch.

jaykang10 mentioned this in D99354: [SimpleLoopUnswitch] Port partially invariant unswitch from LoopUnswitch to SimpleLoopUnswitch.Mar 25 2021, 10:30 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

LoopUnswitch.cpp

279 lines

test/

Transforms/

LoopUnswitch/

partial-unswitch-mssa-threshold.ll

48 lines

partial-unswitch.ll

295 lines

Diff 318134

llvm/lib/Transforms/Scalar/LoopUnswitch.cpp

Show First 20 Lines • Show All 93 Lines • ▼ Show 20 Lines
STATISTIC(TotalInsts, "Total number of instructions analyzed");		STATISTIC(TotalInsts, "Total number of instructions analyzed");

// The specific value of 100 here was chosen based only on intuition and a		// The specific value of 100 here was chosen based only on intuition and a
// few specific examples.		// few specific examples.
static cl::opt<unsigned>		static cl::opt<unsigned>
Threshold("loop-unswitch-threshold", cl::desc("Max loop size to unswitch"),		Threshold("loop-unswitch-threshold", cl::desc("Max loop size to unswitch"),
cl::init(100), cl::Hidden);		cl::init(100), cl::Hidden);

		static cl::opt<unsigned>
		MSSAThreshold("loop-unswitch-memoryssa-threshold",
		cl::desc("Max number of memory uses to explore during "
		"partial unswitching analysis"),
		cl::init(100), cl::Hidden);

namespace {		namespace {

class LUAnalysisCache {		class LUAnalysisCache {
using UnswitchedValsMap =		using UnswitchedValsMap =
DenseMap<const SwitchInst , SmallPtrSet<const Value , 8>>;		DenseMap<const SwitchInst , SmallPtrSet<const Value , 8>>;
using UnswitchedValsIt = UnswitchedValsMap::iterator;		using UnswitchedValsIt = UnswitchedValsMap::iterator;

struct LoopProperties {		struct LoopProperties {
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	class LoopUnswitch : public LoopPass {
LUAnalysisCache BranchesInfo;		LUAnalysisCache BranchesInfo;

bool OptimizeForSize;		bool OptimizeForSize;
bool RedoLoop = false;		bool RedoLoop = false;

Loop *CurrentLoop = nullptr;		Loop *CurrentLoop = nullptr;
DominatorTree *DT = nullptr;		DominatorTree *DT = nullptr;
MemorySSA *MSSA = nullptr;		MemorySSA *MSSA = nullptr;
		AAResults *AA = nullptr;
std::unique_ptr<MemorySSAUpdater> MSSAU;		std::unique_ptr<MemorySSAUpdater> MSSAU;
BasicBlock *LoopHeader = nullptr;		BasicBlock *LoopHeader = nullptr;
BasicBlock *LoopPreheader = nullptr;		BasicBlock *LoopPreheader = nullptr;

bool SanitizeMemory;		bool SanitizeMemory;
SimpleLoopSafetyInfo SafetyInfo;		SimpleLoopSafetyInfo SafetyInfo;

// LoopBlocks contains all of the basic blocks of the loop, including the		// LoopBlocks contains all of the basic blocks of the loop, including the
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	private:
/// Split all of the edges from inside the loop to their exit blocks.		/// Split all of the edges from inside the loop to their exit blocks.
/// Update the appropriate Phi nodes as we do so.		/// Update the appropriate Phi nodes as we do so.
void splitExitEdges(Loop *L,		void splitExitEdges(Loop *L,
const SmallVectorImpl<BasicBlock *> &ExitBlocks);		const SmallVectorImpl<BasicBlock *> &ExitBlocks);

bool tryTrivialLoopUnswitch(bool &Changed);		bool tryTrivialLoopUnswitch(bool &Changed);

bool unswitchIfProfitable(Value LoopCond, Constant Val,		bool unswitchIfProfitable(Value LoopCond, Constant Val,
Instruction *TI = nullptr);		Instruction *TI = nullptr,
		ArrayRef<Instruction *> ToDuplicate = {});
void unswitchTrivialCondition(Loop L, Value Cond, Constant *Val,		void unswitchTrivialCondition(Loop L, Value Cond, Constant *Val,
BasicBlock ExitBlock, Instruction TI);		BasicBlock ExitBlock, Instruction TI);
void unswitchNontrivialCondition(Value LIC, Constant OnVal, Loop *L,		void unswitchNontrivialCondition(Value LIC, Constant OnVal, Loop *L,
Instruction *TI);		Instruction *TI,
		ArrayRef<Instruction *> ToDuplicate = {});

void rewriteLoopBodyWithConditionConstant(Loop L, Value LIC,		void rewriteLoopBodyWithConditionConstant(Loop L, Value LIC,
Constant *Val, bool IsEqual);		Constant *Val, bool IsEqual);

void emitPreheaderBranchOnCondition(Value LIC, Constant Val,		void
BasicBlock *TrueDest,		emitPreheaderBranchOnCondition(Value LIC, Constant Val,
BasicBlock *FalseDest,		BasicBlock TrueDest, BasicBlock FalseDest,
BranchInst OldBranch, Instruction TI);		BranchInst OldBranch, Instruction TI,
		ArrayRef<Instruction *> ToDuplicate = {});

void simplifyCode(std::vector<Instruction > &Worklist, Loop L);		void simplifyCode(std::vector<Instruction > &Worklist, Loop L);

/// Given that the Invariant is not equal to Val. Simplify instructions		/// Given that the Invariant is not equal to Val. Simplify instructions
/// in the loop.		/// in the loop.
Value simplifyInstructionWithNotEqual(Instruction Inst, Value *Invariant,		Value simplifyInstructionWithNotEqual(Instruction Inst, Value *Invariant,
Constant *Val);		Constant *Val);
};		};
▲ Show 20 Lines • Show All 250 Lines • ▼ Show 20 Lines	bool LoopUnswitch::runOnLoop(Loop *L, LPPassManager &LPMRef) {
if (skipLoop(L))		if (skipLoop(L))
return false;		return false;

AC = &getAnalysis<AssumptionCacheTracker>().getAssumptionCache(		AC = &getAnalysis<AssumptionCacheTracker>().getAssumptionCache(
*L->getHeader()->getParent());		*L->getHeader()->getParent());
LI = &getAnalysis<LoopInfoWrapperPass>().getLoopInfo();		LI = &getAnalysis<LoopInfoWrapperPass>().getLoopInfo();
LPM = &LPMRef;		LPM = &LPMRef;
DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();		DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();
		AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();
if (EnableMSSALoopDependency) {		if (EnableMSSALoopDependency) {
MSSA = &getAnalysis<MemorySSAWrapperPass>().getMSSA();		MSSA = &getAnalysis<MemorySSAWrapperPass>().getMSSA();
MSSAU = std::make_unique<MemorySSAUpdater>(MSSA);		MSSAU = std::make_unique<MemorySSAUpdater>(MSSA);
assert(DT && "Cannot update MemorySSA without a valid DomTree.");		assert(DT && "Cannot update MemorySSA without a valid DomTree.");
}		}
CurrentLoop = L;		CurrentLoop = L;
Function *F = CurrentLoop->getHeader()->getParent();		Function *F = CurrentLoop->getHeader()->getParent();

▲ Show 20 Lines • Show All 85 Lines • ▼ Show 20 Lines	static bool equalityPropUnSafe(Value &LoopCond) {
};		};
SelectInst *LSI = dyn_cast<SelectInst>(LHS);		SelectInst *LSI = dyn_cast<SelectInst>(LHS);
SelectInst *RSI = dyn_cast<SelectInst>(RHS);		SelectInst *RSI = dyn_cast<SelectInst>(RHS);
if ((LSI && HasUndefInSelect(LSI)) \|\| (RSI && HasUndefInSelect(RSI)))		if ((LSI && HasUndefInSelect(LSI)) \|\| (RSI && HasUndefInSelect(RSI)))
return true;		return true;
return false;		return false;
}		}

		/// Check if the loop header has a conditional branch that is not
		/// loop-invariant, because it involves load instructions. If all paths from
		/// either the true or false successor to the header or loop exists do not
		/// modify the memory feeding the condition, perform 'partial unswitching'. That
		/// is, duplicate the instructions feeding the condition in the pre-header. Then
		/// unswitch on the duplicated condition. The condition is now known in the
		/// unswitched version for the 'invariant' path through the original loop.
		///
		/// If the branch condition of the header is partially invariant, return a pair
		/// containing the instructions to duplicate and a boolean Constant to update
		/// the condition in the loops created for the true or false successors.
		static std::pair<SmallVector<Instruction , 4>, Constant >
		hasPartialIVCondition(Loop L, MemorySSA &MSSA, AAResults AA) {
		SmallVector<Instruction *, 4> ToDuplicate;

		auto *TI = dyn_cast<BranchInst>(L->getHeader()->getTerminator());
		if (!TI \|\| !TI->isConditional())
		return {};

		auto *CondI = dyn_cast<CmpInst>(TI->getCondition());
		// The case with the condition outside the loop should already be handled
		// earlier.
		if (!CondI \|\| !L->contains(CondI))
		return {};

		ToDuplicate.push_back(CondI);

		SmallVector<Value *, 4> WorkList;
		WorkList.append(CondI->op_begin(), CondI->op_end());
		mdchenUnsubmitted Not Done Reply Inline Actions I guess only the conditional value needs to be added here. mdchen: I guess only the conditional value needs to be added here.
		fhahnAuthorUnsubmitted Done Reply Inline Actions Yes that would be sufficient, if the loop below would support compares, which it does not in the current version. It would be good to extend the supported instructions below to any arithmetic/compare instruction in the future though. fhahn: Yes that would be sufficient, if the loop below would support compares, which it does not in…

		SmallVector<MemoryAccess *, 4> AccessesToCheck;
		SmallVector<MemoryLocation, 4> AccessedLocs;
		while (!WorkList.empty()) {
		jdoerfertUnsubmitted Done Reply Inline Actions `.pop_back_val()` ? jdoerfert: `.pop_back_val()` ?
		Instruction *I = dyn_cast<Instruction>(WorkList.pop_back_val());
		if (!I \|\| !L->contains(I))
		jdoerfertUnsubmitted Not Done Reply Inline Actions Does code generation handle the case where a loop invariant instruction is inside the loop? With the comment below, this might be changed to `L->contains(I)` instead. jdoerfert: Does code generation handle the case where a loop invariant instruction is inside the loop?
		fhahnAuthorUnsubmitted Done Reply Inline Actions Loop-invariant instructions will be duplicated outside the loop at the moment (added a test). With 'handling them', do you mean updating the uses inside the loop to use the hoisted instruction? Not sure if that will help a lot in practice, as they should already have been moved out I think and/or GVN/LICM will clean them up. fhahn: Loop-invariant instructions will be duplicated outside the loop at the moment (added a test).
		jdoerfertUnsubmitted Not Done Reply Inline Actions So, my problem was that I assumed something like X below might be considered "loop invariant". However, the API in `llvm::Loop` does simply perform a `constains` check anyway, which is what I suggested instead. That said, I think it is confusing to call `isLoopInvariant` here because what we want/need is `contains`. If someone later recognizes in `isLoopInvariant` that `X` is not loop variant, we would not put the add in the `toDuplicate` set and fail to create valid code, agreed? (Right now it would only show for GEP but it's the same story.) int A[100]; int X, a = ..., b = ...; for (...) { X = a + b; A[i] = X; } jdoerfert: So, my problem was that I assumed something like X below might be considered "loop invariant".
		fhahnAuthorUnsubmitted Done Reply Inline Actions That said, I think it is confusing to call isLoopInvariant here because what we want/need is contains Yeah, given that we only call it for instructions anyways, using `contains` seems clearer. Updated. fhahn: > That said, I think it is confusing to call isLoopInvariant here because what we want/need is…
		continue;

		// TODO: support additional instructions.
		jdoerfertUnsubmitted Not Done Reply Inline Actions "Later" we could allow anything that is speculatively executable and loads, right? jdoerfert: "Later" we could allow anything that is speculatively executable and loads, right?
		fhahnAuthorUnsubmitted Done Reply Inline Actions yep I think so. fhahn: yep I think so.
		if (!isa<LoadInst>(I) && !isa<GetElementPtrInst>(I))
		return {};

		// Do not duplicate volatile loads.
		if (auto *LI = dyn_cast<LoadInst>(I))
		if (LI->isVolatile())
		return {};

		ToDuplicate.push_back(I);
		if (auto *MemUse = dyn_cast_or_null<MemoryUse>(MSSA.getMemoryAccess(I))) {
		// Queue the defining access to check for alias checks.
		AccessesToCheck.push_back(MemUse->getDefiningAccess());
		mdchenUnsubmitted Not Done Reply Inline Actions `MemUse` could be nullptr. mdchen: `MemUse` could be nullptr.
		fhahnAuthorUnsubmitted Done Reply Inline Actions I think we should not reach the code, if `nullptr` is assigned to `MemUse` in the `if` condition. fhahn: I think we should not reach the code, if `nullptr` is assigned to `MemUse` in the `if`…
		AccessedLocs.push_back(MemoryLocation::get(I));
		}
		WorkList.append(I->op_begin(), I->op_end());
		}

		if (ToDuplicate.size() <= 1)
		return {};

		auto HasNoClobbersOnPath =
		[L, AA, &AccessedLocs](BasicBlock Succ, BasicBlock Header,
		SmallVector<MemoryAccess *, 4> AccessesToCheck) {
		// First, collect all blocks in the loop that are on a patch from Succ
		// to the header.
		SmallVector<BasicBlock *, 4> WorkList;
		WorkList.push_back(Succ);
		WorkList.push_back(Header);
		SmallPtrSet<BasicBlock *, 4> Seen;
		Seen.insert(Header);
		while (!WorkList.empty()) {
		BasicBlock *Current = WorkList.pop_back_val();
		if (!L->contains(Current))
		mdchenUnsubmitted Done Reply Inline Actions pop_back_val() mdchen: pop_back_val()
		fhahnAuthorUnsubmitted Done Reply Inline Actions Thanks, I updated the code! fhahn: Thanks, I updated the code!
		continue;
		const auto &SeenIns = Seen.insert(Current);
		if (!SeenIns.second)
		jdoerfertUnsubmitted Not Done Reply Inline Actions Nit: `const auto &` jdoerfert: Nit: `const auto &`
		continue;

		WorkList.append(succ_begin(Current), succ_end(Current));
		}

		// Next, check if there are any MemoryDefs that are on the path through
		// the loop (in the Seen set) and they may-alias any of the locations in
		// AccessedLocs. If that is the case, they may modify the condition and
		// partial unswitching is not possible.
		SmallPtrSet<MemoryAccess *, 4> SeenAccesses;
		while (!AccessesToCheck.empty()) {
		MemoryAccess *Current = AccessesToCheck.pop_back_val();
		auto SeenI = SeenAccesses.insert(Current);
		if (!SeenI.second \|\| !Seen.contains(Current->getBlock()))
		continue;

		// Bail out if exceeded the threshold.
		if (SeenAccesses.size() >= MSSAThreshold)
		return false;

		// MemoryUse are read-only accesses.
		if (isa<MemoryUse>(Current))
		continue;

		// For a MemoryDef, check if is aliases any of the location feeding
		// the original condition.
		if (auto *CurrentDef = dyn_cast<MemoryDef>(Current)) {
		if (any_of(AccessedLocs, [AA, CurrentDef](MemoryLocation &Loc) {
		return isModSet(
		AA->getModRefInfo(CurrentDef->getMemoryInst(), Loc));
		}))
		return false;
		jdoerfertUnsubmitted Not Done Reply Inline Actions Why do we need to check these uses? jdoerfert: Why do we need to check these uses?
		fhahnAuthorUnsubmitted Done Reply Inline Actions Unfortunately a `MemoryDef` does not necessarily have all may/must-alias defs that follow it as users. For example, I think we could have something like %0 = MemoryDef (LiveOnEntry) %1 = MemoryPhi(%0,...) %2 = MemoryDef(%1,...) ; may-alias %0 depending on what MemorySSA optimizations are applied, I think there could be similar examples with just MemoryDefs. fhahn: Unfortunately a `MemoryDef` does not necessarily have all may/must-alias defs that follow it as…
		jdoerfertUnsubmitted Not Done Reply Inline Actions I'm trying to wrap my head around this and it is probably me. I haven't worked with MSSA much. So, `AccessesToCheck` starts with the defining access for each read location, so far so good. (correct me if I'm wrong at some point) If that access is outside the loop, done. If that access is inside and aliasing a location, done. If that access is inside and not aliasing a location, why do we look at the uses? I would understand if we look at the "operands": Header: %1 = MemDef(%0,...) // clobbers %3 %2 = MemDef(%1,...) // defining access of %3 %3 = MemUse(%2,...) // in AccessedLocations Though, I assume I simply do not understand MSSA in various ways. jdoerfert: I'm trying to wrap my head around this and it is probably me. I haven't worked with MSSA much.
		fhahnAuthorUnsubmitted Done Reply Inline Actions So, AccessesToCheck starts with the defining access for each read location, so far so good. (correct me if I'm wrong at some point) If that access is outside the loop, done. If that access is inside and aliasing a location, done. Sounds good so far. If that access is inside and not aliasing a location, why do we look at the uses? Because the uses of access `A` are the memory accesses that may read/write the memory location written by `A`, after `A`. For accesses that may access the same memory locations in a loop/cycle, there will be uses in `MemoryPhis`. A concrete example is below. To visit all potential clobbers of `; MemoryUse(1) ( %lv = load i32, i32* %ptr, align 4)` , we have to visit all uses of `%1 = MemoryDef(4) (call void @clobber())`, their uses and so on. The case where a clobber comes before the defining access in the header (as in your example) will be handled by this forward-scanning approach because there will be a `MemoryPhi` that's the defining access of `%1` in your example. Does this make sense? While looking at this again, I realized that queuing the defining access at line 692 could be overly pessimistic, if there are scenarios where it may alias a location in `AccessedLocations`, but its defining access is outside the loop. It would probably better to directly queue the uses of the defining access. But I am not sure if such a scenario can happen in practice. define i32 @test(i32* %ptr, i32 %N) { entry: br label %loop.header loop.header: ; preds = %loop.latch, %entry ; 4 = MemoryPhi({entry,liveOnEntry},{loop.latch,3}) %iv = phi i32 [ 0, %entry ], [ %iv.next, %loop.latch ] ; 1 = MemoryDef(4) call void @clobber() ; MemoryUse(1) %lv = load i32, i32* %ptr, align 4 %sc = icmp eq i32 %lv, 100 br i1 %sc, label %noclobber, label %clobber noclobber: ; preds = %loop.header br label %loop.latch clobber: ; preds = %loop.header ; 2 = MemoryDef(1) call void @clobber() br label %loop.latch loop.latch: ; preds = %clobber, %noclobber ; 3 = MemoryPhi({noclobber,1},{clobber,2}) %c = icmp ult i32 %iv, %N %iv.next = add i32 %iv, 1 br i1 %c, label %loop.header, label %exit exit: ; preds = %loop.latch ret i32 10 } fhahn: > So, AccessesToCheck starts with the defining access for each read location, so far so good.
		}

		for (Use &U : Current->uses())
		AccessesToCheck.push_back(cast<MemoryAccess>(U.getUser()));
		}

		return true;
		};

		if (HasNoClobbersOnPath(TI->getSuccessor(0), L->getHeader(), AccessesToCheck))
		return {ToDuplicate, ConstantInt::getTrue(TI->getContext())};
		if (HasNoClobbersOnPath(TI->getSuccessor(1), L->getHeader(), AccessesToCheck))
		return {ToDuplicate, ConstantInt::getFalse(TI->getContext())};

		return {};
		}

/// Do actual work and unswitch loop if possible and profitable.		/// Do actual work and unswitch loop if possible and profitable.
bool LoopUnswitch::processCurrentLoop() {		bool LoopUnswitch::processCurrentLoop() {
bool Changed = false;		bool Changed = false;

initLoopData();		initLoopData();

// If LoopSimplify was unable to form a preheader, don't do any unswitching.		// If LoopSimplify was unable to form a preheader, don't do any unswitching.
if (!LoopPreheader)		if (!LoopPreheader)
▲ Show 20 Lines • Show All 183 Lines • ▼ Show 20 Lines	for (BasicBlock::iterator BBI = (I)->begin(), E = (I)->end();
.first;		.first;
if (LoopCond &&		if (LoopCond &&
unswitchIfProfitable(LoopCond, ConstantInt::getTrue(Context))) {		unswitchIfProfitable(LoopCond, ConstantInt::getTrue(Context))) {
++NumSelects;		++NumSelects;
return true;		return true;
}		}
}		}
}		}

		// Check if there is a header condition that is invariant along the patch from
		// either the true or false successors to the header. This allows unswitching
		// conditions depending on memory accesses, if there's a path not clobbering
		// the memory locations. Check if this transform has been disabled using
		// metadata, to avoid unswitching the same loop multiple times.
		if (MSSA &&
		!findOptionMDForLoop(CurrentLoop, "llvm.loop.unswitch.partial.disable")) {
		auto ToDuplicate = hasPartialIVCondition(CurrentLoop, *MSSA, AA);
		if (!ToDuplicate.first.empty()) {
		++NumBranches;
		unswitchIfProfitable(ToDuplicate.first[0], ToDuplicate.second,
		CurrentLoop->getHeader()->getTerminator(),
		ToDuplicate.first);

		RedoLoop = false;
		return true;
		}
		}

return Changed;		return Changed;
}		}

/// Check to see if all paths from BB exit the loop with no side effects		/// Check to see if all paths from BB exit the loop with no side effects
/// (including infinite loops).		/// (including infinite loops).
///		///
/// If true, we return true and set ExitBB to the block we		/// If true, we return true and set ExitBB to the block we
/// exit through.		/// exit through.
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	if (isTrivialLoopExitBlockHelper(L, BB, ExitBB, Visited))
return ExitBB;		return ExitBB;
return nullptr;		return nullptr;
}		}

/// We have found that we can unswitch CurrentLoop when LoopCond == Val to		/// We have found that we can unswitch CurrentLoop when LoopCond == Val to
/// simplify the loop. If we decide that this is profitable,		/// simplify the loop. If we decide that this is profitable,
/// unswitch the loop, reprocess the pieces, then return true.		/// unswitch the loop, reprocess the pieces, then return true.
bool LoopUnswitch::unswitchIfProfitable(Value LoopCond, Constant Val,		bool LoopUnswitch::unswitchIfProfitable(Value LoopCond, Constant Val,
Instruction *TI) {		Instruction *TI,
		ArrayRef<Instruction *> ToDuplicate) {
// Check to see if it would be profitable to unswitch current loop.		// Check to see if it would be profitable to unswitch current loop.
if (!BranchesInfo.costAllowsUnswitching()) {		if (!BranchesInfo.costAllowsUnswitching()) {
LLVM_DEBUG(dbgs() << "NOT unswitching loop %"		LLVM_DEBUG(dbgs() << "NOT unswitching loop %"
<< CurrentLoop->getHeader()->getName()		<< CurrentLoop->getHeader()->getName()
<< " at non-trivial condition '" << *Val		<< " at non-trivial condition '" << *Val
<< "' == " << *LoopCond << "\n"		<< "' == " << *LoopCond << "\n"
<< ". Cost too high.\n");		<< ". Cost too high.\n");
return false;		return false;
}		}
if (HasBranchDivergence &&		if (HasBranchDivergence &&
getAnalysis<LegacyDivergenceAnalysis>().isDivergent(LoopCond)) {		getAnalysis<LegacyDivergenceAnalysis>().isDivergent(LoopCond)) {
LLVM_DEBUG(dbgs() << "NOT unswitching loop %"		LLVM_DEBUG(dbgs() << "NOT unswitching loop %"
<< CurrentLoop->getHeader()->getName()		<< CurrentLoop->getHeader()->getName()
<< " at non-trivial condition '" << *Val		<< " at non-trivial condition '" << *Val
<< "' == " << *LoopCond << "\n"		<< "' == " << *LoopCond << "\n"
<< ". Condition is divergent.\n");		<< ". Condition is divergent.\n");
return false;		return false;
}		}

unswitchNontrivialCondition(LoopCond, Val, CurrentLoop, TI);		unswitchNontrivialCondition(LoopCond, Val, CurrentLoop, TI, ToDuplicate);
return true;		return true;
}		}

/// Emit a conditional branch on two values if LIC == Val, branch to TrueDst,		/// Emit a conditional branch on two values if LIC == Val, branch to TrueDst,
/// otherwise branch to FalseDest. Insert the code immediately before OldBranch		/// otherwise branch to FalseDest. Insert the code immediately before OldBranch
/// and remove (but not erase!) it from the function.		/// and remove (but not erase!) it from the function.
void LoopUnswitch::emitPreheaderBranchOnCondition(Value LIC, Constant Val,		void LoopUnswitch::emitPreheaderBranchOnCondition(
BasicBlock *TrueDest,		Value LIC, Constant Val, BasicBlock TrueDest, BasicBlock FalseDest,
BasicBlock *FalseDest,		BranchInst OldBranch, Instruction TI,
BranchInst *OldBranch,		ArrayRef<Instruction *> ToDuplicate) {
Instruction *TI) {
assert(OldBranch->isUnconditional() && "Preheader is not split correctly");		assert(OldBranch->isUnconditional() && "Preheader is not split correctly");
assert(TrueDest != FalseDest && "Branch targets should be different");		assert(TrueDest != FalseDest && "Branch targets should be different");

// Insert a conditional branch on LIC to the two preheaders. The original		// Insert a conditional branch on LIC to the two preheaders. The original
// code is the true version and the new code is the false version.		// code is the true version and the new code is the false version.
Value *BranchVal = LIC;		Value *BranchVal = LIC;
bool Swapped = false;		bool Swapped = false;

		if (!ToDuplicate.empty()) {
		ValueToValueMapTy Old2New;
		for (Instruction *I : reverse(ToDuplicate)) {
		auto *New = I->clone();
		New->insertBefore(OldBranch);
		RemapInstruction(New, Old2New,
		RF_NoModuleLevelChanges \| RF_IgnoreMissingLocals);
		Old2New[I] = New;

		if (MSSAU) {
		MemorySSA *MSSA = MSSAU->getMemorySSA();
		auto *MemA = dyn_cast_or_null<MemoryUse>(MSSA->getMemoryAccess(I));
		if (!MemA)
		continue;

		Loop *L = LI->getLoopFor(I->getParent());
		auto *DefiningAccess = MemA->getDefiningAccess();
		// If the defining access is a MemoryPhi in the header, get the incoming
		// value for the pre-header as defining access.
		if (DefiningAccess->getBlock() == I->getParent()) {
		if (auto *MemPhi = dyn_cast<MemoryPhi>(DefiningAccess)) {
		DefiningAccess =
		MemPhi->getIncomingValueForBlock(L->getLoopPreheader());
		}
		}
		MSSAU->createMemoryAccessInBB(New, DefiningAccess, New->getParent(),
		MemorySSA::BeforeTerminator);
		}
		}
		BranchVal = Old2New[ToDuplicate[0]];
		} else {

if (!isa<ConstantInt>(Val) \|\|		if (!isa<ConstantInt>(Val) \|\|
Val->getType() != Type::getInt1Ty(LIC->getContext()))		Val->getType() != Type::getInt1Ty(LIC->getContext()))
BranchVal = new ICmpInst(OldBranch, ICmpInst::ICMP_EQ, LIC, Val);		BranchVal = new ICmpInst(OldBranch, ICmpInst::ICMP_EQ, LIC, Val);
else if (Val != ConstantInt::getTrue(Val->getContext())) {		else if (Val != ConstantInt::getTrue(Val->getContext())) {
// We want to enter the new loop when the condition is true.		// We want to enter the new loop when the condition is true.
std::swap(TrueDest, FalseDest);		std::swap(TrueDest, FalseDest);
Swapped = true;		Swapped = true;
}		}
		}

// Old branch will be removed, so save its parent and successor to update the		// Old branch will be removed, so save its parent and successor to update the
// DomTree.		// DomTree.
auto *OldBranchSucc = OldBranch->getSuccessor(0);		auto *OldBranchSucc = OldBranch->getSuccessor(0);
auto *OldBranchParent = OldBranch->getParent();		auto *OldBranchParent = OldBranch->getParent();

// Insert the new branch.		// Insert the new branch.
BranchInst *BI =		BranchInst *BI =
▲ Show 20 Lines • Show All 266 Lines • ▼ Show 20 Lines	for (unsigned I = 0, E = ExitBlocks.size(); I != E; ++I) {
SplitBlockPredecessors(ExitBlock, Preds, ".us-lcssa", DT, LI, MSSAU.get(),		SplitBlockPredecessors(ExitBlock, Preds, ".us-lcssa", DT, LI, MSSAU.get(),
/PreserveLCSSA/ true);		/PreserveLCSSA/ true);
}		}
}		}

/// We determined that the loop is profitable to unswitch when LIC equal Val.		/// We determined that the loop is profitable to unswitch when LIC equal Val.
/// Split it into loop versions and test the condition outside of either loop.		/// Split it into loop versions and test the condition outside of either loop.
/// Return the loops created as Out1/Out2.		/// Return the loops created as Out1/Out2.
void LoopUnswitch::unswitchNontrivialCondition(Value LIC, Constant Val,		void LoopUnswitch::unswitchNontrivialCondition(
Loop L, Instruction TI) {		Value LIC, Constant Val, Loop L, Instruction TI,
		ArrayRef<Instruction *> ToDuplicate) {
Function *F = LoopHeader->getParent();		Function *F = LoopHeader->getParent();
LLVM_DEBUG(dbgs() << "loop-unswitch: Unswitching loop %"		LLVM_DEBUG(dbgs() << "loop-unswitch: Unswitching loop %"
<< LoopHeader->getName() << " [" << L->getBlocks().size()		<< LoopHeader->getName() << " [" << L->getBlocks().size()
<< " blocks] in Function " << F->getName() << " when '"		<< " blocks] in Function " << F->getName() << " when '"
<< Val << "' == " << LIC << "\n");		<< Val << "' == " << LIC << "\n");

// We are going to make essential changes to CFG. This may invalidate cached		// We are going to make essential changes to CFG. This may invalidate cached
// information for L or one of its parent loops in SCEV.		// information for L or one of its parent loops in SCEV.
▲ Show 20 Lines • Show All 115 Lines • ▼ Show 20 Lines	if (MSSAU) {
// since that invalidates the 1:1 mapping of clones in VMap.		// since that invalidates the 1:1 mapping of clones in VMap.
LoopBlocksRPO LBRPO(L);		LoopBlocksRPO LBRPO(L);
LBRPO.perform(LI);		LBRPO.perform(LI);
MSSAU->updateForClonedLoop(LBRPO, ExitBlocks, VMap);		MSSAU->updateForClonedLoop(LBRPO, ExitBlocks, VMap);
}		}

// Emit the new branch that selects between the two versions of this loop.		// Emit the new branch that selects between the two versions of this loop.
emitPreheaderBranchOnCondition(LIC, Val, NewBlocks[0], LoopBlocks[0], OldBR,		emitPreheaderBranchOnCondition(LIC, Val, NewBlocks[0], LoopBlocks[0], OldBR,
TI);		TI, ToDuplicate);
if (MSSAU) {		if (MSSAU) {
// Update MemoryPhis in Exit blocks.		// Update MemoryPhis in Exit blocks.
MSSAU->updateExitBlocksForClonedLoop(ExitBlocks, VMap, *DT);		MSSAU->updateExitBlocksForClonedLoop(ExitBlocks, VMap, *DT);
if (VerifyMemorySSA)		if (VerifyMemorySSA)
MSSA->verifyMemorySSA();		MSSA->verifyMemorySSA();
}		}

// The OldBr was replaced by a new one and removed (but not erased) by		// The OldBr was replaced by a new one and removed (but not erased) by
// emitPreheaderBranchOnCondition. It is no longer needed, so delete it.		// emitPreheaderBranchOnCondition. It is no longer needed, so delete it.
delete OldBR;		delete OldBR;

LoopProcessWorklist.push_back(NewLoop);		LoopProcessWorklist.push_back(NewLoop);
RedoLoop = true;		RedoLoop = true;

// Keep a WeakTrackingVH holding onto LIC. If the first call to		// Keep a WeakTrackingVH holding onto LIC. If the first call to
// RewriteLoopBody		// RewriteLoopBody
// deletes the instruction (for example by simplifying a PHI that feeds into		// deletes the instruction (for example by simplifying a PHI that feeds into
// the condition that we're unswitching on), we don't rewrite the second		// the condition that we're unswitching on), we don't rewrite the second
// iteration.		// iteration.
WeakTrackingVH LICHandle(LIC);		WeakTrackingVH LICHandle(LIC);

// Now we rewrite the original code to know that the condition is true and the		if (ToDuplicate.empty()) {
// new code to know that the condition is false.		// Now we rewrite the original code to know that the condition is true and
		// the new code to know that the condition is false.
rewriteLoopBodyWithConditionConstant(L, LIC, Val, /IsEqual=/false);		rewriteLoopBodyWithConditionConstant(L, LIC, Val, /IsEqual=/false);

// It's possible that simplifying one loop could cause the other to be		// It's possible that simplifying one loop could cause the other to be
// changed to another value or a constant. If its a constant, don't simplify		// changed to another value or a constant. If its a constant, don't
// it.		// simplify it.
if (!LoopProcessWorklist.empty() && LoopProcessWorklist.back() == NewLoop &&		if (!LoopProcessWorklist.empty() && LoopProcessWorklist.back() == NewLoop &&
LICHandle && !isa<Constant>(LICHandle))		LICHandle && !isa<Constant>(LICHandle))
rewriteLoopBodyWithConditionConstant(NewLoop, LICHandle, Val,		rewriteLoopBodyWithConditionConstant(NewLoop, LICHandle, Val,
/IsEqual=/true);		/IsEqual=/true);
		} else {
		// Partial unswitching. Update the condition in the right loop with the
		// constant.
		auto *CC = cast<ConstantInt>(Val);
		if (CC->isOneValue()) {
		rewriteLoopBodyWithConditionConstant(NewLoop, VMap[LIC], Val,
		/IsEqual=/true);
		} else
		rewriteLoopBodyWithConditionConstant(L, LIC, Val, /IsEqual=/true);

		// Mark the new loop as partially unswitched, to avoid unswitching on the
		// same condition again.
		auto &Context = NewLoop->getHeader()->getContext();
		MDNode *DisableUnswitchMD = MDNode::get(
		Context, MDString::get(Context, "llvm.loop.unswitch.partial.disable"));
		MDNode *NewLoopID = makePostTransformationMetadata(
		Context, L->getLoopID(), {"llvm.loop.unswitch.partial"},
		{DisableUnswitchMD});
		NewLoop->setLoopID(NewLoopID);
		}

if (MSSA && VerifyMemorySSA)		if (MSSA && VerifyMemorySSA)
MSSA->verifyMemorySSA();		MSSA->verifyMemorySSA();
}		}

/// Remove all instances of I from the worklist vector specified.		/// Remove all instances of I from the worklist vector specified.
static void removeFromWorklist(Instruction *I,		static void removeFromWorklist(Instruction *I,
std::vector<Instruction *> &Worklist) {		std::vector<Instruction *> &Worklist) {
▲ Show 20 Lines • Show All 260 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopUnswitch/partial-unswitch-mssa-threshold.ll

This file was added.

				; RUN: opt -loop-unswitch -loop-unswitch-memoryssa-threshold=0 -memssa-check-limit=1 -enable-new-pm=0 -S %s \| FileCheck --check-prefix=THRESHOLD-0 %s
				; RUN: opt -loop-unswitch -memssa-check-limit=1 -S -enable-new-pm=0 %s \| FileCheck --check-prefix=THRESHOLD-DEFAULT %s

				; Make sure -loop-unswitch-memoryssa-threshold works. The test uses
				; -memssa-check-limit=1 to effectively disable any MemorySSA optimizations
				; on construction, so the test can be kept simple.

				declare void @clobber()

				; Partial unswitching is possible, because the store in %noclobber does not
				; alias the load of the condition.
				define i32 @partial_unswitch_true_successor_noclobber(i32* noalias %ptr.1, i32* noalias %ptr.2, i32 %N) {
				; THRESHOLD-0-LABEL: @partial_unswitch_true_successor
				; THRESHOLD-0: entry:
				; THRESHOLD-0: br label %loop.header
				;
				; THRESHOLD-DEFAULT-LABEL: @partial_unswitch_true_successor
				; THRESHOLD-DEFAULT-NEXT: entry:
				; THRESHOLD-DEFAULT-NEXT: [[LV:%[0-9]+]] = load i32, i32* %ptr.1, align 4
				; THRESHOLD-DEFAULT-NEXT: [[C:%[0-9]+]] = icmp eq i32 [[LV]], 100
				; THRESHOLD-DEFAULT-NEXT: br i1 [[C]]
				;
				entry:
				br label %loop.header

				loop.header:
				%iv = phi i32 [ 0, %entry ], [ %iv.next, %loop.latch ]
				%lv = load i32, i32* %ptr.1
				%sc = icmp eq i32 %lv, 100
				br i1 %sc, label %noclobber, label %clobber

				noclobber:
				%gep.1 = getelementptr i32, i32* %ptr.2, i32 %iv
				store i32 %lv, i32* %gep.1
				br label %loop.latch

				clobber:
				call void @clobber()
				br label %loop.latch

				loop.latch:
				%c = icmp ult i32 %iv, %N
				%iv.next = add i32 %iv, 1
				br i1 %c, label %loop.header, label %exit

				exit:
				ret i32 10
				}

llvm/test/Transforms/LoopUnswitch/partial-unswitch.ll

; RUN: opt -loop-unswitch -verify-dom-info -verify-memoryssa -S -enable-new-pm=0 %s \| FileCheck %s		; RUN: opt -loop-unswitch -verify-dom-info -verify-memoryssa -S -enable-new-pm=0 %s \| FileCheck %s

declare void @clobber()		declare void @clobber()

define i32 @partial_unswitch_true_successor(i32* %ptr, i32 %N) {		define i32 @partial_unswitch_true_successor(i32* %ptr, i32 %N) {
; CHECK-LABEL: @partial_unswitch_true_successor		; CHECK-LABEL: @partial_unswitch_true_successor
; CHECK-LABEL: entry:		; CHECK-LABEL: entry:
; CHECK-NEXT: br label %loop.header		; CHECK-NEXT: [[LV:%[0-9]+]] = load i32, i32* %ptr, align 4
		; CHECK-NEXT: [[C:%[0-9]+]] = icmp eq i32 [[LV]], 100
		; CHECK-NEXT: br i1 [[C]], label %[[SPLIT_TRUE_PH:[a-z._]+]], label %[[FALSE_CRIT:[a-z._]+]]

		; CHECK: [[FALSE_CRIT]]:
		; CHECK-NEXT: br label %[[FALSE_PH:[a-z.]+]]

		; CHECK: [[SPLIT_TRUE_PH]]:
		; CHECK-NEXT: br label %[[TRUE_HEADER:[a-z.]+]]

		; CHECK: [[TRUE_HEADER]]:
		; CHECK-NEXT: phi i32
		; CHECK-NEXT: [[TRUE_LV:%[a-z.0-9]+]] = load i32, i32* %ptr, align 4
		; CHECK-NEXT: [[TRUE_C:%[a-z.0-9]+]] = icmp eq i32 [[TRUE_LV]], 100
		; CHECK-NEXT: br i1 true, label %[[TRUE_NOCLOBBER:.+]], label %[[TRUE_CLOBBER:[a-z0-9._]+]]

		; CHECK: [[TRUE_CLOBBER]]:
		; CHECK-NEXT: call
		; CHECK-NEXT: br label %[[TRUE_LATCH:[a-z0-9._]+]]

		; CHECK: [[TRUE_NOCLOBBER]]:
		; CHECK-NEXT: br label %[[TRUE_LATCH:[a-z0-9._]+]]

		; CHECK: [[TRUE_LATCH]]:
		; CHECK-NEXT: icmp
		; CHECK-NEXT: add
		; CHECK-NEXT: br {{.*}} label %[[TRUE_HEADER]]


		; CHECK: [[FALSE_PH]]:
		; CHECK-NEXT: br label %[[FALSE_HEADER:[a-z.]+]]

		; CHECK: [[FALSE_HEADER]]:
		; CHECK-NEXT: phi i32
		; CHECK-NEXT: [[FALSE_LV:%[a-z.0-9]+]] = load i32, i32* %ptr, align 4
		; CHECK-NEXT: [[FALSE_C:%[a-z.0-9]+]] = icmp eq i32 [[FALSE_LV]], 100
		; CHECK-NEXT: br i1 [[FALSE_C]], label %[[FALSE_NOCLOBBER:.+]], label %[[FALSE_CLOBBER:[a-z0-9._]+]]

		; CHECK: [[FALSE_NOCLOBBER]]:
		; CHECK-NEXT: br label %[[FALSE_LATCH:[a-z0-9._]+]]

		; CHECK: [[FALSE_CLOBBER]]:
		; CHECK-NEXT: call
		; CHECK-NEXT: br label %[[FALSE_LATCH:[a-z0-9._]+]]

		; CHECK: [[FALSE_LATCH]]:
		; CHECK-NEXT: icmp
		; CHECK-NEXT: add
		; CHECK-NEXT: br {{.*}} label %[[FALSE_HEADER]]
;		;
entry:		entry:
br label %loop.header		br label %loop.header

loop.header:		loop.header:
%iv = phi i32 [ 0, %entry ], [ %iv.next, %loop.latch ]		%iv = phi i32 [ 0, %entry ], [ %iv.next, %loop.latch ]
%lv = load i32, i32* %ptr		%lv = load i32, i32* %ptr
%sc = icmp eq i32 %lv, 100		%sc = icmp eq i32 %lv, 100
Show All 13 Lines

exit:		exit:
ret i32 10		ret i32 10
}		}

define i32 @partial_unswitch_false_successor(i32* %ptr, i32 %N) {		define i32 @partial_unswitch_false_successor(i32* %ptr, i32 %N) {
; CHECK-LABEL: @partial_unswitch_false_successor		; CHECK-LABEL: @partial_unswitch_false_successor
; CHECK-LABEL: entry:		; CHECK-LABEL: entry:
; CHECK-NEXT: br label %loop.header		; CHECK-NEXT: [[LV:%[0-9]+]] = load i32, i32* %ptr, align 4
		; CHECK-NEXT: [[C:%[0-9]+]] = icmp eq i32 [[LV]], 100
		; CHECK-NEXT: br i1 [[C]], label %[[SPLIT_TRUE_PH:[a-z._]+]], label %[[FALSE_CRIT:[a-z._]+]]

		; CHECK: [[FALSE_CRIT]]:
		; CHECK-NEXT: br label %[[FALSE_PH:[a-z.]+]]

		; CHECK: [[SPLIT_TRUE_PH]]:
		; CHECK-NEXT: br label %[[TRUE_HEADER:[a-z.]+]]

		; CHECK: [[TRUE_HEADER]]:
		; CHECK-NEXT: phi i32
		; CHECK-NEXT: [[TRUE_LV:%[a-z.0-9]+]] = load i32, i32* %ptr, align 4
		; CHECK-NEXT: [[TRUE_C:%[a-z.0-9]+]] = icmp eq i32 [[TRUE_LV]], 100
		; CHECK-NEXT: br i1 [[TRUE_C]], label %[[TRUE_CLOBBER:.+]], label %[[TRUE_NOCLOBBER:[a-z0-9._]+]]

		; CHECK: [[TRUE_NOCLOBBER]]:
		; CHECK-NEXT: br label %[[TRUE_LATCH:[a-z0-9._]+]]

		; CHECK: [[TRUE_CLOBBER]]:
		; CHECK-NEXT: call
		; CHECK-NEXT: br label %[[TRUE_LATCH:[a-z0-9._]+]]

		; CHECK: [[TRUE_LATCH]]:
		; CHECK-NEXT: icmp
		; CHECK-NEXT: add
		; CHECK-NEXT: br {{.*}} label %[[TRUE_HEADER]]


		; CHECK: [[FALSE_PH]]:
		; CHECK-NEXT: br label %[[FALSE_HEADER:[a-z.]+]]

		; CHECK: [[FALSE_HEADER]]:
		; CHECK-NEXT: phi i32
		; CHECK-NEXT: [[FALSE_LV:%[a-z.0-9]+]] = load i32, i32* %ptr, align 4
		; CHECK-NEXT: [[FALSE_C:%[a-z.0-9]+]] = icmp eq i32 [[FALSE_LV]], 100
		; CHECK-NEXT: br i1 false, label %[[FALSE_CLOBBER:.+]], label %[[FALSE_NOCLOBBER:[a-z0-9._]+]]

		; CHECK: [[FALSE_CLOBBER]]:
		; CHECK-NEXT: call
		; CHECK-NEXT: br label %[[FALSE_LATCH:[a-z0-9._]+]]

		; CHECK: [[FALSE_NOCLOBBER]]:
		; CHECK-NEXT: br label %[[FALSE_LATCH:[a-z0-9._]+]]

		; CHECK: [[FALSE_LATCH]]:
		; CHECK-NEXT: icmp
		; CHECK-NEXT: add
		; CHECK-NEXT: br {{.*}} label %[[FALSE_HEADER]]
;		;
entry:		entry:
br label %loop.header		br label %loop.header

loop.header:		loop.header:
%iv = phi i32 [ 0, %entry ], [ %iv.next, %loop.latch ]		%iv = phi i32 [ 0, %entry ], [ %iv.next, %loop.latch ]
%lv = load i32, i32* %ptr		%lv = load i32, i32* %ptr
%sc = icmp eq i32 %lv, 100		%sc = icmp eq i32 %lv, 100
Show All 13 Lines

exit:		exit:
ret i32 10		ret i32 10
}		}

define i32 @partial_unswtich_gep_load_icmp(i32** %ptr, i32 %N) {		define i32 @partial_unswtich_gep_load_icmp(i32** %ptr, i32 %N) {
; CHECK-LABEL: @partial_unswtich_gep_load_icmp		; CHECK-LABEL: @partial_unswtich_gep_load_icmp
; CHECK-LABEL: entry:		; CHECK-LABEL: entry:
; CHECK-NEXT: br label %loop.header		; CHECK-NEXT: [[GEP:%[a-z.0-9]+]] = getelementptr i32, i32* %ptr, i32 1
		; CHECK-NEXT: [[LV0:%[a-z.0-9]+]] = load i32, i32* [[GEP]]
		; CHECK-NEXT: [[LV1:%[a-z.0-9]+]] = load i32, i32* [[LV0]]
		; CHECK-NEXT: [[C:%[a-z.0-9]+]] = icmp eq i32 [[LV1]], 100
		; CHECK-NEXT: br i1 [[C]], label %[[SPLIT_TRUE_PH:[a-z._]+]], label %[[FALSE_CRIT:[a-z._]+]]

		; CHECK: [[FALSE_CRIT]]:
		; CHECK-NEXT: br label %[[FALSE_PH:[a-z.]+]]

		; CHECK: [[SPLIT_TRUE_PH]]:
		; CHECK-NEXT: br label %[[TRUE_HEADER:[a-z.]+]]

		; CHECK: [[TRUE_HEADER]]:
		; CHECK-NEXT: phi i32
		; CHECK-NEXT: [[TRUE_GEP:%[a-z.0-9]+]] = getelementptr i32, i32* %ptr, i32 1
		; CHECK-NEXT: [[TRUE_LV0:%[a-z.0-9]+]] = load i32, i32* [[TRUE_GEP]]
		; CHECK-NEXT: [[TRUE_LV1:%[a-z.0-9]+]] = load i32, i32* [[TRUE_LV0]]
		; CHECK-NEXT: [[TRUE_C:%[a-z.0-9]+]] = icmp eq i32 [[TRUE_LV1]], 100
		; CHECK-NEXT: br i1 true, label %[[TRUE_NOCLOBBER:.+]], label %[[TRUE_CLOBBER:[a-z0-9._]+]]

		; CHECK: [[TRUE_CLOBBER]]:
		; CHECK-NEXT: call
		; CHECK-NEXT: br label %[[TRUE_LATCH:[a-z0-9._]+]]

		; CHECK: [[TRUE_NOCLOBBER]]:
		; CHECK-NEXT: br label %[[TRUE_LATCH:[a-z0-9._]+]]

		; CHECK: [[TRUE_LATCH]]:
		; CHECK-NEXT: icmp
		; CHECK-NEXT: add
		; CHECK-NEXT: br {{.*}} label %[[TRUE_HEADER]]

		; CHECK: [[FALSE_PH]]:
		; CHECK-NEXT: br label %[[FALSE_HEADER:[a-z.]+]]

		; CHECK: [[FALSE_HEADER]]:
		; CHECK-NEXT: phi i32
		; CHECK-NEXT: [[FALSE_GEP:%[a-z.0-9]+]] = getelementptr i32, i32* %ptr, i32 1
		; CHECK-NEXT: [[FALSE_LV0:%[a-z.0-9]+]] = load i32, i32* [[FALSE_GEP]]
		; CHECK-NEXT: [[FALSE_LV1:%[a-z.0-9]+]] = load i32, i32* [[FALSE_LV0]]
		; CHECK-NEXT: [[FALSE_C:%[a-z.0-9]+]] = icmp eq i32 [[FALSE_LV1]], 100
		; CHECK-NEXT: br i1 [[FALSE_C]], label %[[FALSE_NOCLOBBER:.+]], label %[[FALSE_CLOBBER:[a-z0-9._]+]]

		; CHECK: [[FALSE_NOCLOBBER]]:
		; CHECK-NEXT: br label %[[FALSE_LATCH:[a-z0-9._]+]]

		; CHECK: [[FALSE_CLOBBER]]:
		; CHECK-NEXT: call
		; CHECK-NEXT: br label %[[FALSE_LATCH:[a-z0-9._]+]]


		; CHECK: [[FALSE_LATCH]]:
		; CHECK-NEXT: icmp
		; CHECK-NEXT: add
		; CHECK-NEXT: br {{.*}} label %[[FALSE_HEADER]]
;		;
entry:		entry:
br label %loop.header		br label %loop.header

loop.header:		loop.header:
%iv = phi i32 [ 0, %entry ], [ %iv.next, %loop.latch ]		%iv = phi i32 [ 0, %entry ], [ %iv.next, %loop.latch ]
%gep = getelementptr i32, i32* %ptr, i32 1		%gep = getelementptr i32, i32* %ptr, i32 1
%lv.1 = load i32, i32* %gep		%lv.1 = load i32, i32* %gep
Show All 15 Lines

exit:		exit:
ret i32 10		ret i32 10
}		}

define i32 @partial_unswitch_reduction_phi(i32* %ptr, i32 %N) {		define i32 @partial_unswitch_reduction_phi(i32* %ptr, i32 %N) {
; CHECK-LABEL: @partial_unswitch_reduction_phi		; CHECK-LABEL: @partial_unswitch_reduction_phi
; CHECK-LABEL: entry:		; CHECK-LABEL: entry:
; CHECK-NEXT: br label %loop.header		; CHECK-NEXT: [[LV:%[0-9]+]] = load i32, i32* %ptr, align 4
		; CHECK-NEXT: [[C:%[0-9]+]] = icmp eq i32 [[LV]], 100
		; CHECK-NEXT: br i1 [[C]], label %[[SPLIT_TRUE_PH:[a-z._]+]], label %[[FALSE_CRIT:[a-z._]+]]

		; CHECK: [[FALSE_CRIT]]:
		; CHECK-NEXT: br label %[[FALSE_PH:[a-z.]+]]

		; CHECK: [[SPLIT_TRUE_PH]]:
		; CHECK-NEXT: br label %[[TRUE_HEADER:[a-z.]+]]

		; CHECK: [[TRUE_HEADER]]:
		; CHECK-NEXT: phi i32
		; CHECK-NEXT: [[TRUE_RED:%[a-z.0-9]+]] = phi i32 [ 20, %[[SPLIT_TRUE_PH]] ], [ [[TRUE_RED_NEXT:%[a-z.0-9]+]], %[[TRUE_LATCH:[a-z.0-9]+]]
		; CHECK-NEXT: [[TRUE_LV:%[a-z.0-9]+]] = load i32, i32* %ptr, align 4
		; CHECK-NEXT: [[TRUE_C:%[a-z.0-9]+]] = icmp eq i32 [[TRUE_LV]], 100
		; CHECK-NEXT: br i1 [[TRUE_C]], label %[[TRUE_CLOBBER:.+]], label %[[TRUE_NOCLOBBER:[a-z0-9._]+]]

		; CHECK: [[TRUE_NOCLOBBER]]:
		; CHECK-NEXT: [[TRUE_ADD10:%.+]] = add i32 [[TRUE_RED]], 10
		; CHECK-NEXT: br label %[[TRUE_LATCH]]

		; CHECK: [[TRUE_CLOBBER]]:
		; CHECK-NEXT: call
		; CHECK-NEXT: [[TRUE_ADD5:%.+]] = add i32 [[TRUE_RED]], 5
		; CHECK-NEXT: br label %[[TRUE_LATCH]]

		; CHECK: [[TRUE_LATCH]]:
		; CHECK-NEXT: [[TRUE_RED_NEXT]] = phi i32 [ [[TRUE_ADD5]], %[[TRUE_CLOBBER]] ], [ [[TRUE_ADD10]], %[[TRUE_NOCLOBBER]] ]
		; CHECK-NEXT: icmp
		; CHECK-NEXT: add
		; CHECK-NEXT: br {{.*}} label %[[TRUE_HEADER]]


		; CHECK: [[FALSE_PH]]:
		; CHECK-NEXT: br label %[[FALSE_HEADER:[a-z.]+]]

		; CHECK: [[FALSE_HEADER]]:
		; CHECK-NEXT: phi i32
		; CHECK-NEXT: [[FALSE_RED:%[a-z.0-9]+]] = phi i32 [ 20, %[[FALSE_PH]] ], [ [[FALSE_RED_NEXT:%[a-z.0-9]+]], %[[FALSE_LATCH:[a-z.0-9]+]]
		; CHECK-NEXT: [[FALSE_LV:%[a-z.0-9]+]] = load i32, i32* %ptr, align 4
		; CHECK-NEXT: [[FALSE_C:%[a-z.0-9]+]] = icmp eq i32 [[FALSE_LV]], 100
		; CHECK-NEXT: br i1 false, label %[[FALSE_CLOBBER:.+]], label %[[FALSE_NOCLOBBER:[a-z0-9._]+]]

		; CHECK: [[FALSE_CLOBBER]]:
		; CHECK-NEXT: call
		; CHECK-NEXT: [[FALSE_ADD5:%.+]] = add i32 [[FALSE_RED]], 5
		; CHECK-NEXT: br label %[[FALSE_LATCH:[a-z0-9._]+]]

		; CHECK: [[FALSE_NOCLOBBER]]:
		; CHECK-NEXT: [[FALSE_ADD10:%.+]] = add i32 [[FALSE_RED]], 10
		; CHECK-NEXT: br label %[[FALSE_LATCH:[a-z0-9._]+]]

		; CHECK: [[FALSE_LATCH]]:
		; CHECK-NEXT: [[FALSE_RED_NEXT]] = phi i32 [ [[FALSE_ADD5]], %[[FALSE_CLOBBER]] ], [ [[FALSE_ADD10]], %[[FALSE_NOCLOBBER]] ]
		; CHECK-NEXT: icmp
		; CHECK-NEXT: add
		; CHECK-NEXT: br {{.*}} label %[[FALSE_HEADER]]
;		;
entry:		entry:
br label %loop.header		br label %loop.header

loop.header:		loop.header:
%iv = phi i32 [ 0, %entry ], [ %iv.next, %loop.latch ]		%iv = phi i32 [ 0, %entry ], [ %iv.next, %loop.latch ]
%red = phi i32 [ 20, %entry ], [ %red.next, %loop.latch ]		%red = phi i32 [ 20, %entry ], [ %red.next, %loop.latch ]
%lv = load i32, i32* %ptr		%lv = load i32, i32* %ptr
Show All 20 Lines	exit:
ret i32 %red.next.lcssa		ret i32 %red.next.lcssa
}		}

; Partial unswitching is possible, because the store in %noclobber does not		; Partial unswitching is possible, because the store in %noclobber does not
; alias the load of the condition.		; alias the load of the condition.
define i32 @partial_unswitch_true_successor_noclobber(i32* noalias %ptr.1, i32* noalias %ptr.2, i32 %N) {		define i32 @partial_unswitch_true_successor_noclobber(i32* noalias %ptr.1, i32* noalias %ptr.2, i32 %N) {
; CHECK-LABEL: @partial_unswitch_true_successor		; CHECK-LABEL: @partial_unswitch_true_successor
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: br label %loop.header		; CHECK-NEXT: [[LV:%[0-9]+]] = load i32, i32* %ptr.1, align 4
		; CHECK-NEXT: [[C:%[0-9]+]] = icmp eq i32 [[LV]], 100
		; CHECK-NEXT: br i1 [[C]], label %[[SPLIT_TRUE_PH:[a-z._]+]], label %[[FALSE_CRIT:[a-z._]+]]

		; CHECK: [[FALSE_CRIT]]:
		; CHECK-NEXT: br label %[[FALSE_PH:[a-z.]+]]

		; CHECK: [[SPLIT_TRUE_PH]]:
		; CHECK-NEXT: br label %[[TRUE_HEADER:[a-z.]+]]

		; CHECK: [[TRUE_HEADER]]:
		; CHECK-NEXT: phi i32
		; CHECK-NEXT: [[TRUE_LV:%[a-z.0-9]+]] = load i32, i32* %ptr.1, align 4
		; CHECK-NEXT: [[TRUE_C:%[a-z.0-9]+]] = icmp eq i32 [[TRUE_LV]], 100
		; CHECK-NEXT: br i1 true, label %[[TRUE_NOCLOBBER:.+]], label %[[TRUE_CLOBBER:[a-z0-9._]+]]

		; CHECK: [[TRUE_CLOBBER]]:
		; CHECK-NEXT: call
		; CHECK-NEXT: br label %[[TRUE_LATCH:[a-z0-9._]+]]

		; CHECK: [[TRUE_NOCLOBBER]]:
		; CHECK-NEXT: [[TRUE_GEP:%[a-z0-9._]+]] = getelementptr i32, i32* %ptr.2
		; CHECK-NEXT: store i32 [[TRUE_LV]], i32* [[TRUE_GEP]], align 4
		; CHECK-NEXT: br label %[[TRUE_LATCH:[a-z0-9._]+]]

		; CHECK: [[TRUE_LATCH]]:
		; CHECK-NEXT: icmp
		; CHECK-NEXT: add
		; CHECK-NEXT: br {{.*}} label %[[TRUE_HEADER]]


		; CHECK: [[FALSE_PH]]:
		; CHECK-NEXT: br label %[[FALSE_HEADER:[a-z.]+]]

		; CHECK: [[FALSE_HEADER]]:
		; CHECK-NEXT: phi i32
		; CHECK-NEXT: [[FALSE_LV:%[a-z.0-9]+]] = load i32, i32* %ptr.1, align 4
		; CHECK-NEXT: [[FALSE_C:%[a-z.0-9]+]] = icmp eq i32 [[FALSE_LV]], 100
		; CHECK-NEXT: br i1 [[FALSE_C]], label %[[FALSE_NOCLOBBER:.+]], label %[[FALSE_CLOBBER:[a-z0-9._]+]]

		; CHECK: [[FALSE_NOCLOBBER]]:
		; CHECK-NEXT: [[FALSE_GEP:%[a-z0-9._]+]] = getelementptr i32, i32* %ptr.2
		; CHECK-NEXT: store i32 [[FALSE_LV]], i32* [[FALSE_GEP]], align 4
		; CHECK-NEXT: br label %[[FALSE_LATCH:[a-z0-9._]+]]

		; CHECK: [[FALSE_CLOBBER]]:
		; CHECK-NEXT: call
		; CHECK-NEXT: br label %[[FALSE_LATCH:[a-z0-9._]+]]

		; CHECK: [[FALSE_LATCH]]:
		; CHECK-NEXT: icmp
		; CHECK-NEXT: add
		; CHECK-NEXT: br {{.*}} label %[[FALSE_HEADER]]
;		;
entry:		entry:
br label %loop.header		br label %loop.header

loop.header:		loop.header:
%iv = phi i32 [ 0, %entry ], [ %iv.next, %loop.latch ]		%iv = phi i32 [ 0, %entry ], [ %iv.next, %loop.latch ]
%lv = load i32, i32* %ptr.1		%lv = load i32, i32* %ptr.1
%sc = icmp eq i32 %lv, 100		%sc = icmp eq i32 %lv, 100
▲ Show 20 Lines • Show All 170 Lines • ▼ Show 20 Lines
exit:		exit:
ret i32 10		ret i32 10
}		}

; Check that MemorySSA updating can deal with a clobbering access of a		; Check that MemorySSA updating can deal with a clobbering access of a
; duplicated load being a MemoryPHI outside the loop.		; duplicated load being a MemoryPHI outside the loop.
define void @partial_unswitch_memssa_update(i32* noalias %ptr, i1 %c) {		define void @partial_unswitch_memssa_update(i32* noalias %ptr, i1 %c) {
; CHECK-LABEL: @partial_unswitch_memssa_update(		; CHECK-LABEL: @partial_unswitch_memssa_update(
; CHECK-NEXT: entry:		; CHECK-LABEL: loop.ph:
; CHECK-NEXT: br i1 %c, label %loop.ph, label %outside.clobber		; CHECK-NEXT: [[LV:%[a-z0-9]+]] = load i32, i32* %ptr, align 4
;		; CHECK-NEXT: [[C:%[a-z0-9]+]] = icmp eq i32 [[LV]], 0
		; CHECK-NEXT: br i1 [[C]]
entry:		entry:
br i1 %c, label %loop.ph, label %outside.clobber		br i1 %c, label %loop.ph, label %outside.clobber

outside.clobber:		outside.clobber:
call void @clobber()		call void @clobber()
br label %loop.ph		br label %loop.ph

loop.ph:		loop.ph:
Show All 19 Lines

; Make sure the duplicated instructions are moved to a preheader that always		; Make sure the duplicated instructions are moved to a preheader that always
; executes when the loop body also executes. Do not check the unswitched code,		; executes when the loop body also executes. Do not check the unswitched code,
; because it is already checked in the @partial_unswitch_true_successor test		; because it is already checked in the @partial_unswitch_true_successor test
; case.		; case.
define i32 @partial_unswitch_true_successor_preheader_insertion(i32* %ptr, i32 %N) {		define i32 @partial_unswitch_true_successor_preheader_insertion(i32* %ptr, i32 %N) {
; CHECK-LABEL: @partial_unswitch_true_successor_preheader_insertion(		; CHECK-LABEL: @partial_unswitch_true_successor_preheader_insertion(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: %ec = icmp ne i32* %ptr, null		; CHECK-NEXT: [[EC:%[a-z]+]] = icmp ne i32* %ptr, null
; CHECK-NEXT: br i1 %ec, label %loop.ph, label %exit		; CHECK-NEXT: br i1 [[EC]], label %[[PH:[a-z0-9.]+]], label %[[EXIT:[a-z0-9.]+]]

		; CHECK: [[PH]]:
		; CHECK-NEXT: [[LV:%[0-9]+]] = load i32, i32* %ptr, align 4
		; CHECK-NEXT: [[C:%[0-9]+]] = icmp eq i32 [[LV]], 100
		; CHECK-NEXT: br i1 [[C]]
;		;
entry:		entry:
%ec = icmp ne i32* %ptr, null		%ec = icmp ne i32* %ptr, null
br i1 %ec, label %loop.ph, label %exit		br i1 %ec, label %loop.ph, label %exit

loop.ph:		loop.ph:
br label %loop.header		br label %loop.header

Show All 20 Lines
}		}

; Make sure the duplicated instructions are hoisted just before the branch of		; Make sure the duplicated instructions are hoisted just before the branch of
; the preheader. Do not check the unswitched code, because it is already checked		; the preheader. Do not check the unswitched code, because it is already checked
; in the @partial_unswitch_true_successor test case		; in the @partial_unswitch_true_successor test case
define i32 @partial_unswitch_true_successor_insert_point(i32* %ptr, i32 %N) {		define i32 @partial_unswitch_true_successor_insert_point(i32* %ptr, i32 %N) {
; CHECK-LABEL: @partial_unswitch_true_successor_insert_point(		; CHECK-LABEL: @partial_unswitch_true_successor_insert_point(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: call void @clobber()		; CHECK-NEXT: call void @clobber()
; CHECK-NEXT: br label %loop.header		; CHECK-NEXT: [[LV:%[0-9]+]] = load i32, i32* %ptr, align 4
		; CHECK-NEXT: [[C:%[0-9]+]] = icmp eq i32 [[LV]], 100
		; CHECK-NEXT: br i1 [[C]]
;		;
entry:		entry:
call void @clobber()		call void @clobber()
br label %loop.header		br label %loop.header

loop.header:		loop.header:
%iv = phi i32 [ 0, %entry ], [ %iv.next, %loop.latch ]		%iv = phi i32 [ 0, %entry ], [ %iv.next, %loop.latch ]
%lv = load i32, i32* %ptr		%lv = load i32, i32* %ptr
Show All 17 Lines
}		}

; Make sure invariant instructions in the loop are also hoisted to the preheader.		; Make sure invariant instructions in the loop are also hoisted to the preheader.
; Do not check the unswitched code, because it is already checked in the		; Do not check the unswitched code, because it is already checked in the
; @partial_unswitch_true_successor test case		; @partial_unswitch_true_successor test case
define i32 @partial_unswitch_true_successor_hoist_invariant(i32* %ptr, i32 %N) {		define i32 @partial_unswitch_true_successor_hoist_invariant(i32* %ptr, i32 %N) {
; CHECK-LABEL: @partial_unswitch_true_successor_hoist_invariant(		; CHECK-LABEL: @partial_unswitch_true_successor_hoist_invariant(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: br label %loop.header		; CHECK-NEXT: [[GEP:%[0-9]+]] = getelementptr i32, i32* %ptr, i64 1
		; CHECK-NEXT: [[LV:%[0-9]+]] = load i32, i32* [[GEP]], align 4
		; CHECK-NEXT: [[C:%[0-9]+]] = icmp eq i32 [[LV]], 100
		; CHECK-NEXT: br i1 [[C]]
;		;
entry:		entry:
br label %loop.header		br label %loop.header

loop.header:		loop.header:
%iv = phi i32 [ 0, %entry ], [ %iv.next, %loop.latch ]		%iv = phi i32 [ 0, %entry ], [ %iv.next, %loop.latch ]
%gep = getelementptr i32, i32* %ptr, i64 1		%gep = getelementptr i32, i32* %ptr, i64 1
%lv = load i32, i32* %gep		%lv = load i32, i32* %gep
Show All 18 Lines