This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
Analysis/
2/4
IVDescriptors.h
-
Transforms/Utils/
-
Utils/
1/2
LoopUtils.h
-
lib/
-
Analysis/
7/15
IVDescriptors.cpp
-
Transforms/
-
Utils/
4/6
LoopUtils.cpp
-
Vectorize/
3/5
LoopVectorize.cpp
-
SLPVectorizer.cpp
1/2
VPlanRecipes.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
2/2
iv-select-cmp-no-wrap.ll
-
iv-select-cmp.ll
3/3
select-min-index.ll

Differential D150851

[LoopVectorize] Vectorize select-cmp reduction pattern for increasing integer induction variable
AcceptedPublic

Authored by Mel-Chen on May 18 2023, 2:43 AM.

Download Raw Diff

Details

Reviewers

david-arm
dmgreen
kmclaughlin
fhahn
Ayal
bmahjour
reames
shiva0217
artagnon

Summary

Consider the following loop:

int red = ii;
for (int i = 0; i < n; ++i)
  red = (a[i] > b[i]) ? i : red;

We can vectorize this loop if i is an increasing induction variable.
The final reduced value will be tha maximum of i that the condition
a[i] > b[i] is satisfied, or the start value ii.

This patch added new RecurKind enums - IFindLastIV and FFindLastIV.

TODOs:

Casting increasing induction variable, like truncate, SExt.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

Mel-Chen created this revision.May 18 2023, 2:43 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 18 2023, 2:43 AM

Herald added subscribers: hoy, shiva0217, arphaman and 2 others. · View Herald Transcript

Mel-Chen requested review of this revision.May 18 2023, 2:43 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 18 2023, 2:43 AM

Herald added subscribers: llvm-commits, • pcwang-thead, vkmr. · View Herald Transcript

Mel-Chen added a child revision: D143465: [LoopVectorize] Vectorize the reduction pattern of integer min/max with index..May 18 2023, 3:01 AM

Harbormaster completed remote builds in B232801: Diff 523301.May 18 2023, 3:40 AM

Changes:

Add test cases
Check the bound of increasing induction variable

Mel-Chen edited the summary of this revision. (Show Details)May 31 2023, 3:14 AM

Harbormaster completed remote builds in B235529: Diff 526982.May 31 2023, 4:40 AM

Mel-Chen retitled this revision from [LoopVectorize] Vectorize select-cmp reduction pattern for increasing integer induction variable (WIP) to [LoopVectorize] Vectorize select-cmp reduction pattern for increasing integer induction variable .Jun 5 2023, 1:45 AM

Mel-Chen edited the summary of this revision. (Show Details)

Mel-Chen added reviewers: david-arm, dmgreen, kmclaughlin, fhahn, Ayal, bmahjour, reames.

Herald added a subscriber: StephenFan. · View Herald TranscriptJun 5 2023, 1:45 AM

fhahn added inline comments.Jun 12 2023, 3:53 AM

llvm/include/llvm/Analysis/IVDescriptors.h
39	To keep the diff more compact, could you split the FP handling off? It also looks like codegen is at least not tested for the FP case?
llvm/lib/Analysis/IVDescriptors.cpp
670	Do we need to distinguish here between no signed/no unsigned wrap and then chose `smax/umax` during codegen?
llvm/lib/Transforms/Utils/LoopUtils.cpp
1088	What does this mean? The current patch only handles increasing inductions, so there's no mis-compile that needs fixing here( which the comment kind-of implies)?
llvm/test/Transforms/LoopVectorize/iv-select-cmp-no-wrap.ll
1	Could you submit the tests separately?

Mel-Chen added inline comments.Jun 12 2023, 6:23 AM

llvm/include/llvm/Analysis/IVDescriptors.h
39	I am afraid there is misunderstanding. The test functions starting with "@select_fcmp" are testing the SelectIVFCmp reduction pattern. SelectIVFCmp is similar in semantics to SelectFCmp, where the operand types of the select instruction are integer, and the cmp instruction is fcmp.
llvm/lib/Analysis/IVDescriptors.cpp
670	That's a good point. Implementing the patch for both signed and unsigned induction variables can be challenging in practice. The current patch only focuses on the signed IV because in most applications, we often encounter induction variables in the form of IV {0, +, step}. When the IV is signed, we can use -1 or the minimum value of the signed data type as a sentinel value. However, when the IV is unsigned, we don't have a value smaller than 0 to use. This doesn't mean that unsigned IV cannot be vectorized, but rather they require additional handling and a more refined approach. Of course, if an unsigned IV is {1, +, step}, we can directly use the method implemented in this patch. However, such cases are less common, so we decided to focus on handling signed IV first.
llvm/lib/Transforms/Utils/LoopUtils.cpp
1088	My bad. This maybe the note I leave when I implemented this patch. Will Remove it.
llvm/test/Transforms/LoopVectorize/iv-select-cmp-no-wrap.ll
1	Sure. Will do.

fhahn mentioned this in D152693: LoopVectorize: introduce RecurKind::Induction(I|F)(Max|Min).Jun 14 2023, 6:25 AM

artagnon added a subscriber: artagnon.Jun 14 2023, 7:18 AM

artagnon added inline comments.Jun 14 2023, 9:05 AM

llvm/lib/Analysis/IVDescriptors.cpp
610–630	Please update these comments.
657	Why not split this out into its own function?
659–667	We should forbid these cases in the first place instead of checking them.
670	Doesn't the `InstDesc` tell you if it's signed or unsigned?
673–675	Why is this necessary? Even if it isn't the min signed value, we should codegen just fine, as the codegen just involves applying a mask and applying max/min-reduce.
677–678	`getStepRecurrence()` from the SCEV AddRec, instead of relying on `isInductionPhi()`?
llvm/lib/Transforms/Utils/LoopUtils.cpp
1089	Personally, I prefer straight-line codegen as I've done, but I'm probably biased.

artagnon added inline comments.Jun 14 2023, 9:11 AM

llvm/include/llvm/Analysis/IVDescriptors.h
37–39	Maybe have a `max` in the name to make it clear that we only do max-reductions?

shiva0217 added inline comments.Jun 21 2023, 1:18 AM

llvm/include/llvm/Transforms/Utils/LoopUtils.h
365–366	or an increasing loop induction variable?
llvm/lib/Transforms/Utils/LoopUtils.cpp
1155	It might be worth to have a comment to describe that the SelectIVICmp and SelectIVFCmp code generation will use Identity value to determine the if condition in the following case has ever been true. int r = 331; for (int i = 0; i < n; i++) if (src[i] > 111) r = i; When the reduction value(Rdx) equal to the Identity value(Iden), it reveals the condition never been true. So it will select the InitVal. It might be the reason that in IsIncreasingLoopInduction, the function will check the IV start value to avoid the IV overlapping the Identity value.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3887	Could the function rename to createSelectIVCmpTargetReduction and be called from createSelectCmpTargetReduction? Perhaps CreateIntMaxReduce can be moved to the function?
llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
1593	Perhaps a comment to describe that SelectIVICmp and SelectIVFCmp will initial the reduction PHI with Iden and createSentinelValueHandling will use Iden to determine the if condition in the loop has ever been true?

An important step forward - adding some general thoughts and terminology, going (admittedly nearly two years) back to D108136.

This FindLast compound pattern combines two header phi's: an induction and a reduction. The reduction is a "selecting reduction" which choses one element from the reduced set, or none. The induction is recorded to return the index of the (last) element found. The induction must be monotone and have some out-of-bounds value, which this patch ensures by restricting to signed inductions that start at non-min-signed-value and increase w/o wrapping - restrictions which can be lifted.

Reducing a collection C of values into one value v can be classified as either a "selecting reduction" or a "combining reduction", depending on whether v is always a member of C or not, respectively. Min and max reductions are selecting reductions whereas add (including fmuladd), multiply, or, and, xor are in general combining reductions.
When C is a collection of boolean values, the selecting reductions "max" and "min" practically compute Any and All, respectively, as in std::any_of() and std::all_of(). Support for boolean selecting reductions was introduced in D108136, which should arguably be called [I|F]Any rather than Select[I|F]Cmp. The boolean values produced by Integer/Float Compares such as "(src[i] > 3)" or "(a[i] > b[i])" are essentially being "max" reduced; any desired pair of invariant return values can be set/selected after determining the outcome of the boolean reduction.
Note that an Any reduction can terminate once "true" is encountered, similar to a general max/min reduction encountering max/min-value.

A selecting reduction could report the index of the value reduced in addition to the value itself. If the reduced value appears multiple times, the index of the first or last appearance can be reported. Tests for such MinLast cases, aka argmin, were introduced in 4f04be564907f, and are yet to be vectorized by LV - hope this patch helps us get there! These are compound patterns combining three header phi's: an induction and two reductions.

A boolean selecting max/Any reduction reporting the index is typically interested in the index only if the reduced value is 1/"true", otherwise the index is obvious.
This patch deals with boolean selecting Any reductions that report the index of the (last) value reduced, provided it is "true", and may be called FindLast, as in std::find_if() being FindFirst.

When vectorizing and/or unroll-and-interleaving a selecting reduction with index, the indices of multiple candidates need to be compared to determine which is first (or last), during the reduction epilog (for in-loop reductions this is trivial). This comparison requires the indices to be monotone, i.e., to avoid wrapping.
When dealing with Any reductions with index of "true" values, the indicator that a "true" value was encountered can be folded together with the index found so far (of a true value) by using an "invalid" out-of-bounds index - preferably smaller than first iteration for FindLast or larger than last iteration for FindFirst. Such values are overwritten naturally by the (valid) index of any "true" value, when selecting the first or last index. This reduces the pattern to consider a single reduction (of the combined index+indicator value) rather than the two reductions of the general MaxLast case (of index and value).

This FindLast patch currently meets these requirements by restricting to indices that are increasing, signed, and start from a non-min-signed value. It seems unnatural for such indices to wrap, or if PSCEV guards against AddRec wrapping in general(?), but even if an index may wrap and/or does not provide desired out-of-bounds values, a designated IV counting vector iterations could be used from which the original indices can later be reconstructed in the epilog and reduced. Such an IV is immune to wrapping and provides out-of-bound values. This is one of several possible ways to lift these restrictions.

Note that Any reductions reporting the first index can terminate once "true" is encountered, but seem more cumbersome to write (w/o a break), e.g.,:

// FindFirst w/o break.
int red = ii;
int red_set = false;
for (int i = 0; i < n; ++i)
  if (a[i] > b[i]) {
    red = red_set ? red : i;
    red_set = true;
  }

instead of

// FindLast.
int red = ii;
for (int i = 0; i < n; ++i)
  red = (a[i] > b[i]) ? i : red;

A FindLast loop could be optimized into a FindFirst one by reversing the loop.

llvm/test/Transforms/LoopVectorize/select-min-index.ll
89	This test now gets vectorized, being a `FindLast` loop that reports the last index where a[i] < a[i-1]+1, or zero if none are found. (I.e., proving that a sequence is not strictly increasing, rather than computing `MinLast`.) But the vector loop is never reached?

Ayal mentioned this in D153697: [LV] Freeze start value for select-reductions..Jun 26 2023, 4:19 AM

Changes:

Remove the redundant comment. (fhahn's comment)
Separate testing and implementation. (fhahn's comment)

Harbormaster completed remote builds in B241676: Diff 535225.Jun 27 2023, 8:43 PM

Mel-Chen added a parent revision: D153936: [LV] Add tests for select-cmp reduction pattern. (NFC).Jun 27 2023, 8:45 PM

Mel-Chen added reviewers: shiva0217, artagnon.Jun 28 2023, 1:11 AM

Mel-Chen marked 2 inline comments as done.Jun 28 2023, 2:52 AM

Mel-Chen added inline comments.

llvm/include/llvm/Analysis/IVDescriptors.h
37–39	It has some functional overlap with max reduction, but it is not exactly the same as max reduction. If we were to rename it, the change I suggest is: SelectICmp --> SelectInvICmp SelectFCmp --> SelectInvFCmp SelectIVICmp --> SelectIncIVICmp SelectIVFCmp --> SelectIncIVFCmp This will also make it easier for you to expand it in the future.
llvm/lib/Analysis/IVDescriptors.cpp
610–630	Nice catch! Will update it.
657	I'm also think about this because I personally feel that this function is a bit long. Do you think it's better to put it here as a static function, or put it in another file?
659–667	I do not understand. I believe this series of checks already forbids these cases.
670	No, it does not provide signed or unsigned information.
673–675	This is complicated, let me explain why this pattern requires this check. It is related to our goals: 1) We want to achieve the reduction with one reduce intrinsic. 2) We want to use a static sentinel value. About 1), , I understand that even without checking the boundaries or ensuring the select operand is increasing or decreasing, we could still vectorize it by using two reductions. However, the most common format we encounter for induction variables is {0, +, 1}. If this IV is signed, we can accomplish the reduction in one reduction. Therefore, this is based on performance considerations. As for 2), it follows the decision made in 1), which requires us to have a sentinel value. In the case of an increasing IV, the only restriction on the sentinel value is that it must be less than the start value of the IV. In the case of {0, +, 1}, the sentinel value can be -1 or a smaller value, i.e. dynamic sentinel value. However, for easier implementation, we finally decided to use a static sentinel value, which is the minimum value of the data type. Additional description: If we were to use a dynamic sentinel value, it would involve checking whether the data type can represent `IV start value - step` (in the case of {0, +, 1}, it would be -1). However, within the LLVM framework, I currently don't have a way to implement a version with a dynamic sentinel value, which is a room for potential improvement.
677–678	Why? What are the shortcomings that make you think `isInductionPHI` should not be used? In my mind, better to rely on the existing function rather than trying to reimplement it.
llvm/lib/Transforms/Utils/LoopUtils.cpp
1089	I don't get it. Only one I need is creating a signed max reduce.

Mel-Chen added inline comments.Jun 28 2023, 8:49 PM

llvm/include/llvm/Transforms/Utils/LoopUtils.h
365–366	Good find! Will correct it.
llvm/lib/Transforms/Utils/LoopUtils.cpp
1155	Sure, will do.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3887	I'm afraid I won't be able to meet this requirement. Placing `createSentinelValueHandling` in this position is for handling the case when the vector width is 1. You could refer to CHECK-VF1IC4 in the test cases and focus on the `middle.block`. In implementation, VF1IC4 doesn't call `createTargetReduction`, but `ReducedPartRdx` still need to be did the sentinel value fixing. However, perhaps we can create a new bool function for `RK == RecurKind::SelectIVICmp \|\| RK == RecurKind::SelectIVFCmp`. This will most likely expand further and cause the if-condition to become too long.
llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
1593	Sure, will do.

In D150851#4440545, @Ayal wrote:

A selecting reduction could report the index of the value reduced in addition to the value itself. If the reduced value appears multiple times, the index of the first or last appearance can be reported. Tests for such MinLast cases, aka argmin, were introduced in 4f04be564907f, and are yet to be vectorized by LV - hope this patch helps us get there! These are compound patterns combining three header phi's: an induction and two reductions.

Yes, this patch was separated from the D143465 based on @fhahn's suggestion. D143465 still need to be refined, so it hasn't been invited to more reviewers yet.

This FindLast patch currently meets these requirements by restricting to indices that are increasing, signed, and start from a non-min-signed value. It seems unnatural for such indices to wrap, or if PSCEV guards against AddRec wrapping in general(?), but even if an index may wrap and/or does not provide desired out-of-bounds values, a designated IV counting vector iterations could be used from which the original indices can later be reconstructed in the epilog and reduced. Such an IV is immune to wrapping and provides out-of-bound values. This is one of several possible ways to lift these restrictions.

This comment inspired me deeply. Let me share my thoughts and plans regarding the select-cmp reduction pattern (referred to as [I|F]Any mentioned in your comment) .

I believe that the select-cmp reduction pattern can be classified into several types based on the selecting variable. Currently, I have categorized them as follows:

Select operand is a loop invariant, i.e., Select[I|F]Cmp. This has already been implemented in the D108136 by @david-arm.

Select operand is a monotonic increasing/decreasing induction variable, and the start value of the induction variable is not equal to the minimum/maximum value of the data type. This patch handles the case of signed increasing induction variables, while the case of decreasing induction variables is yet to be implemented. The decision to only handle signed variables depends on LLVM's design, the issue including the choice of sentinel values, and the selection of umax|smax reduction intrinsics. If the compiler architecture allows distinguishing between signed and unsigned, the unsigned induction variable case should be easily achievable.

Select operand is a monotonic increasing/decreasing induction variable, and there are no restrictions on the start value of the induction variable.

unsigned int red = start_value;
for (unsigned int i = 0; i < n; ++i)
  red = (a[i] > b[i]) ? i : red;

Select operand is an any variable.

int red = start_value;
for (int i = 0; i < n; ++i)
  red = (a[i] > b[i]) ? c[i] : red;

Both 1) and 2) can be handled with a single reduction. On the other hand, 3) and 4) are more complex, and require two reductions to be completed.

Although all select-cmp reduction patterns can be vectorized using the vectorization approach in 4), for performance, I believe that the cases in 1) and 2) should be handled with a single reduction first. Therefore, when identifying and classifying the RecurKind for select-cmp reduction patterns, it is preferable to first consider whether they can be handled with cases 1) or 2), and then consider whether cases 3) or 4) need to be applied.

Next, let's discuss cases 3) and 4), which have not been implemented yet.

For case 3), I currently have two approaches to solve it. The first approach is to perform reduction not only on the select part, but also on the boolean value of the cmp operation.

unsigned int red = start_value;
vec_bool cmp_red_part = splat(false);
vec_unsigned_int select_red_part = splat(DTypeMin);
vec_unsigned_int step_vec = {0, 1, 2, ...};
for (unsigned int i = 0; i < n; i+=vl) {
  cmp_red_part = cmp_red_part | (vec_a[i] > vec_b[i]);
  select_red_part = (vec_a[i] > vec_b[i]) ? step_vec: select_red_part;
  step_vec += {vl, vl, vl, ...};
}
bool cmp_red = reduce.or(cmp_red_part);
red = cmp_red ? reduce.smax|umax(select_red_part) : start_value;

The second approach is to directly use the vectorization approach in 4) to vectorize case 3).

int red = start_value;
vec_unsigned_int iter_red_part = splat(0);
vec_unsigned_int red_part = splat(start_value);
vec_unsigned_int step_vec = {0, 1, 2, ...};
for (int i = 0; i < n; i+=vl) {
  iter_red_part = (vec_a[i] > vec_b[i]) ? step_vec : iter_red_part;
  red_part = (vec_a[i] > vec_b[i]) ? vec_c[i] : red_part;
  step_vec += {vl, vl, vl, ...};
}
unsigned int iter_red = reduce.umax(iter_red_part);
mask_bool red_mask = (iter_red_part == splat(iter_red));
red = reduce.or(red_part, red_mask);  // unsure about which reduction operation would be best for the extracting the result at the position red_mask indicated so far

Both approaches require two reductions, and one of the reductions will be a reduction phi that does not appear in the original user code. In other words, the vectorizer needs to have the capability to create a new reduction phi.

These are my thoughts on the select-cmp reduction pattern so far.

Note that Any reductions reporting the first index can terminate once "true" is encountered, but seem more cumbersome to write (w/o a break), e.g.,:
// FindFirst w/o break.
int red = ii;
int red_set = false;
for (int i = 0; i < n; ++i)
  if (a[i] > b[i]) {
    red = red_set ? red : i;
    red_set = true;
  }
instead of
// FindLast.
int red = ii;
for (int i = 0; i < n; ++i)
  red = (a[i] > b[i]) ? i : red;
A FindLast loop could be optimized into a FindFirst one by reversing the loop.

Interesting, I haven't thought about FindFirst yet. If it includes a break statement, it will be another long story - uncountable loop vectorization. Although I haven't deeply considered the FindFirst case, I still have some rough ideas to share.

Perhaps we can simplify the FindFirst w/o break example to:

// FindFirst w/o break.
int red = ii;
int red_set = false;
for (int i = 0; i < n; ++i) {
  if ((a[i] > b[i]) && !red_set)   // reduction 1
    red = i;
  if (a[i] > b[i]) // reduction 2
    red_set = true;
}

In this way, we can clearly see that there are two reductions involved, and the result of one reduction will be masked by the result of the other reduction. This is very interesting, and may similar with the pattern in D143465.
If we can transform the code into:

// FindFirst w/o break.
int red = ii;
int red_set = false;
for (int i = 0; i < n; ++i){
  if ((a[i] > b[i]) && !red_set)   // reduction 1
    red = i;
  red_set = red_set | (a[i] > b[i]);  // reduction 2
}

, perhaps it will lead to better optimization results.

llvm/test/Transforms/LoopVectorize/select-min-index.ll
89	Impressive catch! We have been focusing only on the vector.body and ignoring the others. I will prioritize clarifying this bug and fixing it as soon as reasonable.

In D150851#4458987, @Mel-Chen wrote:

[snip]

I believe that the select-cmp reduction pattern can be classified into several types based on the selecting variable. Currently, I have categorized them as follows:

Would be good to try and find more accurate names than using Select*Cmp combinations. A Compare-Select pattern is also used in the existing Min/Max reduction, but has a much better name.

Select operand is a loop invariant, i.e., Select[I|F]Cmp. This has already been implemented in the D108136 by @david-arm.

and should be renamed [I|F]Any or something else more accurate. The two invariant operands should be sunk and selected after the loop, according to the outcome if "any" were found or not.

Select operand is a monotonic increasing/decreasing induction variable, and the start value of the induction variable is not equal to the minimum/maximum value of the data type. This patch handles the case of signed increasing induction variables, while the case of decreasing induction variables is yet to be implemented. The decision to only handle signed variables depends on LLVM's design, the issue including the choice of sentinel values, and the selection of umax|smax reduction intrinsics. If the compiler architecture allows distinguishing between signed and unsigned, the unsigned induction variable case should be easily achievable.

Select operand is a monotonic increasing/decreasing induction variable, and there are no restrictions on the start value of the induction variable.
unsigned int red = start_value;
for (unsigned int i = 0; i < n; ++i)
  red = (a[i] > b[i]) ? i : red;
Select operand is an any variable.
int red = start_value;
for (int i = 0; i < n; ++i)
  red = (a[i] > b[i]) ? c[i] : red;
Both 1) and 2) can be handled with a single reduction. On the other hand, 3) and 4) are more complex, and require two reductions to be completed.

Arguably, (2), (3) and (4) are essentially all FindLast reductions, interested in the last loop iteration i for which some predicate p(i) such as a[i]-b[i]>0 holds, plus an indicator if no such iteration was found, followed by some post-processing of these results: in (2), (3) and (4), if no such iteration was found then some invariant "start_value" is returned, regardless if it was originally out-of-bounds or not. In (4), in addition, if a loop iteration i was found satisfying p(i), some f(i) computation of the last such i should be returned, as in c[i]. This is analogous to sinking the invariants of case (1), and may indeed be more elaborate, but also covers simpler cases such as any (other) AddRec/IV that can be evaluated given i, e.g.,: red = (a[i] > b[i]) ? 3*i+8 : red; - so chose any IV you prefer, e.g., one which has the desired no-wrap plus out-of-bound value, even if the original one does not.

In any case, it should be helpful to distill what actually needs to be maintained throughout the reduction loop, even if more appear there originally; be it boolean indicators in Any reductions (cmp_red_part in your example), indices in Find reductions (select_red_part in your example initialized with "unfound" out-of-bound indicators), or compound value+index in e.g. min/max-with-index reductions. The compiler can surely and does introduce new phi's as needed, hopefully having minimal width, but could also try to eliminate existing phi's and reduce the number of values that are live-out of the loop, possibly at the cost of replicating code, e.g., if c[i] is also used inside the loop.

Here's a sketch minimizing the size of the indices maintained throughout the loop, so they would avoid wrapping, provide out-of-bound values, and possibly use narrower types depending on trip-count and vl:

return_type FindLast(return_type unfound_value, vec_predicate_func, found_func) {
  vec_unsigned_int select_red_part = splat(0); // Zero indicates unfound.
  vec_unsigned_int step_vec = splat(1); // Count vector iterations starting at 1.

  for (unsigned int i = 0; i < n; i+=vl, step_vec+=splat(1))
    select_red_part = (vec_predicate_func(i) ? step_vec : select_red_part;

  unsigned vec_indices_ored = reduce.or(select_red_part);
  if (vec_indices_ored == 0)
    return unfound_value;
  unsigned inflated_red_part = (select_red_part - splat(1)) * vl + <0,1,...,vl-1>;
  unsigned last_index = reduce.umax(inflated_red_part);
  return found_func(last_index);
}

Regarding FindFirst, indeed the natural way of writing it with a break would provide the compiler with an uncountable loop that is harder to vectorize due to speculative execution, and so it is natural to start with FindLast. But if written as a countable loop the compiler might be able to vectorize and optimize it by introducing a break. As in an [I|F]Any countable loop that if free of any other side-effects, or a FindLast loop that can be reversed into a FindFirst loop moving backwards and breaking on first finding.

Changes:

Fix the test cases, D154415. (Ayal's comment)

Herald added a subscriber: wangpc. · View Herald TranscriptJul 4 2023, 1:34 AM

Harbormaster completed remote builds in B242961: Diff 536986.Jul 4 2023, 1:35 AM

Mel-Chen marked an inline comment as done.Jul 4 2023, 1:41 AM

Mel-Chen added inline comments.

llvm/test/Transforms/LoopVectorize/select-min-index.ll
89	Clarified, it has been confirmed that this is not a bug. The reason is that the loop trip count in the test case is 0, causing the simplification of `min.iters.check` to be `true`. The test case has been fixed in D154415.

Mel-Chen added a parent revision: D154415: [LV] Change the test cases to ensure that the trip count is not zero. (NFC).Jul 4 2023, 1:42 AM

Changes:

Update comments. (Artagnon and Shiva's comments)

Harbormaster completed remote builds in B243138: Diff 537231.Jul 5 2023, 12:23 AM

In D150851#4467261, @Ayal wrote:

Would be good to try and find more accurate names than using Select*Cmp combinations. A Compare-Select pattern is also used in the existing Min/Max reduction, but has a much better name.

Sure, perhaps @artagnon can join the discussion as well. Let me share my experience first: In GCC, I have seen a classification called ExtractLast, which has a similar semantics to what you mentioned as FindLast. Welcome further input and opinions.

Select operand is a loop invariant, i.e., Select[I|F]Cmp. This has already been implemented in the D108136 by @david-arm.

and should be renamed [I|F]Any or something else more accurate. The two invariant operands should be sunk and selected after the loop, according to the outcome if "any" were found or not.

How about following the C++ STL, renaming it to [I|F]AnyOf? What do you think, @david-arm?

return_type FindLast(return_type unfound_value, vec_predicate_func, found_func) {
  vec_unsigned_int select_red_part = splat(0); // Zero indicates unfound.
  vec_unsigned_int step_vec = splat(1); // Count vector iterations starting at 1.

  for (unsigned int i = 0; i < n; i+=vl, step_vec+=splat(1))
    select_red_part = (vec_predicate_func(i) ? step_vec : select_red_part;

  unsigned vec_indices_ored = reduce.or(select_red_part);
  if (vec_indices_ored == 0)
    return unfound_value;
  unsigned inflated_red_part = (select_red_part - splat(1)) * vl + <0,1,...,vl-1>;
  unsigned last_index = reduce.umax(inflated_red_part);
  return found_func(last_index);
}

If we focus on removing the wrapping and bound restrictions, I think we can consider the approach proposed by @artagnon in D152693. This method cleverly extends the technique used by @david-arm in SelectICmp. The approach can be summarized as follows:
Consider the loop:

unsigned int red = start_value;
for (unsigned int i = 0; i < n; ++i)
  red = (a[i] > b[i]) ? i : red;

vectorize to:

unsigned int red = start_value;
vec_unsigned_int red_part = splat(start_value);
vec_unsigned_int step_vec = {0, 1, 2, ...};
for (unsigned int i = 0; i < n; i+=vl) {
  red_part = (vec_a[i] > vec_b[i]) ? step_vec : red_part;
  step_vec += {vl, vl, vl, ...};
}
vec_bool ne_start_value = red_part != splat(start_value);
bool may_update = reduce.or(ne_start_value);
vec_unsigned_int masked_red_part = ne_start_value ? red_part : splat(DataTypeMin);
red = may_update ? reduce.smax|umax(masked_red_part) : start_value;

While the conditions checked in this patch are more strict, I believe both approaches should coexist. In general, the IR generated by this patch should have better performance in the same case. Therefore, it should be prioritized when possible. However, when the cases that cannot be handled by this patch, we can apply the approach in D152693.
In addition, there is still room for optimization in this patch. We usually face source code like this:

j = -1;
for (int i = 0; i < n; i++) {
    if (a[i] < b[i]) {
        j = i;
    }
}

When the start value of the reduction is a known constant and is known to be smaller than the start value of the increasing induction variable, we may not even need to use a sentinel value. Simply using the reduce max operation would suffice.

Update: I could found an example where the approach in D152693 lead to incorrect result:
Assuming start_value is 3, and red_part is {0, 1, 2, 3} in the end. If the 3 is updated from the loop, not from the start_value , red should be 3 instead of 2.

@artagnon, could you please help to verify it?

If we focus on removing the wrapping and bound restrictions, I think we can consider the approach proposed by @artagnon in D152693. This method cleverly extends the technique used by @david-arm in SelectICmp. The approach can be summarized as follows:
Consider the loop:
unsigned int red = start_value;
for (unsigned int i = 0; i < n; ++i)
  red = (a[i] > b[i]) ? i : red;
vectorize to:
unsigned int red = start_value;
vec_unsigned_int red_part = splat(start_value);
vec_unsigned_int step_vec = {0, 1, 2, ...};
for (unsigned int i = 0; i < n; i+=vl) {
  red_part = (vec_a[i] > vec_b[i]) ? step_vec : red_part;
  step_vec += {vl, vl, vl, ...};
}
vec_bool ne_start_value = red_part != splat(start_value);
bool may_update = reduce.or(ne_start_value);
vec_unsigned_int masked_red_part = ne_start_value ? red_part : splat(DataTypeMin);
red = may_update ? reduce.smax|umax(masked_red_part) : start_value;

In D150851#4479839, @Mel-Chen wrote:

In D150851#4467261, @Ayal wrote:

Would be good to try and find more accurate names than using Select*Cmp combinations. A Compare-Select pattern is also used in the existing Min/Max reduction, but has a much better name.

Sure, perhaps @artagnon can join the discussion as well. Let me share my experience first: In GCC, I have seen a classification called ExtractLast, which has a similar semantics to what you mentioned as FindLast. Welcome further input and opinions.

I agree with @Ayal: I also got confused by the SelectCmp naming convention initially. Perhaps [I|F]AnyOfInv, [I|F]AnyOfInc, and [I|F]AnyOfDec, and rename the corresponding function isAnyOfReduction? I don't have a strong preference for the rename.

If we focus on removing the wrapping and bound restrictions, I think we can consider the approach proposed by @artagnon in D152693. This method cleverly extends the technique used by @david-arm in SelectICmp. The approach can be summarized as follows:
Consider the loop:
unsigned int red = start_value;
for (unsigned int i = 0; i < n; ++i)
  red = (a[i] > b[i]) ? i : red;
vectorize to:
unsigned int red = start_value;
vec_unsigned_int red_part = splat(start_value);
vec_unsigned_int step_vec = {0, 1, 2, ...};
for (unsigned int i = 0; i < n; i+=vl) {
  red_part = (vec_a[i] > vec_b[i]) ? step_vec : red_part;
  step_vec += {vl, vl, vl, ...};
}
vec_bool ne_start_value = red_part != splat(start_value);
bool may_update = reduce.or(ne_start_value);
vec_unsigned_int masked_red_part = ne_start_value ? red_part : splat(DataTypeMin);
red = may_update ? reduce.smax|umax(masked_red_part) : start_value;
While the conditions checked in this patch are more strict, I believe both approaches should coexist. In general, the IR generated by this patch should have better performance in the same case. Therefore, it should be prioritized when possible. However, when the cases that cannot be handled by this patch, we can apply the approach in D152693.

Yes, the IR generated by this patch is quite optimized, and I suppose we can fallback to code-gen like in D152693 as a follow-up.

In addition, there is still room for optimization in this patch. We usually face source code like this:
j = -1;
for (int i = 0; i < n; i++) {
    if (a[i] < b[i]) {
        j = i;
    }
}
When the start value of the reduction is a known constant and is known to be smaller than the start value of the increasing induction variable, we may not even need to use a sentinel value. Simply using the reduce max operation would suffice.

I wouldn't worry about this for now. The patch already looks pretty good in its current state, and I think with the renaming that @Ayal proposed, it should be ready to land.

llvm/lib/Analysis/IVDescriptors.cpp
657	A static function would be nice.

In D150851#4480312, @Mel-Chen wrote:

Update: I could found an example where the approach in D152693 lead to incorrect result:
Assuming start_value is 3, and red_part is {0, 1, 2, 3} in the end. If the 3 is updated from the loop, not from the start_value , red should be 3 instead of 2.

@artagnon, could you please help to verify it?

Yes, I can confirm that there is indeed a bug. Thanks for catching it! I'm thinking about a fix now.

In D150851#4480316, @artagnon wrote:

Would be good to try and find more accurate names than using Select*Cmp combinations. A Compare-Select pattern is also used in the existing Min/Max reduction, but has a much better name.

I agree with @Ayal: I also got confused by the SelectCmp naming convention initially. Perhaps [I|F]AnyOfInv, [I|F]AnyOfInc, and [I|F]AnyOfDec, and rename the corresponding function isAnyOfReduction? I don't have a strong preference for the rename.

About the function name, I lean towards not renaming it, or renaming it to more easily understandable names like conditional select, conditional assignment, and so on.

In D150851#4480526, @artagnon wrote:

Yes, I can confirm that there is indeed a bug. Thanks for catching it! I'm thinking about a fix now.

Perhaps Ayal's approach can be helpful. The final found_func allows the approach to be more widely applicable. If we only focus on increasing/decreasing induction variables, we may be able to further streamline the process.

In D150851#4467261, @Ayal wrote:

return_type FindLast(return_type unfound_value, vec_predicate_func, found_func) {
  vec_unsigned_int select_red_part = splat(0); // Zero indicates unfound.
  vec_unsigned_int step_vec = splat(1); // Count vector iterations starting at 1.

  for (unsigned int i = 0; i < n; i+=vl, step_vec+=splat(1))
    select_red_part = (vec_predicate_func(i) ? step_vec : select_red_part;

  unsigned vec_indices_ored = reduce.or(select_red_part);
  if (vec_indices_ored == 0)
    return unfound_value;
  unsigned inflated_red_part = (select_red_part - splat(1)) * vl + <0,1,...,vl-1>;
  unsigned last_index = reduce.umax(inflated_red_part);
  return found_func(last_index);
}

Mel-Chen mentioned this in D153936: [LV] Add tests for select-cmp reduction pattern. (NFC).Jul 10 2023, 10:53 PM

Rebase.

Harbormaster completed remote builds in B244382: Diff 538971.Jul 11 2023, 3:40 AM

shiva0217 added inline comments.Jul 11 2023, 8:23 PM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3887	Thanks for the explanation! Could we use "} else if ((!VF.isVector() && !PhiR->isInLoop()))" to guard the generation? It could be easier to understand the codegen is needed when VF is not a vector and createTargetReduction won't be invoked. Should we rename createSentinelValueHandling as createSelectInitValOrReduction? I feel it could reflect the codegen but in a less strong opinion.

Rebase.

Harbormaster completed remote builds in B246453: Diff 541875.Jul 19 2023, 2:44 AM

Changes:

Rename Select[I|F]Cmp to [I|F]AnyOf
Rename SelectIV[I|F]Cmp to [I|F]FindLastIV

Harbormaster completed remote builds in B246778: Diff 542324.Jul 19 2023, 11:49 PM

Mel-Chen added a parent revision: D155786: [LV] Rename the Select[I|F]Cmp reduction pattern to [I|F]AnyOf. (NFC).Jul 19 2023, 11:50 PM

Mel-Chen mentioned this in D155786: [LV] Rename the Select[I|F]Cmp reduction pattern to [I|F]AnyOf. (NFC).Jul 19 2023, 11:56 PM

Mel-Chen edited the summary of this revision. (Show Details)

Hi Mel,

I had the chance to try out your patch today, and it seems that it's really quite restricted in its scope. I tried it out on this simple C program:

#include <stdio.h>

int main() {
      int src[20000] = {4, 5, 2};
      int r = 331;
      for (int i = 0; i < 20000; i++) {
        if (src[i] > 3)
          r = i;
      }
      printf("%d\n", r);
      return 0;
}

Unfortunately, your patch failed to vectorize this simple case, because when you match NonPhi against a PHINode, it is blocked on a trunc:

for.body:                                         ; preds = %entry, %for.body
  %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
  %r.05 = phi i32 [ 331, %entry ], [ %spec.select, %for.body ]
  %arrayidx = getelementptr inbounds [20000 x i32], ptr %src, i64 0, i64 %indvars.iv
  %3 = load i32, ptr %arrayidx, align 4, !tbaa !6
  %cmp1 = icmp sgt i32 %3, 3
  %4 = trunc i64 %indvars.iv to i32 // blocked here
  %spec.select = select i1 %cmp1, i32 %4, i32 %r.05
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
  %exitcond.not = icmp eq i64 %indvars.iv.next, 20000
  br i1 %exitcond.not, label %for.cond.cleanup, label %for.body, !llvm.loop !10

May I suggest using the logic outlined in isInductionMinMaxPattern in D152693?

Thanks.

In D150851#4523028, @artagnon wrote:

May I suggest using the logic outlined in isInductionMinMaxPattern in D152693?

It's not that simple, as you'd lose the information about hasNoSignedWrap from the AR of the trunc. I've also been playing with attempting to extend this patch to the decreasing IV case, and it seems we've painted ourselves into a corner by being more concerned about the codegen performance than the generality of the patch: the constant IV start is a serious limitation when looking at the decreasing IV case. I think other reviewers like @fhahn and @Ayal have also expressed interest in greater generality (perhaps at the cost of codegen performance).

artagnon mentioned this in D156124: LoopVectorize/iv-select-cmp: add tests for truncated IV.Jul 24 2023, 7:14 AM

artagnon mentioned this in D156152: LoopVectorize/iv-select-cmp: add test for decreasing IV, const start.Jul 24 2023, 10:32 AM

In D150851#4528089, @artagnon wrote:

In D150851#4523028, @artagnon wrote:

May I suggest using the logic outlined in isInductionMinMaxPattern in D152693?

It's not that simple, as you'd lose the information about hasNoSignedWrap from the AR of the trunc. I've also been playing with attempting to extend this patch to the decreasing IV case, and it seems we've painted ourselves into a corner by being more concerned about the codegen performance than the generality of the patch: the constant IV start is a serious limitation when looking at the decreasing IV case. I think other reviewers like @fhahn and @Ayal have also expressed interest in greater generality (perhaps at the cost of codegen performance).

Yes, this issue is known, which is why I left a TODO in the summary.
I made the following modifications before:

   auto IsIncreasingLoopInduction = [&SE, &Loop](Value *V) {
+    // FIXME: Should focus on SExt and Trunc only?
+    if (auto *Cast = dyn_cast<CastInst>(V))
+      V = Cast->getOperand(0);
+
     auto *Phi = dyn_cast<PHINode>(V);

However, this approach is not rigorous.
Regardless of whether the nsw flag can still be used to determine whether to use signed max or unsigned max, proving whether the truncated induction variable is monotonically increasing is already a challenge.
Consider the following scenario:

previous step: i64:0000000000000000000000000000000001111111111111111111111111111111 -> i32:01111111111111111111111111111111
add one
current step: i64:0000000000000000000000000000000010000000000000000000000000000000 -> i32:10000000000000000000000000000000

I have discussed this issue with my colleagues several times, and here are some directions we came up with:

Use SCEV overflow check. There is an overflow check generator in PSE that we can use, but it seems to be designed for checking the type of the original induction variable, not the truncated type. Using this approach would require implementing a new check generator and inserting it in the preheader.

Prevent the generation of truncated induction variables. In the IndVarSimplifyPass, there is an option called -indvars-widen-indvars https://llvm.org/doxygen/IndVarSimplify_8cpp.html#aea2c111bf1f82fd672acaad1931e7e2d. This transformation widens the type of the induction variable to reduce the number of sign/zero-extends. However, if there are users in the loop that require the narrower original type, truncation will occur. Perhaps we can make some adjustments in this area.

Before IndVarSimplifyPass:
; Preheader:
  for.body.preheader:                               ; preds = %entry
    br label %for.body

  ; Loop:
  for.body:                                         ; preds = %for.body.preheader, %for.body
    %i.011 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ]
    %idx.010 = phi i32 [ %cond, %for.body ], [ %ii, %for.body.preheader ]
    %idxprom = zext i32 %i.011 to i64
    %arrayidx = getelementptr inbounds i64, ptr %a, i64 %idxprom
    %0 = load i64, ptr %arrayidx, align 8, !tbaa !4
    %arrayidx2 = getelementptr inbounds i64, ptr %b, i64 %idxprom
    %1 = load i64, ptr %arrayidx2, align 8, !tbaa !4
    %cmp3 = icmp sgt i64 %0, %1
    %cond = select i1 %cmp3, i32 %i.011, i32 %idx.010
    %inc = add nuw nsw i32 %i.011, 1 
    %cmp = icmp slt i32 %inc, %n
    br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit, !llvm.loop !8

  ; Exit blocks
  for.cond.cleanup.loopexit:                        ; preds = %for.body
    %cond.lcssa = phi i32 [ %cond, %for.body ]
    br label %for.cond.cleanup

After IndVarSimplifyPass:
; Preheader:
  for.body.preheader:                               ; preds = %entry
    %wide.trip.count = zext i32 %n to i64
    br label %for.body

  ; Loop:
  for.body:                                         ; preds = %for.body.preheader, %for.body
    %indvars.iv = phi i64 [ 0, %for.body.preheader ], [ %indvars.iv.next, %for.body ]
    %idx.010 = phi i32 [ %cond, %for.body ], [ %ii, %for.body.preheader ]
    %arrayidx = getelementptr inbounds i64, ptr %a, i64 %indvars.iv
    %0 = load i64, ptr %arrayidx, align 8, !tbaa !4
    %arrayidx2 = getelementptr inbounds i64, ptr %b, i64 %indvars.iv
    %1 = load i64, ptr %arrayidx2, align 8, !tbaa !4
    %cmp3 = icmp sgt i64 %0, %1
    %2 = trunc i64 %indvars.iv to i32
    %cond = select i1 %cmp3, i32 %2, i32 %idx.010
    %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
    %exitcond = icmp ne i64 %indvars.iv.next, %wide.trip.count
    br i1 %exitcond, label %for.body, label %for.cond.cleanup.loopexit, !llvm.loop !8

  ; Exit blocks
  for.cond.cleanup.loopexit:                        ; preds = %for.body
    %cond.lcssa = phi i32 [ %cond, %for.body ]
    br label %for.cond.cleanup

Only target induction variables that determine the branch condition in the loop latch. With this approach, we can indirectly determine whether the induction variable is signed using loop guards, and directly perform vectorization according to the signed overflow is undefined behavior.

Signed IV loop guard:
entry:
  %cmp9 = icmp sgt i32 %n, 0
  br i1 %cmp9, label %for.body.preheader, label %for.cond.cleanup

Unsigned IV loop guard:
entry:
  %cmp9.not = icmp eq i32 %n, 0
  br i1 %cmp9.not, label %for.cond.cleanup, label %for.body.preheader

Currently, I personally prefer option 2, but since the performance impact of this modification is uncertain, our final decision is to proceed with option 3.

And of course, the worst-case would be to use a general conditional selecting reduction method to solve the problem.

artagnon mentioned this in rG110ec1863af6: LoopVectorize/iv-select-cmp: add test for decreasing IV, const start.Jul 26 2023, 6:15 AM

Thanks for the detailed thoughts, Mel! My initial reaction is that (2) is probably unworkable due to the fallout, and that (3) isn't general enough. I'm leaning towards trying out (1) -- I'll see how that works out over the next few days.

Joe added a subscriber: Joe.Jul 28 2023, 3:48 AM

Only target induction variables that determine the branch condition in the loop latch. With this approach, we can indirectly determine whether the induction variable is signed using loop guards, and directly perform vectorization according to the signed overflow is undefined behavior.

@Mel-Chen I like this option. But I don't think I understand fully - what's the purposes of determining whether the induction variable is signed?

To me, the problem stems from the loss of information that the IV had 32-bit NSW/NUW when IndVarSimplify widens it. Just thinking about alternative options, what if we could retain that information in loop metadata?

In D150851#4535194, @artagnon wrote:

Thanks for the detailed thoughts, Mel! My initial reaction is that (2) is probably unworkable due to the fallout, and that (3) isn't general enough. I'm leaning towards trying out (1) -- I'll see how that works out over the next few days.

Sounds good! Before you start implementing, I suggest observing the benchmarks to gather more cases. We choose option 3 because, based on our available benchmarks, all the opportunities for the FindLastIV pattern arise from induction variables that determine the branch condition at the loop latch. Therefore, we decided to go with option 3, as it is an implementation that is not too challenging and provides effective results

In D150851#4542688, @Joe wrote:

Only target induction variables that determine the branch condition in the loop latch. With this approach, we can indirectly determine whether the induction variable is signed using loop guards, and directly perform vectorization according to the signed overflow is undefined behavior.

@Mel-Chen I like this option. But I don't think I understand fully - what's the purposes of determining whether the induction variable is signed?

We need to determine whether we should use umax or smax in the middle.block. Currently, we have only implemented smax because the application of {1,+,step} unsigned induction variables is relatively uncommon.

To me, the problem stems from the loss of information that the IV had 32-bit NSW/NUW when IndVarSimplify widens it. Just thinking about alternative options, what if we could retain that information in loop metadata?

Yes, this is due to the loss of information. If you would like to use metadata, would it be attached to the step instruction of the induction variable?

In D150851#4546075, @Mel-Chen wrote:

In D150851#4542688, @Joe wrote:

Only target induction variables that determine the branch condition in the loop latch. With this approach, we can indirectly determine whether the induction variable is signed using loop guards, and directly perform vectorization according to the signed overflow is undefined behavior.

@Mel-Chen I like this option. But I don't think I understand fully - what's the purposes of determining whether the induction variable is signed?

We need to determine whether we should use umax or smax in the middle.block. Currently, we have only implemented smax because the application of {1,+,step} unsigned induction variables is relatively uncommon.

That makes sense, thanks for clarifying :)

To me, the problem stems from the loss of information that the IV had 32-bit NSW/NUW when IndVarSimplify widens it. Just thinking about alternative options, what if we could retain that information in loop metadata?

Yes, this is due to the loss of information. If you would like to use metadata, would it be attached to the step instruction of the induction variable?

I (naively) thought this would be attached to the loop metadata, but I haven't looked at implementation details. To be honest, I don't think this is the best option - I prefer option 3.

Mel-Chen mentioned this in rG425e9e81a0c9: [LV] Rename the Select[I|F]Cmp reduction pattern to [I|F]AnyOf. (NFC).Aug 3 2023, 12:37 AM

shiva0217 added inline comments.Aug 7 2023, 11:53 PM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3887	Oops, I think I mix the patch with some local changes. Please ignore the comment.

Rebase, and here is a summary of the changes:

Avoid the use of the SelectCmp.* series to indicate to AnyOf and FindLastIV, as @Ayal expressed concerns about potential confusion with min/max reduction.
I attempted to use FindLast.* as a collective term for AnyOf and FindLastIV, but it was deemed less readable upon completion, so the current version uses separate functions for AnyOf and FindLastIV.
Discovered an issue with AnyOf reduction while inserting AddReductionVar. A pre-commit revision D157375 has been opened to discuss this.

Mel-Chen added inline comments.Aug 8 2023, 2:00 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3887	No problem. And I've just rebased this patch. Please continue with the review. Thank you.

Harbormaster completed remote builds in B251023: Diff 548110.Aug 8 2023, 5:23 AM

Minor changes. Remove the changes that should not be in this patch.

Rebase.

Harbormaster completed remote builds in B251269: Diff 548453.Aug 8 2023, 11:49 PM

artagnon mentioned this in D157861: LoopVectorize: vectorize finding first IV in select-cmp.Aug 14 2023, 5:53 AM

artagnon added a child revision: D157861: LoopVectorize: vectorize finding first IV in select-cmp.Aug 14 2023, 5:53 AM

artagnon mentioned this in D157862: LoopVectorize: handle casted indvars in iv-select-cmp.Aug 14 2023, 6:06 AM

artagnon mentioned this in D157969: LoopVectorize/iv-select-cmp: add test for decreasing IV out-of-bound.Aug 15 2023, 5:15 AM

Changes:

Rebase
Format some code
New test cases @not_vectorized_select_icmp_const_cmp_in_recurrence and @not_vectorized_select_icmp_cmp_in_recurrence in Transforms/LoopVectorize/iv-select-cmp.ll.
Not allow the inclusion of cmp instructions in FindLast recurrence, as I mentioned in the pre-commit revision D157375. The examples are @not_vectorized_select_icmp_const_cmp_in_recurrence and @not_vectorized_select_icmp_cmp_in_recurrence in Transforms/LoopVectorize/iv-select-cmp.ll.

Herald added a subscriber: sunshaoce. · View Herald TranscriptAug 21 2023, 1:52 AM

Harbormaster completed remote builds in B253789: Diff 551932.Aug 21 2023, 3:07 AM

Remove the unused parameter Prev in RecurrenceDescriptor::isFindLastIVPattern.

Harbormaster completed remote builds in B254294: Diff 552652.Aug 23 2023, 4:39 AM

ping

artagnon mentioned this in rG04b1276ad3b8: LoopVectorize/iv-select-cmp: add tests for truncated IV.Aug 30 2023, 5:10 AM

I've looked at this patch several times, and applied it locally and played with it. I might not be the most qualified reviewer, but this patch is ready to land in my opinion. If anyone has any objections, please raise them now.

This revision is now accepted and ready to land.Sep 20 2023, 7:10 AM

In D150851#4648808, @artagnon wrote:

I've looked at this patch several times, and applied it locally and played with it. I might not be the most qualified reviewer, but this patch is ready to land in my opinion. If anyone has any objections, please raise them now.

Hi, artagnon, Thank you. We have some new fixes, but for some unknown reason, I can't use arc diff --update to update the changes. I'm considering moving the patch directly to a GitHub pull request.

artagnon mentioned this in rGef48e90489dc: LoopVectorize/iv-select-cmp: add test for decreasing IV out-of-bound.Sep 25 2023, 5:20 AM

It has been moved to github, please continue to review, thank you.
https://github.com/llvm/llvm-project/pull/67812

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

IVDescriptors.h

29 lines

Transforms/

Utils/

LoopUtils.h

24 lines

lib/

Analysis/

IVDescriptors.cpp

104 lines

Transforms/

Utils/

LoopUtils.cpp

28 lines

Vectorize/

LoopVectorize.cpp

17 lines

SLPVectorizer.cpp

4 lines

VPlanRecipes.cpp

16 lines

test/

Transforms/

LoopVectorize/

iv-select-cmp-no-wrap.ll

141 lines

iv-select-cmp.ll

1645 lines

select-min-index.ll

186 lines

Diff 552652

llvm/include/llvm/Analysis/IVDescriptors.h

Show All 28 Lines
class PredicatedScalarEvolution;		class PredicatedScalarEvolution;
class ScalarEvolution;		class ScalarEvolution;
class SCEV;		class SCEV;
class StoreInst;		class StoreInst;

/// These are the kinds of recurrences that we support.		/// These are the kinds of recurrences that we support.
enum class RecurKind {		enum class RecurKind {
None, ///< Not a recurrence.		None, ///< Not a recurrence.
Add, ///< Sum of integers.		Add, ///< Sum of integers.
Mul, ///< Product of integers.		Mul, ///< Product of integers.
Or, ///< Bitwise or logical OR of integers.		Or, ///< Bitwise or logical OR of integers.
		fhahnUnsubmitted Not Done Reply Inline Actions To keep the diff more compact, could you split the FP handling off? It also looks like codegen is at least not tested for the FP case? fhahn: To keep the diff more compact, could you split the FP handling off? It also looks like codegen…
		Mel-ChenAuthorUnsubmitted Done Reply Inline Actions I am afraid there is misunderstanding. The test functions starting with "@select_fcmp" are testing the SelectIVFCmp reduction pattern. SelectIVFCmp is similar in semantics to SelectFCmp, where the operand types of the select instruction are integer, and the cmp instruction is fcmp. Mel-Chen: I am afraid there is misunderstanding. The test functions starting with "@select_fcmp" are…
		artagnonUnsubmitted Not Done Reply Inline Actions Maybe have a `max` in the name to make it clear that we only do max-reductions? artagnon: Maybe have a `max` in the name to make it clear that we only do max-reductions?
		Mel-ChenAuthorUnsubmitted Done Reply Inline Actions It has some functional overlap with max reduction, but it is not exactly the same as max reduction. If we were to rename it, the change I suggest is: SelectICmp --> SelectInvICmp SelectFCmp --> SelectInvFCmp SelectIVICmp --> SelectIncIVICmp SelectIVFCmp --> SelectIncIVFCmp This will also make it easier for you to expand it in the future. Mel-Chen: It has some functional overlap with max reduction, but it is not exactly the same as max…
And, ///< Bitwise or logical AND of integers.		And, ///< Bitwise or logical AND of integers.
Xor, ///< Bitwise or logical XOR of integers.		Xor, ///< Bitwise or logical XOR of integers.
SMin, ///< Signed integer min implemented in terms of select(cmp()).		SMin, ///< Signed integer min implemented in terms of select(cmp()).
SMax, ///< Signed integer max implemented in terms of select(cmp()).		SMax, ///< Signed integer max implemented in terms of select(cmp()).
UMin, ///< Unsigned integer min implemented in terms of select(cmp()).		UMin, ///< Unsigned integer min implemented in terms of select(cmp()).
UMax, ///< Unsigned integer max implemented in terms of select(cmp()).		UMax, ///< Unsigned integer max implemented in terms of select(cmp()).
FAdd, ///< Sum of floats.		FAdd, ///< Sum of floats.
FMul, ///< Product of floats.		FMul, ///< Product of floats.
FMin, ///< FP min implemented in terms of select(cmp()).		FMin, ///< FP min implemented in terms of select(cmp()).
FMax, ///< FP max implemented in terms of select(cmp()).		FMax, ///< FP max implemented in terms of select(cmp()).
FMinimum, ///< FP min with llvm.minimum semantics		FMinimum, ///< FP min with llvm.minimum semantics
FMaximum, ///< FP max with llvm.maximum semantics		FMaximum, ///< FP max with llvm.maximum semantics
FMulAdd, ///< Sum of float products with llvm.fmuladd(a * b + sum).		FMulAdd, ///< Sum of float products with llvm.fmuladd(a * b + sum).
IAnyOf, ///< Any_of reduction with select(icmp(),x,y) where one of (x,y) is		IAnyOf, ///< Any_of reduction with select(icmp(),x,y) where one of (x,y) is
///< loop invariant, and both x and y are integer type.		///< loop invariant, and both x and y are integer type.
FAnyOf ///< Any_of reduction with select(fcmp(),x,y) where one of (x,y) is		FAnyOf, ///< Any_of reduction with select(fcmp(),x,y) where one of (x,y) is
///< loop invariant, and both x and y are integer type.		///< loop invariant, and both x and y are integer type.
// TODO: Any_of reduction need not be restricted to integer type only.		IFindLastIV, ///< FindLast reduction with select(icmp(),x,y) where one of
		///< (x,y) is increasing loop induction PHI, and both x and y are
		///< integer type.
		FFindLastIV ///< FindLast reduction with select(fcmp(),x,y) where one of (x,y)
		///< is increasing loop induction PHI, and both x and y are
		///< integer type.
		// TODO: Any_of and FindLast reduction need not be restricted to integer type
		// only.
};		};

/// The RecurrenceDescriptor is used to identify recurrences variables in a		/// The RecurrenceDescriptor is used to identify recurrences variables in a
/// loop. Reduction is a special case of recurrence that has uses of the		/// loop. Reduction is a special case of recurrence that has uses of the
/// recurrence variable outside the loop. The method isReductionPHI identifies		/// recurrence variable outside the loop. The method isReductionPHI identifies
/// reductions that are basic recurrences.		/// reductions that are basic recurrences.
///		///
/// Basic recurrences are defined as the summation, product, OR, AND, XOR, min,		/// Basic recurrences are defined as the summation, product, OR, AND, XOR, min,
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	public:
/// Returns a struct describing if the instruction 'I' can be a recurrence		/// Returns a struct describing if the instruction 'I' can be a recurrence
/// variable of type 'Kind' for a Loop \p L and reduction PHI \p Phi.		/// variable of type 'Kind' for a Loop \p L and reduction PHI \p Phi.
/// If the recurrence is a min/max pattern of select(icmp()) this function		/// If the recurrence is a min/max pattern of select(icmp()) this function
/// advances the instruction pointer 'I' from the compare instruction to the		/// advances the instruction pointer 'I' from the compare instruction to the
/// select instruction and stores this pointer in 'PatternLastInst' member of		/// select instruction and stores this pointer in 'PatternLastInst' member of
/// the returned struct.		/// the returned struct.
static InstDesc isRecurrenceInstr(Loop L, PHINode Phi, Instruction *I,		static InstDesc isRecurrenceInstr(Loop L, PHINode Phi, Instruction *I,
RecurKind Kind, InstDesc &Prev,		RecurKind Kind, InstDesc &Prev,
FastMathFlags FuncFMF);		FastMathFlags FuncFMF, ScalarEvolution *SE);

/// Returns true if instruction I has multiple uses in Insts		/// Returns true if instruction I has multiple uses in Insts
static bool hasMultipleUsesOf(Instruction *I,		static bool hasMultipleUsesOf(Instruction *I,
SmallPtrSetImpl<Instruction *> &Insts,		SmallPtrSetImpl<Instruction *> &Insts,
unsigned MaxNumUses);		unsigned MaxNumUses);

/// Returns true if all uses of the instruction I is within the Set.		/// Returns true if all uses of the instruction I is within the Set.
static bool areAllUsesIn(Instruction I, SmallPtrSetImpl<Instruction > &Set);		static bool areAllUsesIn(Instruction I, SmallPtrSetImpl<Instruction > &Set);
Show All 10 Lines	public:
/// Select(ICmp(A, B), X, Y), or		/// Select(ICmp(A, B), X, Y), or
/// Select(FCmp(A, B), X, Y)		/// Select(FCmp(A, B), X, Y)
/// where one of (X, Y) is a loop invariant integer and the other is a PHI		/// where one of (X, Y) is a loop invariant integer and the other is a PHI
/// value. \p Prev specifies the description of an already processed select		/// value. \p Prev specifies the description of an already processed select
/// instruction, so its corresponding cmp can be matched to it.		/// instruction, so its corresponding cmp can be matched to it.
static InstDesc isAnyOfPattern(Loop Loop, PHINode OrigPhi, Instruction *I,		static InstDesc isAnyOfPattern(Loop Loop, PHINode OrigPhi, Instruction *I,
InstDesc &Prev);		InstDesc &Prev);

		/// Returns a struct describing whether the instruction is either a
		/// Select(ICmp(A, B), X, Y), or
		/// Select(FCmp(A, B), X, Y)
		/// where one of (X, Y) is an increasing loop induction variable, and the
		/// other is a PHI value.
		// TODO: FindLast does not need be restricted to increasing loop induction
		// variables.
		static InstDesc isFindLastIVPattern(Loop Loop, PHINode OrigPhi,
		Instruction I, ScalarEvolution SE);

/// Returns a struct describing if the instruction is a		/// Returns a struct describing if the instruction is a
/// Select(FCmp(X, Y), (Z = X op PHINode), PHINode) instruction pattern.		/// Select(FCmp(X, Y), (Z = X op PHINode), PHINode) instruction pattern.
static InstDesc isConditionalRdxPattern(RecurKind Kind, Instruction *I);		static InstDesc isConditionalRdxPattern(RecurKind Kind, Instruction *I);

/// Returns identity corresponding to the RecurrenceKind.		/// Returns identity corresponding to the RecurrenceKind.
Value getRecurrenceIdentity(RecurKind K, Type Tp, FastMathFlags FMF) const;		Value getRecurrenceIdentity(RecurKind K, Type Tp, FastMathFlags FMF) const;

/// Returns the opcode corresponding to the RecurrenceKind.		/// Returns the opcode corresponding to the RecurrenceKind.
▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	public:
}		}

/// Returns true if the recurrence kind is of the form		/// Returns true if the recurrence kind is of the form
/// select(cmp(),x,y) where one of (x,y) is loop invariant.		/// select(cmp(),x,y) where one of (x,y) is loop invariant.
static bool isAnyOfRecurrenceKind(RecurKind Kind) {		static bool isAnyOfRecurrenceKind(RecurKind Kind) {
return Kind == RecurKind::IAnyOf \|\| Kind == RecurKind::FAnyOf;		return Kind == RecurKind::IAnyOf \|\| Kind == RecurKind::FAnyOf;
}		}

		/// Returns true if the recurrence kind is of the form
		/// select(cmp(),x,y) where one of (x,y) is increasing loop induction.
		static bool isFindLastIVRecurrenceKind(RecurKind Kind) {
		return Kind == RecurKind::IFindLastIV \|\| Kind == RecurKind::FFindLastIV;
		}

/// Returns the type of the recurrence. This type can be narrower than the		/// Returns the type of the recurrence. This type can be narrower than the
/// actual type of the Phi if the recurrence has been type-promoted.		/// actual type of the Phi if the recurrence has been type-promoted.
Type *getRecurrenceType() const { return RecurrenceType; }		Type *getRecurrenceType() const { return RecurrenceType; }

/// Returns a reference to the instructions used for type-promoting the		/// Returns a reference to the instructions used for type-promoting the
/// recurrence.		/// recurrence.
const SmallPtrSet<Instruction *, 8> &getCastInsts() const { return CastInsts; }		const SmallPtrSet<Instruction *, 8> &getCastInsts() const { return CastInsts; }

▲ Show 20 Lines • Show All 150 Lines • Show Last 20 Lines

llvm/include/llvm/Transforms/Utils/LoopUtils.h

Show First 20 Lines • Show All 356 Lines • ▼ Show 20 Lines	bool canSinkOrHoistInst(Instruction &I, AAResults AA, DominatorTree DT,
SinkAndHoistLICMFlags &LICMFlags,		SinkAndHoistLICMFlags &LICMFlags,
OptimizationRemarkEmitter *ORE = nullptr);		OptimizationRemarkEmitter *ORE = nullptr);

/// Returns the min/max intrinsic used when expanding a min/max reduction.		/// Returns the min/max intrinsic used when expanding a min/max reduction.
Intrinsic::ID getMinMaxReductionIntrinsicOp(RecurKind RK);		Intrinsic::ID getMinMaxReductionIntrinsicOp(RecurKind RK);

/// Returns the comparison predicate used when expanding a min/max reduction.		/// Returns the comparison predicate used when expanding a min/max reduction.
CmpInst::Predicate getMinMaxReductionPredicate(RecurKind RK);		CmpInst::Predicate getMinMaxReductionPredicate(RecurKind RK);

/// See RecurrenceDescriptor::isAnyOfPattern for a description of the pattern we		/// See RecurrenceDescriptor::isAnyOfPattern for a description of the pattern we
		shiva0217Unsubmitted Not Done Reply Inline Actions or an increasing loop induction variable? shiva0217: or an increasing loop induction variable?
		Mel-ChenAuthorUnsubmitted Done Reply Inline Actions Good find! Will correct it. Mel-Chen: Good find! Will correct it.
/// are trying to match. In this pattern, we are only ever selecting between two		/// are trying to match. In this pattern, we are only ever selecting between two
/// values: 1) an initial start value \p StartVal of the reduction PHI, and 2) a		/// values: 1) an initial start value \p StartVal of the reduction PHI, and 2) a
/// loop invariant value. If any of lane value in \p Left, \p Right is not equal		/// loop invariant value. If any of lane value in \p Left, \p Right is not equal
/// to \p StartVal, select the loop invariant value. This is done by selecting		/// to \p StartVal, select the loop invariant value. This is done by selecting
/// \p Right iff \p Left is equal to \p StartVal.		/// \p Right iff \p Left is equal to \p StartVal.
Value createAnyOfOp(IRBuilderBase &Builder, Value StartVal, RecurKind RK,		Value createAnyOfOp(IRBuilderBase &Builder, Value StartVal, RecurKind RK,
Value Left, Value Right);		Value Left, Value Right);

		/// See RecurrenceDescriptor::isFindLastIVPattern for a description of the
		/// pattern we are trying to match. In this pattern, since the selected set of
		/// values forms an increasing sequence, we are selecting the maximum value from
		/// \p Left and \p Right.
		Value createFindLastIVOp(IRBuilderBase &Builder, Value Left, Value *Right);

/// Returns a Min/Max operation corresponding to MinMaxRecurrenceKind.		/// Returns a Min/Max operation corresponding to MinMaxRecurrenceKind.
/// The Builder's fast-math-flags must be set to propagate the expected values.		/// The Builder's fast-math-flags must be set to propagate the expected values.
Value createMinMaxOp(IRBuilderBase &Builder, RecurKind RK, Value Left,		Value createMinMaxOp(IRBuilderBase &Builder, RecurKind RK, Value Left,
Value *Right);		Value *Right);

/// Generates an ordered vector reduction using extracts to reduce the value.		/// Generates an ordered vector reduction using extracts to reduce the value.
Value getOrderedReduction(IRBuilderBase &Builder, Value Acc, Value *Src,		Value getOrderedReduction(IRBuilderBase &Builder, Value Acc, Value *Src,
unsigned Op, RecurKind MinMaxKind = RecurKind::None);		unsigned Op, RecurKind MinMaxKind = RecurKind::None);
Show All 16 Lines
/// Create a target reduction of the given vector \p Src for a reduction of the		/// Create a target reduction of the given vector \p Src for a reduction of the
/// kind RecurKind::IAnyOf or RecurKind::FAnyOf. The reduction operation is		/// kind RecurKind::IAnyOf or RecurKind::FAnyOf. The reduction operation is
/// described by \p Desc.		/// described by \p Desc.
Value *createAnyOfTargetReduction(IRBuilderBase &B,		Value *createAnyOfTargetReduction(IRBuilderBase &B,
const TargetTransformInfo TTI, Value Src,		const TargetTransformInfo TTI, Value Src,
const RecurrenceDescriptor &Desc,		const RecurrenceDescriptor &Desc,
PHINode *OrigPhi);		PHINode *OrigPhi);

		/// Create a target reduction of the given vector \p Src for a reduction of the
		/// kind RecurKind::IFindLastIV or RecurKind::FFindLastIV. The reduction
		/// operation is described by \p Desc.
		Value *createFindLastIVTargetReduction(IRBuilderBase &B,
		const TargetTransformInfo *TTI,
		Value *Src,
		const RecurrenceDescriptor &Desc);

/// Create a generic target reduction using a recurrence descriptor \p Desc		/// Create a generic target reduction using a recurrence descriptor \p Desc
/// The target is queried to determine if intrinsics or shuffle sequences are		/// The target is queried to determine if intrinsics or shuffle sequences are
/// required to implement the reduction.		/// required to implement the reduction.
/// Fast-math-flags are propagated using the RecurrenceDescriptor.		/// Fast-math-flags are propagated using the RecurrenceDescriptor.
Value createTargetReduction(IRBuilderBase &B, const TargetTransformInfo TTI,		Value createTargetReduction(IRBuilderBase &B, const TargetTransformInfo TTI,
const RecurrenceDescriptor &Desc, Value *Src,		const RecurrenceDescriptor &Desc, Value *Src,
PHINode *OrigPhi = nullptr);		PHINode *OrigPhi = nullptr);

/// Create an ordered reduction intrinsic using the given recurrence		/// Create an ordered reduction intrinsic using the given recurrence
/// descriptor \p Desc.		/// descriptor \p Desc.
Value *createOrderedReduction(IRBuilderBase &B,		Value *createOrderedReduction(IRBuilderBase &B,
const RecurrenceDescriptor &Desc, Value *Src,		const RecurrenceDescriptor &Desc, Value *Src,
Value *Start);		Value *Start);

		/// Returns a set of cmp and select instructions as shown below:
		/// Select(Cmp(NE, Rdx, Iden), Rdx, InitVal)
		/// where \p Rdx is a scalar value generated by target reduction, Iden is the
		/// sentinel value of the recurrence descriptor \p Desc, and InitVal is the
		/// start value of the recurrence descriptor \p Desc.
		Value *createSentinelValueHandling(IRBuilderBase &Builder,
		const TargetTransformInfo *TTI,
		const RecurrenceDescriptor &Desc,
		Value *Rdx);

/// Get the intersection (logical and) of all of the potential IR flags		/// Get the intersection (logical and) of all of the potential IR flags
/// of each scalar operation (VL) that will be converted into a vector (I).		/// of each scalar operation (VL) that will be converted into a vector (I).
/// If OpValue is non-null, we only consider operations similar to OpValue		/// If OpValue is non-null, we only consider operations similar to OpValue
/// when intersecting.		/// when intersecting.
/// Flag set: NSW, NUW (if IncludeWrapFlags is true), exact, and all of		/// Flag set: NSW, NUW (if IncludeWrapFlags is true), exact, and all of
/// fast-math.		/// fast-math.
void propagateIRFlags(Value I, ArrayRef<Value > VL, Value *OpValue = nullptr,		void propagateIRFlags(Value I, ArrayRef<Value > VL, Value *OpValue = nullptr,
bool IncludeWrapFlags = true);		bool IncludeWrapFlags = true);
▲ Show 20 Lines • Show All 140 Lines • Show Last 20 Lines

llvm/lib/Analysis/IVDescriptors.cpp

Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	bool RecurrenceDescriptor::isIntegerRecurrenceKind(RecurKind Kind) {
case RecurKind::And:		case RecurKind::And:
case RecurKind::Xor:		case RecurKind::Xor:
case RecurKind::SMax:		case RecurKind::SMax:
case RecurKind::SMin:		case RecurKind::SMin:
case RecurKind::UMax:		case RecurKind::UMax:
case RecurKind::UMin:		case RecurKind::UMin:
case RecurKind::IAnyOf:		case RecurKind::IAnyOf:
case RecurKind::FAnyOf:		case RecurKind::FAnyOf:
		case RecurKind::IFindLastIV:
		case RecurKind::FFindLastIV:
return true;		return true;
}		}
return false;		return false;
}		}

bool RecurrenceDescriptor::isFloatingPointRecurrenceKind(RecurKind Kind) {		bool RecurrenceDescriptor::isFloatingPointRecurrenceKind(RecurKind Kind) {
return (Kind != RecurKind::None) && !isIntegerRecurrenceKind(Kind);		return (Kind != RecurKind::None) && !isIntegerRecurrenceKind(Kind);
}		}
▲ Show 20 Lines • Show All 305 Lines • ▼ Show 20 Lines	if (!Cur->isCommutative() && !IsAPhi && !isa<SelectInst>(Cur) &&
!VisitedInsts.count(dyn_cast<Instruction>(Cur->getOperand(0))))		!VisitedInsts.count(dyn_cast<Instruction>(Cur->getOperand(0))))
return false;		return false;

// Any reduction instruction must be of one of the allowed kinds. We ignore		// Any reduction instruction must be of one of the allowed kinds. We ignore
// the starting value (the Phi or an AND instruction if the Phi has been		// the starting value (the Phi or an AND instruction if the Phi has been
// type-promoted).		// type-promoted).
if (Cur != Start) {		if (Cur != Start) {
ReduxDesc =		ReduxDesc =
isRecurrenceInstr(TheLoop, Phi, Cur, Kind, ReduxDesc, FuncFMF);		isRecurrenceInstr(TheLoop, Phi, Cur, Kind, ReduxDesc, FuncFMF, SE);
ExactFPMathInst = ExactFPMathInst == nullptr		ExactFPMathInst = ExactFPMathInst == nullptr
? ReduxDesc.getExactFPMathInst()		? ReduxDesc.getExactFPMathInst()
: ExactFPMathInst;		: ExactFPMathInst;
if (!ReduxDesc.isRecurrence())		if (!ReduxDesc.isRecurrence())
return false;		return false;
// FIXME: FMF is allowed on phi, but propagation is not handled correctly.		// FIXME: FMF is allowed on phi, but propagation is not handled correctly.
if (isa<FPMathOperator>(ReduxDesc.getPatternInst()) && !IsAPhi) {		if (isa<FPMathOperator>(ReduxDesc.getPatternInst()) && !IsAPhi) {
FastMathFlags CurFMF = ReduxDesc.getPatternInst()->getFastMathFlags();		FastMathFlags CurFMF = ReduxDesc.getPatternInst()->getFastMathFlags();
▲ Show 20 Lines • Show All 213 Lines • ▼ Show 20 Lines	bool RecurrenceDescriptor::AddReductionVar(
RecurrenceDescriptor RD(RdxStart, ExitInstruction, IntermediateStore, Kind,		RecurrenceDescriptor RD(RdxStart, ExitInstruction, IntermediateStore, Kind,
FMF, ExactFPMathInst, RecurrenceType, IsSigned,		FMF, ExactFPMathInst, RecurrenceType, IsSigned,
IsOrdered, CastInsts, MinWidthCastToRecurrenceType);		IsOrdered, CastInsts, MinWidthCastToRecurrenceType);
RedDes = RD;		RedDes = RD;

return true;		return true;
}		}

// We are looking for loops that do something like this:		// We are looking for loops that do something like this:
// int r = 0;		// int r = 0;
// for (int i = 0; i < n; i++) {		// for (int i = 0; i < n; i++) {
// if (src[i] > 3)		// if (src[i] > 3)
// r = 3;		// r = 3;
// }		// }
// where the reduction value (r) only has two states, in this example 0 or 3.		// where the reduction value (r) only has two states, in this example 0 or 3.
// The generated LLVM IR for this type of loop will be like this:		// The generated LLVM IR for this type of loop will be like this:
// for.body:		// for.body:
// %r = phi i32 [ %spec.select, %for.body ], [ 0, %entry ]		// %r = phi i32 [ %spec.select, %for.body ], [ 0, %entry ]
// ...		// ...
// %cmp = icmp sgt i32 %5, 3		// %cmp = icmp sgt i32 %5, 3
// %spec.select = select i1 %cmp, i32 3, i32 %r		// %spec.select = select i1 %cmp, i32 3, i32 %r
// ...		// ...
// In general we can support vectorization of loops where 'r' flips between		// In general we can support vectorization of loops where 'r' flips between
// any two non-constants, provided they are loop invariant. The only thing		// any two non-constants, provided they are loop invariant. The only thing
// we actually care about at the end of the loop is whether or not any lane		// we actually care about at the end of the loop is whether or not any lane
// in the selected vector is different from the start value. The final		// in the selected vector is different from the start value. The final
// across-vector reduction after the loop simply involves choosing the start		// across-vector reduction after the loop simply involves choosing the start
// value if nothing changed (0 in the example above) or the other selected		// value if nothing changed (0 in the example above) or the other selected
// value (3 in the example above).		// value (3 in the example above).
		artagnonUnsubmitted Not Done Reply Inline Actions Please update these comments. artagnon: Please update these comments.
		Mel-ChenAuthorUnsubmitted Done Reply Inline Actions Nice catch! Will update it. Mel-Chen: Nice catch! Will update it.
RecurrenceDescriptor::InstDesc		RecurrenceDescriptor::InstDesc
RecurrenceDescriptor::isAnyOfPattern(Loop Loop, PHINode OrigPhi,		RecurrenceDescriptor::isAnyOfPattern(Loop Loop, PHINode OrigPhi,
Instruction *I, InstDesc &Prev) {		Instruction *I, InstDesc &Prev) {
// We must handle the select(cmp(),x,y) as a single instruction. Advance to		// We must handle the select(cmp(),x,y) as a single instruction. Advance to
// the select.		// the select.
CmpInst::Predicate Pred;		CmpInst::Predicate Pred;
if (match(I, m_OneUse(m_Cmp(Pred, m_Value(), m_Value())))) {		if (match(I, m_OneUse(m_Cmp(Pred, m_Value(), m_Value())))) {
if (auto Select = dyn_cast<SelectInst>(I->user_begin()))		if (auto Select = dyn_cast<SelectInst>(I->user_begin()))
Show All 10 Lines	RecurrenceDescriptor::isAnyOfPattern(Loop Loop, PHINode OrigPhi,

if (OrigPhi == dyn_cast<PHINode>(SI->getTrueValue()))		if (OrigPhi == dyn_cast<PHINode>(SI->getTrueValue()))
NonPhi = SI->getFalseValue();		NonPhi = SI->getFalseValue();
else if (OrigPhi == dyn_cast<PHINode>(SI->getFalseValue()))		else if (OrigPhi == dyn_cast<PHINode>(SI->getFalseValue()))
NonPhi = SI->getTrueValue();		NonPhi = SI->getTrueValue();
else		else
return InstDesc(false, I);		return InstDesc(false, I);

// We are looking for selects of the form:		// We are looking for selects of the form:
		artagnonUnsubmitted Not Done Reply Inline Actions Why not split this out into its own function? artagnon: Why not split this out into its own function?
		Mel-ChenAuthorUnsubmitted Done Reply Inline Actions I'm also think about this because I personally feel that this function is a bit long. Do you think it's better to put it here as a static function, or put it in another file? Mel-Chen: I'm also think about this because I personally feel that this function is a bit long. Do you…
		artagnonUnsubmitted Not Done Reply Inline Actions A static function would be nice. artagnon: A static function would be nice.
// select(cmp(), phi, loop_invariant) or		// select(cmp(), phi, loop_invariant) or
// select(cmp(), loop_invariant, phi)		// select(cmp(), loop_invariant, phi)
if (!Loop->isLoopInvariant(NonPhi))		if (!Loop->isLoopInvariant(NonPhi))
return InstDesc(false, I);		return InstDesc(false, I);

return InstDesc(I, isa<ICmpInst>(I->getOperand(0)) ? RecurKind::IAnyOf		return InstDesc(I, isa<ICmpInst>(I->getOperand(0)) ? RecurKind::IAnyOf
: RecurKind::FAnyOf);		: RecurKind::FAnyOf);
}		}

		// We are looking for loops that do something like this:
		artagnonUnsubmitted Not Done Reply Inline Actions We should forbid these cases in the first place instead of checking them. artagnon: We should forbid these cases in the first place instead of checking them.
		Mel-ChenAuthorUnsubmitted Done Reply Inline Actions I do not understand. I believe this series of checks already forbids these cases. Mel-Chen: I do not understand. I believe this series of checks already forbids these cases.
		// int r = 0;
		// for (int i = 0; i < n; i++) {
		// if (src[i] > 3)
		fhahnUnsubmitted Not Done Reply Inline Actions Do we need to distinguish here between no signed/no unsigned wrap and then chose `smax/umax` during codegen? fhahn: Do we need to distinguish here between no signed/no unsigned wrap and then chose `smax/umax`…
		Mel-ChenAuthorUnsubmitted Done Reply Inline Actions That's a good point. Implementing the patch for both signed and unsigned induction variables can be challenging in practice. The current patch only focuses on the signed IV because in most applications, we often encounter induction variables in the form of IV {0, +, step}. When the IV is signed, we can use -1 or the minimum value of the signed data type as a sentinel value. However, when the IV is unsigned, we don't have a value smaller than 0 to use. This doesn't mean that unsigned IV cannot be vectorized, but rather they require additional handling and a more refined approach. Of course, if an unsigned IV is {1, +, step}, we can directly use the method implemented in this patch. However, such cases are less common, so we decided to focus on handling signed IV first. Mel-Chen: That's a good point. Implementing the patch for both signed and unsigned induction variables…
		artagnonUnsubmitted Not Done Reply Inline Actions Doesn't the `InstDesc` tell you if it's signed or unsigned? artagnon: Doesn't the `InstDesc` tell you if it's signed or unsigned?
		Mel-ChenAuthorUnsubmitted Done Reply Inline Actions No, it does not provide signed or unsigned information. Mel-Chen: No, it does not provide signed or unsigned information.
		// r = i;
		// }
		// The reduction value (r) is derived from either the values of an increasing
		// induction variable (i) sequence, or from the start value (0).
		// The LLVM IR generated for such loops would be as follows:
		artagnonUnsubmitted Not Done Reply Inline Actions Why is this necessary? Even if it isn't the min signed value, we should codegen just fine, as the codegen just involves applying a mask and applying max/min-reduce. artagnon: Why is this necessary? Even if it isn't the min signed value, we should codegen just fine, as…
		Mel-ChenAuthorUnsubmitted Done Reply Inline Actions This is complicated, let me explain why this pattern requires this check. It is related to our goals: 1) We want to achieve the reduction with one reduce intrinsic. 2) We want to use a static sentinel value. About 1), , I understand that even without checking the boundaries or ensuring the select operand is increasing or decreasing, we could still vectorize it by using two reductions. However, the most common format we encounter for induction variables is {0, +, 1}. If this IV is signed, we can accomplish the reduction in one reduction. Therefore, this is based on performance considerations. As for 2), it follows the decision made in 1), which requires us to have a sentinel value. In the case of an increasing IV, the only restriction on the sentinel value is that it must be less than the start value of the IV. In the case of {0, +, 1}, the sentinel value can be -1 or a smaller value, i.e. dynamic sentinel value. However, for easier implementation, we finally decided to use a static sentinel value, which is the minimum value of the data type. Additional description: If we were to use a dynamic sentinel value, it would involve checking whether the data type can represent `IV start value - step` (in the case of {0, +, 1}, it would be -1). However, within the LLVM framework, I currently don't have a way to implement a version with a dynamic sentinel value, which is a room for potential improvement. Mel-Chen: This is complicated, let me explain why this pattern requires this check. It is related to our…
		// for.body:
		// %r = phi i32 [ %spec.select, %for.body ], [ 0, %entry ]
		// %i = phi i32 [ %inc, %for.body ], [ 0, %entry ]
		artagnonUnsubmitted Not Done Reply Inline Actions `getStepRecurrence()` from the SCEV AddRec, instead of relying on `isInductionPhi()`? artagnon: `getStepRecurrence()` from the SCEV AddRec, instead of relying on `isInductionPhi()`?
		Mel-ChenAuthorUnsubmitted Done Reply Inline Actions Why? What are the shortcomings that make you think `isInductionPHI` should not be used? In my mind, better to rely on the existing function rather than trying to reimplement it. Mel-Chen: Why? What are the shortcomings that make you think `isInductionPHI` should not be used? In my…
		// ...
		// %cmp = icmp sgt i32 %5, 3
		// %spec.select = select i1 %cmp, i32 %i, i32 %r
		// %inc = add nsw i32 %i, 1
		// ...
		// Since 'i' is an increasing induction variable, the reduction value after the
		// loop will be the maximum value of 'i' that the condition (src[i] > 3) is
		// satisfied, or the start value (0 in the example above). When the start value
		// of the increasing induction variable 'i' is greater than the minimum value of
		// the data type, we can use the minimum value of the data type as a sentinel
		// value to replace the start value. This allows us to perform a single
		// reduction max operation to obtain the final reduction result.
		// TODO: It is possible to solve the case where the start value is the minimum
		// value of the data type or a non-constant value by using mask and multiple
		// reduction operations.
		RecurrenceDescriptor::InstDesc
		RecurrenceDescriptor::isFindLastIVPattern(Loop Loop, PHINode OrigPhi,
		Instruction I, ScalarEvolution SE) {
		// Only match select with single use cmp condition.
		// TODO: Only handle single use for now.
		CmpInst::Predicate Pred;
		if (!match(I, m_Select(m_OneUse(m_Cmp(Pred, m_Value(), m_Value())), m_Value(),
		m_Value())))
		return InstDesc(false, I);

		SelectInst *SI = cast<SelectInst>(I);
		Value *NonRdxPhi = nullptr;

		if (OrigPhi == dyn_cast<PHINode>(SI->getTrueValue()))
		NonRdxPhi = SI->getFalseValue();
		else if (OrigPhi == dyn_cast<PHINode>(SI->getFalseValue()))
		NonRdxPhi = SI->getTrueValue();
		else
		return InstDesc(false, I);

		auto IsIncreasingLoopInduction = [&SE, &Loop](Value *V) {
		auto *Phi = dyn_cast<PHINode>(V);
		if (!Phi)
		return false;

		if (!SE)
		return false;

		InductionDescriptor ID;
		if (!InductionDescriptor::isInductionPHI(Phi, Loop, SE, ID))
		return false;

		const SCEVAddRecExpr *AR = cast<SCEVAddRecExpr>(SE->getSCEV(Phi));
		if (!AR->hasNoSignedWrap())
		return false;

		ConstantInt *IVStartValue = dyn_cast<ConstantInt>(ID.getStartValue());
		if (!IVStartValue \|\| IVStartValue->isMinSignedValue())
		return false;

		const SCEV *Step = ID.getStep();
		return SE->isKnownPositive(Step);
		};

		// We are looking for selects of the form:
		// select(cmp(), phi, loop_induction) or
		// select(cmp(), loop_induction, phi)
		if (!IsIncreasingLoopInduction(NonRdxPhi))
		return InstDesc(false, I);

		return InstDesc(I, isa<ICmpInst>(I->getOperand(0)) ? RecurKind::IFindLastIV
		: RecurKind::FFindLastIV);
		}

RecurrenceDescriptor::InstDesc		RecurrenceDescriptor::InstDesc
RecurrenceDescriptor::isMinMaxPattern(Instruction *I, RecurKind Kind,		RecurrenceDescriptor::isMinMaxPattern(Instruction *I, RecurKind Kind,
const InstDesc &Prev) {		const InstDesc &Prev) {
assert((isa<CmpInst>(I) \|\| isa<SelectInst>(I) \|\| isa<CallInst>(I)) &&		assert((isa<CmpInst>(I) \|\| isa<SelectInst>(I) \|\| isa<CallInst>(I)) &&
"Expected a cmp or select or call instruction");		"Expected a cmp or select or call instruction");
if (!isMinMaxRecurrenceKind(Kind))		if (!isMinMaxRecurrenceKind(Kind))
return InstDesc(false, I);		return InstDesc(false, I);

▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines	RecurrenceDescriptor::isConditionalRdxPattern(RecurKind Kind, Instruction *I) {
Instruction IPhi = isa<PHINode>(Op1) ? dyn_cast<Instruction>(Op1)		Instruction IPhi = isa<PHINode>(Op1) ? dyn_cast<Instruction>(Op1)
: dyn_cast<Instruction>(Op2);		: dyn_cast<Instruction>(Op2);
if (!IPhi \|\| IPhi != FalseVal)		if (!IPhi \|\| IPhi != FalseVal)
return InstDesc(false, I);		return InstDesc(false, I);

return InstDesc(true, SI);		return InstDesc(true, SI);
}		}

RecurrenceDescriptor::InstDesc		RecurrenceDescriptor::InstDesc RecurrenceDescriptor::isRecurrenceInstr(
RecurrenceDescriptor::isRecurrenceInstr(Loop L, PHINode OrigPhi,		Loop L, PHINode OrigPhi, Instruction *I, RecurKind Kind, InstDesc &Prev,
Instruction *I, RecurKind Kind,		FastMathFlags FuncFMF, ScalarEvolution *SE) {
InstDesc &Prev, FastMathFlags FuncFMF) {
assert(Prev.getRecKind() == RecurKind::None \|\| Prev.getRecKind() == Kind);		assert(Prev.getRecKind() == RecurKind::None \|\| Prev.getRecKind() == Kind);
switch (I->getOpcode()) {		switch (I->getOpcode()) {
default:		default:
return InstDesc(false, I);		return InstDesc(false, I);
case Instruction::PHI:		case Instruction::PHI:
return InstDesc(I, Prev.getRecKind(), Prev.getExactFPMathInst());		return InstDesc(I, Prev.getRecKind(), Prev.getExactFPMathInst());
case Instruction::Sub:		case Instruction::Sub:
case Instruction::Add:		case Instruction::Add:
Show All 13 Lines	RecurrenceDescriptor::InstDesc RecurrenceDescriptor::isRecurrenceInstr(
case Instruction::FSub:		case Instruction::FSub:
case Instruction::FAdd:		case Instruction::FAdd:
return InstDesc(Kind == RecurKind::FAdd, I,		return InstDesc(Kind == RecurKind::FAdd, I,
I->hasAllowReassoc() ? nullptr : I);		I->hasAllowReassoc() ? nullptr : I);
case Instruction::Select:		case Instruction::Select:
if (Kind == RecurKind::FAdd \|\| Kind == RecurKind::FMul \|\|		if (Kind == RecurKind::FAdd \|\| Kind == RecurKind::FMul \|\|
Kind == RecurKind::Add \|\| Kind == RecurKind::Mul)		Kind == RecurKind::Add \|\| Kind == RecurKind::Mul)
return isConditionalRdxPattern(Kind, I);		return isConditionalRdxPattern(Kind, I);
		if (isFindLastIVRecurrenceKind(Kind))
		return isFindLastIVPattern(L, OrigPhi, I, SE);
[[fallthrough]];		[[fallthrough]];
case Instruction::FCmp:		case Instruction::FCmp:
case Instruction::ICmp:		case Instruction::ICmp:
case Instruction::Call:		case Instruction::Call:
if (isAnyOfRecurrenceKind(Kind))		if (isAnyOfRecurrenceKind(Kind))
return isAnyOfPattern(L, OrigPhi, I, Prev);		return isAnyOfPattern(L, OrigPhi, I, Prev);
auto HasRequiredFMF = [&]() {		auto HasRequiredFMF = [&]() {
if (FuncFMF.noNaNs() && FuncFMF.noSignedZeros())		if (FuncFMF.noNaNs() && FuncFMF.noSignedZeros())
▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	if (AddReductionVar(Phi, RecurKind::UMin, TheLoop, FMF, RedDes, DB, AC, DT,
return true;		return true;
}		}
if (AddReductionVar(Phi, RecurKind::IAnyOf, TheLoop, FMF, RedDes, DB, AC, DT,		if (AddReductionVar(Phi, RecurKind::IAnyOf, TheLoop, FMF, RedDes, DB, AC, DT,
SE)) {		SE)) {
LLVM_DEBUG(dbgs() << "Found an integer conditional select reduction PHI."		LLVM_DEBUG(dbgs() << "Found an integer conditional select reduction PHI."
<< *Phi << "\n");		<< *Phi << "\n");
return true;		return true;
}		}
		if (AddReductionVar(Phi, RecurKind::IFindLastIV, TheLoop, FMF, RedDes, DB, AC,
		DT, SE)) {
		LLVM_DEBUG(dbgs() << "Found a FindLastIV reduction PHI." << *Phi << "\n");
		return true;
		}
if (AddReductionVar(Phi, RecurKind::FMul, TheLoop, FMF, RedDes, DB, AC, DT,		if (AddReductionVar(Phi, RecurKind::FMul, TheLoop, FMF, RedDes, DB, AC, DT,
SE)) {		SE)) {
LLVM_DEBUG(dbgs() << "Found an FMult reduction PHI." << *Phi << "\n");		LLVM_DEBUG(dbgs() << "Found an FMult reduction PHI." << *Phi << "\n");
return true;		return true;
}		}
if (AddReductionVar(Phi, RecurKind::FAdd, TheLoop, FMF, RedDes, DB, AC, DT,		if (AddReductionVar(Phi, RecurKind::FAdd, TheLoop, FMF, RedDes, DB, AC, DT,
SE)) {		SE)) {
LLVM_DEBUG(dbgs() << "Found an FAdd reduction PHI." << *Phi << "\n");		LLVM_DEBUG(dbgs() << "Found an FAdd reduction PHI." << *Phi << "\n");
▲ Show 20 Lines • Show All 173 Lines • ▼ Show 20 Lines	Value RecurrenceDescriptor::getRecurrenceIdentity(RecurKind K, Type Tp,
case RecurKind::FMinimum:		case RecurKind::FMinimum:
return ConstantFP::getInfinity(Tp, false /Negative/);		return ConstantFP::getInfinity(Tp, false /Negative/);
case RecurKind::FMaximum:		case RecurKind::FMaximum:
return ConstantFP::getInfinity(Tp, true /Negative/);		return ConstantFP::getInfinity(Tp, true /Negative/);
case RecurKind::IAnyOf:		case RecurKind::IAnyOf:
case RecurKind::FAnyOf:		case RecurKind::FAnyOf:
return getRecurrenceStartValue();		return getRecurrenceStartValue();
break;		break;
		case RecurKind::IFindLastIV:
		case RecurKind::FFindLastIV:
		return getRecurrenceIdentity(RecurKind::SMax, Tp, FMF);
default:		default:
llvm_unreachable("Unknown recurrence kind");		llvm_unreachable("Unknown recurrence kind");
}		}
}		}

unsigned RecurrenceDescriptor::getOpcode(RecurKind Kind) {		unsigned RecurrenceDescriptor::getOpcode(RecurKind Kind) {
switch (Kind) {		switch (Kind) {
case RecurKind::Add:		case RecurKind::Add:
Show All 11 Lines	unsigned RecurrenceDescriptor::getOpcode(RecurKind Kind) {
case RecurKind::FMulAdd:		case RecurKind::FMulAdd:
case RecurKind::FAdd:		case RecurKind::FAdd:
return Instruction::FAdd;		return Instruction::FAdd;
case RecurKind::SMax:		case RecurKind::SMax:
case RecurKind::SMin:		case RecurKind::SMin:
case RecurKind::UMax:		case RecurKind::UMax:
case RecurKind::UMin:		case RecurKind::UMin:
case RecurKind::IAnyOf:		case RecurKind::IAnyOf:
		case RecurKind::IFindLastIV:
return Instruction::ICmp;		return Instruction::ICmp;
case RecurKind::FMax:		case RecurKind::FMax:
case RecurKind::FMin:		case RecurKind::FMin:
case RecurKind::FMaximum:		case RecurKind::FMaximum:
case RecurKind::FMinimum:		case RecurKind::FMinimum:
case RecurKind::FAnyOf:		case RecurKind::FAnyOf:
		case RecurKind::FFindLastIV:
return Instruction::FCmp;		return Instruction::FCmp;
default:		default:
llvm_unreachable("Unknown recurrence operation");		llvm_unreachable("Unknown recurrence operation");
}		}
}		}

SmallVector<Instruction *, 4>		SmallVector<Instruction *, 4>
RecurrenceDescriptor::getReductionOpChain(PHINode Phi, Loop L) const {		RecurrenceDescriptor::getReductionOpChain(PHINode Phi, Loop L) const {
▲ Show 20 Lines • Show All 405 Lines • Show Last 20 Lines

llvm/lib/Transforms/Utils/LoopUtils.cpp

Show First 20 Lines • Show All 936 Lines • ▼ Show 20 Lines	Value llvm::createAnyOfOp(IRBuilderBase &Builder, Value StartVal,
RecurKind RK, Value Left, Value Right) {		RecurKind RK, Value Left, Value Right) {
if (auto VTy = dyn_cast<VectorType>(Left->getType()))		if (auto VTy = dyn_cast<VectorType>(Left->getType()))
StartVal = Builder.CreateVectorSplat(VTy->getElementCount(), StartVal);		StartVal = Builder.CreateVectorSplat(VTy->getElementCount(), StartVal);
Value *Cmp =		Value *Cmp =
Builder.CreateCmp(CmpInst::ICMP_NE, Left, StartVal, "rdx.select.cmp");		Builder.CreateCmp(CmpInst::ICMP_NE, Left, StartVal, "rdx.select.cmp");
return Builder.CreateSelect(Cmp, Left, Right, "rdx.select");		return Builder.CreateSelect(Cmp, Left, Right, "rdx.select");
}		}

		Value llvm::createFindLastIVOp(IRBuilderBase &Builder, Value Left,
		Value *Right) {
		return createMinMaxOp(Builder, RecurKind::SMax, Left, Right);
		}

Value llvm::createMinMaxOp(IRBuilderBase &Builder, RecurKind RK, Value Left,		Value llvm::createMinMaxOp(IRBuilderBase &Builder, RecurKind RK, Value Left,
Value *Right) {		Value *Right) {
Type *Ty = Left->getType();		Type *Ty = Left->getType();
if (Ty->isIntOrIntVectorTy() \|\|		if (Ty->isIntOrIntVectorTy() \|\|
(RK == RecurKind::FMinimum \|\| RK == RecurKind::FMaximum)) {		(RK == RecurKind::FMinimum \|\| RK == RecurKind::FMaximum)) {
// TODO: Add float minnum/maxnum support when FMF nnan is set.		// TODO: Add float minnum/maxnum support when FMF nnan is set.
Intrinsic::ID Id = getMinMaxReductionIntrinsicOp(RK);		Intrinsic::ID Id = getMinMaxReductionIntrinsicOp(RK);
return Builder.CreateIntrinsic(Ty, Id, {Left, Right}, nullptr,		return Builder.CreateIntrinsic(Ty, Id, {Left, Right}, nullptr,
▲ Show 20 Lines • Show All 106 Lines • ▼ Show 20 Lines	Value *llvm::createAnyOfTargetReduction(IRBuilderBase &Builder,
Value *Cmp =		Value *Cmp =
Builder.CreateCmp(CmpInst::ICMP_NE, Src, Right, "rdx.select.cmp");		Builder.CreateCmp(CmpInst::ICMP_NE, Src, Right, "rdx.select.cmp");

// If any predicate is true it means that we want to select the new value.		// If any predicate is true it means that we want to select the new value.
Cmp = Builder.CreateOrReduce(Cmp);		Cmp = Builder.CreateOrReduce(Cmp);
return Builder.CreateSelect(Cmp, NewVal, InitVal, "rdx.select");		return Builder.CreateSelect(Cmp, NewVal, InitVal, "rdx.select");
}		}

		Value *llvm::createFindLastIVTargetReduction(IRBuilderBase &Builder,
		const TargetTransformInfo *TTI,
		Value *Src,
		const RecurrenceDescriptor &Desc) {
		assert(RecurrenceDescriptor::isFindLastIVRecurrenceKind(
		Desc.getRecurrenceKind()) &&
		"Unexpected reduction kind");
		return Builder.CreateIntMaxReduce(Src, true);
		}

Value *llvm::createSimpleTargetReduction(IRBuilderBase &Builder,		Value *llvm::createSimpleTargetReduction(IRBuilderBase &Builder,
const TargetTransformInfo *TTI,		const TargetTransformInfo *TTI,
Value *Src, RecurKind RdxKind) {		Value *Src, RecurKind RdxKind) {
auto *SrcVecEltTy = cast<VectorType>(Src->getType())->getElementType();		auto *SrcVecEltTy = cast<VectorType>(Src->getType())->getElementType();
switch (RdxKind) {		switch (RdxKind) {
case RecurKind::Add:		case RecurKind::Add:
return Builder.CreateAddReduce(Src);		return Builder.CreateAddReduce(Src);
		fhahnUnsubmitted Done Reply Inline Actions What does this mean? The current patch only handles increasing inductions, so there's no mis-compile that needs fixing here( which the comment kind-of implies)? fhahn: What does this mean? The current patch only handles increasing inductions, so there's no mis…
		Mel-ChenAuthorUnsubmitted Done Reply Inline Actions My bad. This maybe the note I leave when I implemented this patch. Will Remove it. Mel-Chen: My bad. This maybe the note I leave when I implemented this patch. Will Remove it.
case RecurKind::Mul:		case RecurKind::Mul:
		artagnonUnsubmitted Not Done Reply Inline Actions Personally, I prefer straight-line codegen as I've done, but I'm probably biased. artagnon: Personally, I prefer straight-line codegen as I've done, but I'm probably biased.
		Mel-ChenAuthorUnsubmitted Done Reply Inline Actions I don't get it. Only one I need is creating a signed max reduce. Mel-Chen: I don't get it. Only one I need is creating a signed max reduce.
return Builder.CreateMulReduce(Src);		return Builder.CreateMulReduce(Src);
case RecurKind::And:		case RecurKind::And:
return Builder.CreateAndReduce(Src);		return Builder.CreateAndReduce(Src);
case RecurKind::Or:		case RecurKind::Or:
return Builder.CreateOrReduce(Src);		return Builder.CreateOrReduce(Src);
case RecurKind::Xor:		case RecurKind::Xor:
return Builder.CreateXorReduce(Src);		return Builder.CreateXorReduce(Src);
case RecurKind::FMulAdd:		case RecurKind::FMulAdd:
Show All 31 Lines	Value *llvm::createTargetReduction(IRBuilderBase &B,
// All ops in the reduction inherit fast-math-flags from the recurrence		// All ops in the reduction inherit fast-math-flags from the recurrence
// descriptor.		// descriptor.
IRBuilderBase::FastMathFlagGuard FMFGuard(B);		IRBuilderBase::FastMathFlagGuard FMFGuard(B);
B.setFastMathFlags(Desc.getFastMathFlags());		B.setFastMathFlags(Desc.getFastMathFlags());

RecurKind RK = Desc.getRecurrenceKind();		RecurKind RK = Desc.getRecurrenceKind();
if (RecurrenceDescriptor::isAnyOfRecurrenceKind(RK))		if (RecurrenceDescriptor::isAnyOfRecurrenceKind(RK))
return createAnyOfTargetReduction(B, TTI, Src, Desc, OrigPhi);		return createAnyOfTargetReduction(B, TTI, Src, Desc, OrigPhi);
		if (RecurrenceDescriptor::isFindLastIVRecurrenceKind(RK))
		return createFindLastIVTargetReduction(B, TTI, Src, Desc);

return createSimpleTargetReduction(B, TTI, Src, RK);		return createSimpleTargetReduction(B, TTI, Src, RK);
}		}

Value *llvm::createOrderedReduction(IRBuilderBase &B,		Value *llvm::createOrderedReduction(IRBuilderBase &B,
const RecurrenceDescriptor &Desc,		const RecurrenceDescriptor &Desc,
Value Src, Value Start) {		Value Src, Value Start) {
assert((Desc.getRecurrenceKind() == RecurKind::FAdd \|\|		assert((Desc.getRecurrenceKind() == RecurKind::FAdd \|\|
Desc.getRecurrenceKind() == RecurKind::FMulAdd) &&		Desc.getRecurrenceKind() == RecurKind::FMulAdd) &&
"Unexpected reduction kind");		"Unexpected reduction kind");
assert(Src->getType()->isVectorTy() && "Expected a vector type");		assert(Src->getType()->isVectorTy() && "Expected a vector type");
assert(!Start->getType()->isVectorTy() && "Expected a scalar type");		assert(!Start->getType()->isVectorTy() && "Expected a scalar type");

return B.CreateFAddReduce(Start, Src);		return B.CreateFAddReduce(Start, Src);
}		}

		Value *llvm::createSentinelValueHandling(IRBuilderBase &Builder,
		shiva0217Unsubmitted Not Done Reply Inline Actions It might be worth to have a comment to describe that the SelectIVICmp and SelectIVFCmp code generation will use Identity value to determine the if condition in the following case has ever been true. int r = 331; for (int i = 0; i < n; i++) if (src[i] > 111) r = i; When the reduction value(Rdx) equal to the Identity value(Iden), it reveals the condition never been true. So it will select the InitVal. It might be the reason that in IsIncreasingLoopInduction, the function will check the IV start value to avoid the IV overlapping the Identity value. shiva0217: It might be worth to have a comment to describe that the SelectIVICmp and SelectIVFCmp code…
		Mel-ChenAuthorUnsubmitted Done Reply Inline Actions Sure, will do. Mel-Chen: Sure, will do.
		const TargetTransformInfo *TTI,
		const RecurrenceDescriptor &Desc,
		Value *Rdx) {
		Value *InitVal = Desc.getRecurrenceStartValue();
		Value *Iden = Desc.getRecurrenceIdentity(
		Desc.getRecurrenceKind(), Rdx->getType(), Desc.getFastMathFlags());
		Value *Cmp = Builder.CreateCmp(CmpInst::ICMP_NE, Rdx, Iden, "rdx.select.cmp");
		return Builder.CreateSelect(Cmp, Rdx, InitVal, "rdx.select");
		}

void llvm::propagateIRFlags(Value I, ArrayRef<Value > VL, Value *OpValue,		void llvm::propagateIRFlags(Value I, ArrayRef<Value > VL, Value *OpValue,
bool IncludeWrapFlags) {		bool IncludeWrapFlags) {
auto *VecOp = dyn_cast<Instruction>(I);		auto *VecOp = dyn_cast<Instruction>(I);
if (!VecOp)		if (!VecOp)
return;		return;
auto *Intersection = (OpValue == nullptr) ? dyn_cast<Instruction>(VL[0])		auto *Intersection = (OpValue == nullptr) ? dyn_cast<Instruction>(VL[0])
: dyn_cast<Instruction>(OpValue);		: dyn_cast<Instruction>(OpValue);
if (!Intersection)		if (!Intersection)
▲ Show 20 Lines • Show All 784 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,856 Lines • ▼ Show 20 Lines	else {
for (unsigned Part = 1; Part < UF; ++Part) {		for (unsigned Part = 1; Part < UF; ++Part) {
Value *RdxPart = State.get(LoopExitInstDef, Part);		Value *RdxPart = State.get(LoopExitInstDef, Part);
if (Op != Instruction::ICmp && Op != Instruction::FCmp)		if (Op != Instruction::ICmp && Op != Instruction::FCmp)
ReducedPartRdx = Builder.CreateBinOp(		ReducedPartRdx = Builder.CreateBinOp(
(Instruction::BinaryOps)Op, RdxPart, ReducedPartRdx, "bin.rdx");		(Instruction::BinaryOps)Op, RdxPart, ReducedPartRdx, "bin.rdx");
else if (RecurrenceDescriptor::isAnyOfRecurrenceKind(RK))		else if (RecurrenceDescriptor::isAnyOfRecurrenceKind(RK))
ReducedPartRdx = createAnyOfOp(Builder, ReductionStartValue, RK,		ReducedPartRdx = createAnyOfOp(Builder, ReductionStartValue, RK,
ReducedPartRdx, RdxPart);		ReducedPartRdx, RdxPart);
		else if (RecurrenceDescriptor::isFindLastIVRecurrenceKind(RK))
		ReducedPartRdx = createFindLastIVOp(Builder, ReducedPartRdx, RdxPart);
else		else
ReducedPartRdx = createMinMaxOp(Builder, RK, ReducedPartRdx, RdxPart);		ReducedPartRdx = createMinMaxOp(Builder, RK, ReducedPartRdx, RdxPart);
}		}
}		}

// Create the reduction after the loop. Note that inloop reductions create the		// Create the reduction after the loop. Note that inloop reductions create the
// target reduction in the loop using a Reduction recipe.		// target reduction in the loop using a Reduction recipe.
if (VF.isVector() && !PhiR->isInLoop()) {		if (VF.isVector() && !PhiR->isInLoop()) {
ReducedPartRdx =		ReducedPartRdx =
createTargetReduction(Builder, TTI, RdxDesc, ReducedPartRdx, OrigPhi);		createTargetReduction(Builder, TTI, RdxDesc, ReducedPartRdx, OrigPhi);
// If the reduction can be performed in a smaller type, we need to extend		// If the reduction can be performed in a smaller type, we need to extend
// the reduction to the wider type before we branch to the original loop.		// the reduction to the wider type before we branch to the original loop.
if (PhiTy != RdxDesc.getRecurrenceType())		if (PhiTy != RdxDesc.getRecurrenceType())
ReducedPartRdx = RdxDesc.isSigned()		ReducedPartRdx = RdxDesc.isSigned()
? Builder.CreateSExt(ReducedPartRdx, PhiTy)		? Builder.CreateSExt(ReducedPartRdx, PhiTy)
: Builder.CreateZExt(ReducedPartRdx, PhiTy);		: Builder.CreateZExt(ReducedPartRdx, PhiTy);
}		}

		if (RecurrenceDescriptor::isFindLastIVRecurrenceKind(RK))
		ReducedPartRdx =
		createSentinelValueHandling(Builder, TTI, RdxDesc, ReducedPartRdx);
		shiva0217Unsubmitted Not Done Reply Inline Actions Could the function rename to createSelectIVCmpTargetReduction and be called from createSelectCmpTargetReduction? Perhaps CreateIntMaxReduce can be moved to the function? shiva0217: Could the function rename to createSelectIVCmpTargetReduction and be called from…
		Mel-ChenAuthorUnsubmitted Done Reply Inline Actions I'm afraid I won't be able to meet this requirement. Placing `createSentinelValueHandling` in this position is for handling the case when the vector width is 1. You could refer to CHECK-VF1IC4 in the test cases and focus on the `middle.block`. In implementation, VF1IC4 doesn't call `createTargetReduction`, but `ReducedPartRdx` still need to be did the sentinel value fixing. However, perhaps we can create a new bool function for `RK == RecurKind::SelectIVICmp \|\| RK == RecurKind::SelectIVFCmp`. This will most likely expand further and cause the if-condition to become too long. Mel-Chen: I'm afraid I won't be able to meet this requirement. Placing `createSentinelValueHandling` in…
		shiva0217Unsubmitted Not Done Reply Inline Actions Thanks for the explanation! Could we use "} else if ((!VF.isVector() && !PhiR->isInLoop()))" to guard the generation? It could be easier to understand the codegen is needed when VF is not a vector and createTargetReduction won't be invoked. Should we rename createSentinelValueHandling as createSelectInitValOrReduction? I feel it could reflect the codegen but in a less strong opinion. shiva0217: Thanks for the explanation! Could we use "} else if ((!VF.isVector() && !PhiR->isInLoop()))"…
		shiva0217Unsubmitted Done Reply Inline Actions Oops, I think I mix the patch with some local changes. Please ignore the comment. shiva0217: Oops, I think I mix the patch with some local changes. Please ignore the comment.
		Mel-ChenAuthorUnsubmitted Done Reply Inline Actions No problem. And I've just rebased this patch. Please continue with the review. Thank you. Mel-Chen: No problem. And I've just rebased this patch. Please continue with the review. Thank you.

PHINode *ResumePhi =		PHINode *ResumePhi =
dyn_cast<PHINode>(PhiR->getStartValue()->getUnderlyingValue());		dyn_cast<PHINode>(PhiR->getStartValue()->getUnderlyingValue());

// Create a phi node that merges control-flow from the backedge-taken check		// Create a phi node that merges control-flow from the backedge-taken check
// block and the middle block.		// block and the middle block.
PHINode *BCBlockPhi = PHINode::Create(PhiTy, 2, "bc.merge.rdx",		PHINode *BCBlockPhi = PHINode::Create(PhiTy, 2, "bc.merge.rdx",
LoopScalarPreHeader->getTerminator());		LoopScalarPreHeader->getTerminator());

▲ Show 20 Lines • Show All 1,886 Lines • ▼ Show 20 Lines	if (!ScalarInterleavingRequiresRuntimePointerCheck &&
// There is little point in interleaving for reductions containing selects		// There is little point in interleaving for reductions containing selects
// and compares when VF=1 since it may just create more overhead than it's		// and compares when VF=1 since it may just create more overhead than it's
// worth for loops with small trip counts. This is because we still have to		// worth for loops with small trip counts. This is because we still have to
// do the final reduction after the loop.		// do the final reduction after the loop.
bool HasSelectCmpReductions =		bool HasSelectCmpReductions =
HasReductions &&		HasReductions &&
any_of(Legal->getReductionVars(), [&](auto &Reduction) -> bool {		any_of(Legal->getReductionVars(), [&](auto &Reduction) -> bool {
const RecurrenceDescriptor &RdxDesc = Reduction.second;		const RecurrenceDescriptor &RdxDesc = Reduction.second;
return RecurrenceDescriptor::isAnyOfRecurrenceKind(		RecurKind RK = RdxDesc.getRecurrenceKind();
RdxDesc.getRecurrenceKind());		return RecurrenceDescriptor::isAnyOfRecurrenceKind(RK) \|\|
		RecurrenceDescriptor::isFindLastIVRecurrenceKind(RK);
});		});
if (HasSelectCmpReductions) {		if (HasSelectCmpReductions) {
LLVM_DEBUG(dbgs() << "LV: Not interleaving select-cmp reductions.\n");		LLVM_DEBUG(dbgs() << "LV: Not interleaving select-cmp reductions.\n");
return 1;		return 1;
}		}

// If we have a scalar reduction (vector reductions are already dealt with		// If we have a scalar reduction (vector reductions are already dealt with
// by this point), we can increase the critical path length if the loop		// by this point), we can increase the critical path length if the loop
▲ Show 20 Lines • Show All 3,184 Lines • ▼ Show 20 Lines	for (VPRecipeBase &R :
if (!PhiR \|\| !PhiR->isInLoop() \|\| (MinVF.isScalar() && !PhiR->isOrdered()))		if (!PhiR \|\| !PhiR->isInLoop() \|\| (MinVF.isScalar() && !PhiR->isOrdered()))
continue;		continue;
InLoopReductionPhis.push_back(PhiR);		InLoopReductionPhis.push_back(PhiR);
}		}

for (VPReductionPHIRecipe *PhiR : InLoopReductionPhis) {		for (VPReductionPHIRecipe *PhiR : InLoopReductionPhis) {
const RecurrenceDescriptor &RdxDesc = PhiR->getRecurrenceDescriptor();		const RecurrenceDescriptor &RdxDesc = PhiR->getRecurrenceDescriptor();
RecurKind Kind = RdxDesc.getRecurrenceKind();		RecurKind Kind = RdxDesc.getRecurrenceKind();
assert(!RecurrenceDescriptor::isAnyOfRecurrenceKind(Kind) &&		assert(
"AnyOf reductions are not allowed for in-loop reductions");		(!RecurrenceDescriptor::isAnyOfRecurrenceKind(Kind) &&
		!RecurrenceDescriptor::isFindLastIVRecurrenceKind(Kind)) &&
		"AnyOf and FindLast reductions are not allowed for in-loop reductions");

// Collect the chain of "link" recipes for the reduction starting at PhiR.		// Collect the chain of "link" recipes for the reduction starting at PhiR.
SetVector<VPRecipeBase *> Worklist;		SetVector<VPRecipeBase *> Worklist;
Worklist.insert(PhiR);		Worklist.insert(PhiR);
for (unsigned I = 0; I != Worklist.size(); ++I) {		for (unsigned I = 0; I != Worklist.size(); ++I) {
VPRecipeBase *Cur = Worklist[I];		VPRecipeBase *Cur = Worklist[I];
for (VPUser *U : Cur->getVPSingleValue()->users()) {		for (VPUser *U : Cur->getVPSingleValue()->users()) {
auto *UserRecipe = dyn_cast<VPRecipeBase>(U);		auto *UserRecipe = dyn_cast<VPRecipeBase>(U);
▲ Show 20 Lines • Show All 1,342 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 14,238 Lines • ▼ Show 20 Lines	Value emitScaleForReusedOps(Value VectorizedValue, IRBuilderBase &Builder,
case RecurKind::FMinimum:		case RecurKind::FMinimum:
// res = vv		// res = vv
return VectorizedValue;		return VectorizedValue;
case RecurKind::Mul:		case RecurKind::Mul:
case RecurKind::FMul:		case RecurKind::FMul:
case RecurKind::FMulAdd:		case RecurKind::FMulAdd:
case RecurKind::IAnyOf:		case RecurKind::IAnyOf:
case RecurKind::FAnyOf:		case RecurKind::FAnyOf:
		case RecurKind::IFindLastIV:
		case RecurKind::FFindLastIV:
case RecurKind::None:		case RecurKind::None:
llvm_unreachable("Unexpected reduction kind for repeated scalar.");		llvm_unreachable("Unexpected reduction kind for repeated scalar.");
}		}
return nullptr;		return nullptr;
}		}

/// Emits actual operation for the scalar identity values, found during		/// Emits actual operation for the scalar identity values, found during
/// horizontal reduction analysis.		/// horizontal reduction analysis.
▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines	case RecurKind::FAdd: {
auto *Scale = ConstantVector::get(Vals);		auto *Scale = ConstantVector::get(Vals);
return Builder.CreateFMul(VectorizedValue, Scale);		return Builder.CreateFMul(VectorizedValue, Scale);
}		}
case RecurKind::Mul:		case RecurKind::Mul:
case RecurKind::FMul:		case RecurKind::FMul:
case RecurKind::FMulAdd:		case RecurKind::FMulAdd:
case RecurKind::IAnyOf:		case RecurKind::IAnyOf:
case RecurKind::FAnyOf:		case RecurKind::FAnyOf:
		case RecurKind::IFindLastIV:
		case RecurKind::FFindLastIV:
case RecurKind::None:		case RecurKind::None:
llvm_unreachable("Unexpected reduction kind for reused scalars.");		llvm_unreachable("Unexpected reduction kind for reused scalars.");
}		}
return nullptr;		return nullptr;
}		}
};		};
} // end anonymous namespace		} // end anonymous namespace

▲ Show 20 Lines • Show All 1,071 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

Show First 20 Lines • Show All 1,584 Lines • ▼ Show 20 Lines	if (RecurrenceDescriptor::isMinMaxRecurrenceKind(RK) \|\|
if (ScalarPHI) {		if (ScalarPHI) {
Iden = StartV;		Iden = StartV;
} else {		} else {
IRBuilderBase::InsertPointGuard IPBuilder(Builder);		IRBuilderBase::InsertPointGuard IPBuilder(Builder);
Builder.SetInsertPoint(VectorPH->getTerminator());		Builder.SetInsertPoint(VectorPH->getTerminator());
StartV = Iden =		StartV = Iden =
Builder.CreateVectorSplat(State.VF, StartV, "minmax.ident");		Builder.CreateVectorSplat(State.VF, StartV, "minmax.ident");
}		}
		} else if (RecurrenceDescriptor::isFindLastIVRecurrenceKind(RK)) {
		shiva0217Unsubmitted Not Done Reply Inline Actions Perhaps a comment to describe that SelectIVICmp and SelectIVFCmp will initial the reduction PHI with Iden and createSentinelValueHandling will use Iden to determine the if condition in the loop has ever been true? shiva0217: Perhaps a comment to describe that SelectIVICmp and SelectIVFCmp will initial the reduction PHI…
		Mel-ChenAuthorUnsubmitted Done Reply Inline Actions Sure, will do. Mel-Chen: Sure, will do.
		// [I\|F]FindLastIV will use a sentinel value as the identity to initialize
		// the reduction phi. In the middle block, createSentinelValueHandling will
		// generate checks to verify if the reduction result is the sentinel value.
		// If the result is the sentinel value, it will be corrected back to the
		// start value.
		// TODO: The sentinel value is not always necessary. When the start value is
		// a constant, and smaller than the start value of the induction variable,
		// the start value can be directly used to initialize the reduction phi.
		StartV = Iden = RdxDesc.getRecurrenceIdentity(RK, VecTy->getScalarType(),
		RdxDesc.getFastMathFlags());
		if (!ScalarPHI) {
		IRBuilderBase::InsertPointGuard IPBuilder(Builder);
		Builder.SetInsertPoint(VectorPH->getTerminator());
		StartV = Iden = Builder.CreateVectorSplat(State.VF, Iden);
		}
} else {		} else {
Iden = RdxDesc.getRecurrenceIdentity(RK, VecTy->getScalarType(),		Iden = RdxDesc.getRecurrenceIdentity(RK, VecTy->getScalarType(),
RdxDesc.getFastMathFlags());		RdxDesc.getFastMathFlags());

if (!ScalarPHI) {		if (!ScalarPHI) {
Iden = Builder.CreateVectorSplat(State.VF, Iden);		Iden = Builder.CreateVectorSplat(State.VF, Iden);
IRBuilderBase::InsertPointGuard IPBuilder(Builder);		IRBuilderBase::InsertPointGuard IPBuilder(Builder);
Builder.SetInsertPoint(VectorPH->getTerminator());		Builder.SetInsertPoint(VectorPH->getTerminator());
▲ Show 20 Lines • Show All 96 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/iv-select-cmp-no-wrap.ll

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2
				fhahnUnsubmitted Done Reply Inline Actions Could you submit the tests separately? fhahn: Could you submit the tests separately?
				Mel-ChenAuthorUnsubmitted Done Reply Inline Actions Sure. Will do. Mel-Chen: Sure. Will do.
	; RUN: opt -passes=loop-vectorize -force-vector-interleave=1 -force-vector-width=4 -S < %s \| FileCheck %s --check-prefix=CHECK			; RUN: opt -passes=loop-vectorize -force-vector-interleave=1 -force-vector-width=4 -S < %s \| FileCheck %s --check-prefix=CHECK

	define i64 @select_icmp_nuw_nsw(ptr nocapture readonly %a, ptr nocapture readonly %b, i64 %ii, i64 %n) {			define i64 @select_icmp_nuw_nsw(ptr nocapture readonly %a, ptr nocapture readonly %b, i64 %ii, i64 %n) {
	; CHECK-LABEL: define i64 @select_icmp_nuw_nsw			; CHECK-LABEL: define i64 @select_icmp_nuw_nsw
	; CHECK-NOT: vector.body:			; CHECK-SAME: (ptr nocapture readonly [[A:%.]], ptr nocapture readonly [[B:%.]], i64 [[II:%.]], i64 [[N:%.]]) {
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[N]], 4
				; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK: vector.ph:
				; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 4
				; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP6:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
				; CHECK-NEXT: [[TMP1:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP0]]
				; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds i64, ptr [[TMP1]], i32 0
				; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i64>, ptr [[TMP2]], align 8
				; CHECK-NEXT: [[TMP3:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[TMP0]]
				; CHECK-NEXT: [[TMP4:%.*]] = getelementptr inbounds i64, ptr [[TMP3]], i32 0
				; CHECK-NEXT: [[WIDE_LOAD1:%.*]] = load <4 x i64>, ptr [[TMP4]], align 8
				; CHECK-NEXT: [[TMP5:%.*]] = icmp sgt <4 x i64> [[WIDE_LOAD]], [[WIDE_LOAD1]]
				; CHECK-NEXT: [[TMP6]] = select <4 x i1> [[TMP5]], <4 x i64> [[VEC_IND]], <4 x i64> [[VEC_PHI]]
				; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
				; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
				; CHECK-NEXT: [[TMP7:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
				; CHECK: middle.block:
				; CHECK-NEXT: [[TMP8:%.*]] = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> [[TMP6]])
				; CHECK-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[TMP8]], -9223372036854775808
				; CHECK-NEXT: [[RDX_SELECT:%.*]] = select i1 [[RDX_SELECT_CMP]], i64 [[TMP8]], i64 [[II]]
				; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ [[II]], [[ENTRY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[INC:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
				; CHECK-NEXT: [[RDX:%.]] = phi i64 [ [[COND:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]
				; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]
				; CHECK-NEXT: [[TMP9:%.*]] = load i64, ptr [[ARRAYIDX]], align 8
				; CHECK-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[IV]]
				; CHECK-NEXT: [[TMP10:%.*]] = load i64, ptr [[ARRAYIDX1]], align 8
				; CHECK-NEXT: [[CMP2:%.*]] = icmp sgt i64 [[TMP9]], [[TMP10]]
				; CHECK-NEXT: [[COND]] = select i1 [[CMP2]], i64 [[IV]], i64 [[RDX]]
				; CHECK-NEXT: [[INC]] = add nuw nsw i64 [[IV]], 1
				; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC]], [[N]]
				; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
				; CHECK: exit:
				; CHECK-NEXT: [[COND_LCSSA:%.*]] = phi i64 [ [[COND]], [[FOR_BODY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
				; CHECK-NEXT: ret i64 [[COND_LCSSA]]
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %entry, %for.body			for.body: ; preds = %entry, %for.body
	%iv = phi i64 [ %inc, %for.body ], [ 0, %entry ]			%iv = phi i64 [ %inc, %for.body ], [ 0, %entry ]
	%rdx = phi i64 [ %cond, %for.body ], [ %ii, %entry ]			%rdx = phi i64 [ %cond, %for.body ], [ %ii, %entry ]
	%arrayidx = getelementptr inbounds i64, ptr %a, i64 %iv			%arrayidx = getelementptr inbounds i64, ptr %a, i64 %iv
	%0 = load i64, ptr %arrayidx, align 8			%0 = load i64, ptr %arrayidx, align 8
	%arrayidx1 = getelementptr inbounds i64, ptr %b, i64 %iv			%arrayidx1 = getelementptr inbounds i64, ptr %b, i64 %iv
	%1 = load i64, ptr %arrayidx1, align 8			%1 = load i64, ptr %arrayidx1, align 8
	%cmp2 = icmp sgt i64 %0, %1			%cmp2 = icmp sgt i64 %0, %1
	%cond = select i1 %cmp2, i64 %iv, i64 %rdx			%cond = select i1 %cmp2, i64 %iv, i64 %rdx
	%inc = add nuw nsw i64 %iv, 1			%inc = add nuw nsw i64 %iv, 1
	%exitcond.not = icmp eq i64 %inc, %n			%exitcond.not = icmp eq i64 %inc, %n
	br i1 %exitcond.not, label %exit, label %for.body			br i1 %exitcond.not, label %exit, label %for.body

	exit: ; preds = %for.body			exit: ; preds = %for.body
	ret i64 %cond			ret i64 %cond
	}			}

	define i64 @select_icmp_nsw(ptr nocapture readonly %a, ptr nocapture readonly %b, i64 %ii, i64 %n) {			define i64 @select_icmp_nsw(ptr nocapture readonly %a, ptr nocapture readonly %b, i64 %ii, i64 %n) {
	; CHECK-LABEL: define i64 @select_icmp_nsw			; CHECK-LABEL: define i64 @select_icmp_nsw
	; CHECK-NOT: vector.body:			; CHECK-SAME: (ptr nocapture readonly [[A:%.]], ptr nocapture readonly [[B:%.]], i64 [[II:%.]], i64 [[N:%.]]) {
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[N]], 4
				; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK: vector.ph:
				; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 4
				; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP6:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
				; CHECK-NEXT: [[TMP1:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP0]]
				; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds i64, ptr [[TMP1]], i32 0
				; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i64>, ptr [[TMP2]], align 8
				; CHECK-NEXT: [[TMP3:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[TMP0]]
				; CHECK-NEXT: [[TMP4:%.*]] = getelementptr inbounds i64, ptr [[TMP3]], i32 0
				; CHECK-NEXT: [[WIDE_LOAD1:%.*]] = load <4 x i64>, ptr [[TMP4]], align 8
				; CHECK-NEXT: [[TMP5:%.*]] = icmp sgt <4 x i64> [[WIDE_LOAD]], [[WIDE_LOAD1]]
				; CHECK-NEXT: [[TMP6]] = select <4 x i1> [[TMP5]], <4 x i64> [[VEC_IND]], <4 x i64> [[VEC_PHI]]
				; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
				; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
				; CHECK-NEXT: [[TMP7:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
				; CHECK: middle.block:
				; CHECK-NEXT: [[TMP8:%.*]] = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> [[TMP6]])
				; CHECK-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[TMP8]], -9223372036854775808
				; CHECK-NEXT: [[RDX_SELECT:%.*]] = select i1 [[RDX_SELECT_CMP]], i64 [[TMP8]], i64 [[II]]
				; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ [[II]], [[ENTRY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[INC:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
				; CHECK-NEXT: [[RDX:%.]] = phi i64 [ [[COND:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]
				; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]
				; CHECK-NEXT: [[TMP9:%.*]] = load i64, ptr [[ARRAYIDX]], align 8
				; CHECK-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[IV]]
				; CHECK-NEXT: [[TMP10:%.*]] = load i64, ptr [[ARRAYIDX1]], align 8
				; CHECK-NEXT: [[CMP2:%.*]] = icmp sgt i64 [[TMP9]], [[TMP10]]
				; CHECK-NEXT: [[COND]] = select i1 [[CMP2]], i64 [[IV]], i64 [[RDX]]
				; CHECK-NEXT: [[INC]] = add nsw i64 [[IV]], 1
				; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC]], [[N]]
				; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
				; CHECK: exit:
				; CHECK-NEXT: [[COND_LCSSA:%.*]] = phi i64 [ [[COND]], [[FOR_BODY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
				; CHECK-NEXT: ret i64 [[COND_LCSSA]]
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %entry, %for.body			for.body: ; preds = %entry, %for.body
	%iv = phi i64 [ %inc, %for.body ], [ 0, %entry ]			%iv = phi i64 [ %inc, %for.body ], [ 0, %entry ]
	%rdx = phi i64 [ %cond, %for.body ], [ %ii, %entry ]			%rdx = phi i64 [ %cond, %for.body ], [ %ii, %entry ]
	%arrayidx = getelementptr inbounds i64, ptr %a, i64 %iv			%arrayidx = getelementptr inbounds i64, ptr %a, i64 %iv
	%0 = load i64, ptr %arrayidx, align 8			%0 = load i64, ptr %arrayidx, align 8
	%arrayidx1 = getelementptr inbounds i64, ptr %b, i64 %iv			%arrayidx1 = getelementptr inbounds i64, ptr %b, i64 %iv
	%1 = load i64, ptr %arrayidx1, align 8			%1 = load i64, ptr %arrayidx1, align 8
	%cmp2 = icmp sgt i64 %0, %1			%cmp2 = icmp sgt i64 %0, %1
	%cond = select i1 %cmp2, i64 %iv, i64 %rdx			%cond = select i1 %cmp2, i64 %iv, i64 %rdx
	%inc = add nsw i64 %iv, 1			%inc = add nsw i64 %iv, 1
	%exitcond.not = icmp eq i64 %inc, %n			%exitcond.not = icmp eq i64 %inc, %n
	br i1 %exitcond.not, label %exit, label %for.body			br i1 %exitcond.not, label %exit, label %for.body

	exit: ; preds = %for.body			exit: ; preds = %for.body
	ret i64 %cond			ret i64 %cond
	}			}

	define i64 @select_icmp_nuw(ptr nocapture readonly %a, ptr nocapture readonly %b, i64 %ii, i64 %n) {			define i64 @select_icmp_nuw(ptr nocapture readonly %a, ptr nocapture readonly %b, i64 %ii, i64 %n) {
	; CHECK-LABEL: define i64 @select_icmp_nuw			; CHECK-LABEL: define i64 @select_icmp_nuw
	; CHECK-NOT: vector.body:			; CHECK-SAME: (ptr nocapture readonly [[A:%.]], ptr nocapture readonly [[B:%.]], i64 [[II:%.]], i64 [[N:%.]]) {
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[INC:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]
				; CHECK-NEXT: [[RDX:%.]] = phi i64 [ [[COND:%.]], [[FOR_BODY]] ], [ [[II]], [[ENTRY]] ]
				; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]
				; CHECK-NEXT: [[TMP0:%.*]] = load i64, ptr [[ARRAYIDX]], align 8
				; CHECK-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[IV]]
				; CHECK-NEXT: [[TMP1:%.*]] = load i64, ptr [[ARRAYIDX1]], align 8
				; CHECK-NEXT: [[CMP2:%.*]] = icmp sgt i64 [[TMP0]], [[TMP1]]
				; CHECK-NEXT: [[COND]] = select i1 [[CMP2]], i64 [[IV]], i64 [[RDX]]
				; CHECK-NEXT: [[INC]] = add nuw i64 [[IV]], 1
				; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC]], [[N]]
				; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT:%.*]], label [[FOR_BODY]]
				; CHECK: exit:
				; CHECK-NEXT: [[COND_LCSSA:%.*]] = phi i64 [ [[COND]], [[FOR_BODY]] ]
				; CHECK-NEXT: ret i64 [[COND_LCSSA]]
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %entry, %for.body			for.body: ; preds = %entry, %for.body
	%iv = phi i64 [ %inc, %for.body ], [ 0, %entry ]			%iv = phi i64 [ %inc, %for.body ], [ 0, %entry ]
	%rdx = phi i64 [ %cond, %for.body ], [ %ii, %entry ]			%rdx = phi i64 [ %cond, %for.body ], [ %ii, %entry ]
	%arrayidx = getelementptr inbounds i64, ptr %a, i64 %iv			%arrayidx = getelementptr inbounds i64, ptr %a, i64 %iv
	%0 = load i64, ptr %arrayidx, align 8			%0 = load i64, ptr %arrayidx, align 8
	%arrayidx1 = getelementptr inbounds i64, ptr %b, i64 %iv			%arrayidx1 = getelementptr inbounds i64, ptr %b, i64 %iv
	%1 = load i64, ptr %arrayidx1, align 8			%1 = load i64, ptr %arrayidx1, align 8
	%cmp2 = icmp sgt i64 %0, %1			%cmp2 = icmp sgt i64 %0, %1
	%cond = select i1 %cmp2, i64 %iv, i64 %rdx			%cond = select i1 %cmp2, i64 %iv, i64 %rdx
	%inc = add nuw i64 %iv, 1			%inc = add nuw i64 %iv, 1
	%exitcond.not = icmp eq i64 %inc, %n			%exitcond.not = icmp eq i64 %inc, %n
	br i1 %exitcond.not, label %exit, label %for.body			br i1 %exitcond.not, label %exit, label %for.body

	exit: ; preds = %for.body			exit: ; preds = %for.body
	ret i64 %cond			ret i64 %cond
	}			}

	define i64 @select_icmp_noflag(ptr nocapture readonly %a, ptr nocapture readonly %b, i64 %ii, i64 %n) {			define i64 @select_icmp_noflag(ptr nocapture readonly %a, ptr nocapture readonly %b, i64 %ii, i64 %n) {
	; CHECK-LABEL: define i64 @select_icmp_noflag			; CHECK-LABEL: define i64 @select_icmp_noflag
	; CHECK-NOT: vector.body:			; CHECK-SAME: (ptr nocapture readonly [[A:%.]], ptr nocapture readonly [[B:%.]], i64 [[II:%.]], i64 [[N:%.]]) {
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[INC:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]
				; CHECK-NEXT: [[RDX:%.]] = phi i64 [ [[COND:%.]], [[FOR_BODY]] ], [ [[II]], [[ENTRY]] ]
				; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]
				; CHECK-NEXT: [[TMP0:%.*]] = load i64, ptr [[ARRAYIDX]], align 8
				; CHECK-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[IV]]
				; CHECK-NEXT: [[TMP1:%.*]] = load i64, ptr [[ARRAYIDX1]], align 8
				; CHECK-NEXT: [[CMP2:%.*]] = icmp sgt i64 [[TMP0]], [[TMP1]]
				; CHECK-NEXT: [[COND]] = select i1 [[CMP2]], i64 [[IV]], i64 [[RDX]]
				; CHECK-NEXT: [[INC]] = add i64 [[IV]], 1
				; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC]], [[N]]
				; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT:%.*]], label [[FOR_BODY]]
				; CHECK: exit:
				; CHECK-NEXT: [[COND_LCSSA:%.*]] = phi i64 [ [[COND]], [[FOR_BODY]] ]
				; CHECK-NEXT: ret i64 [[COND_LCSSA]]
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %entry, %for.body			for.body: ; preds = %entry, %for.body
	%iv = phi i64 [ %inc, %for.body ], [ 0, %entry ]			%iv = phi i64 [ %inc, %for.body ], [ 0, %entry ]
	%rdx = phi i64 [ %cond, %for.body ], [ %ii, %entry ]			%rdx = phi i64 [ %cond, %for.body ], [ %ii, %entry ]
	%arrayidx = getelementptr inbounds i64, ptr %a, i64 %iv			%arrayidx = getelementptr inbounds i64, ptr %a, i64 %iv
	Show All 12 Lines

llvm/test/Transforms/LoopVectorize/iv-select-cmp.ll

; RUN: opt -passes=loop-vectorize -force-vector-interleave=1 -force-vector-width=4 -S < %s \| FileCheck %s --check-prefix=CHECK		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2
; RUN: opt -passes=loop-vectorize -force-vector-interleave=4 -force-vector-width=4 -S < %s \| FileCheck %s --check-prefix=CHECK		; RUN: opt -passes=loop-vectorize -force-vector-interleave=1 -force-vector-width=4 -S < %s \| FileCheck %s --check-prefix=CHECK-VF4IC1 --check-prefix=CHECK
; RUN: opt -passes=loop-vectorize -force-vector-interleave=4 -force-vector-width=1 -S < %s \| FileCheck %s --check-prefix=CHECK		; RUN: opt -passes=loop-vectorize -force-vector-interleave=4 -force-vector-width=4 -S < %s \| FileCheck %s --check-prefix=CHECK-VF4IC4 --check-prefix=CHECK
		; RUN: opt -passes=loop-vectorize -force-vector-interleave=4 -force-vector-width=1 -S < %s \| FileCheck %s --check-prefix=CHECK-VF1IC4 --check-prefix=CHECK

define i64 @select_icmp_const_1(ptr nocapture readonly %a, i64 %n) {		define i64 @select_icmp_const_1(ptr nocapture readonly %a, i64 %n) {
; CHECK-LABEL: define i64 @select_icmp_const_1		; CHECK-VF4IC1-LABEL: define i64 @select_icmp_const_1
; CHECK-NOT: vector.body:		; CHECK-VF4IC1-SAME: (ptr nocapture readonly [[A:%.]], i64 [[N:%.]]) {
		; CHECK-VF4IC1-NEXT: entry:
		; CHECK-VF4IC1-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[N]], 4
		; CHECK-VF4IC1-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK-VF4IC1: vector.ph:
		; CHECK-VF4IC1-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 4
		; CHECK-VF4IC1-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
		; CHECK-VF4IC1-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK-VF4IC1: vector.body:
		; CHECK-VF4IC1-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[VEC_PHI:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP4:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
		; CHECK-VF4IC1-NEXT: [[TMP1:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP0]]
		; CHECK-VF4IC1-NEXT: [[TMP2:%.*]] = getelementptr inbounds i64, ptr [[TMP1]], i32 0
		; CHECK-VF4IC1-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i64>, ptr [[TMP2]], align 8
		; CHECK-VF4IC1-NEXT: [[TMP3:%.*]] = icmp eq <4 x i64> [[WIDE_LOAD]], <i64 3, i64 3, i64 3, i64 3>
		; CHECK-VF4IC1-NEXT: [[TMP4]] = select <4 x i1> [[TMP3]], <4 x i64> [[VEC_IND]], <4 x i64> [[VEC_PHI]]
		; CHECK-VF4IC1-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
		; CHECK-VF4IC1-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC1-NEXT: [[TMP5:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
		; CHECK-VF4IC1-NEXT: br i1 [[TMP5]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
		; CHECK-VF4IC1: middle.block:
		; CHECK-VF4IC1-NEXT: [[TMP6:%.*]] = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> [[TMP4]])
		; CHECK-VF4IC1-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[TMP6]], -9223372036854775808
		; CHECK-VF4IC1-NEXT: [[RDX_SELECT:%.*]] = select i1 [[RDX_SELECT_CMP]], i64 [[TMP6]], i64 3
		; CHECK-VF4IC1-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
		; CHECK-VF4IC1-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
		; CHECK-VF4IC1: scalar.ph:
		; CHECK-VF4IC1-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
		; CHECK-VF4IC1-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ 3, [[ENTRY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC1-NEXT: br label [[FOR_BODY:%.*]]
		; CHECK-VF4IC1: for.body:
		; CHECK-VF4IC1-NEXT: [[IV:%.]] = phi i64 [ [[INC:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
		; CHECK-VF4IC1-NEXT: [[RDX:%.]] = phi i64 [ [[COND:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]
		; CHECK-VF4IC1-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]
		; CHECK-VF4IC1-NEXT: [[TMP7:%.*]] = load i64, ptr [[ARRAYIDX]], align 8
		; CHECK-VF4IC1-NEXT: [[CMP2:%.*]] = icmp eq i64 [[TMP7]], 3
		; CHECK-VF4IC1-NEXT: [[COND]] = select i1 [[CMP2]], i64 [[IV]], i64 [[RDX]]
		; CHECK-VF4IC1-NEXT: [[INC]] = add nuw nsw i64 [[IV]], 1
		; CHECK-VF4IC1-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC]], [[N]]
		; CHECK-VF4IC1-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
		; CHECK-VF4IC1: exit:
		; CHECK-VF4IC1-NEXT: [[COND_LCSSA:%.*]] = phi i64 [ [[COND]], [[FOR_BODY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC1-NEXT: ret i64 [[COND_LCSSA]]
		;
		; CHECK-VF4IC4-LABEL: define i64 @select_icmp_const_1
		; CHECK-VF4IC4-SAME: (ptr nocapture readonly [[A:%.]], i64 [[N:%.]]) {
		; CHECK-VF4IC4-NEXT: entry:
		; CHECK-VF4IC4-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[N]], 16
		; CHECK-VF4IC4-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK-VF4IC4: vector.ph:
		; CHECK-VF4IC4-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 16
		; CHECK-VF4IC4-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
		; CHECK-VF4IC4-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK-VF4IC4: vector.body:
		; CHECK-VF4IC4-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP16:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI4:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP17:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI5:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP18:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI6:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP19:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[STEP_ADD:%.*]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC4-NEXT: [[STEP_ADD1:%.*]] = add <4 x i64> [[STEP_ADD]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC4-NEXT: [[STEP_ADD2:%.*]] = add <4 x i64> [[STEP_ADD1]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC4-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
		; CHECK-VF4IC4-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 4
		; CHECK-VF4IC4-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 8
		; CHECK-VF4IC4-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 12
		; CHECK-VF4IC4-NEXT: [[TMP4:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP0]]
		; CHECK-VF4IC4-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP1]]
		; CHECK-VF4IC4-NEXT: [[TMP6:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP2]]
		; CHECK-VF4IC4-NEXT: [[TMP7:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP3]]
		; CHECK-VF4IC4-NEXT: [[TMP8:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 0
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i64>, ptr [[TMP8]], align 8
		; CHECK-VF4IC4-NEXT: [[TMP9:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 4
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD7:%.*]] = load <4 x i64>, ptr [[TMP9]], align 8
		; CHECK-VF4IC4-NEXT: [[TMP10:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 8
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD8:%.*]] = load <4 x i64>, ptr [[TMP10]], align 8
		; CHECK-VF4IC4-NEXT: [[TMP11:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 12
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD9:%.*]] = load <4 x i64>, ptr [[TMP11]], align 8
		; CHECK-VF4IC4-NEXT: [[TMP12:%.*]] = icmp eq <4 x i64> [[WIDE_LOAD]], <i64 3, i64 3, i64 3, i64 3>
		; CHECK-VF4IC4-NEXT: [[TMP13:%.*]] = icmp eq <4 x i64> [[WIDE_LOAD7]], <i64 3, i64 3, i64 3, i64 3>
		; CHECK-VF4IC4-NEXT: [[TMP14:%.*]] = icmp eq <4 x i64> [[WIDE_LOAD8]], <i64 3, i64 3, i64 3, i64 3>
		; CHECK-VF4IC4-NEXT: [[TMP15:%.*]] = icmp eq <4 x i64> [[WIDE_LOAD9]], <i64 3, i64 3, i64 3, i64 3>
		; CHECK-VF4IC4-NEXT: [[TMP16]] = select <4 x i1> [[TMP12]], <4 x i64> [[VEC_IND]], <4 x i64> [[VEC_PHI]]
		; CHECK-VF4IC4-NEXT: [[TMP17]] = select <4 x i1> [[TMP13]], <4 x i64> [[STEP_ADD]], <4 x i64> [[VEC_PHI4]]
		; CHECK-VF4IC4-NEXT: [[TMP18]] = select <4 x i1> [[TMP14]], <4 x i64> [[STEP_ADD1]], <4 x i64> [[VEC_PHI5]]
		; CHECK-VF4IC4-NEXT: [[TMP19]] = select <4 x i1> [[TMP15]], <4 x i64> [[STEP_ADD2]], <4 x i64> [[VEC_PHI6]]
		; CHECK-VF4IC4-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16
		; CHECK-VF4IC4-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[STEP_ADD2]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC4-NEXT: [[TMP20:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
		; CHECK-VF4IC4-NEXT: br i1 [[TMP20]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
		; CHECK-VF4IC4: middle.block:
		; CHECK-VF4IC4-NEXT: [[RDX_MINMAX:%.*]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[TMP16]], <4 x i64> [[TMP17]])
		; CHECK-VF4IC4-NEXT: [[RDX_MINMAX10:%.*]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[RDX_MINMAX]], <4 x i64> [[TMP18]])
		; CHECK-VF4IC4-NEXT: [[RDX_MINMAX11:%.*]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[RDX_MINMAX10]], <4 x i64> [[TMP19]])
		; CHECK-VF4IC4-NEXT: [[TMP21:%.*]] = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> [[RDX_MINMAX11]])
		; CHECK-VF4IC4-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[TMP21]], -9223372036854775808
		; CHECK-VF4IC4-NEXT: [[RDX_SELECT:%.*]] = select i1 [[RDX_SELECT_CMP]], i64 [[TMP21]], i64 3
		; CHECK-VF4IC4-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
		; CHECK-VF4IC4-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
		; CHECK-VF4IC4: scalar.ph:
		; CHECK-VF4IC4-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
		; CHECK-VF4IC4-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ 3, [[ENTRY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC4-NEXT: br label [[FOR_BODY:%.*]]
		; CHECK-VF4IC4: for.body:
		; CHECK-VF4IC4-NEXT: [[IV:%.]] = phi i64 [ [[INC:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
		; CHECK-VF4IC4-NEXT: [[RDX:%.]] = phi i64 [ [[COND:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]
		; CHECK-VF4IC4-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]
		; CHECK-VF4IC4-NEXT: [[TMP22:%.*]] = load i64, ptr [[ARRAYIDX]], align 8
		; CHECK-VF4IC4-NEXT: [[CMP2:%.*]] = icmp eq i64 [[TMP22]], 3
		; CHECK-VF4IC4-NEXT: [[COND]] = select i1 [[CMP2]], i64 [[IV]], i64 [[RDX]]
		; CHECK-VF4IC4-NEXT: [[INC]] = add nuw nsw i64 [[IV]], 1
		; CHECK-VF4IC4-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC]], [[N]]
		; CHECK-VF4IC4-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
		; CHECK-VF4IC4: exit:
		; CHECK-VF4IC4-NEXT: [[COND_LCSSA:%.*]] = phi i64 [ [[COND]], [[FOR_BODY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC4-NEXT: ret i64 [[COND_LCSSA]]
		;
		; CHECK-VF1IC4-LABEL: define i64 @select_icmp_const_1
		; CHECK-VF1IC4-SAME: (ptr nocapture readonly [[A:%.]], i64 [[N:%.]]) {
		; CHECK-VF1IC4-NEXT: entry:
		; CHECK-VF1IC4-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[N]], 4
		; CHECK-VF1IC4-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK-VF1IC4: vector.ph:
		; CHECK-VF1IC4-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 4
		; CHECK-VF1IC4-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
		; CHECK-VF1IC4-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK-VF1IC4: vector.body:
		; CHECK-VF1IC4-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP16:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI1:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP17:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI2:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP18:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI3:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP19:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
		; CHECK-VF1IC4-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
		; CHECK-VF1IC4-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 2
		; CHECK-VF1IC4-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 3
		; CHECK-VF1IC4-NEXT: [[TMP4:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP0]]
		; CHECK-VF1IC4-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP1]]
		; CHECK-VF1IC4-NEXT: [[TMP6:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP2]]
		; CHECK-VF1IC4-NEXT: [[TMP7:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP3]]
		; CHECK-VF1IC4-NEXT: [[TMP8:%.*]] = load i64, ptr [[TMP4]], align 8
		; CHECK-VF1IC4-NEXT: [[TMP9:%.*]] = load i64, ptr [[TMP5]], align 8
		; CHECK-VF1IC4-NEXT: [[TMP10:%.*]] = load i64, ptr [[TMP6]], align 8
		; CHECK-VF1IC4-NEXT: [[TMP11:%.*]] = load i64, ptr [[TMP7]], align 8
		; CHECK-VF1IC4-NEXT: [[TMP12:%.*]] = icmp eq i64 [[TMP8]], 3
		; CHECK-VF1IC4-NEXT: [[TMP13:%.*]] = icmp eq i64 [[TMP9]], 3
		; CHECK-VF1IC4-NEXT: [[TMP14:%.*]] = icmp eq i64 [[TMP10]], 3
		; CHECK-VF1IC4-NEXT: [[TMP15:%.*]] = icmp eq i64 [[TMP11]], 3
		; CHECK-VF1IC4-NEXT: [[TMP16]] = select i1 [[TMP12]], i64 [[TMP0]], i64 [[VEC_PHI]]
		; CHECK-VF1IC4-NEXT: [[TMP17]] = select i1 [[TMP13]], i64 [[TMP1]], i64 [[VEC_PHI1]]
		; CHECK-VF1IC4-NEXT: [[TMP18]] = select i1 [[TMP14]], i64 [[TMP2]], i64 [[VEC_PHI2]]
		; CHECK-VF1IC4-NEXT: [[TMP19]] = select i1 [[TMP15]], i64 [[TMP3]], i64 [[VEC_PHI3]]
		; CHECK-VF1IC4-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
		; CHECK-VF1IC4-NEXT: [[TMP20:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
		; CHECK-VF1IC4-NEXT: br i1 [[TMP20]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
		; CHECK-VF1IC4: middle.block:
		; CHECK-VF1IC4-NEXT: [[RDX_MINMAX:%.*]] = call i64 @llvm.smax.i64(i64 [[TMP16]], i64 [[TMP17]])
		; CHECK-VF1IC4-NEXT: [[RDX_MINMAX4:%.*]] = call i64 @llvm.smax.i64(i64 [[RDX_MINMAX]], i64 [[TMP18]])
		; CHECK-VF1IC4-NEXT: [[RDX_MINMAX5:%.*]] = call i64 @llvm.smax.i64(i64 [[RDX_MINMAX4]], i64 [[TMP19]])
		; CHECK-VF1IC4-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[RDX_MINMAX5]], -9223372036854775808
		; CHECK-VF1IC4-NEXT: [[RDX_SELECT:%.*]] = select i1 [[RDX_SELECT_CMP]], i64 [[RDX_MINMAX5]], i64 3
		; CHECK-VF1IC4-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
		; CHECK-VF1IC4-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
		; CHECK-VF1IC4: scalar.ph:
		; CHECK-VF1IC4-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
		; CHECK-VF1IC4-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ 3, [[ENTRY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF1IC4-NEXT: br label [[FOR_BODY:%.*]]
		; CHECK-VF1IC4: for.body:
		; CHECK-VF1IC4-NEXT: [[IV:%.]] = phi i64 [ [[INC:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
		; CHECK-VF1IC4-NEXT: [[RDX:%.]] = phi i64 [ [[COND:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]
		; CHECK-VF1IC4-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]
		; CHECK-VF1IC4-NEXT: [[TMP21:%.*]] = load i64, ptr [[ARRAYIDX]], align 8
		; CHECK-VF1IC4-NEXT: [[CMP2:%.*]] = icmp eq i64 [[TMP21]], 3
		; CHECK-VF1IC4-NEXT: [[COND]] = select i1 [[CMP2]], i64 [[IV]], i64 [[RDX]]
		; CHECK-VF1IC4-NEXT: [[INC]] = add nuw nsw i64 [[IV]], 1
		; CHECK-VF1IC4-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC]], [[N]]
		; CHECK-VF1IC4-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
		; CHECK-VF1IC4: exit:
		; CHECK-VF1IC4-NEXT: [[COND_LCSSA:%.*]] = phi i64 [ [[COND]], [[FOR_BODY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF1IC4-NEXT: ret i64 [[COND_LCSSA]]
;		;
entry:		entry:
br label %for.body		br label %for.body

for.body: ; preds = %entry, %for.body		for.body: ; preds = %entry, %for.body
%iv = phi i64 [ %inc, %for.body ], [ 0, %entry ]		%iv = phi i64 [ %inc, %for.body ], [ 0, %entry ]
%rdx = phi i64 [ %cond, %for.body ], [ 3, %entry ]		%rdx = phi i64 [ %cond, %for.body ], [ 3, %entry ]
%arrayidx = getelementptr inbounds i64, ptr %a, i64 %iv		%arrayidx = getelementptr inbounds i64, ptr %a, i64 %iv
%0 = load i64, ptr %arrayidx, align 8		%0 = load i64, ptr %arrayidx, align 8
%cmp2 = icmp eq i64 %0, 3		%cmp2 = icmp eq i64 %0, 3
%cond = select i1 %cmp2, i64 %iv, i64 %rdx		%cond = select i1 %cmp2, i64 %iv, i64 %rdx
%inc = add nuw nsw i64 %iv, 1		%inc = add nuw nsw i64 %iv, 1
%exitcond.not = icmp eq i64 %inc, %n		%exitcond.not = icmp eq i64 %inc, %n
br i1 %exitcond.not, label %exit, label %for.body		br i1 %exitcond.not, label %exit, label %for.body

exit: ; preds = %for.body		exit: ; preds = %for.body
ret i64 %cond		ret i64 %cond
}		}

define i64 @select_icmp_const_2(ptr nocapture readonly %a, i64 %n) {		define i64 @select_icmp_const_2(ptr nocapture readonly %a, i64 %n) {
; CHECK-LABEL: define i64 @select_icmp_const_2		; CHECK-VF4IC1-LABEL: define i64 @select_icmp_const_2
; CHECK-NOT: vector.body:		; CHECK-VF4IC1-SAME: (ptr nocapture readonly [[A:%.]], i64 [[N:%.]]) {
		; CHECK-VF4IC1-NEXT: entry:
		; CHECK-VF4IC1-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[N]], 4
		; CHECK-VF4IC1-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK-VF4IC1: vector.ph:
		; CHECK-VF4IC1-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 4
		; CHECK-VF4IC1-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
		; CHECK-VF4IC1-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK-VF4IC1: vector.body:
		; CHECK-VF4IC1-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[VEC_PHI:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP4:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
		; CHECK-VF4IC1-NEXT: [[TMP1:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP0]]
		; CHECK-VF4IC1-NEXT: [[TMP2:%.*]] = getelementptr inbounds i64, ptr [[TMP1]], i32 0
		; CHECK-VF4IC1-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i64>, ptr [[TMP2]], align 8
		; CHECK-VF4IC1-NEXT: [[TMP3:%.*]] = icmp eq <4 x i64> [[WIDE_LOAD]], <i64 3, i64 3, i64 3, i64 3>
		; CHECK-VF4IC1-NEXT: [[TMP4]] = select <4 x i1> [[TMP3]], <4 x i64> [[VEC_PHI]], <4 x i64> [[VEC_IND]]
		; CHECK-VF4IC1-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
		; CHECK-VF4IC1-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC1-NEXT: [[TMP5:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
		; CHECK-VF4IC1-NEXT: br i1 [[TMP5]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
		; CHECK-VF4IC1: middle.block:
		; CHECK-VF4IC1-NEXT: [[TMP6:%.*]] = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> [[TMP4]])
		; CHECK-VF4IC1-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[TMP6]], -9223372036854775808
		; CHECK-VF4IC1-NEXT: [[RDX_SELECT:%.*]] = select i1 [[RDX_SELECT_CMP]], i64 [[TMP6]], i64 3
		; CHECK-VF4IC1-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
		; CHECK-VF4IC1-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
		; CHECK-VF4IC1: scalar.ph:
		; CHECK-VF4IC1-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
		; CHECK-VF4IC1-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ 3, [[ENTRY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC1-NEXT: br label [[FOR_BODY:%.*]]
		; CHECK-VF4IC1: for.body:
		; CHECK-VF4IC1-NEXT: [[IV:%.]] = phi i64 [ [[INC:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
		; CHECK-VF4IC1-NEXT: [[RDX:%.]] = phi i64 [ [[COND:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]
		; CHECK-VF4IC1-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]
		; CHECK-VF4IC1-NEXT: [[TMP7:%.*]] = load i64, ptr [[ARRAYIDX]], align 8
		; CHECK-VF4IC1-NEXT: [[CMP2:%.*]] = icmp eq i64 [[TMP7]], 3
		; CHECK-VF4IC1-NEXT: [[COND]] = select i1 [[CMP2]], i64 [[RDX]], i64 [[IV]]
		; CHECK-VF4IC1-NEXT: [[INC]] = add nuw nsw i64 [[IV]], 1
		; CHECK-VF4IC1-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC]], [[N]]
		; CHECK-VF4IC1-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
		; CHECK-VF4IC1: exit:
		; CHECK-VF4IC1-NEXT: [[COND_LCSSA:%.*]] = phi i64 [ [[COND]], [[FOR_BODY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC1-NEXT: ret i64 [[COND_LCSSA]]
		;
		; CHECK-VF4IC4-LABEL: define i64 @select_icmp_const_2
		; CHECK-VF4IC4-SAME: (ptr nocapture readonly [[A:%.]], i64 [[N:%.]]) {
		; CHECK-VF4IC4-NEXT: entry:
		; CHECK-VF4IC4-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[N]], 16
		; CHECK-VF4IC4-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK-VF4IC4: vector.ph:
		; CHECK-VF4IC4-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 16
		; CHECK-VF4IC4-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
		; CHECK-VF4IC4-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK-VF4IC4: vector.body:
		; CHECK-VF4IC4-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP16:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI4:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP17:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI5:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP18:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI6:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP19:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[STEP_ADD:%.*]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC4-NEXT: [[STEP_ADD1:%.*]] = add <4 x i64> [[STEP_ADD]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC4-NEXT: [[STEP_ADD2:%.*]] = add <4 x i64> [[STEP_ADD1]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC4-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
		; CHECK-VF4IC4-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 4
		; CHECK-VF4IC4-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 8
		; CHECK-VF4IC4-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 12
		; CHECK-VF4IC4-NEXT: [[TMP4:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP0]]
		; CHECK-VF4IC4-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP1]]
		; CHECK-VF4IC4-NEXT: [[TMP6:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP2]]
		; CHECK-VF4IC4-NEXT: [[TMP7:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP3]]
		; CHECK-VF4IC4-NEXT: [[TMP8:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 0
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i64>, ptr [[TMP8]], align 8
		; CHECK-VF4IC4-NEXT: [[TMP9:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 4
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD7:%.*]] = load <4 x i64>, ptr [[TMP9]], align 8
		; CHECK-VF4IC4-NEXT: [[TMP10:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 8
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD8:%.*]] = load <4 x i64>, ptr [[TMP10]], align 8
		; CHECK-VF4IC4-NEXT: [[TMP11:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 12
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD9:%.*]] = load <4 x i64>, ptr [[TMP11]], align 8
		; CHECK-VF4IC4-NEXT: [[TMP12:%.*]] = icmp eq <4 x i64> [[WIDE_LOAD]], <i64 3, i64 3, i64 3, i64 3>
		; CHECK-VF4IC4-NEXT: [[TMP13:%.*]] = icmp eq <4 x i64> [[WIDE_LOAD7]], <i64 3, i64 3, i64 3, i64 3>
		; CHECK-VF4IC4-NEXT: [[TMP14:%.*]] = icmp eq <4 x i64> [[WIDE_LOAD8]], <i64 3, i64 3, i64 3, i64 3>
		; CHECK-VF4IC4-NEXT: [[TMP15:%.*]] = icmp eq <4 x i64> [[WIDE_LOAD9]], <i64 3, i64 3, i64 3, i64 3>
		; CHECK-VF4IC4-NEXT: [[TMP16]] = select <4 x i1> [[TMP12]], <4 x i64> [[VEC_PHI]], <4 x i64> [[VEC_IND]]
		; CHECK-VF4IC4-NEXT: [[TMP17]] = select <4 x i1> [[TMP13]], <4 x i64> [[VEC_PHI4]], <4 x i64> [[STEP_ADD]]
		; CHECK-VF4IC4-NEXT: [[TMP18]] = select <4 x i1> [[TMP14]], <4 x i64> [[VEC_PHI5]], <4 x i64> [[STEP_ADD1]]
		; CHECK-VF4IC4-NEXT: [[TMP19]] = select <4 x i1> [[TMP15]], <4 x i64> [[VEC_PHI6]], <4 x i64> [[STEP_ADD2]]
		; CHECK-VF4IC4-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16
		; CHECK-VF4IC4-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[STEP_ADD2]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC4-NEXT: [[TMP20:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
		; CHECK-VF4IC4-NEXT: br i1 [[TMP20]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
		; CHECK-VF4IC4: middle.block:
		; CHECK-VF4IC4-NEXT: [[RDX_MINMAX:%.*]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[TMP16]], <4 x i64> [[TMP17]])
		; CHECK-VF4IC4-NEXT: [[RDX_MINMAX10:%.*]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[RDX_MINMAX]], <4 x i64> [[TMP18]])
		; CHECK-VF4IC4-NEXT: [[RDX_MINMAX11:%.*]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[RDX_MINMAX10]], <4 x i64> [[TMP19]])
		; CHECK-VF4IC4-NEXT: [[TMP21:%.*]] = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> [[RDX_MINMAX11]])
		; CHECK-VF4IC4-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[TMP21]], -9223372036854775808
		; CHECK-VF4IC4-NEXT: [[RDX_SELECT:%.*]] = select i1 [[RDX_SELECT_CMP]], i64 [[TMP21]], i64 3
		; CHECK-VF4IC4-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
		; CHECK-VF4IC4-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
		; CHECK-VF4IC4: scalar.ph:
		; CHECK-VF4IC4-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
		; CHECK-VF4IC4-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ 3, [[ENTRY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC4-NEXT: br label [[FOR_BODY:%.*]]
		; CHECK-VF4IC4: for.body:
		; CHECK-VF4IC4-NEXT: [[IV:%.]] = phi i64 [ [[INC:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
		; CHECK-VF4IC4-NEXT: [[RDX:%.]] = phi i64 [ [[COND:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]
		; CHECK-VF4IC4-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]
		; CHECK-VF4IC4-NEXT: [[TMP22:%.*]] = load i64, ptr [[ARRAYIDX]], align 8
		; CHECK-VF4IC4-NEXT: [[CMP2:%.*]] = icmp eq i64 [[TMP22]], 3
		; CHECK-VF4IC4-NEXT: [[COND]] = select i1 [[CMP2]], i64 [[RDX]], i64 [[IV]]
		; CHECK-VF4IC4-NEXT: [[INC]] = add nuw nsw i64 [[IV]], 1
		; CHECK-VF4IC4-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC]], [[N]]
		; CHECK-VF4IC4-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
		; CHECK-VF4IC4: exit:
		; CHECK-VF4IC4-NEXT: [[COND_LCSSA:%.*]] = phi i64 [ [[COND]], [[FOR_BODY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC4-NEXT: ret i64 [[COND_LCSSA]]
		;
		; CHECK-VF1IC4-LABEL: define i64 @select_icmp_const_2
		; CHECK-VF1IC4-SAME: (ptr nocapture readonly [[A:%.]], i64 [[N:%.]]) {
		; CHECK-VF1IC4-NEXT: entry:
		; CHECK-VF1IC4-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[N]], 4
		; CHECK-VF1IC4-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK-VF1IC4: vector.ph:
		; CHECK-VF1IC4-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 4
		; CHECK-VF1IC4-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
		; CHECK-VF1IC4-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK-VF1IC4: vector.body:
		; CHECK-VF1IC4-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP16:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI1:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP17:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI2:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP18:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI3:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP19:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
		; CHECK-VF1IC4-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
		; CHECK-VF1IC4-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 2
		; CHECK-VF1IC4-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 3
		; CHECK-VF1IC4-NEXT: [[TMP4:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP0]]
		; CHECK-VF1IC4-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP1]]
		; CHECK-VF1IC4-NEXT: [[TMP6:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP2]]
		; CHECK-VF1IC4-NEXT: [[TMP7:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP3]]
		; CHECK-VF1IC4-NEXT: [[TMP8:%.*]] = load i64, ptr [[TMP4]], align 8
		; CHECK-VF1IC4-NEXT: [[TMP9:%.*]] = load i64, ptr [[TMP5]], align 8
		; CHECK-VF1IC4-NEXT: [[TMP10:%.*]] = load i64, ptr [[TMP6]], align 8
		; CHECK-VF1IC4-NEXT: [[TMP11:%.*]] = load i64, ptr [[TMP7]], align 8
		; CHECK-VF1IC4-NEXT: [[TMP12:%.*]] = icmp eq i64 [[TMP8]], 3
		; CHECK-VF1IC4-NEXT: [[TMP13:%.*]] = icmp eq i64 [[TMP9]], 3
		; CHECK-VF1IC4-NEXT: [[TMP14:%.*]] = icmp eq i64 [[TMP10]], 3
		; CHECK-VF1IC4-NEXT: [[TMP15:%.*]] = icmp eq i64 [[TMP11]], 3
		; CHECK-VF1IC4-NEXT: [[TMP16]] = select i1 [[TMP12]], i64 [[VEC_PHI]], i64 [[TMP0]]
		; CHECK-VF1IC4-NEXT: [[TMP17]] = select i1 [[TMP13]], i64 [[VEC_PHI1]], i64 [[TMP1]]
		; CHECK-VF1IC4-NEXT: [[TMP18]] = select i1 [[TMP14]], i64 [[VEC_PHI2]], i64 [[TMP2]]
		; CHECK-VF1IC4-NEXT: [[TMP19]] = select i1 [[TMP15]], i64 [[VEC_PHI3]], i64 [[TMP3]]
		; CHECK-VF1IC4-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
		; CHECK-VF1IC4-NEXT: [[TMP20:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
		; CHECK-VF1IC4-NEXT: br i1 [[TMP20]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
		; CHECK-VF1IC4: middle.block:
		; CHECK-VF1IC4-NEXT: [[RDX_MINMAX:%.*]] = call i64 @llvm.smax.i64(i64 [[TMP16]], i64 [[TMP17]])
		; CHECK-VF1IC4-NEXT: [[RDX_MINMAX4:%.*]] = call i64 @llvm.smax.i64(i64 [[RDX_MINMAX]], i64 [[TMP18]])
		; CHECK-VF1IC4-NEXT: [[RDX_MINMAX5:%.*]] = call i64 @llvm.smax.i64(i64 [[RDX_MINMAX4]], i64 [[TMP19]])
		; CHECK-VF1IC4-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[RDX_MINMAX5]], -9223372036854775808
		; CHECK-VF1IC4-NEXT: [[RDX_SELECT:%.*]] = select i1 [[RDX_SELECT_CMP]], i64 [[RDX_MINMAX5]], i64 3
		; CHECK-VF1IC4-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
		; CHECK-VF1IC4-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
		; CHECK-VF1IC4: scalar.ph:
		; CHECK-VF1IC4-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
		; CHECK-VF1IC4-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ 3, [[ENTRY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF1IC4-NEXT: br label [[FOR_BODY:%.*]]
		; CHECK-VF1IC4: for.body:
		; CHECK-VF1IC4-NEXT: [[IV:%.]] = phi i64 [ [[INC:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
		; CHECK-VF1IC4-NEXT: [[RDX:%.]] = phi i64 [ [[COND:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]
		; CHECK-VF1IC4-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]
		; CHECK-VF1IC4-NEXT: [[TMP21:%.*]] = load i64, ptr [[ARRAYIDX]], align 8
		; CHECK-VF1IC4-NEXT: [[CMP2:%.*]] = icmp eq i64 [[TMP21]], 3
		; CHECK-VF1IC4-NEXT: [[COND]] = select i1 [[CMP2]], i64 [[RDX]], i64 [[IV]]
		; CHECK-VF1IC4-NEXT: [[INC]] = add nuw nsw i64 [[IV]], 1
		; CHECK-VF1IC4-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC]], [[N]]
		; CHECK-VF1IC4-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
		; CHECK-VF1IC4: exit:
		; CHECK-VF1IC4-NEXT: [[COND_LCSSA:%.*]] = phi i64 [ [[COND]], [[FOR_BODY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF1IC4-NEXT: ret i64 [[COND_LCSSA]]
;		;
entry:		entry:
br label %for.body		br label %for.body

for.body: ; preds = %entry, %for.body		for.body: ; preds = %entry, %for.body
%iv = phi i64 [ %inc, %for.body ], [ 0, %entry ]		%iv = phi i64 [ %inc, %for.body ], [ 0, %entry ]
%rdx = phi i64 [ %cond, %for.body ], [ 3, %entry ]		%rdx = phi i64 [ %cond, %for.body ], [ 3, %entry ]
%arrayidx = getelementptr inbounds i64, ptr %a, i64 %iv		%arrayidx = getelementptr inbounds i64, ptr %a, i64 %iv
%0 = load i64, ptr %arrayidx, align 8		%0 = load i64, ptr %arrayidx, align 8
%cmp2 = icmp eq i64 %0, 3		%cmp2 = icmp eq i64 %0, 3
%cond = select i1 %cmp2, i64 %rdx, i64 %iv		%cond = select i1 %cmp2, i64 %rdx, i64 %iv
%inc = add nuw nsw i64 %iv, 1		%inc = add nuw nsw i64 %iv, 1
%exitcond.not = icmp eq i64 %inc, %n		%exitcond.not = icmp eq i64 %inc, %n
br i1 %exitcond.not, label %exit, label %for.body		br i1 %exitcond.not, label %exit, label %for.body

exit: ; preds = %for.body		exit: ; preds = %for.body
ret i64 %cond		ret i64 %cond
}		}

define i64 @select_icmp_const_3_variable_rdx_start(ptr nocapture readonly %a, i64 %rdx.start, i64 %n) {		define i64 @select_icmp_const_3_variable_rdx_start(ptr nocapture readonly %a, i64 %rdx.start, i64 %n) {
; CHECK-LABEL: define i64 @select_icmp_const_3_variable_rdx_start		; CHECK-VF4IC1-LABEL: define i64 @select_icmp_const_3_variable_rdx_start
; CHECK-NOT: vector.body:		; CHECK-VF4IC1-SAME: (ptr nocapture readonly [[A:%.]], i64 [[RDX_START:%.]], i64 [[N:%.*]]) {
		; CHECK-VF4IC1-NEXT: entry:
		; CHECK-VF4IC1-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[N]], 4
		; CHECK-VF4IC1-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK-VF4IC1: vector.ph:
		; CHECK-VF4IC1-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 4
		; CHECK-VF4IC1-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
		; CHECK-VF4IC1-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK-VF4IC1: vector.body:
		; CHECK-VF4IC1-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[VEC_PHI:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP4:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
		; CHECK-VF4IC1-NEXT: [[TMP1:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP0]]
		; CHECK-VF4IC1-NEXT: [[TMP2:%.*]] = getelementptr inbounds i64, ptr [[TMP1]], i32 0
		; CHECK-VF4IC1-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i64>, ptr [[TMP2]], align 8
		; CHECK-VF4IC1-NEXT: [[TMP3:%.*]] = icmp eq <4 x i64> [[WIDE_LOAD]], <i64 3, i64 3, i64 3, i64 3>
		; CHECK-VF4IC1-NEXT: [[TMP4]] = select <4 x i1> [[TMP3]], <4 x i64> [[VEC_IND]], <4 x i64> [[VEC_PHI]]
		; CHECK-VF4IC1-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
		; CHECK-VF4IC1-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC1-NEXT: [[TMP5:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
		; CHECK-VF4IC1-NEXT: br i1 [[TMP5]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
		; CHECK-VF4IC1: middle.block:
		; CHECK-VF4IC1-NEXT: [[TMP6:%.*]] = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> [[TMP4]])
		; CHECK-VF4IC1-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[TMP6]], -9223372036854775808
		; CHECK-VF4IC1-NEXT: [[RDX_SELECT:%.*]] = select i1 [[RDX_SELECT_CMP]], i64 [[TMP6]], i64 [[RDX_START]]
		; CHECK-VF4IC1-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
		; CHECK-VF4IC1-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
		; CHECK-VF4IC1: scalar.ph:
		; CHECK-VF4IC1-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
		; CHECK-VF4IC1-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ [[RDX_START]], [[ENTRY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC1-NEXT: br label [[FOR_BODY:%.*]]
		; CHECK-VF4IC1: for.body:
		; CHECK-VF4IC1-NEXT: [[IV:%.]] = phi i64 [ [[INC:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
		; CHECK-VF4IC1-NEXT: [[RDX:%.]] = phi i64 [ [[COND:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]
		; CHECK-VF4IC1-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]
		; CHECK-VF4IC1-NEXT: [[TMP7:%.*]] = load i64, ptr [[ARRAYIDX]], align 8
		; CHECK-VF4IC1-NEXT: [[CMP2:%.*]] = icmp eq i64 [[TMP7]], 3
		; CHECK-VF4IC1-NEXT: [[COND]] = select i1 [[CMP2]], i64 [[IV]], i64 [[RDX]]
		; CHECK-VF4IC1-NEXT: [[INC]] = add nuw nsw i64 [[IV]], 1
		; CHECK-VF4IC1-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC]], [[N]]
		; CHECK-VF4IC1-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]
		; CHECK-VF4IC1: exit:
		; CHECK-VF4IC1-NEXT: [[COND_LCSSA:%.*]] = phi i64 [ [[COND]], [[FOR_BODY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC1-NEXT: ret i64 [[COND_LCSSA]]
		;
		; CHECK-VF4IC4-LABEL: define i64 @select_icmp_const_3_variable_rdx_start
		; CHECK-VF4IC4-SAME: (ptr nocapture readonly [[A:%.]], i64 [[RDX_START:%.]], i64 [[N:%.*]]) {
		; CHECK-VF4IC4-NEXT: entry:
		; CHECK-VF4IC4-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[N]], 16
		; CHECK-VF4IC4-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK-VF4IC4: vector.ph:
		; CHECK-VF4IC4-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 16
		; CHECK-VF4IC4-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
		; CHECK-VF4IC4-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK-VF4IC4: vector.body:
		; CHECK-VF4IC4-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP16:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI4:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP17:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI5:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP18:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI6:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP19:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[STEP_ADD:%.*]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC4-NEXT: [[STEP_ADD1:%.*]] = add <4 x i64> [[STEP_ADD]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC4-NEXT: [[STEP_ADD2:%.*]] = add <4 x i64> [[STEP_ADD1]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC4-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
		; CHECK-VF4IC4-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 4
		; CHECK-VF4IC4-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 8
		; CHECK-VF4IC4-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 12
		; CHECK-VF4IC4-NEXT: [[TMP4:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP0]]
		; CHECK-VF4IC4-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP1]]
		; CHECK-VF4IC4-NEXT: [[TMP6:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP2]]
		; CHECK-VF4IC4-NEXT: [[TMP7:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP3]]
		; CHECK-VF4IC4-NEXT: [[TMP8:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 0
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i64>, ptr [[TMP8]], align 8
		; CHECK-VF4IC4-NEXT: [[TMP9:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 4
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD7:%.*]] = load <4 x i64>, ptr [[TMP9]], align 8
		; CHECK-VF4IC4-NEXT: [[TMP10:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 8
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD8:%.*]] = load <4 x i64>, ptr [[TMP10]], align 8
		; CHECK-VF4IC4-NEXT: [[TMP11:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 12
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD9:%.*]] = load <4 x i64>, ptr [[TMP11]], align 8
		; CHECK-VF4IC4-NEXT: [[TMP12:%.*]] = icmp eq <4 x i64> [[WIDE_LOAD]], <i64 3, i64 3, i64 3, i64 3>
		; CHECK-VF4IC4-NEXT: [[TMP13:%.*]] = icmp eq <4 x i64> [[WIDE_LOAD7]], <i64 3, i64 3, i64 3, i64 3>
		; CHECK-VF4IC4-NEXT: [[TMP14:%.*]] = icmp eq <4 x i64> [[WIDE_LOAD8]], <i64 3, i64 3, i64 3, i64 3>
		; CHECK-VF4IC4-NEXT: [[TMP15:%.*]] = icmp eq <4 x i64> [[WIDE_LOAD9]], <i64 3, i64 3, i64 3, i64 3>
		; CHECK-VF4IC4-NEXT: [[TMP16]] = select <4 x i1> [[TMP12]], <4 x i64> [[VEC_IND]], <4 x i64> [[VEC_PHI]]
		; CHECK-VF4IC4-NEXT: [[TMP17]] = select <4 x i1> [[TMP13]], <4 x i64> [[STEP_ADD]], <4 x i64> [[VEC_PHI4]]
		; CHECK-VF4IC4-NEXT: [[TMP18]] = select <4 x i1> [[TMP14]], <4 x i64> [[STEP_ADD1]], <4 x i64> [[VEC_PHI5]]
		; CHECK-VF4IC4-NEXT: [[TMP19]] = select <4 x i1> [[TMP15]], <4 x i64> [[STEP_ADD2]], <4 x i64> [[VEC_PHI6]]
		; CHECK-VF4IC4-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16
		; CHECK-VF4IC4-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[STEP_ADD2]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC4-NEXT: [[TMP20:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
		; CHECK-VF4IC4-NEXT: br i1 [[TMP20]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
		; CHECK-VF4IC4: middle.block:
		; CHECK-VF4IC4-NEXT: [[RDX_MINMAX:%.*]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[TMP16]], <4 x i64> [[TMP17]])
		; CHECK-VF4IC4-NEXT: [[RDX_MINMAX10:%.*]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[RDX_MINMAX]], <4 x i64> [[TMP18]])
		; CHECK-VF4IC4-NEXT: [[RDX_MINMAX11:%.*]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[RDX_MINMAX10]], <4 x i64> [[TMP19]])
		; CHECK-VF4IC4-NEXT: [[TMP21:%.*]] = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> [[RDX_MINMAX11]])
		; CHECK-VF4IC4-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[TMP21]], -9223372036854775808
		; CHECK-VF4IC4-NEXT: [[RDX_SELECT:%.*]] = select i1 [[RDX_SELECT_CMP]], i64 [[TMP21]], i64 [[RDX_START]]
		; CHECK-VF4IC4-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
		; CHECK-VF4IC4-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
		; CHECK-VF4IC4: scalar.ph:
		; CHECK-VF4IC4-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
		; CHECK-VF4IC4-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ [[RDX_START]], [[ENTRY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC4-NEXT: br label [[FOR_BODY:%.*]]
		; CHECK-VF4IC4: for.body:
		; CHECK-VF4IC4-NEXT: [[IV:%.]] = phi i64 [ [[INC:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
		; CHECK-VF4IC4-NEXT: [[RDX:%.]] = phi i64 [ [[COND:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]
		; CHECK-VF4IC4-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]
		; CHECK-VF4IC4-NEXT: [[TMP22:%.*]] = load i64, ptr [[ARRAYIDX]], align 8
		; CHECK-VF4IC4-NEXT: [[CMP2:%.*]] = icmp eq i64 [[TMP22]], 3
		; CHECK-VF4IC4-NEXT: [[COND]] = select i1 [[CMP2]], i64 [[IV]], i64 [[RDX]]
		; CHECK-VF4IC4-NEXT: [[INC]] = add nuw nsw i64 [[IV]], 1
		; CHECK-VF4IC4-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC]], [[N]]
		; CHECK-VF4IC4-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]
		; CHECK-VF4IC4: exit:
		; CHECK-VF4IC4-NEXT: [[COND_LCSSA:%.*]] = phi i64 [ [[COND]], [[FOR_BODY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC4-NEXT: ret i64 [[COND_LCSSA]]
		;
		; CHECK-VF1IC4-LABEL: define i64 @select_icmp_const_3_variable_rdx_start
		; CHECK-VF1IC4-SAME: (ptr nocapture readonly [[A:%.]], i64 [[RDX_START:%.]], i64 [[N:%.*]]) {
		; CHECK-VF1IC4-NEXT: entry:
		; CHECK-VF1IC4-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[N]], 4
		; CHECK-VF1IC4-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK-VF1IC4: vector.ph:
		; CHECK-VF1IC4-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 4
		; CHECK-VF1IC4-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
		; CHECK-VF1IC4-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK-VF1IC4: vector.body:
		; CHECK-VF1IC4-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP16:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI1:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP17:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI2:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP18:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI3:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP19:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
		; CHECK-VF1IC4-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
		; CHECK-VF1IC4-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 2
		; CHECK-VF1IC4-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 3
		; CHECK-VF1IC4-NEXT: [[TMP4:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP0]]
		; CHECK-VF1IC4-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP1]]
		; CHECK-VF1IC4-NEXT: [[TMP6:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP2]]
		; CHECK-VF1IC4-NEXT: [[TMP7:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP3]]
		; CHECK-VF1IC4-NEXT: [[TMP8:%.*]] = load i64, ptr [[TMP4]], align 8
		; CHECK-VF1IC4-NEXT: [[TMP9:%.*]] = load i64, ptr [[TMP5]], align 8
		; CHECK-VF1IC4-NEXT: [[TMP10:%.*]] = load i64, ptr [[TMP6]], align 8
		; CHECK-VF1IC4-NEXT: [[TMP11:%.*]] = load i64, ptr [[TMP7]], align 8
		; CHECK-VF1IC4-NEXT: [[TMP12:%.*]] = icmp eq i64 [[TMP8]], 3
		; CHECK-VF1IC4-NEXT: [[TMP13:%.*]] = icmp eq i64 [[TMP9]], 3
		; CHECK-VF1IC4-NEXT: [[TMP14:%.*]] = icmp eq i64 [[TMP10]], 3
		; CHECK-VF1IC4-NEXT: [[TMP15:%.*]] = icmp eq i64 [[TMP11]], 3
		; CHECK-VF1IC4-NEXT: [[TMP16]] = select i1 [[TMP12]], i64 [[TMP0]], i64 [[VEC_PHI]]
		; CHECK-VF1IC4-NEXT: [[TMP17]] = select i1 [[TMP13]], i64 [[TMP1]], i64 [[VEC_PHI1]]
		; CHECK-VF1IC4-NEXT: [[TMP18]] = select i1 [[TMP14]], i64 [[TMP2]], i64 [[VEC_PHI2]]
		; CHECK-VF1IC4-NEXT: [[TMP19]] = select i1 [[TMP15]], i64 [[TMP3]], i64 [[VEC_PHI3]]
		; CHECK-VF1IC4-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
		; CHECK-VF1IC4-NEXT: [[TMP20:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
		; CHECK-VF1IC4-NEXT: br i1 [[TMP20]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
		; CHECK-VF1IC4: middle.block:
		; CHECK-VF1IC4-NEXT: [[RDX_MINMAX:%.*]] = call i64 @llvm.smax.i64(i64 [[TMP16]], i64 [[TMP17]])
		; CHECK-VF1IC4-NEXT: [[RDX_MINMAX4:%.*]] = call i64 @llvm.smax.i64(i64 [[RDX_MINMAX]], i64 [[TMP18]])
		; CHECK-VF1IC4-NEXT: [[RDX_MINMAX5:%.*]] = call i64 @llvm.smax.i64(i64 [[RDX_MINMAX4]], i64 [[TMP19]])
		; CHECK-VF1IC4-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[RDX_MINMAX5]], -9223372036854775808
		; CHECK-VF1IC4-NEXT: [[RDX_SELECT:%.*]] = select i1 [[RDX_SELECT_CMP]], i64 [[RDX_MINMAX5]], i64 [[RDX_START]]
		; CHECK-VF1IC4-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
		; CHECK-VF1IC4-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
		; CHECK-VF1IC4: scalar.ph:
		; CHECK-VF1IC4-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
		; CHECK-VF1IC4-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ [[RDX_START]], [[ENTRY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF1IC4-NEXT: br label [[FOR_BODY:%.*]]
		; CHECK-VF1IC4: for.body:
		; CHECK-VF1IC4-NEXT: [[IV:%.]] = phi i64 [ [[INC:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
		; CHECK-VF1IC4-NEXT: [[RDX:%.]] = phi i64 [ [[COND:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]
		; CHECK-VF1IC4-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]
		; CHECK-VF1IC4-NEXT: [[TMP21:%.*]] = load i64, ptr [[ARRAYIDX]], align 8
		; CHECK-VF1IC4-NEXT: [[CMP2:%.*]] = icmp eq i64 [[TMP21]], 3
		; CHECK-VF1IC4-NEXT: [[COND]] = select i1 [[CMP2]], i64 [[IV]], i64 [[RDX]]
		; CHECK-VF1IC4-NEXT: [[INC]] = add nuw nsw i64 [[IV]], 1
		; CHECK-VF1IC4-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC]], [[N]]
		; CHECK-VF1IC4-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]
		; CHECK-VF1IC4: exit:
		; CHECK-VF1IC4-NEXT: [[COND_LCSSA:%.*]] = phi i64 [ [[COND]], [[FOR_BODY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF1IC4-NEXT: ret i64 [[COND_LCSSA]]
;		;
entry:		entry:
br label %for.body		br label %for.body

for.body: ; preds = %entry, %for.body		for.body: ; preds = %entry, %for.body
%iv = phi i64 [ %inc, %for.body ], [ 0, %entry ]		%iv = phi i64 [ %inc, %for.body ], [ 0, %entry ]
%rdx = phi i64 [ %cond, %for.body ], [ %rdx.start, %entry ]		%rdx = phi i64 [ %cond, %for.body ], [ %rdx.start, %entry ]
%arrayidx = getelementptr inbounds i64, ptr %a, i64 %iv		%arrayidx = getelementptr inbounds i64, ptr %a, i64 %iv
%0 = load i64, ptr %arrayidx, align 8		%0 = load i64, ptr %arrayidx, align 8
%cmp2 = icmp eq i64 %0, 3		%cmp2 = icmp eq i64 %0, 3
%cond = select i1 %cmp2, i64 %iv, i64 %rdx		%cond = select i1 %cmp2, i64 %iv, i64 %rdx
%inc = add nuw nsw i64 %iv, 1		%inc = add nuw nsw i64 %iv, 1
%exitcond.not = icmp eq i64 %inc, %n		%exitcond.not = icmp eq i64 %inc, %n
br i1 %exitcond.not, label %exit, label %for.body		br i1 %exitcond.not, label %exit, label %for.body

exit: ; preds = %for.body		exit: ; preds = %for.body
ret i64 %cond		ret i64 %cond
}		}

define i64 @select_fcmp_const_fast(ptr nocapture readonly %a, i64 %n) {		define i64 @select_fcmp_const_fast(ptr nocapture readonly %a, i64 %n) {
; CHECK-LABEL: define i64 @select_fcmp_const_fast		; CHECK-VF4IC1-LABEL: define i64 @select_fcmp_const_fast
; CHECK-NOT: vector.body:		; CHECK-VF4IC1-SAME: (ptr nocapture readonly [[A:%.]], i64 [[N:%.]]) {
		; CHECK-VF4IC1-NEXT: entry:
		; CHECK-VF4IC1-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[N]], 4
		; CHECK-VF4IC1-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK-VF4IC1: vector.ph:
		; CHECK-VF4IC1-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 4
		; CHECK-VF4IC1-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
		; CHECK-VF4IC1-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK-VF4IC1: vector.body:
		; CHECK-VF4IC1-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[VEC_PHI:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP4:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
		; CHECK-VF4IC1-NEXT: [[TMP1:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[TMP0]]
		; CHECK-VF4IC1-NEXT: [[TMP2:%.*]] = getelementptr inbounds float, ptr [[TMP1]], i32 0
		; CHECK-VF4IC1-NEXT: [[WIDE_LOAD:%.*]] = load <4 x float>, ptr [[TMP2]], align 4
		; CHECK-VF4IC1-NEXT: [[TMP3:%.*]] = fcmp fast ueq <4 x float> [[WIDE_LOAD]], <float 3.000000e+00, float 3.000000e+00, float 3.000000e+00, float 3.000000e+00>
		; CHECK-VF4IC1-NEXT: [[TMP4]] = select <4 x i1> [[TMP3]], <4 x i64> [[VEC_IND]], <4 x i64> [[VEC_PHI]]
		; CHECK-VF4IC1-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
		; CHECK-VF4IC1-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC1-NEXT: [[TMP5:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
		; CHECK-VF4IC1-NEXT: br i1 [[TMP5]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]
		; CHECK-VF4IC1: middle.block:
		; CHECK-VF4IC1-NEXT: [[TMP6:%.*]] = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> [[TMP4]])
		; CHECK-VF4IC1-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[TMP6]], -9223372036854775808
		; CHECK-VF4IC1-NEXT: [[RDX_SELECT:%.*]] = select i1 [[RDX_SELECT_CMP]], i64 [[TMP6]], i64 2
		; CHECK-VF4IC1-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
		; CHECK-VF4IC1-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
		; CHECK-VF4IC1: scalar.ph:
		; CHECK-VF4IC1-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
		; CHECK-VF4IC1-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ 2, [[ENTRY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC1-NEXT: br label [[FOR_BODY:%.*]]
		; CHECK-VF4IC1: for.body:
		; CHECK-VF4IC1-NEXT: [[IV:%.]] = phi i64 [ [[INC:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
		; CHECK-VF4IC1-NEXT: [[RDX:%.]] = phi i64 [ [[COND:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]
		; CHECK-VF4IC1-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[IV]]
		; CHECK-VF4IC1-NEXT: [[TMP7:%.*]] = load float, ptr [[ARRAYIDX]], align 4
		; CHECK-VF4IC1-NEXT: [[CMP2:%.*]] = fcmp fast ueq float [[TMP7]], 3.000000e+00
		; CHECK-VF4IC1-NEXT: [[COND]] = select i1 [[CMP2]], i64 [[IV]], i64 [[RDX]]
		; CHECK-VF4IC1-NEXT: [[INC]] = add nuw nsw i64 [[IV]], 1
		; CHECK-VF4IC1-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC]], [[N]]
		; CHECK-VF4IC1-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP9:![0-9]+]]
		; CHECK-VF4IC1: exit:
		; CHECK-VF4IC1-NEXT: [[COND_LCSSA:%.*]] = phi i64 [ [[COND]], [[FOR_BODY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC1-NEXT: ret i64 [[COND_LCSSA]]
		;
		; CHECK-VF4IC4-LABEL: define i64 @select_fcmp_const_fast
		; CHECK-VF4IC4-SAME: (ptr nocapture readonly [[A:%.]], i64 [[N:%.]]) {
		; CHECK-VF4IC4-NEXT: entry:
		; CHECK-VF4IC4-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[N]], 16
		; CHECK-VF4IC4-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK-VF4IC4: vector.ph:
		; CHECK-VF4IC4-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 16
		; CHECK-VF4IC4-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
		; CHECK-VF4IC4-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK-VF4IC4: vector.body:
		; CHECK-VF4IC4-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP16:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI4:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP17:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI5:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP18:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI6:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP19:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[STEP_ADD:%.*]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC4-NEXT: [[STEP_ADD1:%.*]] = add <4 x i64> [[STEP_ADD]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC4-NEXT: [[STEP_ADD2:%.*]] = add <4 x i64> [[STEP_ADD1]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC4-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
		; CHECK-VF4IC4-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 4
		; CHECK-VF4IC4-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 8
		; CHECK-VF4IC4-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 12
		; CHECK-VF4IC4-NEXT: [[TMP4:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[TMP0]]
		; CHECK-VF4IC4-NEXT: [[TMP5:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[TMP1]]
		; CHECK-VF4IC4-NEXT: [[TMP6:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[TMP2]]
		; CHECK-VF4IC4-NEXT: [[TMP7:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[TMP3]]
		; CHECK-VF4IC4-NEXT: [[TMP8:%.*]] = getelementptr inbounds float, ptr [[TMP4]], i32 0
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD:%.*]] = load <4 x float>, ptr [[TMP8]], align 4
		; CHECK-VF4IC4-NEXT: [[TMP9:%.*]] = getelementptr inbounds float, ptr [[TMP4]], i32 4
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD7:%.*]] = load <4 x float>, ptr [[TMP9]], align 4
		; CHECK-VF4IC4-NEXT: [[TMP10:%.*]] = getelementptr inbounds float, ptr [[TMP4]], i32 8
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD8:%.*]] = load <4 x float>, ptr [[TMP10]], align 4
		; CHECK-VF4IC4-NEXT: [[TMP11:%.*]] = getelementptr inbounds float, ptr [[TMP4]], i32 12
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD9:%.*]] = load <4 x float>, ptr [[TMP11]], align 4
		; CHECK-VF4IC4-NEXT: [[TMP12:%.*]] = fcmp fast ueq <4 x float> [[WIDE_LOAD]], <float 3.000000e+00, float 3.000000e+00, float 3.000000e+00, float 3.000000e+00>
		; CHECK-VF4IC4-NEXT: [[TMP13:%.*]] = fcmp fast ueq <4 x float> [[WIDE_LOAD7]], <float 3.000000e+00, float 3.000000e+00, float 3.000000e+00, float 3.000000e+00>
		; CHECK-VF4IC4-NEXT: [[TMP14:%.*]] = fcmp fast ueq <4 x float> [[WIDE_LOAD8]], <float 3.000000e+00, float 3.000000e+00, float 3.000000e+00, float 3.000000e+00>
		; CHECK-VF4IC4-NEXT: [[TMP15:%.*]] = fcmp fast ueq <4 x float> [[WIDE_LOAD9]], <float 3.000000e+00, float 3.000000e+00, float 3.000000e+00, float 3.000000e+00>
		; CHECK-VF4IC4-NEXT: [[TMP16]] = select <4 x i1> [[TMP12]], <4 x i64> [[VEC_IND]], <4 x i64> [[VEC_PHI]]
		; CHECK-VF4IC4-NEXT: [[TMP17]] = select <4 x i1> [[TMP13]], <4 x i64> [[STEP_ADD]], <4 x i64> [[VEC_PHI4]]
		; CHECK-VF4IC4-NEXT: [[TMP18]] = select <4 x i1> [[TMP14]], <4 x i64> [[STEP_ADD1]], <4 x i64> [[VEC_PHI5]]
		; CHECK-VF4IC4-NEXT: [[TMP19]] = select <4 x i1> [[TMP15]], <4 x i64> [[STEP_ADD2]], <4 x i64> [[VEC_PHI6]]
		; CHECK-VF4IC4-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16
		; CHECK-VF4IC4-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[STEP_ADD2]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC4-NEXT: [[TMP20:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
		; CHECK-VF4IC4-NEXT: br i1 [[TMP20]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]
		; CHECK-VF4IC4: middle.block:
		; CHECK-VF4IC4-NEXT: [[RDX_MINMAX:%.*]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[TMP16]], <4 x i64> [[TMP17]])
		; CHECK-VF4IC4-NEXT: [[RDX_MINMAX10:%.*]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[RDX_MINMAX]], <4 x i64> [[TMP18]])
		; CHECK-VF4IC4-NEXT: [[RDX_MINMAX11:%.*]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[RDX_MINMAX10]], <4 x i64> [[TMP19]])
		; CHECK-VF4IC4-NEXT: [[TMP21:%.*]] = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> [[RDX_MINMAX11]])
		; CHECK-VF4IC4-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[TMP21]], -9223372036854775808
		; CHECK-VF4IC4-NEXT: [[RDX_SELECT:%.*]] = select i1 [[RDX_SELECT_CMP]], i64 [[TMP21]], i64 2
		; CHECK-VF4IC4-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
		; CHECK-VF4IC4-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
		; CHECK-VF4IC4: scalar.ph:
		; CHECK-VF4IC4-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
		; CHECK-VF4IC4-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ 2, [[ENTRY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC4-NEXT: br label [[FOR_BODY:%.*]]
		; CHECK-VF4IC4: for.body:
		; CHECK-VF4IC4-NEXT: [[IV:%.]] = phi i64 [ [[INC:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
		; CHECK-VF4IC4-NEXT: [[RDX:%.]] = phi i64 [ [[COND:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]
		; CHECK-VF4IC4-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[IV]]
		; CHECK-VF4IC4-NEXT: [[TMP22:%.*]] = load float, ptr [[ARRAYIDX]], align 4
		; CHECK-VF4IC4-NEXT: [[CMP2:%.*]] = fcmp fast ueq float [[TMP22]], 3.000000e+00
		; CHECK-VF4IC4-NEXT: [[COND]] = select i1 [[CMP2]], i64 [[IV]], i64 [[RDX]]
		; CHECK-VF4IC4-NEXT: [[INC]] = add nuw nsw i64 [[IV]], 1
		; CHECK-VF4IC4-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC]], [[N]]
		; CHECK-VF4IC4-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP9:![0-9]+]]
		; CHECK-VF4IC4: exit:
		; CHECK-VF4IC4-NEXT: [[COND_LCSSA:%.*]] = phi i64 [ [[COND]], [[FOR_BODY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC4-NEXT: ret i64 [[COND_LCSSA]]
		;
		; CHECK-VF1IC4-LABEL: define i64 @select_fcmp_const_fast
		; CHECK-VF1IC4-SAME: (ptr nocapture readonly [[A:%.]], i64 [[N:%.]]) {
		; CHECK-VF1IC4-NEXT: entry:
		; CHECK-VF1IC4-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[N]], 4
		; CHECK-VF1IC4-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK-VF1IC4: vector.ph:
		; CHECK-VF1IC4-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 4
		; CHECK-VF1IC4-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
		; CHECK-VF1IC4-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK-VF1IC4: vector.body:
		; CHECK-VF1IC4-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP16:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI1:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP17:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI2:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP18:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI3:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP19:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
		; CHECK-VF1IC4-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
		; CHECK-VF1IC4-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 2
		; CHECK-VF1IC4-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 3
		; CHECK-VF1IC4-NEXT: [[TMP4:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[TMP0]]
		; CHECK-VF1IC4-NEXT: [[TMP5:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[TMP1]]
		; CHECK-VF1IC4-NEXT: [[TMP6:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[TMP2]]
		; CHECK-VF1IC4-NEXT: [[TMP7:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[TMP3]]
		; CHECK-VF1IC4-NEXT: [[TMP8:%.*]] = load float, ptr [[TMP4]], align 4
		; CHECK-VF1IC4-NEXT: [[TMP9:%.*]] = load float, ptr [[TMP5]], align 4
		; CHECK-VF1IC4-NEXT: [[TMP10:%.*]] = load float, ptr [[TMP6]], align 4
		; CHECK-VF1IC4-NEXT: [[TMP11:%.*]] = load float, ptr [[TMP7]], align 4
		; CHECK-VF1IC4-NEXT: [[TMP12:%.*]] = fcmp fast ueq float [[TMP8]], 3.000000e+00
		; CHECK-VF1IC4-NEXT: [[TMP13:%.*]] = fcmp fast ueq float [[TMP9]], 3.000000e+00
		; CHECK-VF1IC4-NEXT: [[TMP14:%.*]] = fcmp fast ueq float [[TMP10]], 3.000000e+00
		; CHECK-VF1IC4-NEXT: [[TMP15:%.*]] = fcmp fast ueq float [[TMP11]], 3.000000e+00
		; CHECK-VF1IC4-NEXT: [[TMP16]] = select i1 [[TMP12]], i64 [[TMP0]], i64 [[VEC_PHI]]
		; CHECK-VF1IC4-NEXT: [[TMP17]] = select i1 [[TMP13]], i64 [[TMP1]], i64 [[VEC_PHI1]]
		; CHECK-VF1IC4-NEXT: [[TMP18]] = select i1 [[TMP14]], i64 [[TMP2]], i64 [[VEC_PHI2]]
		; CHECK-VF1IC4-NEXT: [[TMP19]] = select i1 [[TMP15]], i64 [[TMP3]], i64 [[VEC_PHI3]]
		; CHECK-VF1IC4-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
		; CHECK-VF1IC4-NEXT: [[TMP20:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
		; CHECK-VF1IC4-NEXT: br i1 [[TMP20]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]
		; CHECK-VF1IC4: middle.block:
		; CHECK-VF1IC4-NEXT: [[RDX_MINMAX:%.*]] = call i64 @llvm.smax.i64(i64 [[TMP16]], i64 [[TMP17]])
		; CHECK-VF1IC4-NEXT: [[RDX_MINMAX4:%.*]] = call i64 @llvm.smax.i64(i64 [[RDX_MINMAX]], i64 [[TMP18]])
		; CHECK-VF1IC4-NEXT: [[RDX_MINMAX5:%.*]] = call i64 @llvm.smax.i64(i64 [[RDX_MINMAX4]], i64 [[TMP19]])
		; CHECK-VF1IC4-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[RDX_MINMAX5]], -9223372036854775808
		; CHECK-VF1IC4-NEXT: [[RDX_SELECT:%.*]] = select i1 [[RDX_SELECT_CMP]], i64 [[RDX_MINMAX5]], i64 2
		; CHECK-VF1IC4-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
		; CHECK-VF1IC4-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
		; CHECK-VF1IC4: scalar.ph:
		; CHECK-VF1IC4-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
		; CHECK-VF1IC4-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ 2, [[ENTRY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF1IC4-NEXT: br label [[FOR_BODY:%.*]]
		; CHECK-VF1IC4: for.body:
		; CHECK-VF1IC4-NEXT: [[IV:%.]] = phi i64 [ [[INC:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
		; CHECK-VF1IC4-NEXT: [[RDX:%.]] = phi i64 [ [[COND:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]
		; CHECK-VF1IC4-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[IV]]
		; CHECK-VF1IC4-NEXT: [[TMP21:%.*]] = load float, ptr [[ARRAYIDX]], align 4
		; CHECK-VF1IC4-NEXT: [[CMP2:%.*]] = fcmp fast ueq float [[TMP21]], 3.000000e+00
		; CHECK-VF1IC4-NEXT: [[COND]] = select i1 [[CMP2]], i64 [[IV]], i64 [[RDX]]
		; CHECK-VF1IC4-NEXT: [[INC]] = add nuw nsw i64 [[IV]], 1
		; CHECK-VF1IC4-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC]], [[N]]
		; CHECK-VF1IC4-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP9:![0-9]+]]
		; CHECK-VF1IC4: exit:
		; CHECK-VF1IC4-NEXT: [[COND_LCSSA:%.*]] = phi i64 [ [[COND]], [[FOR_BODY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF1IC4-NEXT: ret i64 [[COND_LCSSA]]
;		;
entry:		entry:
br label %for.body		br label %for.body

for.body: ; preds = %entry, %for.body		for.body: ; preds = %entry, %for.body
%iv = phi i64 [ %inc, %for.body ], [ 0, %entry ]		%iv = phi i64 [ %inc, %for.body ], [ 0, %entry ]
%rdx = phi i64 [ %cond, %for.body ], [ 2, %entry ]		%rdx = phi i64 [ %cond, %for.body ], [ 2, %entry ]
%arrayidx = getelementptr inbounds float, ptr %a, i64 %iv		%arrayidx = getelementptr inbounds float, ptr %a, i64 %iv
%0 = load float, ptr %arrayidx, align 4		%0 = load float, ptr %arrayidx, align 4
%cmp2 = fcmp fast ueq float %0, 3.0		%cmp2 = fcmp fast ueq float %0, 3.0
%cond = select i1 %cmp2, i64 %iv, i64 %rdx		%cond = select i1 %cmp2, i64 %iv, i64 %rdx
%inc = add nuw nsw i64 %iv, 1		%inc = add nuw nsw i64 %iv, 1
%exitcond.not = icmp eq i64 %inc, %n		%exitcond.not = icmp eq i64 %inc, %n
br i1 %exitcond.not, label %exit, label %for.body		br i1 %exitcond.not, label %exit, label %for.body

exit: ; preds = %for.body		exit: ; preds = %for.body
ret i64 %cond		ret i64 %cond
}		}

define i64 @select_fcmp_const(ptr nocapture readonly %a, i64 %n) {		define i64 @select_fcmp_const(ptr nocapture readonly %a, i64 %n) {
; CHECK-LABEL: define i64 @select_fcmp_const		; CHECK-VF4IC1-LABEL: define i64 @select_fcmp_const
; CHECK-NOT: vector.body:		; CHECK-VF4IC1-SAME: (ptr nocapture readonly [[A:%.]], i64 [[N:%.]]) {
		; CHECK-VF4IC1-NEXT: entry:
		; CHECK-VF4IC1-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[N]], 4
		; CHECK-VF4IC1-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK-VF4IC1: vector.ph:
		; CHECK-VF4IC1-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 4
		; CHECK-VF4IC1-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
		; CHECK-VF4IC1-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK-VF4IC1: vector.body:
		; CHECK-VF4IC1-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[VEC_PHI:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP4:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
		; CHECK-VF4IC1-NEXT: [[TMP1:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[TMP0]]
		; CHECK-VF4IC1-NEXT: [[TMP2:%.*]] = getelementptr inbounds float, ptr [[TMP1]], i32 0
		; CHECK-VF4IC1-NEXT: [[WIDE_LOAD:%.*]] = load <4 x float>, ptr [[TMP2]], align 4
		; CHECK-VF4IC1-NEXT: [[TMP3:%.*]] = fcmp ueq <4 x float> [[WIDE_LOAD]], <float 3.000000e+00, float 3.000000e+00, float 3.000000e+00, float 3.000000e+00>
		; CHECK-VF4IC1-NEXT: [[TMP4]] = select <4 x i1> [[TMP3]], <4 x i64> [[VEC_IND]], <4 x i64> [[VEC_PHI]]
		; CHECK-VF4IC1-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
		; CHECK-VF4IC1-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC1-NEXT: [[TMP5:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
		; CHECK-VF4IC1-NEXT: br i1 [[TMP5]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]
		; CHECK-VF4IC1: middle.block:
		; CHECK-VF4IC1-NEXT: [[TMP6:%.*]] = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> [[TMP4]])
		; CHECK-VF4IC1-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[TMP6]], -9223372036854775808
		; CHECK-VF4IC1-NEXT: [[RDX_SELECT:%.*]] = select i1 [[RDX_SELECT_CMP]], i64 [[TMP6]], i64 2
		; CHECK-VF4IC1-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
		; CHECK-VF4IC1-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
		; CHECK-VF4IC1: scalar.ph:
		; CHECK-VF4IC1-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
		; CHECK-VF4IC1-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ 2, [[ENTRY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC1-NEXT: br label [[FOR_BODY:%.*]]
		; CHECK-VF4IC1: for.body:
		; CHECK-VF4IC1-NEXT: [[IV:%.]] = phi i64 [ [[INC:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
		; CHECK-VF4IC1-NEXT: [[RDX:%.]] = phi i64 [ [[COND:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]
		; CHECK-VF4IC1-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[IV]]
		; CHECK-VF4IC1-NEXT: [[TMP7:%.*]] = load float, ptr [[ARRAYIDX]], align 4
		; CHECK-VF4IC1-NEXT: [[CMP2:%.*]] = fcmp ueq float [[TMP7]], 3.000000e+00
		; CHECK-VF4IC1-NEXT: [[COND]] = select i1 [[CMP2]], i64 [[IV]], i64 [[RDX]]
		; CHECK-VF4IC1-NEXT: [[INC]] = add nuw nsw i64 [[IV]], 1
		; CHECK-VF4IC1-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC]], [[N]]
		; CHECK-VF4IC1-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP11:![0-9]+]]
		; CHECK-VF4IC1: exit:
		; CHECK-VF4IC1-NEXT: [[COND_LCSSA:%.*]] = phi i64 [ [[COND]], [[FOR_BODY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC1-NEXT: ret i64 [[COND_LCSSA]]
		;
		; CHECK-VF4IC4-LABEL: define i64 @select_fcmp_const
		; CHECK-VF4IC4-SAME: (ptr nocapture readonly [[A:%.]], i64 [[N:%.]]) {
		; CHECK-VF4IC4-NEXT: entry:
		; CHECK-VF4IC4-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[N]], 16
		; CHECK-VF4IC4-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK-VF4IC4: vector.ph:
		; CHECK-VF4IC4-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 16
		; CHECK-VF4IC4-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
		; CHECK-VF4IC4-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK-VF4IC4: vector.body:
		; CHECK-VF4IC4-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP16:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI4:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP17:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI5:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP18:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI6:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP19:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[STEP_ADD:%.*]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC4-NEXT: [[STEP_ADD1:%.*]] = add <4 x i64> [[STEP_ADD]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC4-NEXT: [[STEP_ADD2:%.*]] = add <4 x i64> [[STEP_ADD1]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC4-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
		; CHECK-VF4IC4-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 4
		; CHECK-VF4IC4-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 8
		; CHECK-VF4IC4-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 12
		; CHECK-VF4IC4-NEXT: [[TMP4:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[TMP0]]
		; CHECK-VF4IC4-NEXT: [[TMP5:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[TMP1]]
		; CHECK-VF4IC4-NEXT: [[TMP6:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[TMP2]]
		; CHECK-VF4IC4-NEXT: [[TMP7:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[TMP3]]
		; CHECK-VF4IC4-NEXT: [[TMP8:%.*]] = getelementptr inbounds float, ptr [[TMP4]], i32 0
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD:%.*]] = load <4 x float>, ptr [[TMP8]], align 4
		; CHECK-VF4IC4-NEXT: [[TMP9:%.*]] = getelementptr inbounds float, ptr [[TMP4]], i32 4
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD7:%.*]] = load <4 x float>, ptr [[TMP9]], align 4
		; CHECK-VF4IC4-NEXT: [[TMP10:%.*]] = getelementptr inbounds float, ptr [[TMP4]], i32 8
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD8:%.*]] = load <4 x float>, ptr [[TMP10]], align 4
		; CHECK-VF4IC4-NEXT: [[TMP11:%.*]] = getelementptr inbounds float, ptr [[TMP4]], i32 12
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD9:%.*]] = load <4 x float>, ptr [[TMP11]], align 4
		; CHECK-VF4IC4-NEXT: [[TMP12:%.*]] = fcmp ueq <4 x float> [[WIDE_LOAD]], <float 3.000000e+00, float 3.000000e+00, float 3.000000e+00, float 3.000000e+00>
		; CHECK-VF4IC4-NEXT: [[TMP13:%.*]] = fcmp ueq <4 x float> [[WIDE_LOAD7]], <float 3.000000e+00, float 3.000000e+00, float 3.000000e+00, float 3.000000e+00>
		; CHECK-VF4IC4-NEXT: [[TMP14:%.*]] = fcmp ueq <4 x float> [[WIDE_LOAD8]], <float 3.000000e+00, float 3.000000e+00, float 3.000000e+00, float 3.000000e+00>
		; CHECK-VF4IC4-NEXT: [[TMP15:%.*]] = fcmp ueq <4 x float> [[WIDE_LOAD9]], <float 3.000000e+00, float 3.000000e+00, float 3.000000e+00, float 3.000000e+00>
		; CHECK-VF4IC4-NEXT: [[TMP16]] = select <4 x i1> [[TMP12]], <4 x i64> [[VEC_IND]], <4 x i64> [[VEC_PHI]]
		; CHECK-VF4IC4-NEXT: [[TMP17]] = select <4 x i1> [[TMP13]], <4 x i64> [[STEP_ADD]], <4 x i64> [[VEC_PHI4]]
		; CHECK-VF4IC4-NEXT: [[TMP18]] = select <4 x i1> [[TMP14]], <4 x i64> [[STEP_ADD1]], <4 x i64> [[VEC_PHI5]]
		; CHECK-VF4IC4-NEXT: [[TMP19]] = select <4 x i1> [[TMP15]], <4 x i64> [[STEP_ADD2]], <4 x i64> [[VEC_PHI6]]
		; CHECK-VF4IC4-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16
		; CHECK-VF4IC4-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[STEP_ADD2]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC4-NEXT: [[TMP20:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
		; CHECK-VF4IC4-NEXT: br i1 [[TMP20]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]
		; CHECK-VF4IC4: middle.block:
		; CHECK-VF4IC4-NEXT: [[RDX_MINMAX:%.*]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[TMP16]], <4 x i64> [[TMP17]])
		; CHECK-VF4IC4-NEXT: [[RDX_MINMAX10:%.*]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[RDX_MINMAX]], <4 x i64> [[TMP18]])
		; CHECK-VF4IC4-NEXT: [[RDX_MINMAX11:%.*]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[RDX_MINMAX10]], <4 x i64> [[TMP19]])
		; CHECK-VF4IC4-NEXT: [[TMP21:%.*]] = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> [[RDX_MINMAX11]])
		; CHECK-VF4IC4-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[TMP21]], -9223372036854775808
		; CHECK-VF4IC4-NEXT: [[RDX_SELECT:%.*]] = select i1 [[RDX_SELECT_CMP]], i64 [[TMP21]], i64 2
		; CHECK-VF4IC4-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
		; CHECK-VF4IC4-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
		; CHECK-VF4IC4: scalar.ph:
		; CHECK-VF4IC4-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
		; CHECK-VF4IC4-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ 2, [[ENTRY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC4-NEXT: br label [[FOR_BODY:%.*]]
		; CHECK-VF4IC4: for.body:
		; CHECK-VF4IC4-NEXT: [[IV:%.]] = phi i64 [ [[INC:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
		; CHECK-VF4IC4-NEXT: [[RDX:%.]] = phi i64 [ [[COND:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]
		; CHECK-VF4IC4-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[IV]]
		; CHECK-VF4IC4-NEXT: [[TMP22:%.*]] = load float, ptr [[ARRAYIDX]], align 4
		; CHECK-VF4IC4-NEXT: [[CMP2:%.*]] = fcmp ueq float [[TMP22]], 3.000000e+00
		; CHECK-VF4IC4-NEXT: [[COND]] = select i1 [[CMP2]], i64 [[IV]], i64 [[RDX]]
		; CHECK-VF4IC4-NEXT: [[INC]] = add nuw nsw i64 [[IV]], 1
		; CHECK-VF4IC4-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC]], [[N]]
		; CHECK-VF4IC4-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP11:![0-9]+]]
		; CHECK-VF4IC4: exit:
		; CHECK-VF4IC4-NEXT: [[COND_LCSSA:%.*]] = phi i64 [ [[COND]], [[FOR_BODY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC4-NEXT: ret i64 [[COND_LCSSA]]
		;
		; CHECK-VF1IC4-LABEL: define i64 @select_fcmp_const
		; CHECK-VF1IC4-SAME: (ptr nocapture readonly [[A:%.]], i64 [[N:%.]]) {
		; CHECK-VF1IC4-NEXT: entry:
		; CHECK-VF1IC4-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[N]], 4
		; CHECK-VF1IC4-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK-VF1IC4: vector.ph:
		; CHECK-VF1IC4-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 4
		; CHECK-VF1IC4-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
		; CHECK-VF1IC4-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK-VF1IC4: vector.body:
		; CHECK-VF1IC4-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP16:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI1:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP17:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI2:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP18:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI3:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP19:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
		; CHECK-VF1IC4-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
		; CHECK-VF1IC4-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 2
		; CHECK-VF1IC4-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 3
		; CHECK-VF1IC4-NEXT: [[TMP4:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[TMP0]]
		; CHECK-VF1IC4-NEXT: [[TMP5:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[TMP1]]
		; CHECK-VF1IC4-NEXT: [[TMP6:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[TMP2]]
		; CHECK-VF1IC4-NEXT: [[TMP7:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[TMP3]]
		; CHECK-VF1IC4-NEXT: [[TMP8:%.*]] = load float, ptr [[TMP4]], align 4
		; CHECK-VF1IC4-NEXT: [[TMP9:%.*]] = load float, ptr [[TMP5]], align 4
		; CHECK-VF1IC4-NEXT: [[TMP10:%.*]] = load float, ptr [[TMP6]], align 4
		; CHECK-VF1IC4-NEXT: [[TMP11:%.*]] = load float, ptr [[TMP7]], align 4
		; CHECK-VF1IC4-NEXT: [[TMP12:%.*]] = fcmp ueq float [[TMP8]], 3.000000e+00
		; CHECK-VF1IC4-NEXT: [[TMP13:%.*]] = fcmp ueq float [[TMP9]], 3.000000e+00
		; CHECK-VF1IC4-NEXT: [[TMP14:%.*]] = fcmp ueq float [[TMP10]], 3.000000e+00
		; CHECK-VF1IC4-NEXT: [[TMP15:%.*]] = fcmp ueq float [[TMP11]], 3.000000e+00
		; CHECK-VF1IC4-NEXT: [[TMP16]] = select i1 [[TMP12]], i64 [[TMP0]], i64 [[VEC_PHI]]
		; CHECK-VF1IC4-NEXT: [[TMP17]] = select i1 [[TMP13]], i64 [[TMP1]], i64 [[VEC_PHI1]]
		; CHECK-VF1IC4-NEXT: [[TMP18]] = select i1 [[TMP14]], i64 [[TMP2]], i64 [[VEC_PHI2]]
		; CHECK-VF1IC4-NEXT: [[TMP19]] = select i1 [[TMP15]], i64 [[TMP3]], i64 [[VEC_PHI3]]
		; CHECK-VF1IC4-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
		; CHECK-VF1IC4-NEXT: [[TMP20:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
		; CHECK-VF1IC4-NEXT: br i1 [[TMP20]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]
		; CHECK-VF1IC4: middle.block:
		; CHECK-VF1IC4-NEXT: [[RDX_MINMAX:%.*]] = call i64 @llvm.smax.i64(i64 [[TMP16]], i64 [[TMP17]])
		; CHECK-VF1IC4-NEXT: [[RDX_MINMAX4:%.*]] = call i64 @llvm.smax.i64(i64 [[RDX_MINMAX]], i64 [[TMP18]])
		; CHECK-VF1IC4-NEXT: [[RDX_MINMAX5:%.*]] = call i64 @llvm.smax.i64(i64 [[RDX_MINMAX4]], i64 [[TMP19]])
		; CHECK-VF1IC4-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[RDX_MINMAX5]], -9223372036854775808
		; CHECK-VF1IC4-NEXT: [[RDX_SELECT:%.*]] = select i1 [[RDX_SELECT_CMP]], i64 [[RDX_MINMAX5]], i64 2
		; CHECK-VF1IC4-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
		; CHECK-VF1IC4-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
		; CHECK-VF1IC4: scalar.ph:
		; CHECK-VF1IC4-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
		; CHECK-VF1IC4-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ 2, [[ENTRY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF1IC4-NEXT: br label [[FOR_BODY:%.*]]
		; CHECK-VF1IC4: for.body:
		; CHECK-VF1IC4-NEXT: [[IV:%.]] = phi i64 [ [[INC:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
		; CHECK-VF1IC4-NEXT: [[RDX:%.]] = phi i64 [ [[COND:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]
		; CHECK-VF1IC4-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[IV]]
		; CHECK-VF1IC4-NEXT: [[TMP21:%.*]] = load float, ptr [[ARRAYIDX]], align 4
		; CHECK-VF1IC4-NEXT: [[CMP2:%.*]] = fcmp ueq float [[TMP21]], 3.000000e+00
		; CHECK-VF1IC4-NEXT: [[COND]] = select i1 [[CMP2]], i64 [[IV]], i64 [[RDX]]
		; CHECK-VF1IC4-NEXT: [[INC]] = add nuw nsw i64 [[IV]], 1
		; CHECK-VF1IC4-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC]], [[N]]
		; CHECK-VF1IC4-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP11:![0-9]+]]
		; CHECK-VF1IC4: exit:
		; CHECK-VF1IC4-NEXT: [[COND_LCSSA:%.*]] = phi i64 [ [[COND]], [[FOR_BODY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF1IC4-NEXT: ret i64 [[COND_LCSSA]]
;		;
entry:		entry:
br label %for.body		br label %for.body

for.body: ; preds = %entry, %for.body		for.body: ; preds = %entry, %for.body
%iv = phi i64 [ %inc, %for.body ], [ 0, %entry ]		%iv = phi i64 [ %inc, %for.body ], [ 0, %entry ]
%rdx = phi i64 [ %cond, %for.body ], [ 2, %entry ]		%rdx = phi i64 [ %cond, %for.body ], [ 2, %entry ]
%arrayidx = getelementptr inbounds float, ptr %a, i64 %iv		%arrayidx = getelementptr inbounds float, ptr %a, i64 %iv
%0 = load float, ptr %arrayidx, align 4		%0 = load float, ptr %arrayidx, align 4
%cmp2 = fcmp ueq float %0, 3.0		%cmp2 = fcmp ueq float %0, 3.0
%cond = select i1 %cmp2, i64 %iv, i64 %rdx		%cond = select i1 %cmp2, i64 %iv, i64 %rdx
%inc = add nuw nsw i64 %iv, 1		%inc = add nuw nsw i64 %iv, 1
%exitcond.not = icmp eq i64 %inc, %n		%exitcond.not = icmp eq i64 %inc, %n
br i1 %exitcond.not, label %exit, label %for.body		br i1 %exitcond.not, label %exit, label %for.body

exit: ; preds = %for.body		exit: ; preds = %for.body
ret i64 %cond		ret i64 %cond
}		}

define i64 @select_icmp(ptr nocapture readonly %a, ptr nocapture readonly %b, i64 %rdx.start, i64 %n) {		define i64 @select_icmp(ptr nocapture readonly %a, ptr nocapture readonly %b, i64 %rdx.start, i64 %n) {
; CHECK-LABEL: define i64 @select_icmp		; CHECK-VF4IC1-LABEL: define i64 @select_icmp
; CHECK-NOT: vector.body:		; CHECK-VF4IC1-SAME: (ptr nocapture readonly [[A:%.]], ptr nocapture readonly [[B:%.]], i64 [[RDX_START:%.]], i64 [[N:%.]]) {
		; CHECK-VF4IC1-NEXT: entry:
		; CHECK-VF4IC1-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[N]], 4
		; CHECK-VF4IC1-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK-VF4IC1: vector.ph:
		; CHECK-VF4IC1-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 4
		; CHECK-VF4IC1-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
		; CHECK-VF4IC1-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK-VF4IC1: vector.body:
		; CHECK-VF4IC1-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[VEC_PHI:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP6:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
		; CHECK-VF4IC1-NEXT: [[TMP1:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP0]]
		; CHECK-VF4IC1-NEXT: [[TMP2:%.*]] = getelementptr inbounds i64, ptr [[TMP1]], i32 0
		; CHECK-VF4IC1-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i64>, ptr [[TMP2]], align 8
		; CHECK-VF4IC1-NEXT: [[TMP3:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[TMP0]]
		; CHECK-VF4IC1-NEXT: [[TMP4:%.*]] = getelementptr inbounds i64, ptr [[TMP3]], i32 0
		; CHECK-VF4IC1-NEXT: [[WIDE_LOAD1:%.*]] = load <4 x i64>, ptr [[TMP4]], align 8
		; CHECK-VF4IC1-NEXT: [[TMP5:%.*]] = icmp sgt <4 x i64> [[WIDE_LOAD]], [[WIDE_LOAD1]]
		; CHECK-VF4IC1-NEXT: [[TMP6]] = select <4 x i1> [[TMP5]], <4 x i64> [[VEC_IND]], <4 x i64> [[VEC_PHI]]
		; CHECK-VF4IC1-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
		; CHECK-VF4IC1-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC1-NEXT: [[TMP7:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
		; CHECK-VF4IC1-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]]
		; CHECK-VF4IC1: middle.block:
		; CHECK-VF4IC1-NEXT: [[TMP8:%.*]] = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> [[TMP6]])
		; CHECK-VF4IC1-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[TMP8]], -9223372036854775808
		; CHECK-VF4IC1-NEXT: [[RDX_SELECT:%.*]] = select i1 [[RDX_SELECT_CMP]], i64 [[TMP8]], i64 [[RDX_START]]
		; CHECK-VF4IC1-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
		; CHECK-VF4IC1-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
		; CHECK-VF4IC1: scalar.ph:
		; CHECK-VF4IC1-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
		; CHECK-VF4IC1-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ [[RDX_START]], [[ENTRY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC1-NEXT: br label [[FOR_BODY:%.*]]
		; CHECK-VF4IC1: for.body:
		; CHECK-VF4IC1-NEXT: [[IV:%.]] = phi i64 [ [[INC:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
		; CHECK-VF4IC1-NEXT: [[RDX:%.]] = phi i64 [ [[COND:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]
		; CHECK-VF4IC1-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]
		; CHECK-VF4IC1-NEXT: [[TMP9:%.*]] = load i64, ptr [[ARRAYIDX]], align 8
		; CHECK-VF4IC1-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[IV]]
		; CHECK-VF4IC1-NEXT: [[TMP10:%.*]] = load i64, ptr [[ARRAYIDX1]], align 8
		; CHECK-VF4IC1-NEXT: [[CMP2:%.*]] = icmp sgt i64 [[TMP9]], [[TMP10]]
		; CHECK-VF4IC1-NEXT: [[COND]] = select i1 [[CMP2]], i64 [[IV]], i64 [[RDX]]
		; CHECK-VF4IC1-NEXT: [[INC]] = add nuw nsw i64 [[IV]], 1
		; CHECK-VF4IC1-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC]], [[N]]
		; CHECK-VF4IC1-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP13:![0-9]+]]
		; CHECK-VF4IC1: exit:
		; CHECK-VF4IC1-NEXT: [[COND_LCSSA:%.*]] = phi i64 [ [[COND]], [[FOR_BODY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC1-NEXT: ret i64 [[COND_LCSSA]]
		;
		; CHECK-VF4IC4-LABEL: define i64 @select_icmp
		; CHECK-VF4IC4-SAME: (ptr nocapture readonly [[A:%.]], ptr nocapture readonly [[B:%.]], i64 [[RDX_START:%.]], i64 [[N:%.]]) {
		; CHECK-VF4IC4-NEXT: entry:
		; CHECK-VF4IC4-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[N]], 16
		; CHECK-VF4IC4-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK-VF4IC4: vector.ph:
		; CHECK-VF4IC4-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 16
		; CHECK-VF4IC4-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
		; CHECK-VF4IC4-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK-VF4IC4: vector.body:
		; CHECK-VF4IC4-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP24:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI4:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP25:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI5:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP26:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI6:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP27:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[STEP_ADD:%.*]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC4-NEXT: [[STEP_ADD1:%.*]] = add <4 x i64> [[STEP_ADD]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC4-NEXT: [[STEP_ADD2:%.*]] = add <4 x i64> [[STEP_ADD1]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC4-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
		; CHECK-VF4IC4-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 4
		; CHECK-VF4IC4-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 8
		; CHECK-VF4IC4-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 12
		; CHECK-VF4IC4-NEXT: [[TMP4:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP0]]
		; CHECK-VF4IC4-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP1]]
		; CHECK-VF4IC4-NEXT: [[TMP6:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP2]]
		; CHECK-VF4IC4-NEXT: [[TMP7:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP3]]
		; CHECK-VF4IC4-NEXT: [[TMP8:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 0
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i64>, ptr [[TMP8]], align 8
		; CHECK-VF4IC4-NEXT: [[TMP9:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 4
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD7:%.*]] = load <4 x i64>, ptr [[TMP9]], align 8
		; CHECK-VF4IC4-NEXT: [[TMP10:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 8
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD8:%.*]] = load <4 x i64>, ptr [[TMP10]], align 8
		; CHECK-VF4IC4-NEXT: [[TMP11:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 12
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD9:%.*]] = load <4 x i64>, ptr [[TMP11]], align 8
		; CHECK-VF4IC4-NEXT: [[TMP12:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[TMP0]]
		; CHECK-VF4IC4-NEXT: [[TMP13:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[TMP1]]
		; CHECK-VF4IC4-NEXT: [[TMP14:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[TMP2]]
		; CHECK-VF4IC4-NEXT: [[TMP15:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[TMP3]]
		; CHECK-VF4IC4-NEXT: [[TMP16:%.*]] = getelementptr inbounds i64, ptr [[TMP12]], i32 0
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD10:%.*]] = load <4 x i64>, ptr [[TMP16]], align 8
		; CHECK-VF4IC4-NEXT: [[TMP17:%.*]] = getelementptr inbounds i64, ptr [[TMP12]], i32 4
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD11:%.*]] = load <4 x i64>, ptr [[TMP17]], align 8
		; CHECK-VF4IC4-NEXT: [[TMP18:%.*]] = getelementptr inbounds i64, ptr [[TMP12]], i32 8
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD12:%.*]] = load <4 x i64>, ptr [[TMP18]], align 8
		; CHECK-VF4IC4-NEXT: [[TMP19:%.*]] = getelementptr inbounds i64, ptr [[TMP12]], i32 12
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD13:%.*]] = load <4 x i64>, ptr [[TMP19]], align 8
		; CHECK-VF4IC4-NEXT: [[TMP20:%.*]] = icmp sgt <4 x i64> [[WIDE_LOAD]], [[WIDE_LOAD10]]
		; CHECK-VF4IC4-NEXT: [[TMP21:%.*]] = icmp sgt <4 x i64> [[WIDE_LOAD7]], [[WIDE_LOAD11]]
		; CHECK-VF4IC4-NEXT: [[TMP22:%.*]] = icmp sgt <4 x i64> [[WIDE_LOAD8]], [[WIDE_LOAD12]]
		; CHECK-VF4IC4-NEXT: [[TMP23:%.*]] = icmp sgt <4 x i64> [[WIDE_LOAD9]], [[WIDE_LOAD13]]
		; CHECK-VF4IC4-NEXT: [[TMP24]] = select <4 x i1> [[TMP20]], <4 x i64> [[VEC_IND]], <4 x i64> [[VEC_PHI]]
		; CHECK-VF4IC4-NEXT: [[TMP25]] = select <4 x i1> [[TMP21]], <4 x i64> [[STEP_ADD]], <4 x i64> [[VEC_PHI4]]
		; CHECK-VF4IC4-NEXT: [[TMP26]] = select <4 x i1> [[TMP22]], <4 x i64> [[STEP_ADD1]], <4 x i64> [[VEC_PHI5]]
		; CHECK-VF4IC4-NEXT: [[TMP27]] = select <4 x i1> [[TMP23]], <4 x i64> [[STEP_ADD2]], <4 x i64> [[VEC_PHI6]]
		; CHECK-VF4IC4-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16
		; CHECK-VF4IC4-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[STEP_ADD2]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC4-NEXT: [[TMP28:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
		; CHECK-VF4IC4-NEXT: br i1 [[TMP28]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]]
		; CHECK-VF4IC4: middle.block:
		; CHECK-VF4IC4-NEXT: [[RDX_MINMAX:%.*]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[TMP24]], <4 x i64> [[TMP25]])
		; CHECK-VF4IC4-NEXT: [[RDX_MINMAX14:%.*]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[RDX_MINMAX]], <4 x i64> [[TMP26]])
		; CHECK-VF4IC4-NEXT: [[RDX_MINMAX15:%.*]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[RDX_MINMAX14]], <4 x i64> [[TMP27]])
		; CHECK-VF4IC4-NEXT: [[TMP29:%.*]] = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> [[RDX_MINMAX15]])
		; CHECK-VF4IC4-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[TMP29]], -9223372036854775808
		; CHECK-VF4IC4-NEXT: [[RDX_SELECT:%.*]] = select i1 [[RDX_SELECT_CMP]], i64 [[TMP29]], i64 [[RDX_START]]
		; CHECK-VF4IC4-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
		; CHECK-VF4IC4-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
		; CHECK-VF4IC4: scalar.ph:
		; CHECK-VF4IC4-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
		; CHECK-VF4IC4-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ [[RDX_START]], [[ENTRY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC4-NEXT: br label [[FOR_BODY:%.*]]
		; CHECK-VF4IC4: for.body:
		; CHECK-VF4IC4-NEXT: [[IV:%.]] = phi i64 [ [[INC:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
		; CHECK-VF4IC4-NEXT: [[RDX:%.]] = phi i64 [ [[COND:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]
		; CHECK-VF4IC4-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]
		; CHECK-VF4IC4-NEXT: [[TMP30:%.*]] = load i64, ptr [[ARRAYIDX]], align 8
		; CHECK-VF4IC4-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[IV]]
		; CHECK-VF4IC4-NEXT: [[TMP31:%.*]] = load i64, ptr [[ARRAYIDX1]], align 8
		; CHECK-VF4IC4-NEXT: [[CMP2:%.*]] = icmp sgt i64 [[TMP30]], [[TMP31]]
		; CHECK-VF4IC4-NEXT: [[COND]] = select i1 [[CMP2]], i64 [[IV]], i64 [[RDX]]
		; CHECK-VF4IC4-NEXT: [[INC]] = add nuw nsw i64 [[IV]], 1
		; CHECK-VF4IC4-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC]], [[N]]
		; CHECK-VF4IC4-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP13:![0-9]+]]
		; CHECK-VF4IC4: exit:
		; CHECK-VF4IC4-NEXT: [[COND_LCSSA:%.*]] = phi i64 [ [[COND]], [[FOR_BODY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC4-NEXT: ret i64 [[COND_LCSSA]]
		;
		; CHECK-VF1IC4-LABEL: define i64 @select_icmp
		; CHECK-VF1IC4-SAME: (ptr nocapture readonly [[A:%.]], ptr nocapture readonly [[B:%.]], i64 [[RDX_START:%.]], i64 [[N:%.]]) {
		; CHECK-VF1IC4-NEXT: entry:
		; CHECK-VF1IC4-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[N]], 4
		; CHECK-VF1IC4-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK-VF1IC4: vector.ph:
		; CHECK-VF1IC4-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 4
		; CHECK-VF1IC4-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
		; CHECK-VF1IC4-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK-VF1IC4: vector.body:
		; CHECK-VF1IC4-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP24:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI1:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP25:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI2:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP26:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI3:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP27:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
		; CHECK-VF1IC4-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
		; CHECK-VF1IC4-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 2
		; CHECK-VF1IC4-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 3
		; CHECK-VF1IC4-NEXT: [[TMP4:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP0]]
		; CHECK-VF1IC4-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP1]]
		; CHECK-VF1IC4-NEXT: [[TMP6:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP2]]
		; CHECK-VF1IC4-NEXT: [[TMP7:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP3]]
		; CHECK-VF1IC4-NEXT: [[TMP8:%.*]] = load i64, ptr [[TMP4]], align 8
		; CHECK-VF1IC4-NEXT: [[TMP9:%.*]] = load i64, ptr [[TMP5]], align 8
		; CHECK-VF1IC4-NEXT: [[TMP10:%.*]] = load i64, ptr [[TMP6]], align 8
		; CHECK-VF1IC4-NEXT: [[TMP11:%.*]] = load i64, ptr [[TMP7]], align 8
		; CHECK-VF1IC4-NEXT: [[TMP12:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[TMP0]]
		; CHECK-VF1IC4-NEXT: [[TMP13:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[TMP1]]
		; CHECK-VF1IC4-NEXT: [[TMP14:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[TMP2]]
		; CHECK-VF1IC4-NEXT: [[TMP15:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[TMP3]]
		; CHECK-VF1IC4-NEXT: [[TMP16:%.*]] = load i64, ptr [[TMP12]], align 8
		; CHECK-VF1IC4-NEXT: [[TMP17:%.*]] = load i64, ptr [[TMP13]], align 8
		; CHECK-VF1IC4-NEXT: [[TMP18:%.*]] = load i64, ptr [[TMP14]], align 8
		; CHECK-VF1IC4-NEXT: [[TMP19:%.*]] = load i64, ptr [[TMP15]], align 8
		; CHECK-VF1IC4-NEXT: [[TMP20:%.*]] = icmp sgt i64 [[TMP8]], [[TMP16]]
		; CHECK-VF1IC4-NEXT: [[TMP21:%.*]] = icmp sgt i64 [[TMP9]], [[TMP17]]
		; CHECK-VF1IC4-NEXT: [[TMP22:%.*]] = icmp sgt i64 [[TMP10]], [[TMP18]]
		; CHECK-VF1IC4-NEXT: [[TMP23:%.*]] = icmp sgt i64 [[TMP11]], [[TMP19]]
		; CHECK-VF1IC4-NEXT: [[TMP24]] = select i1 [[TMP20]], i64 [[TMP0]], i64 [[VEC_PHI]]
		; CHECK-VF1IC4-NEXT: [[TMP25]] = select i1 [[TMP21]], i64 [[TMP1]], i64 [[VEC_PHI1]]
		; CHECK-VF1IC4-NEXT: [[TMP26]] = select i1 [[TMP22]], i64 [[TMP2]], i64 [[VEC_PHI2]]
		; CHECK-VF1IC4-NEXT: [[TMP27]] = select i1 [[TMP23]], i64 [[TMP3]], i64 [[VEC_PHI3]]
		; CHECK-VF1IC4-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
		; CHECK-VF1IC4-NEXT: [[TMP28:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
		; CHECK-VF1IC4-NEXT: br i1 [[TMP28]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]]
		; CHECK-VF1IC4: middle.block:
		; CHECK-VF1IC4-NEXT: [[RDX_MINMAX:%.*]] = call i64 @llvm.smax.i64(i64 [[TMP24]], i64 [[TMP25]])
		; CHECK-VF1IC4-NEXT: [[RDX_MINMAX4:%.*]] = call i64 @llvm.smax.i64(i64 [[RDX_MINMAX]], i64 [[TMP26]])
		; CHECK-VF1IC4-NEXT: [[RDX_MINMAX5:%.*]] = call i64 @llvm.smax.i64(i64 [[RDX_MINMAX4]], i64 [[TMP27]])
		; CHECK-VF1IC4-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[RDX_MINMAX5]], -9223372036854775808
		; CHECK-VF1IC4-NEXT: [[RDX_SELECT:%.*]] = select i1 [[RDX_SELECT_CMP]], i64 [[RDX_MINMAX5]], i64 [[RDX_START]]
		; CHECK-VF1IC4-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
		; CHECK-VF1IC4-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
		; CHECK-VF1IC4: scalar.ph:
		; CHECK-VF1IC4-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
		; CHECK-VF1IC4-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ [[RDX_START]], [[ENTRY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF1IC4-NEXT: br label [[FOR_BODY:%.*]]
		; CHECK-VF1IC4: for.body:
		; CHECK-VF1IC4-NEXT: [[IV:%.]] = phi i64 [ [[INC:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
		; CHECK-VF1IC4-NEXT: [[RDX:%.]] = phi i64 [ [[COND:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]
		; CHECK-VF1IC4-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]
		; CHECK-VF1IC4-NEXT: [[TMP29:%.*]] = load i64, ptr [[ARRAYIDX]], align 8
		; CHECK-VF1IC4-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[IV]]
		; CHECK-VF1IC4-NEXT: [[TMP30:%.*]] = load i64, ptr [[ARRAYIDX1]], align 8
		; CHECK-VF1IC4-NEXT: [[CMP2:%.*]] = icmp sgt i64 [[TMP29]], [[TMP30]]
		; CHECK-VF1IC4-NEXT: [[COND]] = select i1 [[CMP2]], i64 [[IV]], i64 [[RDX]]
		; CHECK-VF1IC4-NEXT: [[INC]] = add nuw nsw i64 [[IV]], 1
		; CHECK-VF1IC4-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC]], [[N]]
		; CHECK-VF1IC4-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP13:![0-9]+]]
		; CHECK-VF1IC4: exit:
		; CHECK-VF1IC4-NEXT: [[COND_LCSSA:%.*]] = phi i64 [ [[COND]], [[FOR_BODY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF1IC4-NEXT: ret i64 [[COND_LCSSA]]
;		;
entry:		entry:
br label %for.body		br label %for.body

for.body: ; preds = %entry, %for.body		for.body: ; preds = %entry, %for.body
%iv = phi i64 [ %inc, %for.body ], [ 0, %entry ]		%iv = phi i64 [ %inc, %for.body ], [ 0, %entry ]
%rdx = phi i64 [ %cond, %for.body ], [ %rdx.start, %entry ]		%rdx = phi i64 [ %cond, %for.body ], [ %rdx.start, %entry ]
%arrayidx = getelementptr inbounds i64, ptr %a, i64 %iv		%arrayidx = getelementptr inbounds i64, ptr %a, i64 %iv
%0 = load i64, ptr %arrayidx, align 8		%0 = load i64, ptr %arrayidx, align 8
%arrayidx1 = getelementptr inbounds i64, ptr %b, i64 %iv		%arrayidx1 = getelementptr inbounds i64, ptr %b, i64 %iv
%1 = load i64, ptr %arrayidx1, align 8		%1 = load i64, ptr %arrayidx1, align 8
%cmp2 = icmp sgt i64 %0, %1		%cmp2 = icmp sgt i64 %0, %1
%cond = select i1 %cmp2, i64 %iv, i64 %rdx		%cond = select i1 %cmp2, i64 %iv, i64 %rdx
%inc = add nuw nsw i64 %iv, 1		%inc = add nuw nsw i64 %iv, 1
%exitcond.not = icmp eq i64 %inc, %n		%exitcond.not = icmp eq i64 %inc, %n
br i1 %exitcond.not, label %exit, label %for.body		br i1 %exitcond.not, label %exit, label %for.body

exit: ; preds = %for.body		exit: ; preds = %for.body
ret i64 %cond		ret i64 %cond
}		}

define i64 @select_fcmp(ptr nocapture readonly %a, ptr nocapture readonly %b, i64 %rdx.start, i64 %n) {		define i64 @select_fcmp(ptr nocapture readonly %a, ptr nocapture readonly %b, i64 %rdx.start, i64 %n) {
; CHECK-LABEL: define i64 @select_fcmp		; CHECK-VF4IC1-LABEL: define i64 @select_fcmp
; CHECK-NOT: vector.body:		; CHECK-VF4IC1-SAME: (ptr nocapture readonly [[A:%.]], ptr nocapture readonly [[B:%.]], i64 [[RDX_START:%.]], i64 [[N:%.]]) {
		; CHECK-VF4IC1-NEXT: entry:
		; CHECK-VF4IC1-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[N]], 4
		; CHECK-VF4IC1-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK-VF4IC1: vector.ph:
		; CHECK-VF4IC1-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 4
		; CHECK-VF4IC1-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
		; CHECK-VF4IC1-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK-VF4IC1: vector.body:
		; CHECK-VF4IC1-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[VEC_PHI:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP6:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
		; CHECK-VF4IC1-NEXT: [[TMP1:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[TMP0]]
		; CHECK-VF4IC1-NEXT: [[TMP2:%.*]] = getelementptr inbounds float, ptr [[TMP1]], i32 0
		; CHECK-VF4IC1-NEXT: [[WIDE_LOAD:%.*]] = load <4 x float>, ptr [[TMP2]], align 4
		; CHECK-VF4IC1-NEXT: [[TMP3:%.*]] = getelementptr inbounds float, ptr [[B]], i64 [[TMP0]]
		; CHECK-VF4IC1-NEXT: [[TMP4:%.*]] = getelementptr inbounds float, ptr [[TMP3]], i32 0
		; CHECK-VF4IC1-NEXT: [[WIDE_LOAD1:%.*]] = load <4 x float>, ptr [[TMP4]], align 4
		; CHECK-VF4IC1-NEXT: [[TMP5:%.*]] = fcmp ogt <4 x float> [[WIDE_LOAD]], [[WIDE_LOAD1]]
		; CHECK-VF4IC1-NEXT: [[TMP6]] = select <4 x i1> [[TMP5]], <4 x i64> [[VEC_IND]], <4 x i64> [[VEC_PHI]]
		; CHECK-VF4IC1-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
		; CHECK-VF4IC1-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC1-NEXT: [[TMP7:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
		; CHECK-VF4IC1-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP14:![0-9]+]]
		; CHECK-VF4IC1: middle.block:
		; CHECK-VF4IC1-NEXT: [[TMP8:%.*]] = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> [[TMP6]])
		; CHECK-VF4IC1-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[TMP8]], -9223372036854775808
		; CHECK-VF4IC1-NEXT: [[RDX_SELECT:%.*]] = select i1 [[RDX_SELECT_CMP]], i64 [[TMP8]], i64 [[RDX_START]]
		; CHECK-VF4IC1-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
		; CHECK-VF4IC1-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
		; CHECK-VF4IC1: scalar.ph:
		; CHECK-VF4IC1-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
		; CHECK-VF4IC1-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ [[RDX_START]], [[ENTRY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC1-NEXT: br label [[FOR_BODY:%.*]]
		; CHECK-VF4IC1: for.body:
		; CHECK-VF4IC1-NEXT: [[IV:%.]] = phi i64 [ [[INC:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
		; CHECK-VF4IC1-NEXT: [[RDX:%.]] = phi i64 [ [[COND:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]
		; CHECK-VF4IC1-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[IV]]
		; CHECK-VF4IC1-NEXT: [[TMP9:%.*]] = load float, ptr [[ARRAYIDX]], align 4
		; CHECK-VF4IC1-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds float, ptr [[B]], i64 [[IV]]
		; CHECK-VF4IC1-NEXT: [[TMP10:%.*]] = load float, ptr [[ARRAYIDX1]], align 4
		; CHECK-VF4IC1-NEXT: [[CMP2:%.*]] = fcmp ogt float [[TMP9]], [[TMP10]]
		; CHECK-VF4IC1-NEXT: [[COND]] = select i1 [[CMP2]], i64 [[IV]], i64 [[RDX]]
		; CHECK-VF4IC1-NEXT: [[INC]] = add nuw nsw i64 [[IV]], 1
		; CHECK-VF4IC1-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC]], [[N]]
		; CHECK-VF4IC1-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP15:![0-9]+]]
		; CHECK-VF4IC1: exit:
		; CHECK-VF4IC1-NEXT: [[COND_LCSSA:%.*]] = phi i64 [ [[COND]], [[FOR_BODY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC1-NEXT: ret i64 [[COND_LCSSA]]
		;
		; CHECK-VF4IC4-LABEL: define i64 @select_fcmp
		; CHECK-VF4IC4-SAME: (ptr nocapture readonly [[A:%.]], ptr nocapture readonly [[B:%.]], i64 [[RDX_START:%.]], i64 [[N:%.]]) {
		; CHECK-VF4IC4-NEXT: entry:
		; CHECK-VF4IC4-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[N]], 16
		; CHECK-VF4IC4-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK-VF4IC4: vector.ph:
		; CHECK-VF4IC4-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 16
		; CHECK-VF4IC4-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
		; CHECK-VF4IC4-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK-VF4IC4: vector.body:
		; CHECK-VF4IC4-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP24:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI4:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP25:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI5:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP26:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI6:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP27:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[STEP_ADD:%.*]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC4-NEXT: [[STEP_ADD1:%.*]] = add <4 x i64> [[STEP_ADD]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC4-NEXT: [[STEP_ADD2:%.*]] = add <4 x i64> [[STEP_ADD1]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC4-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
		; CHECK-VF4IC4-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 4
		; CHECK-VF4IC4-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 8
		; CHECK-VF4IC4-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 12
		; CHECK-VF4IC4-NEXT: [[TMP4:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[TMP0]]
		; CHECK-VF4IC4-NEXT: [[TMP5:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[TMP1]]
		; CHECK-VF4IC4-NEXT: [[TMP6:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[TMP2]]
		; CHECK-VF4IC4-NEXT: [[TMP7:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[TMP3]]
		; CHECK-VF4IC4-NEXT: [[TMP8:%.*]] = getelementptr inbounds float, ptr [[TMP4]], i32 0
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD:%.*]] = load <4 x float>, ptr [[TMP8]], align 4
		; CHECK-VF4IC4-NEXT: [[TMP9:%.*]] = getelementptr inbounds float, ptr [[TMP4]], i32 4
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD7:%.*]] = load <4 x float>, ptr [[TMP9]], align 4
		; CHECK-VF4IC4-NEXT: [[TMP10:%.*]] = getelementptr inbounds float, ptr [[TMP4]], i32 8
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD8:%.*]] = load <4 x float>, ptr [[TMP10]], align 4
		; CHECK-VF4IC4-NEXT: [[TMP11:%.*]] = getelementptr inbounds float, ptr [[TMP4]], i32 12
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD9:%.*]] = load <4 x float>, ptr [[TMP11]], align 4
		; CHECK-VF4IC4-NEXT: [[TMP12:%.*]] = getelementptr inbounds float, ptr [[B]], i64 [[TMP0]]
		; CHECK-VF4IC4-NEXT: [[TMP13:%.*]] = getelementptr inbounds float, ptr [[B]], i64 [[TMP1]]
		; CHECK-VF4IC4-NEXT: [[TMP14:%.*]] = getelementptr inbounds float, ptr [[B]], i64 [[TMP2]]
		; CHECK-VF4IC4-NEXT: [[TMP15:%.*]] = getelementptr inbounds float, ptr [[B]], i64 [[TMP3]]
		; CHECK-VF4IC4-NEXT: [[TMP16:%.*]] = getelementptr inbounds float, ptr [[TMP12]], i32 0
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD10:%.*]] = load <4 x float>, ptr [[TMP16]], align 4
		; CHECK-VF4IC4-NEXT: [[TMP17:%.*]] = getelementptr inbounds float, ptr [[TMP12]], i32 4
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD11:%.*]] = load <4 x float>, ptr [[TMP17]], align 4
		; CHECK-VF4IC4-NEXT: [[TMP18:%.*]] = getelementptr inbounds float, ptr [[TMP12]], i32 8
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD12:%.*]] = load <4 x float>, ptr [[TMP18]], align 4
		; CHECK-VF4IC4-NEXT: [[TMP19:%.*]] = getelementptr inbounds float, ptr [[TMP12]], i32 12
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD13:%.*]] = load <4 x float>, ptr [[TMP19]], align 4
		; CHECK-VF4IC4-NEXT: [[TMP20:%.*]] = fcmp ogt <4 x float> [[WIDE_LOAD]], [[WIDE_LOAD10]]
		; CHECK-VF4IC4-NEXT: [[TMP21:%.*]] = fcmp ogt <4 x float> [[WIDE_LOAD7]], [[WIDE_LOAD11]]
		; CHECK-VF4IC4-NEXT: [[TMP22:%.*]] = fcmp ogt <4 x float> [[WIDE_LOAD8]], [[WIDE_LOAD12]]
		; CHECK-VF4IC4-NEXT: [[TMP23:%.*]] = fcmp ogt <4 x float> [[WIDE_LOAD9]], [[WIDE_LOAD13]]
		; CHECK-VF4IC4-NEXT: [[TMP24]] = select <4 x i1> [[TMP20]], <4 x i64> [[VEC_IND]], <4 x i64> [[VEC_PHI]]
		; CHECK-VF4IC4-NEXT: [[TMP25]] = select <4 x i1> [[TMP21]], <4 x i64> [[STEP_ADD]], <4 x i64> [[VEC_PHI4]]
		; CHECK-VF4IC4-NEXT: [[TMP26]] = select <4 x i1> [[TMP22]], <4 x i64> [[STEP_ADD1]], <4 x i64> [[VEC_PHI5]]
		; CHECK-VF4IC4-NEXT: [[TMP27]] = select <4 x i1> [[TMP23]], <4 x i64> [[STEP_ADD2]], <4 x i64> [[VEC_PHI6]]
		; CHECK-VF4IC4-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16
		; CHECK-VF4IC4-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[STEP_ADD2]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC4-NEXT: [[TMP28:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
		; CHECK-VF4IC4-NEXT: br i1 [[TMP28]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP14:![0-9]+]]
		; CHECK-VF4IC4: middle.block:
		; CHECK-VF4IC4-NEXT: [[RDX_MINMAX:%.*]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[TMP24]], <4 x i64> [[TMP25]])
		; CHECK-VF4IC4-NEXT: [[RDX_MINMAX14:%.*]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[RDX_MINMAX]], <4 x i64> [[TMP26]])
		; CHECK-VF4IC4-NEXT: [[RDX_MINMAX15:%.*]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[RDX_MINMAX14]], <4 x i64> [[TMP27]])
		; CHECK-VF4IC4-NEXT: [[TMP29:%.*]] = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> [[RDX_MINMAX15]])
		; CHECK-VF4IC4-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[TMP29]], -9223372036854775808
		; CHECK-VF4IC4-NEXT: [[RDX_SELECT:%.*]] = select i1 [[RDX_SELECT_CMP]], i64 [[TMP29]], i64 [[RDX_START]]
		; CHECK-VF4IC4-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
		; CHECK-VF4IC4-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
		; CHECK-VF4IC4: scalar.ph:
		; CHECK-VF4IC4-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
		; CHECK-VF4IC4-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ [[RDX_START]], [[ENTRY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC4-NEXT: br label [[FOR_BODY:%.*]]
		; CHECK-VF4IC4: for.body:
		; CHECK-VF4IC4-NEXT: [[IV:%.]] = phi i64 [ [[INC:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
		; CHECK-VF4IC4-NEXT: [[RDX:%.]] = phi i64 [ [[COND:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]
		; CHECK-VF4IC4-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[IV]]
		; CHECK-VF4IC4-NEXT: [[TMP30:%.*]] = load float, ptr [[ARRAYIDX]], align 4
		; CHECK-VF4IC4-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds float, ptr [[B]], i64 [[IV]]
		; CHECK-VF4IC4-NEXT: [[TMP31:%.*]] = load float, ptr [[ARRAYIDX1]], align 4
		; CHECK-VF4IC4-NEXT: [[CMP2:%.*]] = fcmp ogt float [[TMP30]], [[TMP31]]
		; CHECK-VF4IC4-NEXT: [[COND]] = select i1 [[CMP2]], i64 [[IV]], i64 [[RDX]]
		; CHECK-VF4IC4-NEXT: [[INC]] = add nuw nsw i64 [[IV]], 1
		; CHECK-VF4IC4-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC]], [[N]]
		; CHECK-VF4IC4-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP15:![0-9]+]]
		; CHECK-VF4IC4: exit:
		; CHECK-VF4IC4-NEXT: [[COND_LCSSA:%.*]] = phi i64 [ [[COND]], [[FOR_BODY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC4-NEXT: ret i64 [[COND_LCSSA]]
		;
		; CHECK-VF1IC4-LABEL: define i64 @select_fcmp
		; CHECK-VF1IC4-SAME: (ptr nocapture readonly [[A:%.]], ptr nocapture readonly [[B:%.]], i64 [[RDX_START:%.]], i64 [[N:%.]]) {
		; CHECK-VF1IC4-NEXT: entry:
		; CHECK-VF1IC4-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[N]], 4
		; CHECK-VF1IC4-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK-VF1IC4: vector.ph:
		; CHECK-VF1IC4-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 4
		; CHECK-VF1IC4-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
		; CHECK-VF1IC4-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK-VF1IC4: vector.body:
		; CHECK-VF1IC4-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP24:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI1:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP25:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI2:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP26:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI3:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP27:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
		; CHECK-VF1IC4-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
		; CHECK-VF1IC4-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 2
		; CHECK-VF1IC4-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 3
		; CHECK-VF1IC4-NEXT: [[TMP4:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[TMP0]]
		; CHECK-VF1IC4-NEXT: [[TMP5:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[TMP1]]
		; CHECK-VF1IC4-NEXT: [[TMP6:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[TMP2]]
		; CHECK-VF1IC4-NEXT: [[TMP7:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[TMP3]]
		; CHECK-VF1IC4-NEXT: [[TMP8:%.*]] = load float, ptr [[TMP4]], align 4
		; CHECK-VF1IC4-NEXT: [[TMP9:%.*]] = load float, ptr [[TMP5]], align 4
		; CHECK-VF1IC4-NEXT: [[TMP10:%.*]] = load float, ptr [[TMP6]], align 4
		; CHECK-VF1IC4-NEXT: [[TMP11:%.*]] = load float, ptr [[TMP7]], align 4
		; CHECK-VF1IC4-NEXT: [[TMP12:%.*]] = getelementptr inbounds float, ptr [[B]], i64 [[TMP0]]
		; CHECK-VF1IC4-NEXT: [[TMP13:%.*]] = getelementptr inbounds float, ptr [[B]], i64 [[TMP1]]
		; CHECK-VF1IC4-NEXT: [[TMP14:%.*]] = getelementptr inbounds float, ptr [[B]], i64 [[TMP2]]
		; CHECK-VF1IC4-NEXT: [[TMP15:%.*]] = getelementptr inbounds float, ptr [[B]], i64 [[TMP3]]
		; CHECK-VF1IC4-NEXT: [[TMP16:%.*]] = load float, ptr [[TMP12]], align 4
		; CHECK-VF1IC4-NEXT: [[TMP17:%.*]] = load float, ptr [[TMP13]], align 4
		; CHECK-VF1IC4-NEXT: [[TMP18:%.*]] = load float, ptr [[TMP14]], align 4
		; CHECK-VF1IC4-NEXT: [[TMP19:%.*]] = load float, ptr [[TMP15]], align 4
		; CHECK-VF1IC4-NEXT: [[TMP20:%.*]] = fcmp ogt float [[TMP8]], [[TMP16]]
		; CHECK-VF1IC4-NEXT: [[TMP21:%.*]] = fcmp ogt float [[TMP9]], [[TMP17]]
		; CHECK-VF1IC4-NEXT: [[TMP22:%.*]] = fcmp ogt float [[TMP10]], [[TMP18]]
		; CHECK-VF1IC4-NEXT: [[TMP23:%.*]] = fcmp ogt float [[TMP11]], [[TMP19]]
		; CHECK-VF1IC4-NEXT: [[TMP24]] = select i1 [[TMP20]], i64 [[TMP0]], i64 [[VEC_PHI]]
		; CHECK-VF1IC4-NEXT: [[TMP25]] = select i1 [[TMP21]], i64 [[TMP1]], i64 [[VEC_PHI1]]
		; CHECK-VF1IC4-NEXT: [[TMP26]] = select i1 [[TMP22]], i64 [[TMP2]], i64 [[VEC_PHI2]]
		; CHECK-VF1IC4-NEXT: [[TMP27]] = select i1 [[TMP23]], i64 [[TMP3]], i64 [[VEC_PHI3]]
		; CHECK-VF1IC4-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
		; CHECK-VF1IC4-NEXT: [[TMP28:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
		; CHECK-VF1IC4-NEXT: br i1 [[TMP28]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP14:![0-9]+]]
		; CHECK-VF1IC4: middle.block:
		; CHECK-VF1IC4-NEXT: [[RDX_MINMAX:%.*]] = call i64 @llvm.smax.i64(i64 [[TMP24]], i64 [[TMP25]])
		; CHECK-VF1IC4-NEXT: [[RDX_MINMAX4:%.*]] = call i64 @llvm.smax.i64(i64 [[RDX_MINMAX]], i64 [[TMP26]])
		; CHECK-VF1IC4-NEXT: [[RDX_MINMAX5:%.*]] = call i64 @llvm.smax.i64(i64 [[RDX_MINMAX4]], i64 [[TMP27]])
		; CHECK-VF1IC4-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[RDX_MINMAX5]], -9223372036854775808
		; CHECK-VF1IC4-NEXT: [[RDX_SELECT:%.*]] = select i1 [[RDX_SELECT_CMP]], i64 [[RDX_MINMAX5]], i64 [[RDX_START]]
		; CHECK-VF1IC4-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
		; CHECK-VF1IC4-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
		; CHECK-VF1IC4: scalar.ph:
		; CHECK-VF1IC4-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
		; CHECK-VF1IC4-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ [[RDX_START]], [[ENTRY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF1IC4-NEXT: br label [[FOR_BODY:%.*]]
		; CHECK-VF1IC4: for.body:
		; CHECK-VF1IC4-NEXT: [[IV:%.]] = phi i64 [ [[INC:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
		; CHECK-VF1IC4-NEXT: [[RDX:%.]] = phi i64 [ [[COND:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]
		; CHECK-VF1IC4-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[IV]]
		; CHECK-VF1IC4-NEXT: [[TMP29:%.*]] = load float, ptr [[ARRAYIDX]], align 4
		; CHECK-VF1IC4-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds float, ptr [[B]], i64 [[IV]]
		; CHECK-VF1IC4-NEXT: [[TMP30:%.*]] = load float, ptr [[ARRAYIDX1]], align 4
		; CHECK-VF1IC4-NEXT: [[CMP2:%.*]] = fcmp ogt float [[TMP29]], [[TMP30]]
		; CHECK-VF1IC4-NEXT: [[COND]] = select i1 [[CMP2]], i64 [[IV]], i64 [[RDX]]
		; CHECK-VF1IC4-NEXT: [[INC]] = add nuw nsw i64 [[IV]], 1
		; CHECK-VF1IC4-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC]], [[N]]
		; CHECK-VF1IC4-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP15:![0-9]+]]
		; CHECK-VF1IC4: exit:
		; CHECK-VF1IC4-NEXT: [[COND_LCSSA:%.*]] = phi i64 [ [[COND]], [[FOR_BODY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF1IC4-NEXT: ret i64 [[COND_LCSSA]]
;		;
entry:		entry:
br label %for.body		br label %for.body

for.body: ; preds = %entry, %for.body		for.body: ; preds = %entry, %for.body
%iv = phi i64 [ %inc, %for.body ], [ 0, %entry ]		%iv = phi i64 [ %inc, %for.body ], [ 0, %entry ]
%rdx = phi i64 [ %cond, %for.body ], [ %rdx.start, %entry ]		%rdx = phi i64 [ %cond, %for.body ], [ %rdx.start, %entry ]
%arrayidx = getelementptr inbounds float, ptr %a, i64 %iv		%arrayidx = getelementptr inbounds float, ptr %a, i64 %iv
%0 = load float, ptr %arrayidx, align 4		%0 = load float, ptr %arrayidx, align 4
%arrayidx1 = getelementptr inbounds float, ptr %b, i64 %iv		%arrayidx1 = getelementptr inbounds float, ptr %b, i64 %iv
%1 = load float, ptr %arrayidx1, align 4		%1 = load float, ptr %arrayidx1, align 4
%cmp2 = fcmp ogt float %0, %1		%cmp2 = fcmp ogt float %0, %1
%cond = select i1 %cmp2, i64 %iv, i64 %rdx		%cond = select i1 %cmp2, i64 %iv, i64 %rdx
%inc = add nuw nsw i64 %iv, 1		%inc = add nuw nsw i64 %iv, 1
%exitcond.not = icmp eq i64 %inc, %n		%exitcond.not = icmp eq i64 %inc, %n
br i1 %exitcond.not, label %exit, label %for.body		br i1 %exitcond.not, label %exit, label %for.body

exit: ; preds = %for.body		exit: ; preds = %for.body
ret i64 %cond		ret i64 %cond
}		}

define i64 @select_icmp_min_valid_iv_start(ptr nocapture readonly %a, ptr nocapture readonly %b, i64 %rdx.start, i64 %n) {		define i64 @select_icmp_min_valid_iv_start(ptr nocapture readonly %a, ptr nocapture readonly %b, i64 %rdx.start, i64 %n) {
; CHECK-LABEL: define i64 @select_icmp_min_valid_iv_start		; CHECK-VF4IC1-LABEL: define i64 @select_icmp_min_valid_iv_start
; CHECK-NOT: vector.body:		; CHECK-VF4IC1-SAME: (ptr nocapture readonly [[A:%.]], ptr nocapture readonly [[B:%.]], i64 [[RDX_START:%.]], i64 [[N:%.]]) {
		; CHECK-VF4IC1-NEXT: entry:
		; CHECK-VF4IC1-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[N]], 4
		; CHECK-VF4IC1-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK-VF4IC1: vector.ph:
		; CHECK-VF4IC1-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 4
		; CHECK-VF4IC1-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
		; CHECK-VF4IC1-NEXT: [[IND_END:%.*]] = add i64 -9223372036854775807, [[N_VEC]]
		; CHECK-VF4IC1-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK-VF4IC1: vector.body:
		; CHECK-VF4IC1-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 -9223372036854775807, i64 -9223372036854775806, i64 -9223372036854775805, i64 -9223372036854775804>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[VEC_PHI:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP6:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
		; CHECK-VF4IC1-NEXT: [[TMP1:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP0]]
		; CHECK-VF4IC1-NEXT: [[TMP2:%.*]] = getelementptr inbounds i64, ptr [[TMP1]], i32 0
		; CHECK-VF4IC1-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i64>, ptr [[TMP2]], align 8
		; CHECK-VF4IC1-NEXT: [[TMP3:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[TMP0]]
		; CHECK-VF4IC1-NEXT: [[TMP4:%.*]] = getelementptr inbounds i64, ptr [[TMP3]], i32 0
		; CHECK-VF4IC1-NEXT: [[WIDE_LOAD2:%.*]] = load <4 x i64>, ptr [[TMP4]], align 8
		; CHECK-VF4IC1-NEXT: [[TMP5:%.*]] = icmp sgt <4 x i64> [[WIDE_LOAD]], [[WIDE_LOAD2]]
		; CHECK-VF4IC1-NEXT: [[TMP6]] = select <4 x i1> [[TMP5]], <4 x i64> [[VEC_IND]], <4 x i64> [[VEC_PHI]]
		; CHECK-VF4IC1-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
		; CHECK-VF4IC1-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC1-NEXT: [[TMP7:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
		; CHECK-VF4IC1-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP16:![0-9]+]]
		; CHECK-VF4IC1: middle.block:
		; CHECK-VF4IC1-NEXT: [[TMP8:%.*]] = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> [[TMP6]])
		; CHECK-VF4IC1-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[TMP8]], -9223372036854775808
		; CHECK-VF4IC1-NEXT: [[RDX_SELECT:%.*]] = select i1 [[RDX_SELECT_CMP]], i64 [[TMP8]], i64 [[RDX_START]]
		; CHECK-VF4IC1-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
		; CHECK-VF4IC1-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
		; CHECK-VF4IC1: scalar.ph:
		; CHECK-VF4IC1-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ -9223372036854775807, [[ENTRY:%.]] ]
		; CHECK-VF4IC1-NEXT: [[BC_RESUME_VAL1:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]
		; CHECK-VF4IC1-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ [[RDX_START]], [[ENTRY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC1-NEXT: br label [[FOR_BODY:%.*]]
		; CHECK-VF4IC1: for.body:
		; CHECK-VF4IC1-NEXT: [[IV_J:%.]] = phi i64 [ [[INC3:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
		; CHECK-VF4IC1-NEXT: [[IV_I:%.]] = phi i64 [ [[INC:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL1]], [[SCALAR_PH]] ]
		; CHECK-VF4IC1-NEXT: [[RDX:%.]] = phi i64 [ [[COND:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]
		; CHECK-VF4IC1-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV_I]]
		; CHECK-VF4IC1-NEXT: [[TMP9:%.*]] = load i64, ptr [[ARRAYIDX]], align 8
		; CHECK-VF4IC1-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[IV_I]]
		; CHECK-VF4IC1-NEXT: [[TMP10:%.*]] = load i64, ptr [[ARRAYIDX1]], align 8
		; CHECK-VF4IC1-NEXT: [[CMP2:%.*]] = icmp sgt i64 [[TMP9]], [[TMP10]]
		; CHECK-VF4IC1-NEXT: [[COND]] = select i1 [[CMP2]], i64 [[IV_J]], i64 [[RDX]]
		; CHECK-VF4IC1-NEXT: [[INC]] = add nuw nsw i64 [[IV_I]], 1
		; CHECK-VF4IC1-NEXT: [[INC3]] = add nsw i64 [[IV_J]], 1
		; CHECK-VF4IC1-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC]], [[N]]
		; CHECK-VF4IC1-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP17:![0-9]+]]
		; CHECK-VF4IC1: exit:
		; CHECK-VF4IC1-NEXT: [[COND_LCSSA:%.*]] = phi i64 [ [[COND]], [[FOR_BODY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC1-NEXT: ret i64 [[COND_LCSSA]]
		;
		; CHECK-VF4IC4-LABEL: define i64 @select_icmp_min_valid_iv_start
		; CHECK-VF4IC4-SAME: (ptr nocapture readonly [[A:%.]], ptr nocapture readonly [[B:%.]], i64 [[RDX_START:%.]], i64 [[N:%.]]) {
		; CHECK-VF4IC4-NEXT: entry:
		; CHECK-VF4IC4-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[N]], 16
		; CHECK-VF4IC4-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK-VF4IC4: vector.ph:
		; CHECK-VF4IC4-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 16
		; CHECK-VF4IC4-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
		; CHECK-VF4IC4-NEXT: [[IND_END:%.*]] = add i64 -9223372036854775807, [[N_VEC]]
		; CHECK-VF4IC4-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK-VF4IC4: vector.body:
		; CHECK-VF4IC4-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 -9223372036854775807, i64 -9223372036854775806, i64 -9223372036854775805, i64 -9223372036854775804>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP24:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI5:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP25:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI6:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP26:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI7:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP27:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[STEP_ADD:%.*]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC4-NEXT: [[STEP_ADD2:%.*]] = add <4 x i64> [[STEP_ADD]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC4-NEXT: [[STEP_ADD3:%.*]] = add <4 x i64> [[STEP_ADD2]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC4-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
		; CHECK-VF4IC4-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 4
		; CHECK-VF4IC4-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 8
		; CHECK-VF4IC4-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 12
		; CHECK-VF4IC4-NEXT: [[TMP4:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP0]]
		; CHECK-VF4IC4-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP1]]
		; CHECK-VF4IC4-NEXT: [[TMP6:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP2]]
		; CHECK-VF4IC4-NEXT: [[TMP7:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP3]]
		; CHECK-VF4IC4-NEXT: [[TMP8:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 0
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i64>, ptr [[TMP8]], align 8
		; CHECK-VF4IC4-NEXT: [[TMP9:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 4
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD8:%.*]] = load <4 x i64>, ptr [[TMP9]], align 8
		; CHECK-VF4IC4-NEXT: [[TMP10:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 8
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD9:%.*]] = load <4 x i64>, ptr [[TMP10]], align 8
		; CHECK-VF4IC4-NEXT: [[TMP11:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 12
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD10:%.*]] = load <4 x i64>, ptr [[TMP11]], align 8
		; CHECK-VF4IC4-NEXT: [[TMP12:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[TMP0]]
		; CHECK-VF4IC4-NEXT: [[TMP13:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[TMP1]]
		; CHECK-VF4IC4-NEXT: [[TMP14:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[TMP2]]
		; CHECK-VF4IC4-NEXT: [[TMP15:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[TMP3]]
		; CHECK-VF4IC4-NEXT: [[TMP16:%.*]] = getelementptr inbounds i64, ptr [[TMP12]], i32 0
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD11:%.*]] = load <4 x i64>, ptr [[TMP16]], align 8
		; CHECK-VF4IC4-NEXT: [[TMP17:%.*]] = getelementptr inbounds i64, ptr [[TMP12]], i32 4
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD12:%.*]] = load <4 x i64>, ptr [[TMP17]], align 8
		; CHECK-VF4IC4-NEXT: [[TMP18:%.*]] = getelementptr inbounds i64, ptr [[TMP12]], i32 8
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD13:%.*]] = load <4 x i64>, ptr [[TMP18]], align 8
		; CHECK-VF4IC4-NEXT: [[TMP19:%.*]] = getelementptr inbounds i64, ptr [[TMP12]], i32 12
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD14:%.*]] = load <4 x i64>, ptr [[TMP19]], align 8
		; CHECK-VF4IC4-NEXT: [[TMP20:%.*]] = icmp sgt <4 x i64> [[WIDE_LOAD]], [[WIDE_LOAD11]]
		; CHECK-VF4IC4-NEXT: [[TMP21:%.*]] = icmp sgt <4 x i64> [[WIDE_LOAD8]], [[WIDE_LOAD12]]
		; CHECK-VF4IC4-NEXT: [[TMP22:%.*]] = icmp sgt <4 x i64> [[WIDE_LOAD9]], [[WIDE_LOAD13]]
		; CHECK-VF4IC4-NEXT: [[TMP23:%.*]] = icmp sgt <4 x i64> [[WIDE_LOAD10]], [[WIDE_LOAD14]]
		; CHECK-VF4IC4-NEXT: [[TMP24]] = select <4 x i1> [[TMP20]], <4 x i64> [[VEC_IND]], <4 x i64> [[VEC_PHI]]
		; CHECK-VF4IC4-NEXT: [[TMP25]] = select <4 x i1> [[TMP21]], <4 x i64> [[STEP_ADD]], <4 x i64> [[VEC_PHI5]]
		; CHECK-VF4IC4-NEXT: [[TMP26]] = select <4 x i1> [[TMP22]], <4 x i64> [[STEP_ADD2]], <4 x i64> [[VEC_PHI6]]
		; CHECK-VF4IC4-NEXT: [[TMP27]] = select <4 x i1> [[TMP23]], <4 x i64> [[STEP_ADD3]], <4 x i64> [[VEC_PHI7]]
		; CHECK-VF4IC4-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16
		; CHECK-VF4IC4-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[STEP_ADD3]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC4-NEXT: [[TMP28:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
		; CHECK-VF4IC4-NEXT: br i1 [[TMP28]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP16:![0-9]+]]
		; CHECK-VF4IC4: middle.block:
		; CHECK-VF4IC4-NEXT: [[RDX_MINMAX:%.*]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[TMP24]], <4 x i64> [[TMP25]])
		; CHECK-VF4IC4-NEXT: [[RDX_MINMAX15:%.*]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[RDX_MINMAX]], <4 x i64> [[TMP26]])
		; CHECK-VF4IC4-NEXT: [[RDX_MINMAX16:%.*]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[RDX_MINMAX15]], <4 x i64> [[TMP27]])
		; CHECK-VF4IC4-NEXT: [[TMP29:%.*]] = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> [[RDX_MINMAX16]])
		; CHECK-VF4IC4-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[TMP29]], -9223372036854775808
		; CHECK-VF4IC4-NEXT: [[RDX_SELECT:%.*]] = select i1 [[RDX_SELECT_CMP]], i64 [[TMP29]], i64 [[RDX_START]]
		; CHECK-VF4IC4-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
		; CHECK-VF4IC4-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
		; CHECK-VF4IC4: scalar.ph:
		; CHECK-VF4IC4-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ -9223372036854775807, [[ENTRY:%.]] ]
		; CHECK-VF4IC4-NEXT: [[BC_RESUME_VAL1:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]
		; CHECK-VF4IC4-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ [[RDX_START]], [[ENTRY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC4-NEXT: br label [[FOR_BODY:%.*]]
		; CHECK-VF4IC4: for.body:
		; CHECK-VF4IC4-NEXT: [[IV_J:%.]] = phi i64 [ [[INC3:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
		; CHECK-VF4IC4-NEXT: [[IV_I:%.]] = phi i64 [ [[INC:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL1]], [[SCALAR_PH]] ]
		; CHECK-VF4IC4-NEXT: [[RDX:%.]] = phi i64 [ [[COND:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]
		; CHECK-VF4IC4-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV_I]]
		; CHECK-VF4IC4-NEXT: [[TMP30:%.*]] = load i64, ptr [[ARRAYIDX]], align 8
		; CHECK-VF4IC4-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[IV_I]]
		; CHECK-VF4IC4-NEXT: [[TMP31:%.*]] = load i64, ptr [[ARRAYIDX1]], align 8
		; CHECK-VF4IC4-NEXT: [[CMP2:%.*]] = icmp sgt i64 [[TMP30]], [[TMP31]]
		; CHECK-VF4IC4-NEXT: [[COND]] = select i1 [[CMP2]], i64 [[IV_J]], i64 [[RDX]]
		; CHECK-VF4IC4-NEXT: [[INC]] = add nuw nsw i64 [[IV_I]], 1
		; CHECK-VF4IC4-NEXT: [[INC3]] = add nsw i64 [[IV_J]], 1
		; CHECK-VF4IC4-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC]], [[N]]
		; CHECK-VF4IC4-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP17:![0-9]+]]
		; CHECK-VF4IC4: exit:
		; CHECK-VF4IC4-NEXT: [[COND_LCSSA:%.*]] = phi i64 [ [[COND]], [[FOR_BODY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC4-NEXT: ret i64 [[COND_LCSSA]]
		;
		; CHECK-VF1IC4-LABEL: define i64 @select_icmp_min_valid_iv_start
		; CHECK-VF1IC4-SAME: (ptr nocapture readonly [[A:%.]], ptr nocapture readonly [[B:%.]], i64 [[RDX_START:%.]], i64 [[N:%.]]) {
		; CHECK-VF1IC4-NEXT: entry:
		; CHECK-VF1IC4-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[N]], 4
		; CHECK-VF1IC4-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK-VF1IC4: vector.ph:
		; CHECK-VF1IC4-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 4
		; CHECK-VF1IC4-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
		; CHECK-VF1IC4-NEXT: [[IND_END:%.*]] = add i64 -9223372036854775807, [[N_VEC]]
		; CHECK-VF1IC4-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK-VF1IC4: vector.body:
		; CHECK-VF1IC4-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP28:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI2:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP29:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI3:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP30:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI4:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP31:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
		; CHECK-VF1IC4-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
		; CHECK-VF1IC4-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 2
		; CHECK-VF1IC4-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 3
		; CHECK-VF1IC4-NEXT: [[OFFSET_IDX:%.*]] = add i64 -9223372036854775807, [[INDEX]]
		; CHECK-VF1IC4-NEXT: [[TMP4:%.*]] = add i64 [[OFFSET_IDX]], 0
		; CHECK-VF1IC4-NEXT: [[TMP5:%.*]] = add i64 [[OFFSET_IDX]], 1
		; CHECK-VF1IC4-NEXT: [[TMP6:%.*]] = add i64 [[OFFSET_IDX]], 2
		; CHECK-VF1IC4-NEXT: [[TMP7:%.*]] = add i64 [[OFFSET_IDX]], 3
		; CHECK-VF1IC4-NEXT: [[TMP8:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP0]]
		; CHECK-VF1IC4-NEXT: [[TMP9:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP1]]
		; CHECK-VF1IC4-NEXT: [[TMP10:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP2]]
		; CHECK-VF1IC4-NEXT: [[TMP11:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP3]]
		; CHECK-VF1IC4-NEXT: [[TMP12:%.*]] = load i64, ptr [[TMP8]], align 8
		; CHECK-VF1IC4-NEXT: [[TMP13:%.*]] = load i64, ptr [[TMP9]], align 8
		; CHECK-VF1IC4-NEXT: [[TMP14:%.*]] = load i64, ptr [[TMP10]], align 8
		; CHECK-VF1IC4-NEXT: [[TMP15:%.*]] = load i64, ptr [[TMP11]], align 8
		; CHECK-VF1IC4-NEXT: [[TMP16:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[TMP0]]
		; CHECK-VF1IC4-NEXT: [[TMP17:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[TMP1]]
		; CHECK-VF1IC4-NEXT: [[TMP18:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[TMP2]]
		; CHECK-VF1IC4-NEXT: [[TMP19:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[TMP3]]
		; CHECK-VF1IC4-NEXT: [[TMP20:%.*]] = load i64, ptr [[TMP16]], align 8
		; CHECK-VF1IC4-NEXT: [[TMP21:%.*]] = load i64, ptr [[TMP17]], align 8
		; CHECK-VF1IC4-NEXT: [[TMP22:%.*]] = load i64, ptr [[TMP18]], align 8
		; CHECK-VF1IC4-NEXT: [[TMP23:%.*]] = load i64, ptr [[TMP19]], align 8
		; CHECK-VF1IC4-NEXT: [[TMP24:%.*]] = icmp sgt i64 [[TMP12]], [[TMP20]]
		; CHECK-VF1IC4-NEXT: [[TMP25:%.*]] = icmp sgt i64 [[TMP13]], [[TMP21]]
		; CHECK-VF1IC4-NEXT: [[TMP26:%.*]] = icmp sgt i64 [[TMP14]], [[TMP22]]
		; CHECK-VF1IC4-NEXT: [[TMP27:%.*]] = icmp sgt i64 [[TMP15]], [[TMP23]]
		; CHECK-VF1IC4-NEXT: [[TMP28]] = select i1 [[TMP24]], i64 [[TMP4]], i64 [[VEC_PHI]]
		; CHECK-VF1IC4-NEXT: [[TMP29]] = select i1 [[TMP25]], i64 [[TMP5]], i64 [[VEC_PHI2]]
		; CHECK-VF1IC4-NEXT: [[TMP30]] = select i1 [[TMP26]], i64 [[TMP6]], i64 [[VEC_PHI3]]
		; CHECK-VF1IC4-NEXT: [[TMP31]] = select i1 [[TMP27]], i64 [[TMP7]], i64 [[VEC_PHI4]]
		; CHECK-VF1IC4-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
		; CHECK-VF1IC4-NEXT: [[TMP32:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
		; CHECK-VF1IC4-NEXT: br i1 [[TMP32]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP16:![0-9]+]]
		; CHECK-VF1IC4: middle.block:
		; CHECK-VF1IC4-NEXT: [[RDX_MINMAX:%.*]] = call i64 @llvm.smax.i64(i64 [[TMP28]], i64 [[TMP29]])
		; CHECK-VF1IC4-NEXT: [[RDX_MINMAX5:%.*]] = call i64 @llvm.smax.i64(i64 [[RDX_MINMAX]], i64 [[TMP30]])
		; CHECK-VF1IC4-NEXT: [[RDX_MINMAX6:%.*]] = call i64 @llvm.smax.i64(i64 [[RDX_MINMAX5]], i64 [[TMP31]])
		; CHECK-VF1IC4-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[RDX_MINMAX6]], -9223372036854775808
		; CHECK-VF1IC4-NEXT: [[RDX_SELECT:%.*]] = select i1 [[RDX_SELECT_CMP]], i64 [[RDX_MINMAX6]], i64 [[RDX_START]]
		; CHECK-VF1IC4-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
		; CHECK-VF1IC4-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
		; CHECK-VF1IC4: scalar.ph:
		; CHECK-VF1IC4-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ -9223372036854775807, [[ENTRY:%.]] ]
		; CHECK-VF1IC4-NEXT: [[BC_RESUME_VAL1:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]
		; CHECK-VF1IC4-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ [[RDX_START]], [[ENTRY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF1IC4-NEXT: br label [[FOR_BODY:%.*]]
		; CHECK-VF1IC4: for.body:
		; CHECK-VF1IC4-NEXT: [[IV_J:%.]] = phi i64 [ [[INC3:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
		; CHECK-VF1IC4-NEXT: [[IV_I:%.]] = phi i64 [ [[INC:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL1]], [[SCALAR_PH]] ]
		; CHECK-VF1IC4-NEXT: [[RDX:%.]] = phi i64 [ [[COND:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]
		; CHECK-VF1IC4-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV_I]]
		; CHECK-VF1IC4-NEXT: [[TMP33:%.*]] = load i64, ptr [[ARRAYIDX]], align 8
		; CHECK-VF1IC4-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[IV_I]]
		; CHECK-VF1IC4-NEXT: [[TMP34:%.*]] = load i64, ptr [[ARRAYIDX1]], align 8
		; CHECK-VF1IC4-NEXT: [[CMP2:%.*]] = icmp sgt i64 [[TMP33]], [[TMP34]]
		; CHECK-VF1IC4-NEXT: [[COND]] = select i1 [[CMP2]], i64 [[IV_J]], i64 [[RDX]]
		; CHECK-VF1IC4-NEXT: [[INC]] = add nuw nsw i64 [[IV_I]], 1
		; CHECK-VF1IC4-NEXT: [[INC3]] = add nsw i64 [[IV_J]], 1
		; CHECK-VF1IC4-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC]], [[N]]
		; CHECK-VF1IC4-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP17:![0-9]+]]
		; CHECK-VF1IC4: exit:
		; CHECK-VF1IC4-NEXT: [[COND_LCSSA:%.*]] = phi i64 [ [[COND]], [[FOR_BODY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF1IC4-NEXT: ret i64 [[COND_LCSSA]]
;		;
entry:		entry:
br label %for.body		br label %for.body

for.body: ; preds = %entry, %for.body		for.body: ; preds = %entry, %for.body
%iv.j = phi i64 [ %inc3, %for.body ], [ -9223372036854775807, %entry]		%iv.j = phi i64 [ %inc3, %for.body ], [ -9223372036854775807, %entry]
%iv.i = phi i64 [ %inc, %for.body ], [ 0, %entry ]		%iv.i = phi i64 [ %inc, %for.body ], [ 0, %entry ]
%rdx = phi i64 [ %cond, %for.body ], [ %rdx.start, %entry ]		%rdx = phi i64 [ %cond, %for.body ], [ %rdx.start, %entry ]
▲ Show 20 Lines • Show All 130 Lines • ▼ Show 20 Lines	for.body: ; preds = %entry, %for.body
%cond = select i1 %cmp2, i64 %iv, i64 %rdx		%cond = select i1 %cmp2, i64 %iv, i64 %rdx
%inc = add nuw nsw i64 %iv, 1		%inc = add nuw nsw i64 %iv, 1
%exitcond.not = icmp eq i64 %inc, %n		%exitcond.not = icmp eq i64 %inc, %n
br i1 %exitcond.not, label %exit, label %for.body		br i1 %exitcond.not, label %exit, label %for.body

exit: ; preds = %for.body		exit: ; preds = %for.body
ret i64 %cond		ret i64 %cond
}		}

		;
		; The test case is modified from @select_i32_from_icmp_same_inputs at
		; Transforms/LoopVectorize/select-cmp.ll
		;
		define i64 @not_vectorized_select_icmp_const_cmp_in_recurrence(i64 %a, i64 %b, i64 %n) {
		; CHECK-LABEL: define i64 @not_vectorized_select_icmp_const_cmp_in_recurrence
		; CHECK-NOT: vector.body:
		;
		entry:
		br label %for.body

		for.body: ; preds = %entry, %for.body
		%0 = phi i64 [ 0, %entry ], [ %4, %for.body ]
		%1 = phi i64 [ %a, %entry ], [ %3, %for.body ]
		%2 = icmp eq i64 %1, 3
		%3 = select i1 %2, i64 %1, i64 %0
		%4 = add nuw nsw i64 %0, 1
		%5 = icmp eq i64 %4, %n
		br i1 %5, label %exit, label %for.body

		exit: ; preds = %for.body
		ret i64 %3
		}

		define i64 @not_vectorized_select_icmp_cmp_in_recurrence(i64 %a, i64 %b, i64 %n, ptr %c) {
		; CHECK-LABEL: define i64 @not_vectorized_select_icmp_cmp_in_recurrence
		; CHECK-NOT: vector.body:
		;
		entry:
		br label %for.body

		for.body: ; preds = %entry, %for.body
		%0 = phi i64 [ 0, %entry ], [ %6, %for.body ]
		%1 = phi i64 [ %a, %entry ], [ %5, %for.body ]
		%2 = getelementptr inbounds i64, ptr %c, i64 %0
		%3 = load i64, ptr %2, align 8
		%4 = icmp eq i64 %1, %3
		%5 = select i1 %4, i64 %1, i64 %0
		%6 = add nuw nsw i64 %0, 1
		%7 = icmp eq i64 %6, %n
		br i1 %7, label %exit, label %for.body

		exit: ; preds = %for.body
		ret i64 %5
		}

llvm/test/Transforms/LoopVectorize/select-min-index.ll

; RUN: opt -passes=loop-vectorize -force-vector-width=4 -force-vector-interleave=1 -S %s \| FileCheck %s		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function test_not_vectorize_select_no_min_reduction --version 2
; RUN: opt -passes=loop-vectorize -force-vector-width=4 -force-vector-interleave=2 -S %s \| FileCheck %s		; RUN: opt -passes=loop-vectorize -force-vector-width=4 -force-vector-interleave=1 -S %s \| FileCheck %s --check-prefix=CHECK-VF4IC1 --check-prefix=CHECK
; RUN: opt -passes=loop-vectorize -force-vector-width=1 -force-vector-interleave=2 -S %s \| FileCheck %s		; RUN: opt -passes=loop-vectorize -force-vector-width=4 -force-vector-interleave=2 -S %s \| FileCheck %s --check-prefix=CHECK-VF4IC2 --check-prefix=CHECK
		; RUN: opt -passes=loop-vectorize -force-vector-width=1 -force-vector-interleave=2 -S %s \| FileCheck %s --check-prefix=CHECK-VF1IC2 --check-prefix=CHECK

; Test cases for selecting the index with the minimum value.		; Test cases for selecting the index with the minimum value.

define i64 @test_vectorize_select_umin_idx(ptr %src, i64 %n) {		define i64 @test_vectorize_select_umin_idx(ptr %src, i64 %n) {
; CHECK-LABEL: @test_vectorize_select_umin_idx(		; CHECK-LABEL: @test_vectorize_select_umin_idx(
; CHECK-NOT: vector.body:		; CHECK-NOT: vector.body:
;		;
entry:		entry:
▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	loop:
br i1 %exitcond.not, label %exit, label %loop		br i1 %exitcond.not, label %exit, label %loop

exit:		exit:
%res = phi i64 [ %min.idx.next, %loop ]		%res = phi i64 [ %min.idx.next, %loop ]
ret i64 %res		ret i64 %res
}		}

define i64 @test_not_vectorize_select_no_min_reduction(ptr %src, i64 %n) {		define i64 @test_not_vectorize_select_no_min_reduction(ptr %src, i64 %n) {
; CHECK-LABEL: @test_not_vectorize_select_no_min_reduction(		; CHECK-VF4IC1-LABEL: define i64 @test_not_vectorize_select_no_min_reduction
; CHECK-NOT: vector.body:		; CHECK-VF4IC1-SAME: (ptr [[SRC:%.]], i64 [[N:%.]]) {
		; CHECK-VF4IC1-NEXT: entry:
		; CHECK-VF4IC1-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[N]], 4
		AyalUnsubmitted Done Reply Inline Actions This test now gets vectorized, being a `FindLast` loop that reports the last index where a[i] < a[i-1]+1, or zero if none are found. (I.e., proving that a sequence is not strictly increasing, rather than computing `MinLast`.) But the vector loop is never reached? Ayal: This test now gets vectorized, being a `FindLast` loop that reports the last index where a[i] <…
		Mel-ChenAuthorUnsubmitted Done Reply Inline Actions Impressive catch! We have been focusing only on the vector.body and ignoring the others. I will prioritize clarifying this bug and fixing it as soon as reasonable. Mel-Chen: Impressive catch! We have been focusing only on the vector.body and ignoring the others. I…
		Mel-ChenAuthorUnsubmitted Done Reply Inline Actions Clarified, it has been confirmed that this is not a bug. The reason is that the loop trip count in the test case is 0, causing the simplification of `min.iters.check` to be `true`. The test case has been fixed in D154415. Mel-Chen: Clarified, it has been confirmed that this is not a bug. The reason is that the loop trip…
		; CHECK-VF4IC1-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK-VF4IC1: vector.ph:
		; CHECK-VF4IC1-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 4
		; CHECK-VF4IC1-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
		; CHECK-VF4IC1-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK-VF4IC1: vector.body:
		; CHECK-VF4IC1-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[VEC_PHI:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP6:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[VECTOR_RECUR:%.]] = phi <4 x i64> [ <i64 poison, i64 poison, i64 poison, i64 0>, [[VECTOR_PH]] ], [ [[TMP3:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
		; CHECK-VF4IC1-NEXT: [[TMP1:%.*]] = getelementptr i64, ptr [[SRC]], i64 [[TMP0]]
		; CHECK-VF4IC1-NEXT: [[TMP2:%.*]] = getelementptr i64, ptr [[TMP1]], i32 0
		; CHECK-VF4IC1-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i64>, ptr [[TMP2]], align 4
		; CHECK-VF4IC1-NEXT: [[TMP3]] = add <4 x i64> [[WIDE_LOAD]], <i64 1, i64 1, i64 1, i64 1>
		; CHECK-VF4IC1-NEXT: [[TMP4:%.*]] = shufflevector <4 x i64> [[VECTOR_RECUR]], <4 x i64> [[TMP3]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
		; CHECK-VF4IC1-NEXT: [[TMP5:%.*]] = icmp ugt <4 x i64> [[TMP4]], [[WIDE_LOAD]]
		; CHECK-VF4IC1-NEXT: [[TMP6]] = select <4 x i1> [[TMP5]], <4 x i64> [[VEC_IND]], <4 x i64> [[VEC_PHI]]
		; CHECK-VF4IC1-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
		; CHECK-VF4IC1-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC1-NEXT: [[TMP7:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
		; CHECK-VF4IC1-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
		; CHECK-VF4IC1: middle.block:
		; CHECK-VF4IC1-NEXT: [[TMP8:%.*]] = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> [[TMP6]])
		; CHECK-VF4IC1-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[TMP8]], -9223372036854775808
		; CHECK-VF4IC1-NEXT: [[RDX_SELECT:%.*]] = select i1 [[RDX_SELECT_CMP]], i64 [[TMP8]], i64 0
		; CHECK-VF4IC1-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
		; CHECK-VF4IC1-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
		; CHECK-VF4IC1-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
		; CHECK-VF4IC1: scalar.ph:
		; CHECK-VF4IC1-NEXT: [[SCALAR_RECUR_INIT:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[VECTOR_RECUR_EXTRACT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC1-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]
		; CHECK-VF4IC1-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ 0, [[ENTRY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC1-NEXT: br label [[LOOP:%.*]]
		; CHECK-VF4IC1: loop:
		; CHECK-VF4IC1-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP]] ]
		; CHECK-VF4IC1-NEXT: [[MIN_IDX:%.]] = phi i64 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[MIN_IDX_NEXT:%.]], [[LOOP]] ]
		; CHECK-VF4IC1-NEXT: [[SCALAR_RECUR:%.]] = phi i64 [ [[SCALAR_RECUR_INIT]], [[SCALAR_PH]] ], [ [[MIN_VAL_NEXT:%.]], [[LOOP]] ]
		; CHECK-VF4IC1-NEXT: [[GEP:%.*]] = getelementptr i64, ptr [[SRC]], i64 [[IV]]
		; CHECK-VF4IC1-NEXT: [[L:%.*]] = load i64, ptr [[GEP]], align 4
		; CHECK-VF4IC1-NEXT: [[CMP:%.*]] = icmp ugt i64 [[SCALAR_RECUR]], [[L]]
		; CHECK-VF4IC1-NEXT: [[MIN_VAL_NEXT]] = add i64 [[L]], 1
		; CHECK-VF4IC1-NEXT: [[FOO:%.*]] = call i64 @llvm.umin.i64(i64 [[SCALAR_RECUR]], i64 [[L]])
		; CHECK-VF4IC1-NEXT: [[MIN_IDX_NEXT]] = select i1 [[CMP]], i64 [[IV]], i64 [[MIN_IDX]]
		; CHECK-VF4IC1-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
		; CHECK-VF4IC1-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], [[N]]
		; CHECK-VF4IC1-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[LOOP]], !llvm.loop [[LOOP3:![0-9]+]]
		; CHECK-VF4IC1: exit:
		; CHECK-VF4IC1-NEXT: [[RES:%.*]] = phi i64 [ [[MIN_IDX_NEXT]], [[LOOP]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC1-NEXT: ret i64 [[RES]]
		;
		; CHECK-VF4IC2-LABEL: define i64 @test_not_vectorize_select_no_min_reduction
		; CHECK-VF4IC2-SAME: (ptr [[SRC:%.]], i64 [[N:%.]]) {
		; CHECK-VF4IC2-NEXT: entry:
		; CHECK-VF4IC2-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[N]], 8
		; CHECK-VF4IC2-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK-VF4IC2: vector.ph:
		; CHECK-VF4IC2-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 8
		; CHECK-VF4IC2-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
		; CHECK-VF4IC2-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK-VF4IC2: vector.body:
		; CHECK-VF4IC2-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC2-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC2-NEXT: [[VEC_PHI:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP12:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC2-NEXT: [[VEC_PHI2:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP13:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC2-NEXT: [[VECTOR_RECUR:%.]] = phi <4 x i64> [ <i64 poison, i64 poison, i64 poison, i64 0>, [[VECTOR_PH]] ], [ [[TMP7:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC2-NEXT: [[STEP_ADD:%.*]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC2-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
		; CHECK-VF4IC2-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 4
		; CHECK-VF4IC2-NEXT: [[TMP2:%.*]] = getelementptr i64, ptr [[SRC]], i64 [[TMP0]]
		; CHECK-VF4IC2-NEXT: [[TMP3:%.*]] = getelementptr i64, ptr [[SRC]], i64 [[TMP1]]
		; CHECK-VF4IC2-NEXT: [[TMP4:%.*]] = getelementptr i64, ptr [[TMP2]], i32 0
		; CHECK-VF4IC2-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i64>, ptr [[TMP4]], align 4
		; CHECK-VF4IC2-NEXT: [[TMP5:%.*]] = getelementptr i64, ptr [[TMP2]], i32 4
		; CHECK-VF4IC2-NEXT: [[WIDE_LOAD3:%.*]] = load <4 x i64>, ptr [[TMP5]], align 4
		; CHECK-VF4IC2-NEXT: [[TMP6:%.*]] = add <4 x i64> [[WIDE_LOAD]], <i64 1, i64 1, i64 1, i64 1>
		; CHECK-VF4IC2-NEXT: [[TMP7]] = add <4 x i64> [[WIDE_LOAD3]], <i64 1, i64 1, i64 1, i64 1>
		; CHECK-VF4IC2-NEXT: [[TMP8:%.*]] = shufflevector <4 x i64> [[VECTOR_RECUR]], <4 x i64> [[TMP6]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
		; CHECK-VF4IC2-NEXT: [[TMP9:%.*]] = shufflevector <4 x i64> [[TMP6]], <4 x i64> [[TMP7]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
		; CHECK-VF4IC2-NEXT: [[TMP10:%.*]] = icmp ugt <4 x i64> [[TMP8]], [[WIDE_LOAD]]
		; CHECK-VF4IC2-NEXT: [[TMP11:%.*]] = icmp ugt <4 x i64> [[TMP9]], [[WIDE_LOAD3]]
		; CHECK-VF4IC2-NEXT: [[TMP12]] = select <4 x i1> [[TMP10]], <4 x i64> [[VEC_IND]], <4 x i64> [[VEC_PHI]]
		; CHECK-VF4IC2-NEXT: [[TMP13]] = select <4 x i1> [[TMP11]], <4 x i64> [[STEP_ADD]], <4 x i64> [[VEC_PHI2]]
		; CHECK-VF4IC2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8
		; CHECK-VF4IC2-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[STEP_ADD]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC2-NEXT: [[TMP14:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
		; CHECK-VF4IC2-NEXT: br i1 [[TMP14]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
		; CHECK-VF4IC2: middle.block:
		; CHECK-VF4IC2-NEXT: [[RDX_MINMAX:%.*]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[TMP12]], <4 x i64> [[TMP13]])
		; CHECK-VF4IC2-NEXT: [[TMP15:%.*]] = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> [[RDX_MINMAX]])
		; CHECK-VF4IC2-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[TMP15]], -9223372036854775808
		; CHECK-VF4IC2-NEXT: [[RDX_SELECT:%.*]] = select i1 [[RDX_SELECT_CMP]], i64 [[TMP15]], i64 0
		; CHECK-VF4IC2-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
		; CHECK-VF4IC2-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i64> [[TMP7]], i32 3
		; CHECK-VF4IC2-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
		; CHECK-VF4IC2: scalar.ph:
		; CHECK-VF4IC2-NEXT: [[SCALAR_RECUR_INIT:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[VECTOR_RECUR_EXTRACT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC2-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]
		; CHECK-VF4IC2-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ 0, [[ENTRY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC2-NEXT: br label [[LOOP:%.*]]
		; CHECK-VF4IC2: loop:
		; CHECK-VF4IC2-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP]] ]
		; CHECK-VF4IC2-NEXT: [[MIN_IDX:%.]] = phi i64 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[MIN_IDX_NEXT:%.]], [[LOOP]] ]
		; CHECK-VF4IC2-NEXT: [[SCALAR_RECUR:%.]] = phi i64 [ [[SCALAR_RECUR_INIT]], [[SCALAR_PH]] ], [ [[MIN_VAL_NEXT:%.]], [[LOOP]] ]
		; CHECK-VF4IC2-NEXT: [[GEP:%.*]] = getelementptr i64, ptr [[SRC]], i64 [[IV]]
		; CHECK-VF4IC2-NEXT: [[L:%.*]] = load i64, ptr [[GEP]], align 4
		; CHECK-VF4IC2-NEXT: [[CMP:%.*]] = icmp ugt i64 [[SCALAR_RECUR]], [[L]]
		; CHECK-VF4IC2-NEXT: [[MIN_VAL_NEXT]] = add i64 [[L]], 1
		; CHECK-VF4IC2-NEXT: [[FOO:%.*]] = call i64 @llvm.umin.i64(i64 [[SCALAR_RECUR]], i64 [[L]])
		; CHECK-VF4IC2-NEXT: [[MIN_IDX_NEXT]] = select i1 [[CMP]], i64 [[IV]], i64 [[MIN_IDX]]
		; CHECK-VF4IC2-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
		; CHECK-VF4IC2-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], [[N]]
		; CHECK-VF4IC2-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[LOOP]], !llvm.loop [[LOOP3:![0-9]+]]
		; CHECK-VF4IC2: exit:
		; CHECK-VF4IC2-NEXT: [[RES:%.*]] = phi i64 [ [[MIN_IDX_NEXT]], [[LOOP]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC2-NEXT: ret i64 [[RES]]
		;
		; CHECK-VF1IC2-LABEL: define i64 @test_not_vectorize_select_no_min_reduction
		; CHECK-VF1IC2-SAME: (ptr [[SRC:%.]], i64 [[N:%.]]) {
		; CHECK-VF1IC2-NEXT: entry:
		; CHECK-VF1IC2-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[N]], 2
		; CHECK-VF1IC2-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK-VF1IC2: vector.ph:
		; CHECK-VF1IC2-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 2
		; CHECK-VF1IC2-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
		; CHECK-VF1IC2-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK-VF1IC2: vector.body:
		; CHECK-VF1IC2-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC2-NEXT: [[VEC_PHI:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP10:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC2-NEXT: [[VEC_PHI1:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP11:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC2-NEXT: [[VECTOR_RECUR:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[TMP7:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC2-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
		; CHECK-VF1IC2-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
		; CHECK-VF1IC2-NEXT: [[TMP2:%.*]] = getelementptr i64, ptr [[SRC]], i64 [[TMP0]]
		; CHECK-VF1IC2-NEXT: [[TMP3:%.*]] = getelementptr i64, ptr [[SRC]], i64 [[TMP1]]
		; CHECK-VF1IC2-NEXT: [[TMP4:%.*]] = load i64, ptr [[TMP2]], align 4
		; CHECK-VF1IC2-NEXT: [[TMP5:%.*]] = load i64, ptr [[TMP3]], align 4
		; CHECK-VF1IC2-NEXT: [[TMP6:%.*]] = add i64 [[TMP4]], 1
		; CHECK-VF1IC2-NEXT: [[TMP7]] = add i64 [[TMP5]], 1
		; CHECK-VF1IC2-NEXT: [[TMP8:%.*]] = icmp ugt i64 [[VECTOR_RECUR]], [[TMP4]]
		; CHECK-VF1IC2-NEXT: [[TMP9:%.*]] = icmp ugt i64 [[TMP6]], [[TMP5]]
		; CHECK-VF1IC2-NEXT: [[TMP10]] = select i1 [[TMP8]], i64 [[TMP0]], i64 [[VEC_PHI]]
		; CHECK-VF1IC2-NEXT: [[TMP11]] = select i1 [[TMP9]], i64 [[TMP1]], i64 [[VEC_PHI1]]
		; CHECK-VF1IC2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
		; CHECK-VF1IC2-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
		; CHECK-VF1IC2-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
		; CHECK-VF1IC2: middle.block:
		; CHECK-VF1IC2-NEXT: [[RDX_MINMAX:%.*]] = call i64 @llvm.smax.i64(i64 [[TMP10]], i64 [[TMP11]])
		; CHECK-VF1IC2-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[RDX_MINMAX]], -9223372036854775808
		; CHECK-VF1IC2-NEXT: [[RDX_SELECT:%.*]] = select i1 [[RDX_SELECT_CMP]], i64 [[RDX_MINMAX]], i64 0
		; CHECK-VF1IC2-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
		; CHECK-VF1IC2-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
		; CHECK-VF1IC2: scalar.ph:
		; CHECK-VF1IC2-NEXT: [[SCALAR_RECUR_INIT:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF1IC2-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]
		; CHECK-VF1IC2-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ 0, [[ENTRY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF1IC2-NEXT: br label [[LOOP:%.*]]
		; CHECK-VF1IC2: loop:
		; CHECK-VF1IC2-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP]] ]
		; CHECK-VF1IC2-NEXT: [[MIN_IDX:%.]] = phi i64 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[MIN_IDX_NEXT:%.]], [[LOOP]] ]
		; CHECK-VF1IC2-NEXT: [[SCALAR_RECUR:%.]] = phi i64 [ [[SCALAR_RECUR_INIT]], [[SCALAR_PH]] ], [ [[MIN_VAL_NEXT:%.]], [[LOOP]] ]
		; CHECK-VF1IC2-NEXT: [[GEP:%.*]] = getelementptr i64, ptr [[SRC]], i64 [[IV]]
		; CHECK-VF1IC2-NEXT: [[L:%.*]] = load i64, ptr [[GEP]], align 4
		; CHECK-VF1IC2-NEXT: [[CMP:%.*]] = icmp ugt i64 [[SCALAR_RECUR]], [[L]]
		; CHECK-VF1IC2-NEXT: [[MIN_VAL_NEXT]] = add i64 [[L]], 1
		; CHECK-VF1IC2-NEXT: [[FOO:%.*]] = call i64 @llvm.umin.i64(i64 [[SCALAR_RECUR]], i64 [[L]])
		; CHECK-VF1IC2-NEXT: [[MIN_IDX_NEXT]] = select i1 [[CMP]], i64 [[IV]], i64 [[MIN_IDX]]
		; CHECK-VF1IC2-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
		; CHECK-VF1IC2-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], [[N]]
		; CHECK-VF1IC2-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[LOOP]], !llvm.loop [[LOOP3:![0-9]+]]
		; CHECK-VF1IC2: exit:
		; CHECK-VF1IC2-NEXT: [[RES:%.*]] = phi i64 [ [[MIN_IDX_NEXT]], [[LOOP]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF1IC2-NEXT: ret i64 [[RES]]
;		;
entry:		entry:
br label %loop		br label %loop

loop:		loop:
%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]		%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
%min.idx = phi i64 [ 0, %entry ], [ %min.idx.next, %loop ]		%min.idx = phi i64 [ 0, %entry ], [ %min.idx.next, %loop ]
%min.val = phi i64 [ 0, %entry ], [ %min.val.next, %loop ]		%min.val = phi i64 [ 0, %entry ], [ %min.val.next, %loop ]
▲ Show 20 Lines • Show All 206 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorize] Vectorize select-cmp reduction pattern for increasing integer induction variableAcceptedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 552652

llvm/include/llvm/Analysis/IVDescriptors.h

llvm/include/llvm/Transforms/Utils/LoopUtils.h

llvm/lib/Analysis/IVDescriptors.cpp

llvm/lib/Transforms/Utils/LoopUtils.cpp

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

llvm/test/Transforms/LoopVectorize/iv-select-cmp-no-wrap.ll

llvm/test/Transforms/LoopVectorize/iv-select-cmp.ll

llvm/test/Transforms/LoopVectorize/select-min-index.ll

[LoopVectorize] Vectorize select-cmp reduction pattern for increasing integer induction variable
AcceptedPublic