This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
Analysis/
2/3
IVDescriptors.h
-
Transforms/Utils/
-
Utils/
-
LoopUtils.h
-
lib/
-
Analysis/
1/10
IVDescriptors.cpp
-
Transforms/
-
Utils/
-
LoopUtils.cpp
-
Vectorize/
2
LoopVectorizationLegality.cpp
2/10
LoopVectorize.cpp
-
VPlanRecipes.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
1
select-min-index.ll
1/2
smax-idx.ll

Differential D143465

[LoopVectorize] Vectorize the reduction pattern of integer min/max with index.
Needs ReviewPublic

Authored by Mel-Chen on Feb 6 2023, 11:09 PM.

Download Raw Diff

Details

Reviewers

ABataev
fhahn

Summary

The Concept and Approach

Here is an example of min max with index pattern:

int idx = ii;
int max = mm;
for (int i = 0; i < n; ++i) {
  int x = a[i];
  if (max < x) {
    max = x;
    idx = i;
  }
}

After transfering to LLVM IR, it will look like this:

define i64 @smax_idx(ptr nocapture readonly %a, i64 %mm, i64 %ii, ptr nocapture writeonly %res_max, i64 %n) {
entry:
  br label %for.body

for.body:
  %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
  %max.09 = phi i64 [ %mm, %entry ], [ %1, %for.body ]
  %idx.011 = phi i64 [ %ii, %entry ], [ %spec.select7, %for.body ]
  %arrayidx = getelementptr inbounds i64, ptr %a, i64 %indvars.iv
  %0 = load i64, ptr %arrayidx
  %1 = tail call i64 @llvm.smax.i64(i64 %max.09, i64 %0)
  %cmp1 = icmp slt i64 %max.09, %0
  %spec.select7 = select i1 %cmp1, i64 %indvars.iv, i64 %idx.011
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
  %exitcond.not = icmp eq i64 %indvars.iv.next, %n
  br i1 %exitcond.not, label %exit, label %for.body

exit:
  store i64 %1, ptr %res_max
  ret i64 %spec.select7
}

Then we'll make a def-use graph for illustration (focus on for.body):

                   ┌──────────────────┐
                   ▼                  │
          %indvars.iv                 │
         /         │ \                │
        ▼          │  ▼               │
%arrayidx          │%indvars.iv.next  │
     │             │      │        └──┘
     ▼             │      ▼
    %0             │   %exitcond.not
┌─┐ /└───────────┐ │      │
│ ▼▼             │ │      ▼
│ %1             │ │      br
│  │             │ │
│  ▼             │ │
│phi_max:%max.09 │ │
└──┘  \    ┌─────┘ │
       ▼   ▼       │ 
       %cmp1       │
      ┌─┐ \        │
      │ ▼  ▼       ▼
      │ %spec.select7
      │       │
      │       ▼
      │    phi_idx:%idx.011 
      └───────┘

Generally, we will do traveling that starts from the phi of the loop header block when recognizing a reduction pattern, that is, phi_max and phi_idx in the graph. Taking simple max reduction as an example, we will start with phi_max and perform depth-first traveling on the def-use graph to check whether we can go back to phi_max, which means forming a cycle. Besides that, two things must be confirmed: first, at least one reduction operation, the operations in the cycle, is used outside the loop. If there is no external user, there is no need for the vectorizer to vectorize for it. Second, the reduction operation cannot have users inside the loop, unless the internal users are also reduction operations, and this is one of the issues that this patch needs to handle.

Let’s go back to the def-use graph. If we want to find two cycles for one traveling, it is obviously difficult. Besides making the algorithm more complicated, we also face the issue of ordering phi_max and phi_idx, because we cannot control the order of input IR. The better way is to find the cycles of phi_max and phi_idx respectively according to the original algorithm, and perform the second stage - combining phi_max and phi_idx. In this way, the phi order issue can be solved.

for.body:
  %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
  %max.09 = phi i64 [ %mm, %entry ], [ %1, %for.body ]
  %idx.011 = phi i64 [ %ii, %entry ], [ %spec.select7, %for.body ]
…

And 

for.body:
  %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
  %idx.011 = phi i64 [ %ii, %entry ], [ %spec.select7, %for.body ]
  %max.09 = phi i64 [ %mm, %entry ], [ %1, %for.body ]
…

Next, let's see what type of cycle phi_max and phi_idx are. Ignoring the edge (phi_max, %cmp1), phi_max is the general max reduction, and phi_idx is the select-cmp reduction. In other words, we need to recognize max reduction and select-cmp reduction in the first stage of reduction recognition. Then use the relationship between the two reductions found in the first stage to perform the combination in the second stage.

After the recognition is completed, the next step is code generation. First, let's look at how to code generation when there is no dependency between max reduction and select-cmp reduction (that is, when it is not a pattern of min max with index):

/* Normal two independent reductions*/
int idx = ii;
int max = mm;
for (int i = 0; i < n; ++i) {
  int x = a[i];
  if (max < x) 
    max = x; 
  if (b[i] < a[i])
    idx = i;
}

vec_max = broadcast(mm)
vec_step = iota
vec_idx = broadcast(MIN_VALUE(DType))
for (int i = 0; i < n; i += vf) {
  vec_a = load(a, i, vf)
  vec_b = load(b, i, vf)
  vec_cmp = vec_b < vec_a
  vec_max = max(vec_max, vec_a)
  vec_idx = select(vec_cmp, vec_step, vec_idx)
  vec_step += vf;
}
red_max = reduce_max(vec_max)
red_idx_candidate = reduce_max(vec_idx)
red_idx = red_idx_candidate == MIN_VALUE(DType) ? ii : red_idx_candidate

And when there is a dependency between max reduction and select-cmp reduction (that is, it is a pattern of min max with index), what will happen to code generation?

/* Two dependent reductions, the max with first index pattern */
int idx = ii;
int max = mm;
for (int i = 0; i < n; ++i) {
  int x = a[i];
  if (max < x) {  // strict
    max = x; 
    idx = i;
  }
}

vec_max = broadcast(mm)
vec_step = iota
vec_idx = broadcast(MIN_VALUE(DType))
for (int i = 0; i < n; i += vf) {
  vec_a = load(a, i, vf)
  vec_cmp = vec_max < vec_a
  vec_max = max(vec_max, vec_a)
  vec_idx = select(vec_cmp, vec_step, vec_idx)
  vec_step += vf;
}
red_max = reduce_max(vec_max)
vec_all_max = broadcast(red_max)
mask = (vec_max == vec_all_max)
red_idx_candidate = reduce_min(vec_idx, mask)  // since this case is strict max reduction
red_idx = red_idx_candidate == MIN_VALUE(DType) ? ii : red_idx_candidate

The biggest difference is whether a mask needs to be created between max reduction and select-cmp reduction in the exit block (or middle block in LLVM vectorizer). Secondly, according to whether min max is strict or non-strict, decide whether to use the maximum index or the minimum index. Therefore, as long as the correct reduction dependency can be established in the recognition stage, and do code generation according to the dependency, the pattern of min max with index can be vectorized.

The Implementation in LLVM

According to the description in the previous chapter, first of all, the vectorizer must be able to recognize select-cmp reduction. Next, we need to solve the issue of internal reduction users, so that min max reduction can accept loop internal users in the first recognition stage. The third is to combine min max reduction and select-cmp reduction. Finally, according to the relationship between select-cmp reduction and min max reduction, a mask is generated in the middle block and reduction fix is performed.

Select-Cmp Reduction

At present, there is already a select-cmp implementation in LLVM, developed by the author david-arm. However, the current implementation restricts the value of non-reduction phi to be loop invariant, which does not meet our demands. Therefore, we need to expand this feature, namely SelectIVICmp/ SelectIVFCmp.
Taking SelectIVICmp as an example, the result of vectorization is as follows:

/* A SelectIVICmp example */
#include <stdint.h>

int64_t idx_scalar(int64_t *a, int64_t *b, int64_t ii, int64_t n) {
  int64_t idx = ii;
  for (int64_t i = 0; i < n; ++i)
    idx = (a[i] > b[i]) ? i : idx;

  return idx;
}

/* LLVM IR for vectorized SelectIVICmp reduction */
define dso_local i64 @idx_scalar(ptr nocapture noundef readonly %a, ptr nocapture noundef readonly %b, i64 noundef %ii, i64 noundef %n) local_unnamed_addr #0 {
…

vector.body:                                      ; preds = %vector.body, %vector.ph
  %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
  %vec.ind = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, %vector.ph ], [ %vec.ind.next, %vector.body ]
  %vec.phi = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, %vector.ph ], [ %6, %vector.body ]
  %0 = add i64 %index, 0
  %1 = getelementptr inbounds i64, ptr %a, i64 %0
  %2 = getelementptr inbounds i64, ptr %1, i32 0
  %wide.load = load <4 x i64>, ptr %2, align 8, !tbaa !4
  %3 = getelementptr inbounds i64, ptr %b, i64 %0
  %4 = getelementptr inbounds i64, ptr %3, i32 0
  %wide.load1 = load <4 x i64>, ptr %4, align 8, !tbaa !4
  %5 = icmp sgt <4 x i64> %wide.load, %wide.load1
  %6 = select <4 x i1> %5, <4 x i64> %vec.ind, <4 x i64> %vec.phi
  %index.next = add nuw i64 %index, 4
  %vec.ind.next = add <4 x i64> %vec.ind, <i64 4, i64 4, i64 4, i64 4>
  %7 = icmp eq i64 %index.next, %n.vec
  br i1 %7, label %middle.block, label %vector.body, !llvm.loop !8

middle.block:                                     ; preds = %vector.body
  %8 = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> %6)
  %rdx.select.cmp = icmp ne i64 %8, -9223372036854775808
  %rdx.select = select i1 %rdx.select.cmp, i64 %8, i64 %ii
  %cmp.n = icmp eq i64 %n, %n.vec
  br i1 %cmp.n, label %for.exit.loopexit, label %scalar.ph
…
}

Assuming that the format of the induction variable i is {start, +, step}, SelectIVICmp should use start - step as its identity. However, for easy implementation, I directly use MIN_VALUE(DType) as identity (-9223372036854775808 in above example). The role of identity here is the sentinel value, which represents the start value of SelectIVICmp (%ii in the above example). In the end, if the result of reduce_max is identity, the result of reduction will be fixed as the start value.

Note that this is a temporary implementation. Unexpected errors may occur when the start of the induction variable is MIN_TYPE(DType), or when the maximum value of the induction variable exceeds SignedMax(DType). Generally, it should be implemented by two reductions.

Internal User Issue: UserRecurPhi and UserRecurKind

I modified the function isMinMaxPattern so that while identifying min max reduction, it is also possible to identify whether there is an index pattern that may depend on min max reduction.

isMinMaxIdxPattern will set UserRecurPhi and UserRecurKind according to the current select and cmp IR traveling, and the currently recognizing kind of min max reduction. Please refer to the form below:

Take the first example:
  %1 = tail call i64 @llvm.smax.i64(i64 %max.09, i64 %0)
  %cmp1 = icmp slt i64 %max.09, %0
  %spec.select7 = select i1 %cmp1, i64 %indvars.iv, i64 %idx.011

cmp format	%max.09 < %0	%max.09 <= %0	%max.09 > %0	%max.09 >= %0	%0 < %max.09	%0 <= %max.09	%0 > %max.09	%0 >= %max.09
UMax SMax	UserRecurPhi=select.getFalseValue() UserRecurKind=MinMaxFirstIdx	UserRecurPhi=select.getFalseValue() UserRecurKind=MinMaxLastIdx	UserRecurPhi=select.getTrueValue() UserRecurKind=MinMaxLastIdx	UserRecurPhi=select.getTrueValue() UserRecurKind=MinMaxFirstIdx	UserRecurPhi=select.getTrueValue() UserRecurKind=MinMaxLastIdx	UserRecurPhi=select.getTrueValue() UserRecurKind=MinMaxFirstIdx	UserRecurPhi=select.getFalseValue() UserRecurKind=MinMaxFirstIdx	UserRecurPhi=select.getFalseValue() UserRecurKind=MinMaxLastIdx
UMin SMin	UserRecurPhi=select.getTrueValue() UserRecurKind=MinMaxLastIdx	UserRecurPhi=select.getTrueValue() UserRecurKind=MinMaxFirstIdx	UserRecurPhi=select.getFalseValue() UserRecurKind=MinMaxFirstIdx	UserRecurPhi=select.getFalseValue() UserRecurKind=MinMaxLastIdx	UserRecurPhi=select.getFalseValue() UserRecurKind=MinMaxFirstIdx	UserRecurPhi=select.getFalseValue() UserRecurKind=MinMaxLastIdx	UserRecurPhi=select.getTrueValue() UserRecurKind=MinMaxLastIdx	UserRecurPhi=select.getTrueValue() UserRecurKind=MinMaxFirstIdx

According to this table, UserRecurPhi will be set to %idx.011 (FalseValue in the select IR) and UserRecurKind to MinMaxFirstIdx.

Once UserRecurPhi is set, it means that the select should belong to another unknown reduction, UserRecurPhi should be the phi of the unknown reduction, and UserRecurKind is the expected reduction kind. Both UserRecurPhi and UserRecurKind will be used in the second phase recognition.

The Second Phase of Recognition

At the end of function canVectorizeInstrs, there will be a second confirmation against the reduction that has UserRecurPhi.
At this stage, all reductions should have been found. At this point, the vectorizer only needs to check whether UserRecurPhi is a reduction phi. At the same time, by using the function fixUserRecurrence, the SelectIVICmp will be converted into MinMaxFirstIdx or MinMaxLastIdx according to UserRecurKind.

Code Generation and Reduction Fix

Min max with index vectorization does not need to be adjusted for the contents of vector.body, but requires changes in function fixReduction.

We must ensure that min max reduction is fixed earlier than index reduction, because index reduction needs to use the mask generated by min max reduction. We can achieve this by modifying the function fixCrossIterationPHIs. The map DependRecurrenceMasks will keep the mask generated by min max reduction. If the mask required by index reduction is not ready, it will postpone the fix of the index reduction until the required mask is ready.

Consider the following loops:

int idx = ii;
int max = mm;
for (int i = 0; i < n; ++i) {
  int x = a[i];
  if (max < x) {
    max = x;
    idx = i;
  }
}

and

int idx = ii;
int max = mm;
for (int i = 0; i < n; ++i) {
  int x = a[i];
  if (max <= x) {
    max = x;
    idx = i;
  }
}

Changes:

New recurrence Kinds: MinMaxFirstIdx and MinMaxLastIdx. This kind is not directly generated by function AddReductionVar, but converted from SelectIVICmp/SelectIVFCmp.

TODOs:

Now have not support that the min/max recurrence without exit instruction. Refer to test case smax_idx_max_no_exit_user.
Support the min/max recurrence in select(cmp()). Refer to test case smax_idx_select_cmp.
Support FP min/max recurrence.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

Mel-Chen created this revision.Feb 6 2023, 11:09 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 6 2023, 11:09 PM

Herald added subscribers: shiva0217, arphaman, rogfer01, hiraditya. · View Herald Transcript

Mel-Chen requested review of this revision.Feb 6 2023, 11:09 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 6 2023, 11:09 PM

Herald added subscribers: llvm-commits, • pcwang-thead, vkmr. · View Herald Transcript

Mel-Chen added reviewers: ABataev, fhahn.Feb 6 2023, 11:56 PM

Herald added a subscriber: StephenFan. · View Herald TranscriptFeb 6 2023, 11:56 PM

Mel-Chen mentioned this in D132063: [LV] Support vectorizing 'select index of minimum element' idiom. (WIP).Feb 7 2023, 12:01 AM

Mel-Chen retitled this revision from [LoopVectorize] Vectorize the reduction pattern of integer min/max with index. to [LoopVectorize] Vectorize the reduction pattern of integer min/max with index. (WIP).Feb 7 2023, 12:16 AM

Harbormaster completed remote builds in B212285: Diff 495388.Feb 7 2023, 12:18 AM

Mel-Chen edited the summary of this revision. (Show Details)Feb 7 2023, 12:21 AM

rui.zhang added a subscriber: rui.zhang.Feb 7 2023, 10:06 AM

Rebase and update the command in test case.

Harbormaster completed remote builds in B212749: Diff 496034.Feb 9 2023, 12:22 AM

Changes:

Fix interleave code generation for SelectIVICmp and SelectIVFCmp.
Fix the internal compiler error for MinMaxFirstIdx and MinMaxLastIdx when -force-vector-width=1.
Update test cases. Add more run command lines.

Harbormaster completed remote builds in B213366: Diff 496879.Feb 13 2023, 3:24 AM

Changes:

Rebase
Split the patch of test case and implementation
Update FIXME

Harbormaster completed remote builds in B213407: Diff 496947.Feb 13 2023, 5:52 AM

Mel-Chen added a parent revision: D143905: [LV] Harden the test of the minmax with index pattern. (NFC).Feb 13 2023, 5:53 AM

Mel-Chen edited the summary of this revision. (Show Details)Feb 13 2023, 5:57 AM

huntergr added a subscriber: huntergr.Feb 13 2023, 6:34 AM

Changes:

Remove function createInductionSelectCmpTargetReduction
Add function createSentinelValueHandling
Fix the start value of SelectCmp and MinMaxIdx

And then I don't know why all FIXME has been fixed, will confirm.

Harbormaster completed remote builds in B213658: Diff 497320.Feb 14 2023, 9:12 AM

Rebase and fix the check prefix in test cases.

Harbormaster completed remote builds in B213805: Diff 497548.Feb 14 2023, 11:06 PM

Changes:

Confirm the operands of intrinsic and cmp. Fixed test case @smax_idx_not_vec_1.
Format code.

Mel-Chen retitled this revision from [LoopVectorize] Vectorize the reduction pattern of integer min/max with index. (WIP) to [LoopVectorize] Vectorize the reduction pattern of integer min/max with index..Feb 15 2023, 3:24 AM

Mel-Chen edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B213854: Diff 497615.Feb 15 2023, 5:06 AM

Rebase.

Harbormaster completed remote builds in B213881: Diff 497652.Feb 15 2023, 6:46 AM

@fhahn Ping. What do you think of my approach? I am looking forward to your reply.

Changes:

Rebase
Fix the bug of predicate normalization

Harbormaster completed remote builds in B215445: Diff 499753.Feb 23 2023, 1:32 AM

Rebase and update test case result.

Harbormaster completed remote builds in B218666: Diff 504121.Mar 10 2023, 7:43 AM

fhahn added inline comments.Mar 19 2023, 2:15 PM

llvm/include/llvm/Analysis/IVDescriptors.h
379	It would be helpful to document how the new system of recurrences depending on other recurrences would work I think, possibly also with an explanation of the whole approach in the patch description.
llvm/lib/Analysis/IVDescriptors.cpp
733	nit: Variables should start with upper case also, move definition to use?
758	The naming here is a bit confusing now, `NonPhi` can be an increasing loop induction? In that case it would be a phi, right?
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
777	We are in the process of removing those kinds of global maps that are used to carry information used during codegen and later. Ideally the combination of values would be modeled explicitly in the exit block of the plan, but we are not there yet. This is the main reason for D132063 doing things the way it does.
3875	this would need documenting.
llvm/test/Transforms/LoopVectorize/smax-idx.ll
1	Could you add new tests as a separate patch?

Mel-Chen added inline comments.Mar 22 2023, 7:31 AM

llvm/include/llvm/Analysis/IVDescriptors.h
379	Sure. I will document the whole approach. and update in the summary tomorrow. Quickly explain the function of `UserRecurPhi` . The purpose of `UserRecurPhi` is to allow the recurrence to be used in the loop (loop internal use), and to ensure that the user is also a recurrence. `UserRecurPhi` will record the candidate user recurrence phi, and `UserRecurKind` will recored the excepted user recurrence kind. Currently I'm limiting candidates to one, but it should be possible to have more than one.
llvm/lib/Analysis/IVDescriptors.cpp
758	Yes, it's a little confusing here. It could be better to replace `NonPhi` with `NonRecurPhi`. By the way, are you interested in supporting full functional SelectCmp pattern? I think min max with index pattern really needs to depend on the SelectCmp to be safe.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
777	I see., but `DependRecurrenceMasks` exists for a reason. Consider the following case: int idx = ii; int foo = jj; int max = mm; for (int i = 0; i < n; ++i) { int x = a[i]; if (max < x) { max = x; idx = i; foo = b[i]; } } That mask has the chance to be reused, and I try to keep that flexibility. Of course, we can recalculate the mask for each reduction that needs a mask, but currently using the global maps to preserve the mask is a relatively simple method that I think of. I have heard that VPlan is going to be extended to other blocks, could you share the relevant discussion links?
3875	Sure. Quick explanation: min/max recurrence should be done earlier than min max idx recurrence, because idx recurrence depends on the mask produced by min max recurrence. Here is to ensure that the recurrence dependencies are correct.
llvm/test/Transforms/LoopVectorize/smax-idx.ll
1	Of course. I will split an NFC patch tomorrow.

Mel-Chen added a parent revision: D146718: [LV] Add tests for integer min max with index reduction pattern. (NFC).Mar 23 2023, 6:03 AM

Split test case into parent revision.

Mel-Chen added inline comments.Mar 23 2023, 6:13 AM

llvm/include/llvm/Analysis/IVDescriptors.h
379	Too busy today. The document will be available in next week.

Harbormaster completed remote builds in B221296: Diff 507716.Mar 23 2023, 7:03 AM

Mel-Chen edited the summary of this revision. (Show Details)Mar 30 2023, 11:14 PM

Herald added subscribers: jeroen.dobbelaere, kosarev, kristof.beyls. · View Herald TranscriptMar 30 2023, 11:14 PM

@fhahn Updated my approach introduction in the summary.
If you have any questions, please contact me. Looking forward to discussing with you again. Thank you.

@fhahn Ping. I'd be glad to discuss this patch with you.

fhahn added inline comments.Apr 23 2023, 2:44 PM

llvm/lib/Analysis/IVDescriptors.cpp
741	Using this API seems unnecessarily strict; we don't need to bounds (and getBounds may fail if It cannot identify the bounds), we just need to check the direction of the IV, which can be done by checking if it is an induction PHI and use `SE.getMonotonicPredicateTyp`.
llvm/test/Transforms/LoopVectorize/select-min-index.ll
264–265	Is this incorrectly vectorized or does the test name need fixing? It looks like `%min.val` isn't an actual minimum value phi?

Herald added a subscriber: hoy. · View Herald TranscriptApr 23 2023, 2:44 PM

Rebase this patch, and created revision D149731 to preserving min max operation in select-cmp form.

Mel-Chen added a parent revision: D149731: [IR] New function llvm::createMinMaxSelectCmpOp for creating min/max operation in select-cmp form.May 3 2023, 1:15 AM

Harbormaster completed remote builds in B229636: Diff 519006.May 3 2023, 2:21 AM

Changes:

Split SelectIVICmp and SelectIVFCmp out
Add Comment
Minor refine the code

Harbormaster completed remote builds in B232802: Diff 523302.May 18 2023, 2:47 AM

Mel-Chen added a parent revision: D150851: [LoopVectorize] Vectorize select-cmp reduction pattern for increasing integer induction variable.May 18 2023, 3:01 AM

Mel-Chen edited the summary of this revision. (Show Details)

RKSimon edited the summary of this revision. (Show Details)Jun 5 2023, 2:31 AM

Changes:

Format and minor fix

Harbormaster completed remote builds in B236626: Diff 528430.Jun 5 2023, 7:17 AM

artagnon added a subscriber: artagnon.Jun 14 2023, 7:16 AM

Matt added a subscriber: Matt.Jun 14 2023, 3:35 PM

artagnon added inline comments.Jun 15 2023, 3:52 AM

llvm/lib/Analysis/IVDescriptors.cpp
417	Why?
426–428	If you separate out the MinMaxIdx pattern into its own function, we can check `NumCmpSelectPatternInst` for it separately.
861–890	This is a bit cryptic: would you consider adding more `RecurKind`s to make this less cryptic?
907–908	Can we avoid the expensive call to `isInductionPHI()` by checking that the `SCEVAddRec` is a `SCEVConstant`?
1336–1343	Why not merge this with the `RecurKind::SMax` case?
1371–1372	Rename these to `IMinMaxFirstIdx` and `IMinMaxLastIdx`?
llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
881	Typo: comfirm.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4171–4174	`RdxDesc.isOrdered()` can help you pick between `FCMP_OEQ` and `FCMP_UEQ`.

Mel-Chen mentioned this in D150851: [LoopVectorize] Vectorize select-cmp reduction pattern for increasing integer induction variable.Jun 29 2023, 3:08 AM

shiva0217 added inline comments.Aug 17 2023, 1:59 AM

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
892	Instead of fixUserRecurrence to setDependMinMaxRecurDes and change the user RecurKind, is it possible to setDependMinMaxRecurDes when isReductionPHI return true? If we able to propagate parent(dependent) RecurDes to isReductionPHI, perhaps we can create reduction as following. RecurKind ParentKind = RedDes.getRecurrenceKind(); if (ParentKind == RecurKind::SMax) { if (AddReductionVar(Phi, RecurKind::MinMaxFirstIdx, TheLoop, FMF, RedDes, DB, AC, DT, SE)) { LLVM_DEBUG(dbgs() << "Found an MinMaxFirstIdx reduction PHI." << *Phi << "\n"); return true; } } The dependency for the RecurKind could be explicitly and avoid the user RecurKind fixup.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3875	Perhaps we could do the sorting according to the reduction dependency before calling fixReduction which may be similar to https://reviews.llvm.org/D157631.
4144	Should we check isMinMaxRecurrenceKind(Kind) and isMinMaxIdxRecurrenceKind for user kind? Although it would be the only dependency currently, it might be explicit for the reader and avoid unexpected codegen in the future.
4146	Could we encapsulate the mask generation to createMinMaxIdxMaskOp or other name you prefer?
4167	Should we check isMinMaxRecurrenceKind(Kind) and isMinMaxIdxRecurrenceKind for user kind?
4168	Could we encapsulate the mask generation to createMinMaxIdxMask or similar?

Herald added a subscriber: wangpc. · View Herald TranscriptAug 17 2023, 1:59 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

IVDescriptors.h

131 lines

Transforms/

Utils/

LoopUtils.h

34 lines

lib/

Analysis/

IVDescriptors.cpp

279 lines

Transforms/

Utils/

LoopUtils.cpp

116 lines

Vectorize/

LoopVectorizationLegality.cpp

15 lines

LoopVectorize.cpp

92 lines

VPlanRecipes.cpp

12 lines

test/

Transforms/

LoopVectorize/

select-min-index.ll

361 lines

smax-idx.ll

1200 lines

Diff 504121

llvm/include/llvm/Analysis/IVDescriptors.h

Show All 27 Lines
class Loop;		class Loop;
class PredicatedScalarEvolution;		class PredicatedScalarEvolution;
class ScalarEvolution;		class ScalarEvolution;
class SCEV;		class SCEV;
class StoreInst;		class StoreInst;

/// These are the kinds of recurrences that we support.		/// These are the kinds of recurrences that we support.
enum class RecurKind {		enum class RecurKind {
None, ///< Not a recurrence.		None, ///< Not a recurrence.
Add, ///< Sum of integers.		Add, ///< Sum of integers.
Mul, ///< Product of integers.		Mul, ///< Product of integers.
Or, ///< Bitwise or logical OR of integers.		Or, ///< Bitwise or logical OR of integers.
And, ///< Bitwise or logical AND of integers.		And, ///< Bitwise or logical AND of integers.
Xor, ///< Bitwise or logical XOR of integers.		Xor, ///< Bitwise or logical XOR of integers.
SMin, ///< Signed integer min implemented in terms of select(cmp()).		SMin, ///< Signed integer min implemented in terms of select(cmp()).
SMax, ///< Signed integer max implemented in terms of select(cmp()).		SMax, ///< Signed integer max implemented in terms of select(cmp()).
UMin, ///< Unisgned integer min implemented in terms of select(cmp()).		UMin, ///< Unisgned integer min implemented in terms of select(cmp()).
UMax, ///< Unsigned integer max implemented in terms of select(cmp()).		UMax, ///< Unsigned integer max implemented in terms of select(cmp()).
FAdd, ///< Sum of floats.		FAdd, ///< Sum of floats.
FMul, ///< Product of floats.		FMul, ///< Product of floats.
FMin, ///< FP min implemented in terms of select(cmp()).		FMin, ///< FP min implemented in terms of select(cmp()).
FMax, ///< FP max implemented in terms of select(cmp()).		FMax, ///< FP max implemented in terms of select(cmp()).
FMulAdd, ///< Fused multiply-add of floats (a * b + c).		FMulAdd, ///< Fused multiply-add of floats (a * b + c).
SelectICmp, ///< Integer select(icmp(),x,y) where one of (x,y) is loop		SelectICmp, ///< Integer select(icmp(),x,y) where one of (x,y) is loop
///< invariant		///< invariant
SelectFCmp ///< Integer select(fcmp(),x,y) where one of (x,y) is loop		SelectFCmp, ///< Integer select(fcmp(),x,y) where one of (x,y) is loop
///< invariant		///< invariant
		SelectIVICmp, ///< Integer select(icmp(),x,y) where one of (x,y) is increasing
		///< loop induction PHI
		SelectIVFCmp, ///< Integer select(fcmp(),x,y) where one of (x,y) is increasing
		///< loop induction PHI
		MinMaxFirstIdx, ///< Min/Max with first index
		MinMaxLastIdx ///< Min/Max with last index
};		};

/// The RecurrenceDescriptor is used to identify recurrences variables in a		/// The RecurrenceDescriptor is used to identify recurrences variables in a
/// loop. Reduction is a special case of recurrence that has uses of the		/// loop. Reduction is a special case of recurrence that has uses of the
/// recurrence variable outside the loop. The method isReductionPHI identifies		/// recurrence variable outside the loop. The method isReductionPHI identifies
/// reductions that are basic recurrences.		/// reductions that are basic recurrences.
///		///
/// Basic recurrences are defined as the summation, product, OR, AND, XOR, min,		/// Basic recurrences are defined as the summation, product, OR, AND, XOR, min,
/// or max of a set of terms. For example: for(i=0; i<n; i++) { total +=		/// or max of a set of terms. For example: for(i=0; i<n; i++) { total +=
/// array[i]; } is a summation of array elements. Basic recurrences are a		/// array[i]; } is a summation of array elements. Basic recurrences are a
/// special case of chains of recurrences (CR). See ScalarEvolution for CR		/// special case of chains of recurrences (CR). See ScalarEvolution for CR
/// references.		/// references.

/// This struct holds information about recurrence variables.		/// This struct holds information about recurrence variables.
class RecurrenceDescriptor {		class RecurrenceDescriptor {
public:		public:
RecurrenceDescriptor() = default;		RecurrenceDescriptor() = default;

RecurrenceDescriptor(Value Start, Instruction Exit, StoreInst *Store,		RecurrenceDescriptor(Value Start, Instruction Exit, StoreInst *Store,
RecurKind K, FastMathFlags FMF, Instruction *ExactFP,		RecurKind K, FastMathFlags FMF, Instruction *ExactFP,
Type *RT, bool Signed, bool Ordered,		Type *RT, bool Signed, bool Ordered,
SmallPtrSetImpl<Instruction *> &CI,		SmallPtrSetImpl<Instruction *> &CI,
unsigned MinWidthCastToRecurTy)		unsigned MinWidthCastToRecurTy, PHINode *UserRecurPhi,
		RecurKind UserRecurKind)
: IntermediateStore(Store), StartValue(Start), LoopExitInstr(Exit),		: IntermediateStore(Store), StartValue(Start), LoopExitInstr(Exit),
Kind(K), FMF(FMF), ExactFPMathInst(ExactFP), RecurrenceType(RT),		Kind(K), FMF(FMF), ExactFPMathInst(ExactFP), RecurrenceType(RT),
IsSigned(Signed), IsOrdered(Ordered),		IsSigned(Signed), IsOrdered(Ordered),
MinWidthCastToRecurrenceType(MinWidthCastToRecurTy) {		MinWidthCastToRecurrenceType(MinWidthCastToRecurTy),
		UserRecurPhi(UserRecurPhi), UserRecurKind(UserRecurKind) {
CastInsts.insert(CI.begin(), CI.end());		CastInsts.insert(CI.begin(), CI.end());
}		}

/// This POD struct holds information about a potential recurrence operation.		/// This POD struct holds information about a potential recurrence operation.
class InstDesc {		class InstDesc {
public:		public:
InstDesc(bool IsRecur, Instruction I, Instruction ExactFP = nullptr)		InstDesc(bool IsRecur, Instruction I, Instruction ExactFP = nullptr)
: IsRecurrence(IsRecur), PatternLastInst(I),		: IsRecurrence(IsRecur), PatternLastInst(I),
RecKind(RecurKind::None), ExactFPMathInst(ExactFP) {}		RecKind(RecurKind::None), ExactFPMathInst(ExactFP) {}

InstDesc(Instruction I, RecurKind K, Instruction ExactFP = nullptr)		InstDesc(Instruction I, RecurKind K, Instruction ExactFP = nullptr)
: IsRecurrence(true), PatternLastInst(I), RecKind(K),		: IsRecurrence(true), PatternLastInst(I), RecKind(K),
ExactFPMathInst(ExactFP) {}		ExactFPMathInst(ExactFP) {}

		InstDesc(bool IsRecur, Instruction I, PHINode CandUserRecurPhi,
		RecurKind CandUserRecurKind, Instruction *ExactFP = nullptr)
		: IsRecurrence(IsRecur), PatternLastInst(I), RecKind(RecurKind::None),
		CandUserRecurPhi(CandUserRecurPhi),
		CandUserRecurKind(CandUserRecurKind), ExactFPMathInst(ExactFP) {}

bool isRecurrence() const { return IsRecurrence; }		bool isRecurrence() const { return IsRecurrence; }

bool needsExactFPMath() const { return ExactFPMathInst != nullptr; }		bool needsExactFPMath() const { return ExactFPMathInst != nullptr; }

Instruction *getExactFPMathInst() const { return ExactFPMathInst; }		Instruction *getExactFPMathInst() const { return ExactFPMathInst; }

RecurKind getRecKind() const { return RecKind; }		RecurKind getRecKind() const { return RecKind; }

Instruction *getPatternInst() const { return PatternLastInst; }		Instruction *getPatternInst() const { return PatternLastInst; }

		PHINode *getCandUserRecurPhi() const { return CandUserRecurPhi; }

		RecurKind getCandUserRecurKind() const { return CandUserRecurKind; }

		bool isCandidateUser() const {
		return getCandUserRecurPhi() && getCandUserRecurKind() != RecurKind::None;
		}

private:		private:
// Is this instruction a recurrence candidate.		// Is this instruction a recurrence candidate.
bool IsRecurrence;		bool IsRecurrence;
// The last instruction in a min/max pattern (select of the select(icmp())		// The last instruction in a min/max pattern (select of the select(icmp())
// pattern), or the current recurrence instruction otherwise.		// pattern), or the current recurrence instruction otherwise.
Instruction *PatternLastInst;		Instruction *PatternLastInst;
// If this is a min/max pattern.		// If this is a min/max pattern.
RecurKind RecKind;		RecurKind RecKind;
// Recurrence does not allow floating-point reassociation.		// Recurrence does not allow floating-point reassociation.
Instruction *ExactFPMathInst;		Instruction *ExactFPMathInst;
		// This instruction may be the operation of another recurrence.
		// Record potential recurrence phi.
		PHINode *CandUserRecurPhi = nullptr;
		// And expected recurrence kind.
		RecurKind CandUserRecurKind = RecurKind::None;
};		};

/// Returns a struct describing if the instruction 'I' can be a recurrence		/// Returns a struct describing if the instruction 'I' can be a recurrence
/// variable of type 'Kind' for a Loop \p L and reduction PHI \p Phi.		/// variable of type 'Kind' for a Loop \p L and reduction PHI \p Phi.
/// If the recurrence is a min/max pattern of select(icmp()) this function		/// If the recurrence is a min/max pattern of select(icmp()) this function
/// advances the instruction pointer 'I' from the compare instruction to the		/// advances the instruction pointer 'I' from the compare instruction to the
/// select instruction and stores this pointer in 'PatternLastInst' member of		/// select instruction and stores this pointer in 'PatternLastInst' member of
/// the returned struct.		/// the returned struct.
static InstDesc isRecurrenceInstr(Loop L, PHINode Phi, Instruction *I,		static InstDesc isRecurrenceInstr(Loop L, PHINode Phi, Instruction *I,
RecurKind Kind, InstDesc &Prev,		RecurKind Kind, InstDesc &Prev,
FastMathFlags FuncFMF);		FastMathFlags FuncFMF, ScalarEvolution *SE);

/// Returns true if instruction I has multiple uses in Insts		/// Returns true if instruction I has multiple uses in Insts
static bool hasMultipleUsesOf(Instruction *I,		static bool hasMultipleUsesOf(Instruction *I,
SmallPtrSetImpl<Instruction *> &Insts,		SmallPtrSetImpl<Instruction *> &Insts,
unsigned MaxNumUses);		unsigned MaxNumUses);

/// Returns true if all uses of the instruction I is within the Set.		/// Returns true if all uses of the instruction I is within the Set.
static bool areAllUsesIn(Instruction I, SmallPtrSetImpl<Instruction > &Set);		static bool areAllUsesIn(Instruction I, SmallPtrSetImpl<Instruction > &Set);

/// Returns a struct describing if the instruction is a llvm.(s/u)(min/max),		/// Returns a struct describing if the instruction is a llvm.(s/u)(min/max),
/// llvm.minnum/maxnum or a Select(ICmp(X, Y), X, Y) pair of instructions		/// llvm.minnum/maxnum or a Select(ICmp(X, Y), X, Y) pair of instructions
/// corresponding to a min(X, Y) or max(X, Y), matching the recurrence kind \p		/// corresponding to a min(X, Y) or max(X, Y), matching the recurrence kind \p
/// Kind. \p Prev specifies the description of an already processed select		/// Kind. \p Prev specifies the description of an already processed select
/// instruction, so its corresponding cmp can be matched to it.		/// instruction, so its corresponding cmp can be matched to it.
static InstDesc isMinMaxPattern(Instruction *I, RecurKind Kind,		static InstDesc isMinMaxPattern(Instruction *I, RecurKind Kind,
const InstDesc &Prev);		const InstDesc &Prev, Loop *Loop,
		PHINode OrigPhi, ScalarEvolution SE);

		/// Returns RecurKind describing which min/max recurrence kind the instruction
		/// \p I belongs to. Return RecurKind::None if instruction \p I is not matched
		/// any of min/max recurrence kind. Unlike isMinMaxPattern, this function does
		/// not limit exactly one use of cmp value.
		static RecurKind isMinMaxOperation(Instruction *I);

		static InstDesc isMinMaxIdxPattern(Loop Loop, Instruction I,
		PHINode *MinMaxPhi, RecurKind MinMaxKind,
		ScalarEvolution *SE);

/// Returns a struct describing whether the instruction is either a		/// Returns a struct describing whether the instruction is either a
/// Select(ICmp(A, B), X, Y), or		/// Select(ICmp(A, B), X, Y), or
/// Select(FCmp(A, B), X, Y)		/// Select(FCmp(A, B), X, Y)
/// where one of (X, Y) is a loop invariant integer and the other is a PHI		/// where one of (X, Y) is a loop invariant integer and the other is a PHI
/// value. \p Prev specifies the description of an already processed select		/// value. \p Prev specifies the description of an already processed select
/// instruction, so its corresponding cmp can be matched to it.		/// instruction, so its corresponding cmp can be matched to it.
static InstDesc isSelectCmpPattern(Loop Loop, PHINode OrigPhi,		static InstDesc isSelectCmpPattern(Loop Loop, PHINode OrigPhi,
Instruction *I, InstDesc &Prev);		Instruction *I, InstDesc &Prev,
		ScalarEvolution *SE);

/// Returns a struct describing if the instruction is a		/// Returns a struct describing if the instruction is a
/// Select(FCmp(X, Y), (Z = X op PHINode), PHINode) instruction pattern.		/// Select(FCmp(X, Y), (Z = X op PHINode), PHINode) instruction pattern.
static InstDesc isConditionalRdxPattern(RecurKind Kind, Instruction *I);		static InstDesc isConditionalRdxPattern(RecurKind Kind, Instruction *I);

/// Returns identity corresponding to the RecurrenceKind.		/// Returns identity corresponding to the RecurrenceKind.
Value getRecurrenceIdentity(RecurKind K, Type Tp, FastMathFlags FMF) const;		Value getRecurrenceIdentity(RecurKind K, Type Tp, FastMathFlags FMF) const;

▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines	static bool isFPMinMaxRecurrenceKind(RecurKind Kind) {
return Kind == RecurKind::FMin \|\| Kind == RecurKind::FMax;		return Kind == RecurKind::FMin \|\| Kind == RecurKind::FMax;
}		}

/// Returns true if the recurrence kind is any min/max kind.		/// Returns true if the recurrence kind is any min/max kind.
static bool isMinMaxRecurrenceKind(RecurKind Kind) {		static bool isMinMaxRecurrenceKind(RecurKind Kind) {
return isIntMinMaxRecurrenceKind(Kind) \|\| isFPMinMaxRecurrenceKind(Kind);		return isIntMinMaxRecurrenceKind(Kind) \|\| isFPMinMaxRecurrenceKind(Kind);
}		}

		/// Returns true if the recurrence kind is a max kind.
		static bool isMaxRecurrenceKind(RecurKind Kind) {
		return Kind == RecurKind::UMax \|\| Kind == RecurKind::SMax \|\|
		Kind == RecurKind::FMax;
		}

		static bool isMinMaxIdxRecurrenceKind(RecurKind Kind) {
		return Kind == RecurKind::MinMaxFirstIdx \|\|
		Kind == RecurKind::MinMaxLastIdx;
		}

/// Returns true if the recurrence kind is of the form		/// Returns true if the recurrence kind is of the form
/// select(cmp(),x,y) where one of (x,y) is loop invariant.		/// select(cmp(),x,y) where one of (x,y) is loop invariant.
static bool isSelectCmpRecurrenceKind(RecurKind Kind) {		static bool isSelectCmpRecurrenceKind(RecurKind Kind) {
return Kind == RecurKind::SelectICmp \|\| Kind == RecurKind::SelectFCmp;		return Kind == RecurKind::SelectICmp \|\| Kind == RecurKind::SelectFCmp \|\|
		Kind == RecurKind::SelectIVICmp \|\| Kind == RecurKind::SelectIVFCmp \|\|
		isMinMaxIdxRecurrenceKind(Kind);
}		}

/// Returns the type of the recurrence. This type can be narrower than the		/// Returns the type of the recurrence. This type can be narrower than the
/// actual type of the Phi if the recurrence has been type-promoted.		/// actual type of the Phi if the recurrence has been type-promoted.
Type *getRecurrenceType() const { return RecurrenceType; }		Type *getRecurrenceType() const { return RecurrenceType; }

/// Returns a reference to the instructions used for type-promoting the		/// Returns a reference to the instructions used for type-promoting the
/// recurrence.		/// recurrence.
const SmallPtrSet<Instruction *, 8> &getCastInsts() const { return CastInsts; }		const SmallPtrSet<Instruction *, 8> &getCastInsts() const { return CastInsts; }

		PHINode *getUserRecurPhi() const { return UserRecurPhi; }

		void setRecurKind(RecurKind K) {
		assert((K != RecurKind::None) && "Unexpected recurrence kind.");
		Kind = K;
		}

		void setDependMinMaxRecDes(RecurrenceDescriptor *MMRD) {
		assert(isMinMaxRecurrenceKind(MMRD->getRecurrenceKind()) &&
		"DependMinMaxRecDes must be a min/max recurrence.");
		DependMinMaxRecDes = MMRD;
		}

		RecurrenceDescriptor *getDependMinMaxRecDes() const {
		return DependMinMaxRecDes;
		}

		bool hasUserRecurrence() const {
		return UserRecurPhi && UserRecurKind != RecurKind::None;
		}

		bool fixUserRecurrence(RecurrenceDescriptor &UserRedDes);

/// Returns the minimum width used by the recurrence in bits.		/// Returns the minimum width used by the recurrence in bits.
unsigned getMinWidthCastToRecurrenceTypeInBits() const {		unsigned getMinWidthCastToRecurrenceTypeInBits() const {
return MinWidthCastToRecurrenceType;		return MinWidthCastToRecurrenceType;
}		}

/// Returns true if all source operands of the recurrence are SExtInsts.		/// Returns true if all source operands of the recurrence are SExtInsts.
bool isSigned() const { return IsSigned; }		bool isSigned() const { return IsSigned; }

Show All 36 Lines	private:
// True if this recurrence can be treated as an in-order reduction.		// True if this recurrence can be treated as an in-order reduction.
// Currently only a non-reassociative FAdd can be considered in-order,		// Currently only a non-reassociative FAdd can be considered in-order,
// if it is also the only FAdd in the PHI's use chain.		// if it is also the only FAdd in the PHI's use chain.
bool IsOrdered = false;		bool IsOrdered = false;
// Instructions used for type-promoting the recurrence.		// Instructions used for type-promoting the recurrence.
SmallPtrSet<Instruction *, 8> CastInsts;		SmallPtrSet<Instruction *, 8> CastInsts;
// The minimum width used by the recurrence.		// The minimum width used by the recurrence.
unsigned MinWidthCastToRecurrenceType;		unsigned MinWidthCastToRecurrenceType;

		PHINode *UserRecurPhi = nullptr;
		fhahnUnsubmitted Not Done Reply Inline Actions It would be helpful to document how the new system of recurrences depending on other recurrences would work I think, possibly also with an explanation of the whole approach in the patch description. fhahn: It would be helpful to document how the new system of recurrences depending on other…
		Mel-ChenAuthorUnsubmitted Done Reply Inline Actions Sure. I will document the whole approach. and update in the summary tomorrow. Quickly explain the function of `UserRecurPhi` . The purpose of `UserRecurPhi` is to allow the recurrence to be used in the loop (loop internal use), and to ensure that the user is also a recurrence. `UserRecurPhi` will record the candidate user recurrence phi, and `UserRecurKind` will recored the excepted user recurrence kind. Currently I'm limiting candidates to one, but it should be possible to have more than one. Mel-Chen: Sure. I will document the whole approach. and update in the summary tomorrow. Quickly explain…
		Mel-ChenAuthorUnsubmitted Done Reply Inline Actions Too busy today. The document will be available in next week. Mel-Chen: Too busy today. The document will be available in next week.

		RecurKind UserRecurKind = RecurKind::None;

		RecurrenceDescriptor *DependMinMaxRecDes = nullptr;
};		};

/// A struct for saving information about induction variables.		/// A struct for saving information about induction variables.
class InductionDescriptor {		class InductionDescriptor {
public:		public:
/// This enum represents the kinds of inductions that we support.		/// This enum represents the kinds of inductions that we support.
enum InductionKind {		enum InductionKind {
IK_NoInduction, ///< Not an induction variable.		IK_NoInduction, ///< Not an induction variable.
▲ Show 20 Lines • Show All 97 Lines • Show Last 20 Lines

llvm/include/llvm/Transforms/Utils/LoopUtils.h

	Show First 20 Lines • Show All 353 Lines • ▼ Show 20 Lines

	/// See RecurrenceDescriptor::isSelectCmpPattern for a description of the			/// See RecurrenceDescriptor::isSelectCmpPattern for a description of the
	/// pattern we are trying to match. In this pattern we are only ever selecting			/// pattern we are trying to match. In this pattern we are only ever selecting
	/// between two values: 1) an initial PHI start value, and 2) a loop invariant			/// between two values: 1) an initial PHI start value, and 2) a loop invariant
	/// value. This function uses \p LoopExitInst to determine 2), which we then use			/// value. This function uses \p LoopExitInst to determine 2), which we then use
	/// to select between \p Left and \p Right. Any lane value in \p Left that			/// to select between \p Left and \p Right. Any lane value in \p Left that
	/// matches 2) will be merged into \p Right.			/// matches 2) will be merged into \p Right.
	Value createSelectCmpOp(IRBuilderBase &Builder, Value StartVal, RecurKind RK,			Value createSelectCmpOp(IRBuilderBase &Builder, Value StartVal, RecurKind RK,
	Value Left, Value Right);			Value Left, Value Right, Value *SrcCmp = nullptr);

	/// Returns a Min/Max operation corresponding to MinMaxRecurrenceKind.			/// Returns a Min/Max operation corresponding to MinMaxRecurrenceKind.
	/// The Builder's fast-math-flags must be set to propagate the expected values.			/// The Builder's fast-math-flags must be set to propagate the expected values.
	Value createMinMaxOp(IRBuilderBase &Builder, RecurKind RK, Value Left,			Value createMinMaxOp(IRBuilderBase &Builder, RecurKind RK, Value Left,
	Value *Right);			Value *Right);

	/// Generates an ordered vector reduction using extracts to reduce the value.			/// Generates an ordered vector reduction using extracts to reduce the value.
	Value getOrderedReduction(IRBuilderBase &Builder, Value Acc, Value *Src,			Value getOrderedReduction(IRBuilderBase &Builder, Value Acc, Value *Src,
	Show All 12 Lines
	/// Fast-math-flags are propagated using the IRBuilder's setting.			/// Fast-math-flags are propagated using the IRBuilder's setting.
	Value *createSimpleTargetReduction(IRBuilderBase &B,			Value *createSimpleTargetReduction(IRBuilderBase &B,
	const TargetTransformInfo TTI, Value Src,			const TargetTransformInfo TTI, Value Src,
	RecurKind RdxKind);			RecurKind RdxKind);

	/// Create a target reduction of the given vector \p Src for a reduction of the			/// Create a target reduction of the given vector \p Src for a reduction of the
	/// kind RecurKind::SelectICmp or RecurKind::SelectFCmp. The reduction operation			/// kind RecurKind::SelectICmp or RecurKind::SelectFCmp. The reduction operation
	/// is described by \p Desc.			/// is described by \p Desc.
	Value *createSelectCmpTargetReduction(IRBuilderBase &B,			Value *createInvariantSelectCmpTargetReduction(IRBuilderBase &B,
	const TargetTransformInfo *TTI,			const TargetTransformInfo *TTI,
	Value *Src,			Value *Src,
	const RecurrenceDescriptor &Desc,			const RecurrenceDescriptor &Desc,
	PHINode *OrigPhi);			PHINode *OrigPhi);

				Value *createMMISelectCmpTargetReduction(IRBuilderBase &Builder,
				const TargetTransformInfo *TTI,
				Value *Src,
				const RecurrenceDescriptor &Desc,
				PHINode OrigPhi, Value SrcMask);

				/// Create a target reduction of the given vector \p Src for a reduction of the
				/// kind RecurKind::SelectICmp or RecurKind::SelectFCmp. The reduction operation
				/// is described by \p Desc.
				Value *
				createSelectCmpTargetReduction(IRBuilderBase &B, const TargetTransformInfo *TTI,
				Value *Src, const RecurrenceDescriptor &Desc,
				PHINode OrigPhi, Value SrcMask = nullptr);

	/// Create a generic target reduction using a recurrence descriptor \p Desc			/// Create a generic target reduction using a recurrence descriptor \p Desc
	/// The target is queried to determine if intrinsics or shuffle sequences are			/// The target is queried to determine if intrinsics or shuffle sequences are
	/// required to implement the reduction.			/// required to implement the reduction.
	/// Fast-math-flags are propagated using the RecurrenceDescriptor.			/// Fast-math-flags are propagated using the RecurrenceDescriptor.
	Value createTargetReduction(IRBuilderBase &B, const TargetTransformInfo TTI,			Value createTargetReduction(IRBuilderBase &B, const TargetTransformInfo TTI,
	const RecurrenceDescriptor &Desc, Value *Src,			const RecurrenceDescriptor &Desc, Value *Src,
	PHINode *OrigPhi = nullptr);			PHINode *OrigPhi = nullptr,
				Value *SrcMask = nullptr);

	/// Create an ordered reduction intrinsic using the given recurrence			/// Create an ordered reduction intrinsic using the given recurrence
	/// descriptor \p Desc.			/// descriptor \p Desc.
	Value *createOrderedReduction(IRBuilderBase &B,			Value *createOrderedReduction(IRBuilderBase &B,
	const RecurrenceDescriptor &Desc, Value *Src,			const RecurrenceDescriptor &Desc, Value *Src,
	Value *Start);			Value *Start);

				Value *createSentinelValueHandling(IRBuilderBase &Builder,
				const TargetTransformInfo *TTI,
				const RecurrenceDescriptor &Desc,
				Value *Rdx);

	/// Get the intersection (logical and) of all of the potential IR flags			/// Get the intersection (logical and) of all of the potential IR flags
	/// of each scalar operation (VL) that will be converted into a vector (I).			/// of each scalar operation (VL) that will be converted into a vector (I).
	/// If OpValue is non-null, we only consider operations similar to OpValue			/// If OpValue is non-null, we only consider operations similar to OpValue
	/// when intersecting.			/// when intersecting.
	/// Flag set: NSW, NUW (if IncludeWrapFlags is true), exact, and all of			/// Flag set: NSW, NUW (if IncludeWrapFlags is true), exact, and all of
	/// fast-math.			/// fast-math.
	void propagateIRFlags(Value I, ArrayRef<Value > VL, Value *OpValue = nullptr,			void propagateIRFlags(Value I, ArrayRef<Value > VL, Value *OpValue = nullptr,
	bool IncludeWrapFlags = true);			bool IncludeWrapFlags = true);
	▲ Show 20 Lines • Show All 132 Lines • Show Last 20 Lines

llvm/lib/Analysis/IVDescriptors.cpp

Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	bool RecurrenceDescriptor::isIntegerRecurrenceKind(RecurKind Kind) {
case RecurKind::And:		case RecurKind::And:
case RecurKind::Xor:		case RecurKind::Xor:
case RecurKind::SMax:		case RecurKind::SMax:
case RecurKind::SMin:		case RecurKind::SMin:
case RecurKind::UMax:		case RecurKind::UMax:
case RecurKind::UMin:		case RecurKind::UMin:
case RecurKind::SelectICmp:		case RecurKind::SelectICmp:
case RecurKind::SelectFCmp:		case RecurKind::SelectFCmp:
		case RecurKind::SelectIVICmp:
		case RecurKind::SelectIVFCmp:
		case RecurKind::MinMaxFirstIdx:
		case RecurKind::MinMaxLastIdx:
return true;		return true;
}		}
return false;		return false;
}		}

bool RecurrenceDescriptor::isFloatingPointRecurrenceKind(RecurKind Kind) {		bool RecurrenceDescriptor::isFloatingPointRecurrenceKind(RecurKind Kind) {
return (Kind != RecurKind::None) && !isIntegerRecurrenceKind(Kind);		return (Kind != RecurKind::None) && !isIntegerRecurrenceKind(Kind);
}		}

		bool RecurrenceDescriptor::fixUserRecurrence(RecurrenceDescriptor &UserRedDes) {
		RecurKind UserCurrKind = UserRedDes.getRecurrenceKind();
		assert((UserCurrKind != RecurKind::None) && "Unexpected recurrence kind.");

		if (isMinMaxRecurrenceKind(Kind))
		if (UserCurrKind == RecurKind::SelectIVICmp \|\|
		UserCurrKind == RecurKind::SelectIVFCmp) {
		UserRedDes.setRecurKind(UserRecurKind);
		UserRedDes.setDependMinMaxRecDes(this);
		return true;
		}

		return false;
		}

/// Determines if Phi may have been type-promoted. If Phi has a single user		/// Determines if Phi may have been type-promoted. If Phi has a single user
/// that ANDs the Phi with a type mask, return the user. RT is updated to		/// that ANDs the Phi with a type mask, return the user. RT is updated to
/// account for the narrower bit width represented by the mask, and the AND		/// account for the narrower bit width represented by the mask, and the AND
/// instruction is added to CI.		/// instruction is added to CI.
static Instruction lookThroughAnd(PHINode Phi, Type *&RT,		static Instruction lookThroughAnd(PHINode Phi, Type *&RT,
SmallPtrSetImpl<Instruction *> &Visited,		SmallPtrSetImpl<Instruction *> &Visited,
SmallPtrSetImpl<Instruction *> &CI) {		SmallPtrSetImpl<Instruction *> &CI) {
if (!Phi->hasOneUse())		if (!Phi->hasOneUse())
▲ Show 20 Lines • Show All 168 Lines • ▼ Show 20 Lines	bool RecurrenceDescriptor::AddReductionVar(
bool FoundReduxOp = false;		bool FoundReduxOp = false;

// We start with the PHI node and scan for all of the users of this		// We start with the PHI node and scan for all of the users of this
// instruction. All users must be instructions that can be used as reduction		// instruction. All users must be instructions that can be used as reduction
// variables (such as ADD). We must have a single out-of-block user. The cycle		// variables (such as ADD). We must have a single out-of-block user. The cycle
// must include the original PHI.		// must include the original PHI.
bool FoundStartPHI = false;		bool FoundStartPHI = false;

		PHINode *UserRecurPHI = nullptr;
		RecurKind UserRecurKind = RecurKind::None;
		Instruction *UserRecurInstr = nullptr;

// To recognize min/max patterns formed by a icmp select sequence, we store		// To recognize min/max patterns formed by a icmp select sequence, we store
// the number of instruction we saw from the recognized min/max pattern,		// the number of instruction we saw from the recognized min/max pattern,
// to make sure we only see exactly the two instructions.		// to make sure we only see exactly the two instructions.
unsigned NumCmpSelectPatternInst = 0;		unsigned NumCmpSelectPatternInst = 0;
		Instruction *MinMaxRecurOperation = nullptr;
InstDesc ReduxDesc(false, nullptr);		InstDesc ReduxDesc(false, nullptr);

// Data used for determining if the recurrence has been type-promoted.		// Data used for determining if the recurrence has been type-promoted.
Type *RecurrenceType = Phi->getType();		Type *RecurrenceType = Phi->getType();
SmallPtrSet<Instruction *, 4> CastInsts;		SmallPtrSet<Instruction *, 4> CastInsts;
unsigned MinWidthCastToRecurrenceType;		unsigned MinWidthCastToRecurrenceType;
Instruction *Start = Phi;		Instruction *Start = Phi;
bool IsSigned = false;		bool IsSigned = false;
▲ Show 20 Lines • Show All 108 Lines • ▼ Show 20 Lines	if (!Cur->isCommutative() && !IsAPhi && !isa<SelectInst>(Cur) &&
!VisitedInsts.count(dyn_cast<Instruction>(Cur->getOperand(0))))		!VisitedInsts.count(dyn_cast<Instruction>(Cur->getOperand(0))))
return false;		return false;

// Any reduction instruction must be of one of the allowed kinds. We ignore		// Any reduction instruction must be of one of the allowed kinds. We ignore
// the starting value (the Phi or an AND instruction if the Phi has been		// the starting value (the Phi or an AND instruction if the Phi has been
// type-promoted).		// type-promoted).
if (Cur != Start) {		if (Cur != Start) {
ReduxDesc =		ReduxDesc =
isRecurrenceInstr(TheLoop, Phi, Cur, Kind, ReduxDesc, FuncFMF);		isRecurrenceInstr(TheLoop, Phi, Cur, Kind, ReduxDesc, FuncFMF, SE);
ExactFPMathInst = ExactFPMathInst == nullptr		ExactFPMathInst = ExactFPMathInst == nullptr
? ReduxDesc.getExactFPMathInst()		? ReduxDesc.getExactFPMathInst()
: ExactFPMathInst;		: ExactFPMathInst;
if (!ReduxDesc.isRecurrence())		if (!ReduxDesc.isRecurrence()) {
		if (!ReduxDesc.isCandidateUser())
		return false;

		// TODO: Only allow one user recurrence now.
		if (UserRecurPHI)
return false;		return false;

		UserRecurPHI = ReduxDesc.getCandUserRecurPhi();
		UserRecurKind = ReduxDesc.getCandUserRecurKind();
		UserRecurInstr = Cur;
		// TODO: Call AddReductionVar here?
		artagnonUnsubmitted Not Done Reply Inline Actions Why? artagnon: Why?

		// Fix NumCmpSelectPatternInst
		if (auto *SI = dyn_cast<SelectInst>(Cur)) {
		auto *CI = dyn_cast<CmpInst>(SI->getCondition());
		if (CI->hasOneUse())
		--NumCmpSelectPatternInst;
		}
		// Stop visiting the users of current instruction if it contains user
		// recurrence.
		continue;
		}
		artagnonUnsubmitted Not Done Reply Inline Actions If you separate out the MinMaxIdx pattern into its own function, we can check `NumCmpSelectPatternInst` for it separately. artagnon: If you separate out the MinMaxIdx pattern into its own function, we can check…
// FIXME: FMF is allowed on phi, but propagation is not handled correctly.		// FIXME: FMF is allowed on phi, but propagation is not handled correctly.
if (isa<FPMathOperator>(ReduxDesc.getPatternInst()) && !IsAPhi) {		if (isa<FPMathOperator>(ReduxDesc.getPatternInst()) && !IsAPhi) {
FastMathFlags CurFMF = ReduxDesc.getPatternInst()->getFastMathFlags();		FastMathFlags CurFMF = ReduxDesc.getPatternInst()->getFastMathFlags();
if (auto *Sel = dyn_cast<SelectInst>(ReduxDesc.getPatternInst())) {		if (auto *Sel = dyn_cast<SelectInst>(ReduxDesc.getPatternInst())) {
// Accept FMF on either fcmp or select of a min/max idiom.		// Accept FMF on either fcmp or select of a min/max idiom.
// TODO: This is a hack to work-around the fact that FMF may not be		// TODO: This is a hack to work-around the fact that FMF may not be
// assigned/propagated correctly. If that problem is fixed or we		// assigned/propagated correctly. If that problem is fixed or we
// standardize on fmin/fmax via intrinsics, this can be removed.		// standardize on fmin/fmax via intrinsics, this can be removed.
Show All 22 Lines	if (!IsAPhi && !IsASelect && !isMinMaxRecurrenceKind(Kind) &&
!isSelectCmpRecurrenceKind(Kind) &&		!isSelectCmpRecurrenceKind(Kind) &&
hasMultipleUsesOf(Cur, VisitedInsts, 1))		hasMultipleUsesOf(Cur, VisitedInsts, 1))
return false;		return false;

// All inputs to a PHI node must be a reduction value.		// All inputs to a PHI node must be a reduction value.
if (IsAPhi && Cur != Phi && !areAllUsesIn(Cur, VisitedInsts))		if (IsAPhi && Cur != Phi && !areAllUsesIn(Cur, VisitedInsts))
return false;		return false;

if ((isIntMinMaxRecurrenceKind(Kind) \|\| Kind == RecurKind::SelectICmp) &&		if ((isIntMinMaxRecurrenceKind(Kind) \|\| Kind == RecurKind::SelectICmp \|\|
		Kind == RecurKind::SelectIVICmp) &&
(isa<ICmpInst>(Cur) \|\| isa<SelectInst>(Cur)))		(isa<ICmpInst>(Cur) \|\| isa<SelectInst>(Cur)))
++NumCmpSelectPatternInst;		++NumCmpSelectPatternInst;
if ((isFPMinMaxRecurrenceKind(Kind) \|\| Kind == RecurKind::SelectFCmp) &&		if ((isFPMinMaxRecurrenceKind(Kind) \|\| Kind == RecurKind::SelectFCmp \|\|
		Kind == RecurKind::SelectIVFCmp) &&
(isa<FCmpInst>(Cur) \|\| isa<SelectInst>(Cur)))		(isa<FCmpInst>(Cur) \|\| isa<SelectInst>(Cur)))
++NumCmpSelectPatternInst;		++NumCmpSelectPatternInst;

		// Save the main operation of min/max recurrence, which may be a intrinsic
		// call, or select instruction.
		if (isMinMaxRecurrenceKind(Kind) &&
		(isa<SelectInst>(Cur) \|\| isa<IntrinsicInst>(Cur)))
		MinMaxRecurOperation = Cur;

// Check whether we found a reduction operator.		// Check whether we found a reduction operator.
FoundReduxOp \|= !IsAPhi && Cur != Start;		FoundReduxOp \|= !IsAPhi && Cur != Start;

// Process users of current instruction. Push non-PHI nodes after PHI nodes		// Process users of current instruction. Push non-PHI nodes after PHI nodes
// onto the stack. This way we are going to have seen all inputs to PHI		// onto the stack. This way we are going to have seen all inputs to PHI
// nodes once we get to them.		// nodes once we get to them.
SmallVector<Instruction *, 8> NonPHIs;		SmallVector<Instruction *, 8> NonPHIs;
SmallVector<Instruction *, 8> PHIs;		SmallVector<Instruction *, 8> PHIs;
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	for (User *U : Cur->users()) {
return false;		return false;
}		}
NonPHIs.push_back(UI);		NonPHIs.push_back(UI);
}		}
} else if (!isa<PHINode>(UI) &&		} else if (!isa<PHINode>(UI) &&
((!isa<FCmpInst>(UI) && !isa<ICmpInst>(UI) &&		((!isa<FCmpInst>(UI) && !isa<ICmpInst>(UI) &&
!isa<SelectInst>(UI)) \|\|		!isa<SelectInst>(UI)) \|\|
(!isConditionalRdxPattern(Kind, UI).isRecurrence() &&		(!isConditionalRdxPattern(Kind, UI).isRecurrence() &&
!isSelectCmpPattern(TheLoop, Phi, UI, IgnoredVal)		!isSelectCmpPattern(TheLoop, Phi, UI, IgnoredVal, SE)
.isRecurrence() &&		.isRecurrence() &&
!isMinMaxPattern(UI, Kind, IgnoredVal).isRecurrence())))		!isMinMaxPattern(UI, Kind, IgnoredVal, TheLoop, Phi, SE)
		.isRecurrence())))
return false;		return false;

// Remember that we completed the cycle.		// Remember that we completed the cycle.
if (UI == Phi)		if (UI == Phi)
FoundStartPHI = true;		FoundStartPHI = true;
}		}
Worklist.append(PHIs.begin(), PHIs.end());		Worklist.append(PHIs.begin(), PHIs.end());
Worklist.append(NonPHIs.begin(), NonPHIs.end());		Worklist.append(NonPHIs.begin(), NonPHIs.end());
}		}

// This means we have seen one but not the other instruction of the		// This means we have seen one but not the other instruction of the
// pattern or more than just a select and cmp. Zero implies that we saw a		// pattern or more than just a select and cmp. Zero implies that we saw a
// llvm.min/max intrinsic, which is always OK.		// llvm.min/max intrinsic, which is always OK.
if (isMinMaxRecurrenceKind(Kind) && NumCmpSelectPatternInst != 2 &&		if (isMinMaxRecurrenceKind(Kind) && NumCmpSelectPatternInst != 2 &&
NumCmpSelectPatternInst != 0)		NumCmpSelectPatternInst != 0)
return false;		return false;

if (isSelectCmpRecurrenceKind(Kind) && NumCmpSelectPatternInst != 1)		if (isSelectCmpRecurrenceKind(Kind) && NumCmpSelectPatternInst != 1)
return false;		return false;

		if (isMinMaxRecurrenceKind(Kind) && UserRecurPHI) {
		assert(isa<SelectInst>(UserRecurInstr) &&
		"Unexpected instruction of user recurrence for min/max recurrence");
		auto *UserRecurSI = dyn_cast<SelectInst>(UserRecurInstr);
		if (auto *MinMaxSI = dyn_cast<SelectInst>(MinMaxRecurOperation)) {
		// TODO: As long as the operands are the same, it is not limited to the
		// same cmp instruction.
		if (UserRecurSI->getCondition() != MinMaxSI->getCondition())
		return false;
		} else if (auto *MinMaxII = dyn_cast<IntrinsicInst>(MinMaxRecurOperation)) {
		auto *UserRecurCI = dyn_cast<CmpInst>(UserRecurSI->getCondition());
		// Match smax(%maxphi, %0), icmp(pred, %maxphi, %0) or
		// smax(%maxphi, %0), icmp(inverted_pred, %0, %maxphi)
		if (!(UserRecurCI->getOperand(0) == MinMaxII->getOperand(0) &&
		UserRecurCI->getOperand(1) == MinMaxII->getOperand(1)) &&
		!(UserRecurCI->getOperand(0) == MinMaxII->getOperand(1) &&
		UserRecurCI->getOperand(1) == MinMaxII->getOperand(0)))
		return false;
		}
		}

if (IntermediateStore) {		if (IntermediateStore) {
// Check that stored value goes to the phi node again. This way we make sure		// Check that stored value goes to the phi node again. This way we make sure
// that the value stored in IntermediateStore is indeed the final reduction		// that the value stored in IntermediateStore is indeed the final reduction
// value.		// value.
if (!is_contained(Phi->operands(), IntermediateStore->getValueOperand())) {		if (!is_contained(Phi->operands(), IntermediateStore->getValueOperand())) {
LLVM_DEBUG(dbgs() << "Not a final reduction value stored: "		LLVM_DEBUG(dbgs() << "Not a final reduction value stored: "
<< *IntermediateStore << '\n');		<< *IntermediateStore << '\n');
return false;		return false;
▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines	bool RecurrenceDescriptor::AddReductionVar(
// only have a single instruction with out-of-loop users.		// only have a single instruction with out-of-loop users.

// The ExitInstruction(Instruction which is allowed to have out-of-loop users)		// The ExitInstruction(Instruction which is allowed to have out-of-loop users)
// is saved as part of the RecurrenceDescriptor.		// is saved as part of the RecurrenceDescriptor.

// Save the description of this reduction variable.		// Save the description of this reduction variable.
RecurrenceDescriptor RD(RdxStart, ExitInstruction, IntermediateStore, Kind,		RecurrenceDescriptor RD(RdxStart, ExitInstruction, IntermediateStore, Kind,
FMF, ExactFPMathInst, RecurrenceType, IsSigned,		FMF, ExactFPMathInst, RecurrenceType, IsSigned,
IsOrdered, CastInsts, MinWidthCastToRecurrenceType);		IsOrdered, CastInsts, MinWidthCastToRecurrenceType,
		UserRecurPHI, UserRecurKind);
RedDes = RD;		RedDes = RD;

return true;		return true;
}		}

// We are looking for loops that do something like this:		// We are looking for loops that do something like this:
// int r = 0;		// int r = 0;
// for (int i = 0; i < n; i++) {		// for (int i = 0; i < n; i++) {
Show All 12 Lines
// any two non-constants, provided they are loop invariant. The only thing		// any two non-constants, provided they are loop invariant. The only thing
// we actually care about at the end of the loop is whether or not any lane		// we actually care about at the end of the loop is whether or not any lane
// in the selected vector is different from the start value. The final		// in the selected vector is different from the start value. The final
// across-vector reduction after the loop simply involves choosing the start		// across-vector reduction after the loop simply involves choosing the start
// value if nothing changed (0 in the example above) or the other selected		// value if nothing changed (0 in the example above) or the other selected
// value (3 in the example above).		// value (3 in the example above).
RecurrenceDescriptor::InstDesc		RecurrenceDescriptor::InstDesc
RecurrenceDescriptor::isSelectCmpPattern(Loop Loop, PHINode OrigPhi,		RecurrenceDescriptor::isSelectCmpPattern(Loop Loop, PHINode OrigPhi,
Instruction *I, InstDesc &Prev) {		Instruction *I, InstDesc &Prev,
		ScalarEvolution *SE) {
// We must handle the select(cmp(),x,y) as a single instruction. Advance to		// We must handle the select(cmp(),x,y) as a single instruction. Advance to
// the select.		// the select.
CmpInst::Predicate Pred;		CmpInst::Predicate Pred;
if (match(I, m_OneUse(m_Cmp(Pred, m_Value(), m_Value())))) {		if (match(I, m_OneUse(m_Cmp(Pred, m_Value(), m_Value())))) {
if (auto Select = dyn_cast<SelectInst>(I->user_begin()))		if (auto Select = dyn_cast<SelectInst>(I->user_begin()))
return InstDesc(Select, Prev.getRecKind());		return InstDesc(Select, Prev.getRecKind());
}		}

// Only match select with single use cmp condition.		// Only match select with single use cmp condition.
if (!match(I, m_Select(m_OneUse(m_Cmp(Pred, m_Value(), m_Value())), m_Value(),		if (!match(I, m_Select(m_OneUse(m_Cmp(Pred, m_Value(), m_Value())), m_Value(),
m_Value())))		m_Value())))
return InstDesc(false, I);		return InstDesc(false, I);

SelectInst *SI = cast<SelectInst>(I);		SelectInst *SI = cast<SelectInst>(I);
Value *NonPhi = nullptr;		Value *NonPhi = nullptr;

if (OrigPhi == dyn_cast<PHINode>(SI->getTrueValue()))		if (OrigPhi == dyn_cast<PHINode>(SI->getTrueValue()))
NonPhi = SI->getFalseValue();		NonPhi = SI->getFalseValue();
else if (OrigPhi == dyn_cast<PHINode>(SI->getFalseValue()))		else if (OrigPhi == dyn_cast<PHINode>(SI->getFalseValue()))
NonPhi = SI->getTrueValue();		NonPhi = SI->getTrueValue();
else		else
return InstDesc(false, I);		return InstDesc(false, I);

		auto isIncreasingLoopInduction = [&SE, &Loop](Value *V) {
		fhahnUnsubmitted Not Done Reply Inline Actions nit: Variables should start with upper case also, move definition to use? fhahn: nit: Variables should start with upper case also, move definition to use?
		if (!SE)
		return false;

		auto *Phi = dyn_cast<PHINode>(V);
		if (!Phi)
		return false;

		auto LB = Loop::LoopBounds::getBounds(Loop, Phi, *SE);
		fhahnUnsubmitted Not Done Reply Inline Actions Using this API seems unnecessarily strict; we don't need to bounds (and getBounds may fail if It cannot identify the bounds), we just need to check the direction of the IV, which can be done by checking if it is an induction PHI and use `SE.getMonotonicPredicateTyp`. fhahn: Using this API seems unnecessarily strict; we don't need to bounds (and getBounds may fail if…
		if (!LB)
		return false;

		auto Direction = LB->getDirection();
		return Direction == Loop::LoopBounds::Direction::Increasing;
		};

// We are looking for selects of the form:		// We are looking for selects of the form:
// select(cmp(), phi, loop_invariant) or		// select(cmp(), phi, loop_invariant) or
// select(cmp(), loop_invariant, phi)		// select(cmp(), loop_invariant, phi)
if (!Loop->isLoopInvariant(NonPhi))		if (Loop->isLoopInvariant(NonPhi))
return InstDesc(false, I);

return InstDesc(I, isa<ICmpInst>(I->getOperand(0)) ? RecurKind::SelectICmp		return InstDesc(I, isa<ICmpInst>(I->getOperand(0)) ? RecurKind::SelectICmp
: RecurKind::SelectFCmp);		: RecurKind::SelectFCmp);
		// or
		// select(cmp(), phi, loop_induction) or
		// select(cmp(), loop_induction, phi)
		if (isIncreasingLoopInduction(NonPhi))
		fhahnUnsubmitted Not Done Reply Inline Actions The naming here is a bit confusing now, `NonPhi` can be an increasing loop induction? In that case it would be a phi, right? fhahn: The naming here is a bit confusing now, `NonPhi` can be an increasing loop induction? In that…
		Mel-ChenAuthorUnsubmitted Done Reply Inline Actions Yes, it's a little confusing here. It could be better to replace `NonPhi` with `NonRecurPhi`. By the way, are you interested in supporting full functional SelectCmp pattern? I think min max with index pattern really needs to depend on the SelectCmp to be safe. Mel-Chen: Yes, it's a little confusing here. It could be better to replace `NonPhi` with `NonRecurPhi`.
		return InstDesc(I, isa<ICmpInst>(I->getOperand(0))
		? RecurKind::SelectIVICmp
		: RecurKind::SelectIVFCmp);

		return InstDesc(false, I);
}		}

RecurrenceDescriptor::InstDesc		RecurrenceDescriptor::InstDesc
RecurrenceDescriptor::isMinMaxPattern(Instruction *I, RecurKind Kind,		RecurrenceDescriptor::isMinMaxPattern(Instruction *I, RecurKind Kind,
const InstDesc &Prev) {		const InstDesc &Prev, Loop *Loop,
		PHINode OrigPhi, ScalarEvolution SE) {
assert((isa<CmpInst>(I) \|\| isa<SelectInst>(I) \|\| isa<CallInst>(I)) &&		assert((isa<CmpInst>(I) \|\| isa<SelectInst>(I) \|\| isa<CallInst>(I)) &&
"Expected a cmp or select or call instruction");		"Expected a cmp or select or call instruction");
if (!isMinMaxRecurrenceKind(Kind))		if (!isMinMaxRecurrenceKind(Kind))
return InstDesc(false, I);		return InstDesc(false, I);

// We must handle the select(cmp()) as a single instruction. Advance to the		// We must handle the select(cmp()) as a single instruction. Advance to the
// select.		// select.
CmpInst::Predicate Pred;		CmpInst::Predicate Pred;
if (match(I, m_OneUse(m_Cmp(Pred, m_Value(), m_Value())))) {		if (match(I, m_OneUse(m_Cmp(Pred, m_Value(), m_Value())))) {
if (auto Select = dyn_cast<SelectInst>(I->user_begin()))		if (auto Select = dyn_cast<SelectInst>(I->user_begin()))
return InstDesc(Select, Prev.getRecKind());		return InstDesc(Select, Prev.getRecKind());
}		}

// Only match select with single use cmp condition, or a min/max intrinsic.		// Only match select with single use cmp condition, or a min/max intrinsic.
if (!isa<IntrinsicInst>(I) &&		if (!isa<IntrinsicInst>(I) &&
!match(I, m_Select(m_OneUse(m_Cmp(Pred, m_Value(), m_Value())), m_Value(),		!match(I, m_Select(m_OneUse(m_Cmp(Pred, m_Value(), m_Value())), m_Value(),
m_Value())))		m_Value())))
return InstDesc(false, I);		return InstDesc(false, I);

		RecurKind MMRK = isMinMaxOperation(I);
		if (MMRK != RecurKind::None)
		return InstDesc(Kind == MMRK, I);

		if (isa<SelectInst>(I))
		return isMinMaxIdxPattern(Loop, I, OrigPhi, Kind, SE);

		return InstDesc(false, I);
		}

		RecurKind RecurrenceDescriptor::isMinMaxOperation(Instruction *I) {
// Look for a min/max pattern.		// Look for a min/max pattern.
if (match(I, m_UMin(m_Value(), m_Value())))		if (match(I, m_UMin(m_Value(), m_Value())))
return InstDesc(Kind == RecurKind::UMin, I);		return RecurKind::UMin;
if (match(I, m_UMax(m_Value(), m_Value())))		if (match(I, m_UMax(m_Value(), m_Value())))
return InstDesc(Kind == RecurKind::UMax, I);		return RecurKind::UMax;
if (match(I, m_SMax(m_Value(), m_Value())))		if (match(I, m_SMax(m_Value(), m_Value())))
return InstDesc(Kind == RecurKind::SMax, I);		return RecurKind::SMax;
if (match(I, m_SMin(m_Value(), m_Value())))		if (match(I, m_SMin(m_Value(), m_Value())))
return InstDesc(Kind == RecurKind::SMin, I);		return RecurKind::SMin;
if (match(I, m_OrdFMin(m_Value(), m_Value())))		if (match(I, m_OrdFMin(m_Value(), m_Value())))
return InstDesc(Kind == RecurKind::FMin, I);		return RecurKind::FMin;
if (match(I, m_OrdFMax(m_Value(), m_Value())))		if (match(I, m_OrdFMax(m_Value(), m_Value())))
return InstDesc(Kind == RecurKind::FMax, I);		return RecurKind::FMax;
if (match(I, m_UnordFMin(m_Value(), m_Value())))		if (match(I, m_UnordFMin(m_Value(), m_Value())))
return InstDesc(Kind == RecurKind::FMin, I);		return RecurKind::FMin;
if (match(I, m_UnordFMax(m_Value(), m_Value())))		if (match(I, m_UnordFMax(m_Value(), m_Value())))
return InstDesc(Kind == RecurKind::FMax, I);		return RecurKind::FMax;
if (match(I, m_Intrinsic<Intrinsic::minnum>(m_Value(), m_Value())))		if (match(I, m_Intrinsic<Intrinsic::minnum>(m_Value(), m_Value())))
return InstDesc(Kind == RecurKind::FMin, I);		return RecurKind::FMin;
if (match(I, m_Intrinsic<Intrinsic::maxnum>(m_Value(), m_Value())))		if (match(I, m_Intrinsic<Intrinsic::maxnum>(m_Value(), m_Value())))
return InstDesc(Kind == RecurKind::FMax, I);		return RecurKind::FMax;

		return RecurKind::None;
		}

		RecurrenceDescriptor::InstDesc RecurrenceDescriptor::isMinMaxIdxPattern(
		Loop Loop, Instruction I, PHINode *MinMaxPhi, RecurKind MinMaxKind,
		ScalarEvolution *SE) {
		assert(isa<SelectInst>(I) && "Expected a select instruction");
		// TODO: FP MinMax
		if (!isIntMinMaxRecurrenceKind(MinMaxKind))
		return InstDesc(false, I);

		// Requires SCEV to check the index part
		if (!SE) {
		LLVM_DEBUG(dbgs() << "MinMaxIdx patterns are not recognized without "
		<< "Scalar Evolution Analysis\n");
		return InstDesc(false, I);
		}

		// Check the index select
		auto *SI = dyn_cast<SelectInst>(I);
		auto *CI = dyn_cast<CmpInst>(SI->getCondition());
		Value LHS = CI->getOperand(0), RHS = CI->getOperand(1);

		// %cmp = icmp pred, %mmphi, %0
		// %select = select %cmp, %update, %idxphi
		// Check if cmp used min/max phi
		bool IsLHSPhi;
		if (MinMaxPhi == dyn_cast<PHINode>(LHS))
		IsLHSPhi = true;
		else if (MinMaxPhi == dyn_cast<PHINode>(RHS))
		IsLHSPhi = false;
		else
		return InstDesc(false, I);

		// Normalize the predicate, and get which side the select should update idx
		// TODO: Need to consider commutable.
		CmpInst::Predicate NormPred =
		IsLHSPhi ? CI->getPredicate() : CI->getSwappedPredicate();
		bool UpdateSide;
		RecurKind ExpectedIdxRK;
		switch (NormPred) {
		case CmpInst::ICMP_SLT:
		case CmpInst::ICMP_ULT:
		// %mmphi < %0
		UpdateSide = isMaxRecurrenceKind(MinMaxKind);
		ExpectedIdxRK = isMaxRecurrenceKind(MinMaxKind) ? RecurKind::MinMaxFirstIdx
		: RecurKind::MinMaxLastIdx;
		break;
		case CmpInst::ICMP_SLE:
		case CmpInst::ICMP_ULE:
		// %mmphi <= %0
		UpdateSide = isMaxRecurrenceKind(MinMaxKind);
		ExpectedIdxRK = isMaxRecurrenceKind(MinMaxKind) ? RecurKind::MinMaxLastIdx
		: RecurKind::MinMaxFirstIdx;
		break;
		case CmpInst::ICMP_SGT:
		case CmpInst::ICMP_UGT:
		// %mmphi > %0
		UpdateSide = !isMaxRecurrenceKind(MinMaxKind);
		ExpectedIdxRK = isMaxRecurrenceKind(MinMaxKind) ? RecurKind::MinMaxLastIdx
		: RecurKind::MinMaxFirstIdx;
		break;
		case CmpInst::ICMP_SGE:
		case CmpInst::ICMP_UGE:
		// %mmphi >= %0
		UpdateSide = !isMaxRecurrenceKind(MinMaxKind);
		ExpectedIdxRK = isMaxRecurrenceKind(MinMaxKind) ? RecurKind::MinMaxFirstIdx
		: RecurKind::MinMaxLastIdx;
		break;
		artagnonUnsubmitted Not Done Reply Inline Actions This is a bit cryptic: would you consider adding more `RecurKind`s to make this less cryptic? artagnon: This is a bit cryptic: would you consider adding more `RecurKind`s to make this less cryptic?
		default:
		return InstDesc(false, I);
		}

		// Get the reduction phi of index select
		Value *IdxUpdateV = UpdateSide ? SI->getTrueValue() : SI->getFalseValue();
		Value *IdxReduxV = UpdateSide ? SI->getFalseValue() : SI->getTrueValue();
		// Handle the operand of index select may have been casted.
		if (auto *Cast = dyn_cast<CastInst>(IdxUpdateV))
		IdxUpdateV = Cast->getOperand(0);

		auto *IdxUpdatePhi = dyn_cast<PHINode>(IdxUpdateV);
		auto *IdxReduxPhi = dyn_cast<PHINode>(IdxReduxV);
		if (!IdxUpdatePhi \|\| !IdxReduxPhi)
		return InstDesc(false, I);

		// Check update side is a loop induction variable
		InductionDescriptor ID;
		artagnonUnsubmitted Not Done Reply Inline Actions Can we avoid the expensive call to `isInductionPHI()` by checking that the `SCEVAddRec` is a `SCEVConstant`? artagnon: Can we avoid the expensive call to `isInductionPHI()` by checking that the `SCEVAddRec` is a…
		if (!InductionDescriptor::isInductionPHI(IdxUpdatePhi, Loop, SE, ID))
return InstDesc(false, I);		return InstDesc(false, I);

		// The reduction phi of index select and reduction phi of min/max must not the
		// same
		if (IdxReduxPhi == MinMaxPhi)
		return InstDesc(false, I);

		return InstDesc(false, I, IdxReduxPhi, ExpectedIdxRK);
}		}

/// Returns true if the select instruction has users in the compare-and-add		/// Returns true if the select instruction has users in the compare-and-add
/// reduction pattern below. The select instruction argument is the last one		/// reduction pattern below. The select instruction argument is the last one
/// in the sequence.		/// in the sequence.
///		///
/// %sum.1 = phi ...		/// %sum.1 = phi ...
/// ...		/// ...
Show All 38 Lines	RecurrenceDescriptor::isConditionalRdxPattern(RecurKind Kind, Instruction *I) {
Instruction IPhi = isa<PHINode>(Op1) ? dyn_cast<Instruction>(Op1)		Instruction IPhi = isa<PHINode>(Op1) ? dyn_cast<Instruction>(Op1)
: dyn_cast<Instruction>(Op2);		: dyn_cast<Instruction>(Op2);
if (!IPhi \|\| IPhi != FalseVal)		if (!IPhi \|\| IPhi != FalseVal)
return InstDesc(false, I);		return InstDesc(false, I);

return InstDesc(true, SI);		return InstDesc(true, SI);
}		}

RecurrenceDescriptor::InstDesc		RecurrenceDescriptor::InstDesc RecurrenceDescriptor::isRecurrenceInstr(
RecurrenceDescriptor::isRecurrenceInstr(Loop L, PHINode OrigPhi,		Loop L, PHINode OrigPhi, Instruction *I, RecurKind Kind, InstDesc &Prev,
Instruction *I, RecurKind Kind,		FastMathFlags FuncFMF, ScalarEvolution *SE) {
InstDesc &Prev, FastMathFlags FuncFMF) {
assert(Prev.getRecKind() == RecurKind::None \|\| Prev.getRecKind() == Kind);		assert(Prev.getRecKind() == RecurKind::None \|\| Prev.getRecKind() == Kind);
switch (I->getOpcode()) {		switch (I->getOpcode()) {
default:		default:
return InstDesc(false, I);		return InstDesc(false, I);
case Instruction::PHI:		case Instruction::PHI:
return InstDesc(I, Prev.getRecKind(), Prev.getExactFPMathInst());		return InstDesc(I, Prev.getRecKind(), Prev.getExactFPMathInst());
case Instruction::Sub:		case Instruction::Sub:
case Instruction::Add:		case Instruction::Add:
Show All 18 Lines	case Instruction::Select:
if (Kind == RecurKind::FAdd \|\| Kind == RecurKind::FMul \|\|		if (Kind == RecurKind::FAdd \|\| Kind == RecurKind::FMul \|\|
Kind == RecurKind::Add \|\| Kind == RecurKind::Mul)		Kind == RecurKind::Add \|\| Kind == RecurKind::Mul)
return isConditionalRdxPattern(Kind, I);		return isConditionalRdxPattern(Kind, I);
[[fallthrough]];		[[fallthrough]];
case Instruction::FCmp:		case Instruction::FCmp:
case Instruction::ICmp:		case Instruction::ICmp:
case Instruction::Call:		case Instruction::Call:
if (isSelectCmpRecurrenceKind(Kind))		if (isSelectCmpRecurrenceKind(Kind))
return isSelectCmpPattern(L, OrigPhi, I, Prev);		return isSelectCmpPattern(L, OrigPhi, I, Prev, SE);
if (isIntMinMaxRecurrenceKind(Kind) \|\|		if (isIntMinMaxRecurrenceKind(Kind) \|\|
(((FuncFMF.noNaNs() && FuncFMF.noSignedZeros()) \|\|		(((FuncFMF.noNaNs() && FuncFMF.noSignedZeros()) \|\|
(isa<FPMathOperator>(I) && I->hasNoNaNs() &&		(isa<FPMathOperator>(I) && I->hasNoNaNs() &&
I->hasNoSignedZeros())) &&		I->hasNoSignedZeros())) &&
isFPMinMaxRecurrenceKind(Kind)))		isFPMinMaxRecurrenceKind(Kind)))
return isMinMaxPattern(I, Kind, Prev);		return isMinMaxPattern(I, Kind, Prev, L, OrigPhi, SE);
else if (isFMulAddIntrinsic(I))		else if (isFMulAddIntrinsic(I))
return InstDesc(Kind == RecurKind::FMulAdd, I,		return InstDesc(Kind == RecurKind::FMulAdd, I,
I->hasAllowReassoc() ? nullptr : I);		I->hasAllowReassoc() ? nullptr : I);
return InstDesc(false, I);		return InstDesc(false, I);
}		}
}		}

bool RecurrenceDescriptor::hasMultipleUsesOf(		bool RecurrenceDescriptor::hasMultipleUsesOf(
▲ Show 20 Lines • Show All 304 Lines • ▼ Show 20 Lines	case RecurKind::FMin:
return ConstantFP::getInfinity(Tp, false /Negative/);		return ConstantFP::getInfinity(Tp, false /Negative/);
case RecurKind::FMax:		case RecurKind::FMax:
assert((FMF.noNaNs() && FMF.noSignedZeros()) &&		assert((FMF.noNaNs() && FMF.noSignedZeros()) &&
"nnan, nsz is expected to be set for FP max reduction.");		"nnan, nsz is expected to be set for FP max reduction.");
return ConstantFP::getInfinity(Tp, true /Negative/);		return ConstantFP::getInfinity(Tp, true /Negative/);
case RecurKind::SelectICmp:		case RecurKind::SelectICmp:
case RecurKind::SelectFCmp:		case RecurKind::SelectFCmp:
return getRecurrenceStartValue();		return getRecurrenceStartValue();
break;		break;
		case RecurKind::SelectIVICmp:
		case RecurKind::SelectIVFCmp:
		case RecurKind::MinMaxFirstIdx:
		case RecurKind::MinMaxLastIdx:
		// FIXME: SMax or UMax, I'm not sure which one is correct.
		return getRecurrenceIdentity(RecurKind::SMax, Tp, FMF);
default:		default:
		artagnonUnsubmitted Not Done Reply Inline Actions Why not merge this with the `RecurKind::SMax` case? artagnon: Why not merge this with the `RecurKind::SMax` case?
llvm_unreachable("Unknown recurrence kind");		llvm_unreachable("Unknown recurrence kind");
}		}
}		}

unsigned RecurrenceDescriptor::getOpcode(RecurKind Kind) {		unsigned RecurrenceDescriptor::getOpcode(RecurKind Kind) {
switch (Kind) {		switch (Kind) {
case RecurKind::Add:		case RecurKind::Add:
return Instruction::Add;		return Instruction::Add;
Show All 10 Lines	unsigned RecurrenceDescriptor::getOpcode(RecurKind Kind) {
case RecurKind::FMulAdd:		case RecurKind::FMulAdd:
case RecurKind::FAdd:		case RecurKind::FAdd:
return Instruction::FAdd;		return Instruction::FAdd;
case RecurKind::SMax:		case RecurKind::SMax:
case RecurKind::SMin:		case RecurKind::SMin:
case RecurKind::UMax:		case RecurKind::UMax:
case RecurKind::UMin:		case RecurKind::UMin:
case RecurKind::SelectICmp:		case RecurKind::SelectICmp:
		case RecurKind::SelectIVICmp:
		// TODO: maybe new FMinMaxFirstIdx/ FMinMaxLastIdx
		case RecurKind::MinMaxFirstIdx:
		artagnonUnsubmitted Not Done Reply Inline Actions Rename these to `IMinMaxFirstIdx` and `IMinMaxLastIdx`? artagnon: Rename these to `IMinMaxFirstIdx` and `IMinMaxLastIdx`?
		case RecurKind::MinMaxLastIdx:
return Instruction::ICmp;		return Instruction::ICmp;
case RecurKind::FMax:		case RecurKind::FMax:
case RecurKind::FMin:		case RecurKind::FMin:
case RecurKind::SelectFCmp:		case RecurKind::SelectFCmp:
		case RecurKind::SelectIVFCmp:
return Instruction::FCmp;		return Instruction::FCmp;
default:		default:
llvm_unreachable("Unknown recurrence operation");		llvm_unreachable("Unknown recurrence operation");
}		}
}		}

SmallVector<Instruction *, 4>		SmallVector<Instruction *, 4>
RecurrenceDescriptor::getReductionOpChain(PHINode Phi, Loop L) const {		RecurrenceDescriptor::getReductionOpChain(PHINode Phi, Loop L) const {
▲ Show 20 Lines • Show All 434 Lines • Show Last 20 Lines

llvm/lib/Transforms/Utils/LoopUtils.cpp

Show First 20 Lines • Show All 894 Lines • ▼ Show 20 Lines	CmpInst::Predicate llvm::getMinMaxReductionPredicate(RecurKind RK) {
case RecurKind::FMin:		case RecurKind::FMin:
return CmpInst::FCMP_OLT;		return CmpInst::FCMP_OLT;
case RecurKind::FMax:		case RecurKind::FMax:
return CmpInst::FCMP_OGT;		return CmpInst::FCMP_OGT;
}		}
}		}

Value llvm::createSelectCmpOp(IRBuilderBase &Builder, Value StartVal,		Value llvm::createSelectCmpOp(IRBuilderBase &Builder, Value StartVal,
RecurKind RK, Value Left, Value Right) {		RecurKind RK, Value Left, Value Right,
		Value *SrcCmp) {
		switch (RK) {
		case RecurKind::SelectICmp:
		case RecurKind::SelectFCmp: {
if (auto VTy = dyn_cast<VectorType>(Left->getType()))		if (auto VTy = dyn_cast<VectorType>(Left->getType()))
StartVal = Builder.CreateVectorSplat(VTy->getElementCount(), StartVal);		StartVal = Builder.CreateVectorSplat(VTy->getElementCount(), StartVal);
Value *Cmp =		Value *Cmp =
Builder.CreateCmp(CmpInst::ICMP_NE, Left, StartVal, "rdx.select.cmp");		Builder.CreateCmp(CmpInst::ICMP_NE, Left, StartVal, "rdx.select.cmp");
return Builder.CreateSelect(Cmp, Left, Right, "rdx.select");		return Builder.CreateSelect(Cmp, Left, Right, "rdx.select");
}		}
		case RecurKind::SelectIVICmp:
		case RecurKind::SelectIVFCmp:
		// TODO: SMax or UMax?
		return createMinMaxOp(Builder, RecurKind::SMax, Left, Right);
		case RecurKind::MinMaxFirstIdx: {
		assert((SrcCmp && isa<CmpInst>(SrcCmp)) &&
		"SrcCmp should not be nullptr when MinMaxFirstIdx recurrence");
		auto *SrcCI = dyn_cast<CmpInst>(SrcCmp);
		CmpInst::Predicate Pred = SrcCI->getNonStrictPredicate();
		Value *Cmp = Builder.CreateCmp(Pred, SrcCI->getOperand(0),
		SrcCI->getOperand(1), "rdx.select.cmp");
		return Builder.CreateSelect(Cmp, Left, Right, "rdx.select");
		}
		case RecurKind::MinMaxLastIdx:
		assert((SrcCmp && isa<CmpInst>(SrcCmp)) &&
		"SrcCmp should not be nullptr when MinMaxLastIdx recurrence");
		return Builder.CreateSelect(SrcCmp, Left, Right, "rdx.select");
		default:
		llvm_unreachable("Unknown SelectCmp recurrence kind");
		}
		}

Value llvm::createMinMaxOp(IRBuilderBase &Builder, RecurKind RK, Value Left,		Value llvm::createMinMaxOp(IRBuilderBase &Builder, RecurKind RK, Value Left,
Value *Right) {		Value *Right) {
CmpInst::Predicate Pred = getMinMaxReductionPredicate(RK);		CmpInst::Predicate Pred = getMinMaxReductionPredicate(RK);
Value *Cmp = Builder.CreateCmp(Pred, Left, Right, "rdx.minmax.cmp");		Value *Cmp = Builder.CreateCmp(Pred, Left, Right, "rdx.minmax.cmp");
Value *Select = Builder.CreateSelect(Cmp, Left, Right, "rdx.minmax.select");		Value *Select = Builder.CreateSelect(Cmp, Left, Right, "rdx.minmax.select");
return Select;		return Select;
}		}
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	if (Op != Instruction::ICmp && Op != Instruction::FCmp) {
"Invalid min/max");		"Invalid min/max");
TmpVec = createMinMaxOp(Builder, RdxKind, TmpVec, Shuf);		TmpVec = createMinMaxOp(Builder, RdxKind, TmpVec, Shuf);
}		}
}		}
// The result is in the first element of the vector.		// The result is in the first element of the vector.
return Builder.CreateExtractElement(TmpVec, Builder.getInt32(0));		return Builder.CreateExtractElement(TmpVec, Builder.getInt32(0));
}		}

Value *llvm::createSelectCmpTargetReduction(IRBuilderBase &Builder,		Value *llvm::createInvariantSelectCmpTargetReduction(
const TargetTransformInfo *TTI,		IRBuilderBase &Builder, const TargetTransformInfo TTI, Value Src,
Value *Src,		const RecurrenceDescriptor &Desc, PHINode *OrigPhi) {
const RecurrenceDescriptor &Desc,		assert((Desc.getRecurrenceKind() == RecurKind::SelectICmp \|\|
PHINode *OrigPhi) {		Desc.getRecurrenceKind() == RecurKind::SelectFCmp) &&
assert(RecurrenceDescriptor::isSelectCmpRecurrenceKind(
Desc.getRecurrenceKind()) &&
"Unexpected reduction kind");		"Unexpected reduction kind");
Value *InitVal = Desc.getRecurrenceStartValue();		Value *InitVal = Desc.getRecurrenceStartValue();
Value *NewVal = nullptr;		Value *NewVal = nullptr;

// First use the original phi to determine the new value we're trying to		// First use the original phi to determine the new value we're trying to
// select from in the loop.		// select from in the loop.
SelectInst *SI = nullptr;		SelectInst *SI = nullptr;
for (auto *U : OrigPhi->users()) {		for (auto *U : OrigPhi->users()) {
Show All 17 Lines	Value *llvm::createInvariantSelectCmpTargetReduction(
Value *Cmp =		Value *Cmp =
Builder.CreateCmp(CmpInst::ICMP_NE, Src, Right, "rdx.select.cmp");		Builder.CreateCmp(CmpInst::ICMP_NE, Src, Right, "rdx.select.cmp");

// If any predicate is true it means that we want to select the new value.		// If any predicate is true it means that we want to select the new value.
Cmp = Builder.CreateOrReduce(Cmp);		Cmp = Builder.CreateOrReduce(Cmp);
return Builder.CreateSelect(Cmp, NewVal, InitVal, "rdx.select");		return Builder.CreateSelect(Cmp, NewVal, InitVal, "rdx.select");
}		}

		Value *llvm::createMMISelectCmpTargetReduction(
		IRBuilderBase &Builder, const TargetTransformInfo TTI, Value Src,
		const RecurrenceDescriptor &Desc, PHINode OrigPhi, Value SrcMask) {
		assert(RecurrenceDescriptor::isMinMaxIdxRecurrenceKind(
		Desc.getRecurrenceKind()) &&
		"Unexpected reduction kind");
		RecurKind Kind = Desc.getRecurrenceKind();
		// FIXME: UMax/SMax or UMin/UMax?
		RecurKind RdxExtractK =
		Kind == RecurKind::MinMaxFirstIdx ? RecurKind::SMin : RecurKind::SMax;

		assert(SrcMask && "MinMaxIdx recurrence requests mask");
		// TODO: If vp reduction intrinsic is supported, there is no need to generate
		// additional select here.
		auto *SrcVecEltTy = cast<VectorType>(Src->getType())->getElementType();
		Value *RdxOpIden = Desc.getRecurrenceIdentity(RdxExtractK, SrcVecEltTy,
		Desc.getFastMathFlags());
		ElementCount EC = cast<VectorType>(Src->getType())->getElementCount();
		RdxOpIden = Builder.CreateVectorSplat(EC, RdxOpIden);
		Value *NewVal = Builder.CreateSelect(SrcMask, Src, RdxOpIden, "mask.select");

		return createSimpleTargetReduction(Builder, TTI, NewVal, RdxExtractK);
		}

		Value *llvm::createSelectCmpTargetReduction(IRBuilderBase &Builder,
		const TargetTransformInfo *TTI,
		Value *Src,
		const RecurrenceDescriptor &Desc,
		PHINode OrigPhi, Value SrcMask) {
		assert(RecurrenceDescriptor::isSelectCmpRecurrenceKind(
		Desc.getRecurrenceKind()) &&
		"Unexpected reduction kind");
		RecurKind RdxKind = Desc.getRecurrenceKind();
		switch (RdxKind) {
		case RecurKind::SelectICmp:
		case RecurKind::SelectFCmp:
		return createInvariantSelectCmpTargetReduction(Builder, TTI, Src, Desc,
		OrigPhi);
		case RecurKind::SelectIVICmp:
		case RecurKind::SelectIVFCmp:
		// FIXME: SMax or UMax?
		// TODO: Decreasing induction need fix here
		return Builder.CreateIntMaxReduce(Src, true);
		case RecurKind::MinMaxFirstIdx:
		case RecurKind::MinMaxLastIdx:
		return createMMISelectCmpTargetReduction(Builder, TTI, Src, Desc, OrigPhi,
		SrcMask);
		default:
		llvm_unreachable("Unknown SelectCmp recurrence kind");
		}
		}

Value *llvm::createSimpleTargetReduction(IRBuilderBase &Builder,		Value *llvm::createSimpleTargetReduction(IRBuilderBase &Builder,
const TargetTransformInfo *TTI,		const TargetTransformInfo *TTI,
Value *Src, RecurKind RdxKind) {		Value *Src, RecurKind RdxKind) {
auto *SrcVecEltTy = cast<VectorType>(Src->getType())->getElementType();		auto *SrcVecEltTy = cast<VectorType>(Src->getType())->getElementType();
switch (RdxKind) {		switch (RdxKind) {
case RecurKind::Add:		case RecurKind::Add:
return Builder.CreateAddReduce(Src);		return Builder.CreateAddReduce(Src);
case RecurKind::Mul:		case RecurKind::Mul:
Show All 25 Lines	Value *llvm::createSimpleTargetReduction(IRBuilderBase &Builder,
default:		default:
llvm_unreachable("Unhandled opcode");		llvm_unreachable("Unhandled opcode");
}		}
}		}

Value *llvm::createTargetReduction(IRBuilderBase &B,		Value *llvm::createTargetReduction(IRBuilderBase &B,
const TargetTransformInfo *TTI,		const TargetTransformInfo *TTI,
const RecurrenceDescriptor &Desc, Value *Src,		const RecurrenceDescriptor &Desc, Value *Src,
PHINode *OrigPhi) {		PHINode OrigPhi, Value SrcMask) {
// TODO: Support in-order reductions based on the recurrence descriptor.		// TODO: Support in-order reductions based on the recurrence descriptor.
// All ops in the reduction inherit fast-math-flags from the recurrence		// All ops in the reduction inherit fast-math-flags from the recurrence
// descriptor.		// descriptor.
IRBuilderBase::FastMathFlagGuard FMFGuard(B);		IRBuilderBase::FastMathFlagGuard FMFGuard(B);
B.setFastMathFlags(Desc.getFastMathFlags());		B.setFastMathFlags(Desc.getFastMathFlags());

RecurKind RK = Desc.getRecurrenceKind();		RecurKind RK = Desc.getRecurrenceKind();
if (RecurrenceDescriptor::isSelectCmpRecurrenceKind(RK))		if (RecurrenceDescriptor::isSelectCmpRecurrenceKind(RK))
return createSelectCmpTargetReduction(B, TTI, Src, Desc, OrigPhi);		return createSelectCmpTargetReduction(B, TTI, Src, Desc, OrigPhi, SrcMask);

return createSimpleTargetReduction(B, TTI, Src, RK);		return createSimpleTargetReduction(B, TTI, Src, RK);
}		}

Value *llvm::createOrderedReduction(IRBuilderBase &B,		Value *llvm::createOrderedReduction(IRBuilderBase &B,
const RecurrenceDescriptor &Desc,		const RecurrenceDescriptor &Desc,
Value Src, Value Start) {		Value Src, Value Start) {
assert((Desc.getRecurrenceKind() == RecurKind::FAdd \|\|		assert((Desc.getRecurrenceKind() == RecurKind::FAdd \|\|
Desc.getRecurrenceKind() == RecurKind::FMulAdd) &&		Desc.getRecurrenceKind() == RecurKind::FMulAdd) &&
"Unexpected reduction kind");		"Unexpected reduction kind");
assert(Src->getType()->isVectorTy() && "Expected a vector type");		assert(Src->getType()->isVectorTy() && "Expected a vector type");
assert(!Start->getType()->isVectorTy() && "Expected a scalar type");		assert(!Start->getType()->isVectorTy() && "Expected a scalar type");

return B.CreateFAddReduce(Start, Src);		return B.CreateFAddReduce(Start, Src);
}		}

		Value *llvm::createSentinelValueHandling(IRBuilderBase &Builder,
		const TargetTransformInfo *TTI,
		const RecurrenceDescriptor &Desc,
		Value *Rdx) {
		Value *InitVal = Desc.getRecurrenceStartValue();
		Value *Iden = Desc.getRecurrenceIdentity(
		Desc.getRecurrenceKind(), Rdx->getType(), Desc.getFastMathFlags());
		Value *Cmp = Builder.CreateCmp(CmpInst::ICMP_NE, Rdx, Iden, "rdx.select.cmp");
		return Builder.CreateSelect(Cmp, Rdx, InitVal, "rdx.select");
		}

void llvm::propagateIRFlags(Value I, ArrayRef<Value > VL, Value *OpValue,		void llvm::propagateIRFlags(Value I, ArrayRef<Value > VL, Value *OpValue,
bool IncludeWrapFlags) {		bool IncludeWrapFlags) {
auto *VecOp = dyn_cast<Instruction>(I);		auto *VecOp = dyn_cast<Instruction>(I);
if (!VecOp)		if (!VecOp)
return;		return;
auto *Intersection = (OpValue == nullptr) ? dyn_cast<Instruction>(VL[0])		auto *Intersection = (OpValue == nullptr) ? dyn_cast<Instruction>(VL[0])
: dyn_cast<Instruction>(OpValue);		: dyn_cast<Instruction>(OpValue);
if (!Intersection)		if (!Intersection)
▲ Show 20 Lines • Show All 778 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

Show First 20 Lines • Show All 872 Lines • ▼ Show 20 Lines	for (Instruction &I : *BB) {
reportVectorizationFailure("Value cannot be used outside the loop",		reportVectorizationFailure("Value cannot be used outside the loop",
"value cannot be used outside the loop",		"value cannot be used outside the loop",
"ValueUsedOutsideLoop", ORE, TheLoop, &I);		"ValueUsedOutsideLoop", ORE, TheLoop, &I);
return false;		return false;
}		}
} // next instr.		} // next instr.
}		}

		// Second comfirm the incomplete reductions
		artagnonUnsubmitted Not Done Reply Inline Actions Typo: comfirm. artagnon: Typo: comfirm.
		for (auto R : Reductions) {
		RecurrenceDescriptor &RedDes = Reductions.find(R.first)->second;
		if (!RedDes.hasUserRecurrence())
		continue;

		PHINode *UserPhi = RedDes.getUserRecurPhi();
		if (!isReductionVariable(UserPhi))
		return false;

		RecurrenceDescriptor &UserRedDes = Reductions.find(UserPhi)->second;
		if (!RedDes.fixUserRecurrence(UserRedDes))
		shiva0217Unsubmitted Not Done Reply Inline Actions Instead of fixUserRecurrence to setDependMinMaxRecurDes and change the user RecurKind, is it possible to setDependMinMaxRecurDes when isReductionPHI return true? If we able to propagate parent(dependent) RecurDes to isReductionPHI, perhaps we can create reduction as following. RecurKind ParentKind = RedDes.getRecurrenceKind(); if (ParentKind == RecurKind::SMax) { if (AddReductionVar(Phi, RecurKind::MinMaxFirstIdx, TheLoop, FMF, RedDes, DB, AC, DT, SE)) { LLVM_DEBUG(dbgs() << "Found an MinMaxFirstIdx reduction PHI." << Phi << "\n"); return true; } } The dependency for the RecurKind could be explicitly and avoid the user RecurKind fixup. shiva0217:* Instead of fixUserRecurrence to setDependMinMaxRecurDes and change the user RecurKind, is it…
		return false;
		}

if (!PrimaryInduction) {		if (!PrimaryInduction) {
if (Inductions.empty()) {		if (Inductions.empty()) {
reportVectorizationFailure("Did not find one integer induction var",		reportVectorizationFailure("Did not find one integer induction var",
"loop induction variable could not be identified",		"loop induction variable could not be identified",
"NoInductionVariable", ORE, TheLoop);		"NoInductionVariable", ORE, TheLoop);
return false;		return false;
} else if (!WidestIndTy) {		} else if (!WidestIndTy) {
reportVectorizationFailure("Did not find one integer induction var",		reportVectorizationFailure("Did not find one integer induction var",
▲ Show 20 Lines • Show All 561 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 549 Lines • ▼ Show 20 Lines	public:
/// this is needed because each iteration in the loop corresponds to a SIMD		/// this is needed because each iteration in the loop corresponds to a SIMD
/// element.		/// element.
virtual Value getBroadcastInstrs(Value V);		virtual Value getBroadcastInstrs(Value V);

// Returns the resume value (bc.merge.rdx) for a reduction as		// Returns the resume value (bc.merge.rdx) for a reduction as
// generated by fixReduction.		// generated by fixReduction.
PHINode *getReductionResumeValue(const RecurrenceDescriptor &RdxDesc);		PHINode *getReductionResumeValue(const RecurrenceDescriptor &RdxDesc);

		// Returns the recurrence mask (mask.cmp) for a recurrence as generated by
		// fixReduction.
		std::pair<Value *, VectorParts>
		getDependRecurrenceMask(const RecurrenceDescriptor &RdxDesc);

/// Create a new phi node for the induction variable \p OrigPhi to resume		/// Create a new phi node for the induction variable \p OrigPhi to resume
/// iteration count in the scalar epilogue, from where the vectorized loop		/// iteration count in the scalar epilogue, from where the vectorized loop
/// left off. In cases where the loop skeleton is more complicated (eg.		/// left off. In cases where the loop skeleton is more complicated (eg.
/// epilogue vectorization) and the resume values can come from an additional		/// epilogue vectorization) and the resume values can come from an additional
/// bypass block, the \p AdditionalBypass pair provides information about the		/// bypass block, the \p AdditionalBypass pair provides information about the
/// bypass block and the end value on the edge from bypass to this loop.		/// bypass block and the end value on the edge from bypass to this loop.
PHINode *createInductionResumeValue(		PHINode *createInductionResumeValue(
PHINode *OrigPhi, const InductionDescriptor &ID,		PHINode *OrigPhi, const InductionDescriptor &ID,
▲ Show 20 Lines • Show All 193 Lines • ▼ Show 20 Lines	protected:
/// Structure to hold information about generated runtime checks, responsible		/// Structure to hold information about generated runtime checks, responsible
/// for cleaning the checks, if vectorization turns out unprofitable.		/// for cleaning the checks, if vectorization turns out unprofitable.
GeneratedRTChecks &RTChecks;		GeneratedRTChecks &RTChecks;

// Holds the resume values for reductions in the loops, used to set the		// Holds the resume values for reductions in the loops, used to set the
// correct start value of reduction PHIs when vectorizing the epilogue.		// correct start value of reduction PHIs when vectorizing the epilogue.
SmallMapVector<const RecurrenceDescriptor , PHINode , 4>		SmallMapVector<const RecurrenceDescriptor , PHINode , 4>
ReductionResumeValues;		ReductionResumeValues;

		// Holds the masks for recurrences in the loops, be used for reduction when
		// there is a reduction that depends on the recurrence.
		SmallMapVector<const RecurrenceDescriptor , std::pair<Value , VectorParts>,
		4>
		DependRecurrenceMasks;
		fhahnUnsubmitted Not Done Reply Inline Actions We are in the process of removing those kinds of global maps that are used to carry information used during codegen and later. Ideally the combination of values would be modeled explicitly in the exit block of the plan, but we are not there yet. This is the main reason for D132063 doing things the way it does. fhahn: We are in the process of removing those kinds of global maps that are used to carry information…
		Mel-ChenAuthorUnsubmitted Done Reply Inline Actions I see., but `DependRecurrenceMasks` exists for a reason. Consider the following case: int idx = ii; int foo = jj; int max = mm; for (int i = 0; i < n; ++i) { int x = a[i]; if (max < x) { max = x; idx = i; foo = b[i]; } } That mask has the chance to be reused, and I try to keep that flexibility. Of course, we can recalculate the mask for each reduction that needs a mask, but currently using the global maps to preserve the mask is a relatively simple method that I think of. I have heard that VPlan is going to be extended to other blocks, could you share the relevant discussion links? Mel-Chen: I see., but `DependRecurrenceMasks` exists for a reason. Consider the following case: ``` int…
};		};

class InnerLoopUnroller : public InnerLoopVectorizer {		class InnerLoopUnroller : public InnerLoopVectorizer {
public:		public:
InnerLoopUnroller(Loop *OrigLoop, PredicatedScalarEvolution &PSE,		InnerLoopUnroller(Loop *OrigLoop, PredicatedScalarEvolution &PSE,
LoopInfo LI, DominatorTree DT,		LoopInfo LI, DominatorTree DT,
const TargetLibraryInfo *TLI,		const TargetLibraryInfo *TLI,
const TargetTransformInfo TTI, AssumptionCache AC,		const TargetTransformInfo TTI, AssumptionCache AC,
▲ Show 20 Lines • Show All 360 Lines • ▼ Show 20 Lines
PHINode *InnerLoopVectorizer::getReductionResumeValue(		PHINode *InnerLoopVectorizer::getReductionResumeValue(
const RecurrenceDescriptor &RdxDesc) {		const RecurrenceDescriptor &RdxDesc) {
auto It = ReductionResumeValues.find(&RdxDesc);		auto It = ReductionResumeValues.find(&RdxDesc);
assert(It != ReductionResumeValues.end() &&		assert(It != ReductionResumeValues.end() &&
"Expected to find a resume value for the reduction.");		"Expected to find a resume value for the reduction.");
return It->second;		return It->second;
}		}

		std::pair<Value *, InnerLoopVectorizer::VectorParts>
		InnerLoopVectorizer::getDependRecurrenceMask(
		const RecurrenceDescriptor &RdxDesc) {
		auto It = DependRecurrenceMasks.find(&RdxDesc);
		assert(It != DependRecurrenceMasks.end() &&
		"Expected to find a dependence mask for the recurrence.");
		return It->second;
		}

namespace llvm {		namespace llvm {

// Loop vectorization cost-model hints how the scalar epilogue loop should be		// Loop vectorization cost-model hints how the scalar epilogue loop should be
// lowered.		// lowered.
enum ScalarEpilogueLowering {		enum ScalarEpilogueLowering {

// The default: allowing scalar epilogues.		// The default: allowing scalar epilogues.
CM_ScalarEpilogueAllowed,		CM_ScalarEpilogueAllowed,
▲ Show 20 Lines • Show All 2,691 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::fixCrossIterationPHIs(VPTransformState &State) {
// In order to support recurrences we need to be able to vectorize Phi nodes.		// In order to support recurrences we need to be able to vectorize Phi nodes.
// Phi nodes have cycles, so we need to vectorize them in two stages. This is		// Phi nodes have cycles, so we need to vectorize them in two stages. This is
// stage #2: We now need to fix the recurrences by adding incoming edges to		// stage #2: We now need to fix the recurrences by adding incoming edges to
// the currently empty PHI nodes. At this point every instruction in the		// the currently empty PHI nodes. At this point every instruction in the
// original loop is widened to a vector form so we can use them to construct		// original loop is widened to a vector form so we can use them to construct
// the incoming edges.		// the incoming edges.
VPBasicBlock *Header =		VPBasicBlock *Header =
State.Plan->getVectorLoopRegion()->getEntryBasicBlock();		State.Plan->getVectorLoopRegion()->getEntryBasicBlock();
for (VPRecipeBase &R : Header->phis()) {		// FIXME: Maybe I should not choose std::queue...
if (auto *ReductionPhi = dyn_cast<VPReductionPHIRecipe>(&R))		std::queue<VPRecipeBase *> Worklist;
		for (VPRecipeBase &R : Header->phis())
		Worklist.push(&R);

		while (!Worklist.empty()) {
		fhahnUnsubmitted Not Done Reply Inline Actions this would need documenting. fhahn: this would need documenting.
		Mel-ChenAuthorUnsubmitted Done Reply Inline Actions Sure. Quick explanation: min/max recurrence should be done earlier than min max idx recurrence, because idx recurrence depends on the mask produced by min max recurrence. Here is to ensure that the recurrence dependencies are correct. Mel-Chen: Sure. Quick explanation: min/max recurrence should be done earlier than min max idx recurrence…
		shiva0217Unsubmitted Not Done Reply Inline Actions Perhaps we could do the sorting according to the reduction dependency before calling fixReduction which may be similar to https://reviews.llvm.org/D157631. shiva0217: Perhaps we could do the sorting according to the reduction dependency before calling…
		VPRecipeBase &R = *(Worklist.front());
		Worklist.pop();
		if (auto *ReductionPhi = dyn_cast<VPReductionPHIRecipe>(&R)) {
		const RecurrenceDescriptor &RecDesc =
		ReductionPhi->getRecurrenceDescriptor();
		RecurrenceDescriptor *DependRecDesc = RecDesc.getDependMinMaxRecDes();
		if (DependRecDesc && !DependRecurrenceMasks.count(DependRecDesc)) {
		Worklist.push(&R);
		continue;
		}
fixReduction(ReductionPhi, State);		fixReduction(ReductionPhi, State);
else if (auto *FOR = dyn_cast<VPFirstOrderRecurrencePHIRecipe>(&R))		} else if (auto *FOR = dyn_cast<VPFirstOrderRecurrencePHIRecipe>(&R))
fixFixedOrderRecurrence(FOR, State);		fixFixedOrderRecurrence(FOR, State);
}		}
}		}

void InnerLoopVectorizer::fixFixedOrderRecurrence(		void InnerLoopVectorizer::fixFixedOrderRecurrence(
VPFirstOrderRecurrencePHIRecipe *PhiR, VPTransformState &State) {		VPFirstOrderRecurrencePHIRecipe *PhiR, VPTransformState &State) {
// This is the second phase of vectorizing first-order recurrences. An		// This is the second phase of vectorizing first-order recurrences. An
// overview of the transformation is described below. Suppose we have the		// overview of the transformation is described below. Suppose we have the
▲ Show 20 Lines • Show All 201 Lines • ▼ Show 20 Lines	for (unsigned Part = 0; Part < UF; ++Part) {
State.reset(LoopExitInstDef, RdxParts[Part], Part);		State.reset(LoopExitInstDef, RdxParts[Part], Part);
}		}
}		}

// Reduce all of the unrolled parts into a single vector.		// Reduce all of the unrolled parts into a single vector.
Value *ReducedPartRdx = State.get(LoopExitInstDef, 0);		Value *ReducedPartRdx = State.get(LoopExitInstDef, 0);
unsigned Op = RecurrenceDescriptor::getOpcode(RK);		unsigned Op = RecurrenceDescriptor::getOpcode(RK);

		// Get the reduction mask if the reduction depend on another one.
		RecurrenceDescriptor *DependDesc = RdxDesc.getDependMinMaxRecDes();
		Value *DependRdxMask = nullptr;
		VectorParts DependPartMasks;
		if (DependDesc) {
		Builder.SetInsertPoint(&*LoopMiddleBlock->getTerminator());
		std::tie(DependRdxMask, DependPartMasks) =
		getDependRecurrenceMask(*DependDesc);
		}

		Value *NewRdxMask = nullptr;
		VectorParts NewPartMasks(UF);

// The middle block terminator has already been assigned a DebugLoc here (the		// The middle block terminator has already been assigned a DebugLoc here (the
// OrigLoop's single latch terminator). We want the whole middle block to		// OrigLoop's single latch terminator). We want the whole middle block to
// appear to execute on this line because: (a) it is all compiler generated,		// appear to execute on this line because: (a) it is all compiler generated,
// (b) these instructions are always executed after evaluating the latch		// (b) these instructions are always executed after evaluating the latch
// conditional branch, and (c) other passes may add new predecessors which		// conditional branch, and (c) other passes may add new predecessors which
// terminate on this line. This is the easiest way to ensure we don't		// terminate on this line. This is the easiest way to ensure we don't
// accidentally cause an extra step back into the loop while debugging.		// accidentally cause an extra step back into the loop while debugging.
State.setDebugLocFromInst(LoopMiddleBlock->getTerminator());		State.setDebugLocFromInst(LoopMiddleBlock->getTerminator());
if (PhiR->isOrdered())		if (PhiR->isOrdered())
ReducedPartRdx = State.get(LoopExitInstDef, UF - 1);		ReducedPartRdx = State.get(LoopExitInstDef, UF - 1);
else {		else {
// Floating-point operations should have some FMF to enable the reduction.		// Floating-point operations should have some FMF to enable the reduction.
IRBuilderBase::FastMathFlagGuard FMFG(Builder);		IRBuilderBase::FastMathFlagGuard FMFG(Builder);
Builder.setFastMathFlags(RdxDesc.getFastMathFlags());		Builder.setFastMathFlags(RdxDesc.getFastMathFlags());
for (unsigned Part = 1; Part < UF; ++Part) {		for (unsigned Part = 1; Part < UF; ++Part) {
Value *RdxPart = State.get(LoopExitInstDef, Part);		Value *RdxPart = State.get(LoopExitInstDef, Part);
		Value *PartMask = DependDesc ? DependPartMasks[Part] : nullptr;
if (Op != Instruction::ICmp && Op != Instruction::FCmp) {		if (Op != Instruction::ICmp && Op != Instruction::FCmp) {
ReducedPartRdx = Builder.CreateBinOp(		ReducedPartRdx = Builder.CreateBinOp(
(Instruction::BinaryOps)Op, RdxPart, ReducedPartRdx, "bin.rdx");		(Instruction::BinaryOps)Op, RdxPart, ReducedPartRdx, "bin.rdx");
} else if (RecurrenceDescriptor::isSelectCmpRecurrenceKind(RK))		} else if (RecurrenceDescriptor::isSelectCmpRecurrenceKind(RK))
ReducedPartRdx = createSelectCmpOp(Builder, ReductionStartValue, RK,		ReducedPartRdx = createSelectCmpOp(Builder, ReductionStartValue, RK,
ReducedPartRdx, RdxPart);		ReducedPartRdx, RdxPart, PartMask);
else		else {
ReducedPartRdx = createMinMaxOp(Builder, RK, ReducedPartRdx, RdxPart);		ReducedPartRdx = createMinMaxOp(Builder, RK, ReducedPartRdx, RdxPart);
		// Keep the part mask on demand.
		if (RdxDesc.hasUserRecurrence()) {
		shiva0217Unsubmitted Not Done Reply Inline Actions Should we check isMinMaxRecurrenceKind(Kind) and isMinMaxIdxRecurrenceKind for user kind? Although it would be the only dependency currently, it might be explicit for the reader and avoid unexpected codegen in the future. shiva0217: Should we check isMinMaxRecurrenceKind(Kind) and isMinMaxIdxRecurrenceKind for user kind?
		auto *SI = dyn_cast<SelectInst>(ReducedPartRdx);
		auto *CI = dyn_cast<CmpInst>(SI->getCondition());
		shiva0217Unsubmitted Not Done Reply Inline Actions Could we encapsulate the mask generation to createMinMaxIdxMaskOp or other name you prefer? shiva0217: Could we encapsulate the mask generation to createMinMaxIdxMaskOp or other name you prefer?
		NewPartMasks[Part] = CI;
		}
		}
}		}
}		}

// Create the reduction after the loop. Note that inloop reductions create the		// Create the reduction after the loop. Note that inloop reductions create the
// target reduction in the loop using a Reduction recipe.		// target reduction in the loop using a Reduction recipe.
if (VF.isVector() && !PhiR->isInLoop()) {		if (VF.isVector() && !PhiR->isInLoop()) {
ReducedPartRdx =		Value *ReducedPart = ReducedPartRdx;
createTargetReduction(Builder, TTI, RdxDesc, ReducedPartRdx, OrigPhi);		ReducedPartRdx = createTargetReduction(
		Builder, TTI, RdxDesc, ReducedPartRdx, OrigPhi, DependRdxMask);
// If the reduction can be performed in a smaller type, we need to extend		// If the reduction can be performed in a smaller type, we need to extend
// the reduction to the wider type before we branch to the original loop.		// the reduction to the wider type before we branch to the original loop.
if (PhiTy != RdxDesc.getRecurrenceType())		if (PhiTy != RdxDesc.getRecurrenceType())
ReducedPartRdx = RdxDesc.isSigned()		ReducedPartRdx = RdxDesc.isSigned()
? Builder.CreateSExt(ReducedPartRdx, PhiTy)		? Builder.CreateSExt(ReducedPartRdx, PhiTy)
: Builder.CreateZExt(ReducedPartRdx, PhiTy);		: Builder.CreateZExt(ReducedPartRdx, PhiTy);

		// Create depend recurrence mask on demand.
		if (RdxDesc.hasUserRecurrence()) {
		shiva0217Unsubmitted Not Done Reply Inline Actions Should we check isMinMaxRecurrenceKind(Kind) and isMinMaxIdxRecurrenceKind for user kind? shiva0217: Should we check isMinMaxRecurrenceKind(Kind) and isMinMaxIdxRecurrenceKind for user kind?
		ElementCount EC =
		shiva0217Unsubmitted Not Done Reply Inline Actions Could we encapsulate the mask generation to createMinMaxIdxMask or similar? shiva0217: Could we encapsulate the mask generation to createMinMaxIdxMask or similar?
		cast<VectorType>(ReducedPart->getType())->getElementCount();
		Value *RdxSplat = Builder.CreateVectorSplat(EC, ReducedPartRdx);
		// FIXME: Not sure use FCMP_OEQ is right or not.
		CmpInst::Predicate MaskPred =
		(ReducedPartRdx->getType()->isFloatingPointTy()) ? CmpInst::FCMP_OEQ
		: CmpInst::ICMP_EQ;
		artagnonUnsubmitted Not Done Reply Inline Actions `RdxDesc.isOrdered()` can help you pick between `FCMP_OEQ` and `FCMP_UEQ`. artagnon: `RdxDesc.isOrdered()` can help you pick between `FCMP_OEQ` and `FCMP_UEQ`.
		NewRdxMask =
		Builder.CreateCmp(MaskPred, RdxSplat, ReducedPart, "mask.cmp");
		}
}		}

		if (RecurrenceDescriptor::isMinMaxIdxRecurrenceKind(RK) \|\|
		(RK == RecurKind::SelectIVICmp) \|\| (RK == RecurKind::SelectIVFCmp))
		ReducedPartRdx =
		createSentinelValueHandling(Builder, TTI, RdxDesc, ReducedPartRdx);

		// Set the recurrence mask for this reduction on demand.
		if (RdxDesc.hasUserRecurrence())
		DependRecurrenceMasks.insert({&RdxDesc, {NewRdxMask, NewPartMasks}});

PHINode *ResumePhi =		PHINode *ResumePhi =
dyn_cast<PHINode>(PhiR->getStartValue()->getUnderlyingValue());		dyn_cast<PHINode>(PhiR->getStartValue()->getUnderlyingValue());

// Create a phi node that merges control-flow from the backedge-taken check		// Create a phi node that merges control-flow from the backedge-taken check
// block and the middle block.		// block and the middle block.
PHINode *BCBlockPhi = PHINode::Create(PhiTy, 2, "bc.merge.rdx",		PHINode *BCBlockPhi = PHINode::Create(PhiTy, 2, "bc.merge.rdx",
LoopScalarPreHeader->getTerminator());		LoopScalarPreHeader->getTerminator());

▲ Show 20 Lines • Show All 6,627 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

Show First 20 Lines • Show All 1,238 Lines • ▼ Show 20 Lines	void VPReductionPHIRecipe::execute(VPTransformState &State) {
// Reductions do not have to start at zero. They can start with		// Reductions do not have to start at zero. They can start with
// any loop invariant values.		// any loop invariant values.
VPValue *StartVPV = getStartValue();		VPValue *StartVPV = getStartValue();
Value *StartV = StartVPV->getLiveInIRValue();		Value *StartV = StartVPV->getLiveInIRValue();

Value *Iden = nullptr;		Value *Iden = nullptr;
RecurKind RK = RdxDesc.getRecurrenceKind();		RecurKind RK = RdxDesc.getRecurrenceKind();
if (RecurrenceDescriptor::isMinMaxRecurrenceKind(RK) \|\|		if (RecurrenceDescriptor::isMinMaxRecurrenceKind(RK) \|\|
RecurrenceDescriptor::isSelectCmpRecurrenceKind(RK)) {		(RK == RecurKind::SelectICmp \|\| RK == RecurKind::SelectFCmp)) {
// MinMax reduction have the start value as their identify.		// MinMax reduction have the start value as their identify.
if (ScalarPHI) {		if (ScalarPHI) {
Iden = StartV;		Iden = StartV;
} else {		} else {
IRBuilderBase::InsertPointGuard IPBuilder(Builder);		IRBuilderBase::InsertPointGuard IPBuilder(Builder);
Builder.SetInsertPoint(VectorPH->getTerminator());		Builder.SetInsertPoint(VectorPH->getTerminator());
StartV = Iden =		StartV = Iden =
Builder.CreateVectorSplat(State.VF, StartV, "minmax.ident");		Builder.CreateVectorSplat(State.VF, StartV, "minmax.ident");
}		}
		} else if (RecurrenceDescriptor::isMinMaxIdxRecurrenceKind(RK) \|\|
		(RK == RecurKind::SelectIVICmp \|\| RK == RecurKind::SelectIVFCmp)) {
		StartV = Iden = RdxDesc.getRecurrenceIdentity(RK, VecTy->getScalarType(),
		RdxDesc.getFastMathFlags());

		if (!ScalarPHI) {
		IRBuilderBase::InsertPointGuard IPBuilder(Builder);
		Builder.SetInsertPoint(VectorPH->getTerminator());
		StartV = Iden = Builder.CreateVectorSplat(State.VF, Iden);
		}
} else {		} else {
Iden = RdxDesc.getRecurrenceIdentity(RK, VecTy->getScalarType(),		Iden = RdxDesc.getRecurrenceIdentity(RK, VecTy->getScalarType(),
RdxDesc.getFastMathFlags());		RdxDesc.getFastMathFlags());

if (!ScalarPHI) {		if (!ScalarPHI) {
Iden = Builder.CreateVectorSplat(State.VF, Iden);		Iden = Builder.CreateVectorSplat(State.VF, Iden);
IRBuilderBase::InsertPointGuard IPBuilder(Builder);		IRBuilderBase::InsertPointGuard IPBuilder(Builder);
Builder.SetInsertPoint(VectorPH->getTerminator());		Builder.SetInsertPoint(VectorPH->getTerminator());
▲ Show 20 Lines • Show All 96 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/select-min-index.ll

; RUN: opt -passes=loop-vectorize -force-vector-width=4 -force-vector-interleave=1 -S %s \| FileCheck %s		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt -passes=loop-vectorize -force-vector-width=4 -force-vector-interleave=2 -S %s \| FileCheck %s		; RUN: opt -passes=loop-vectorize -force-vector-width=4 -force-vector-interleave=1 -S %s \| FileCheck %s --check-prefix=CHECK-VF4IC1 --check-prefix=CHECK
; RUN: opt -passes=loop-vectorize -force-vector-width=1 -force-vector-interleave=2 -S %s \| FileCheck %s		; RUN: opt -passes=loop-vectorize -force-vector-width=4 -force-vector-interleave=2 -S %s \| FileCheck %s --check-prefix=CHECK-VF4IC2 --check-prefix=CHECK
		; RUN: opt -passes=loop-vectorize -force-vector-width=1 -force-vector-interleave=2 -S %s \| FileCheck %s --check-prefix=CHECK-VF1IC2 --check-prefix=CHECK

; Test cases for selecting the index with the minimum value.		; Test cases for selecting the index with the minimum value.

define i64 @test_vectorize_select_umin_idx(ptr %src) {		define i64 @test_vectorize_select_umin_idx(ptr %src) {
; CHECK-LABEL: @test_vectorize_select_umin_idx(		; CHECK-LABEL: @test_vectorize_select_umin_idx(
; CHECK-NOT: vector.body:		; CHECK-NOT: vector.body:
;		;
entry:		entry:
Show All 13 Lines	loop:
br i1 %exitcond.not, label %exit, label %loop		br i1 %exitcond.not, label %exit, label %loop

exit:		exit:
%res = phi i64 [ %min.idx.next, %loop ]		%res = phi i64 [ %min.idx.next, %loop ]
ret i64 %res		ret i64 %res
}		}

define i64 @test_vectorize_select_umin_idx_all_exit_inst(ptr %src, ptr %umin) {		define i64 @test_vectorize_select_umin_idx_all_exit_inst(ptr %src, ptr %umin) {
; CHECK-LABEL: @test_vectorize_select_umin_idx_all_exit_inst(		; CHECK-VF4IC1-LABEL: @test_vectorize_select_umin_idx_all_exit_inst(
; CHECK-NOT: vector.body:		; CHECK-VF4IC1-NEXT: entry:
		; CHECK-VF4IC1-NEXT: br i1 true, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK-VF4IC1: vector.ph:
		; CHECK-VF4IC1-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK-VF4IC1: vector.body:
		; CHECK-VF4IC1-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[VEC_PHI:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP5:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[VEC_PHI1:%.]] = phi <4 x i64> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP4:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
		; CHECK-VF4IC1-NEXT: [[TMP1:%.]] = getelementptr i64, ptr [[SRC:%.]], i64 [[TMP0]]
		; CHECK-VF4IC1-NEXT: [[TMP2:%.*]] = getelementptr i64, ptr [[TMP1]], i32 0
		; CHECK-VF4IC1-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i64>, ptr [[TMP2]], align 4
		; CHECK-VF4IC1-NEXT: [[TMP3:%.*]] = icmp ugt <4 x i64> [[VEC_PHI1]], [[WIDE_LOAD]]
		; CHECK-VF4IC1-NEXT: [[TMP4]] = call <4 x i64> @llvm.umin.v4i64(<4 x i64> [[VEC_PHI1]], <4 x i64> [[WIDE_LOAD]])
		; CHECK-VF4IC1-NEXT: [[TMP5]] = select <4 x i1> [[TMP3]], <4 x i64> [[VEC_IND]], <4 x i64> [[VEC_PHI]]
		; CHECK-VF4IC1-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
		; CHECK-VF4IC1-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC1-NEXT: [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], 0
		; CHECK-VF4IC1-NEXT: br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
		; CHECK-VF4IC1: middle.block:
		; CHECK-VF4IC1-NEXT: [[TMP7:%.*]] = call i64 @llvm.vector.reduce.umin.v4i64(<4 x i64> [[TMP4]])
		; CHECK-VF4IC1-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <4 x i64> poison, i64 [[TMP7]], i64 0
		; CHECK-VF4IC1-NEXT: [[DOTSPLAT:%.*]] = shufflevector <4 x i64> [[DOTSPLATINSERT]], <4 x i64> poison, <4 x i32> zeroinitializer
		; CHECK-VF4IC1-NEXT: [[MASK_CMP:%.*]] = icmp eq <4 x i64> [[DOTSPLAT]], [[TMP4]]
		; CHECK-VF4IC1-NEXT: [[CMP_N:%.*]] = icmp eq i64 0, 0
		; CHECK-VF4IC1-NEXT: [[MASK_SELECT:%.*]] = select <4 x i1> [[MASK_CMP]], <4 x i64> [[TMP5]], <4 x i64> <i64 9223372036854775807, i64 9223372036854775807, i64 9223372036854775807, i64 9223372036854775807>
		; CHECK-VF4IC1-NEXT: [[TMP8:%.*]] = call i64 @llvm.vector.reduce.smin.v4i64(<4 x i64> [[MASK_SELECT]])
		; CHECK-VF4IC1-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[TMP8]], -9223372036854775808
		; CHECK-VF4IC1-NEXT: [[RDX_SELECT:%.*]] = select i1 [[RDX_SELECT_CMP]], i64 [[TMP8]], i64 0
		; CHECK-VF4IC1-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
		; CHECK-VF4IC1: scalar.ph:
		; CHECK-VF4IC1-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 0, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
		; CHECK-VF4IC1-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ 0, [[ENTRY]] ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC1-NEXT: [[BC_MERGE_RDX2:%.*]] = phi i64 [ 0, [[ENTRY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC1-NEXT: br label [[LOOP:%.*]]
		; CHECK-VF4IC1: loop:
		; CHECK-VF4IC1-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP]] ]
		; CHECK-VF4IC1-NEXT: [[MIN_IDX:%.]] = phi i64 [ [[BC_MERGE_RDX2]], [[SCALAR_PH]] ], [ [[MIN_IDX_NEXT:%.]], [[LOOP]] ]
		; CHECK-VF4IC1-NEXT: [[MIN_VAL:%.]] = phi i64 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[MIN_VAL_NEXT:%.]], [[LOOP]] ]
		; CHECK-VF4IC1-NEXT: [[GEP:%.*]] = getelementptr i64, ptr [[SRC]], i64 [[IV]]
		; CHECK-VF4IC1-NEXT: [[L:%.*]] = load i64, ptr [[GEP]], align 4
		; CHECK-VF4IC1-NEXT: [[CMP:%.*]] = icmp ugt i64 [[MIN_VAL]], [[L]]
		; CHECK-VF4IC1-NEXT: [[MIN_VAL_NEXT]] = tail call i64 @llvm.umin.i64(i64 [[MIN_VAL]], i64 [[L]])
		; CHECK-VF4IC1-NEXT: [[MIN_IDX_NEXT]] = select i1 [[CMP]], i64 [[IV]], i64 [[MIN_IDX]]
		; CHECK-VF4IC1-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
		; CHECK-VF4IC1-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 0
		; CHECK-VF4IC1-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[LOOP]], !llvm.loop [[LOOP3:![0-9]+]]
		; CHECK-VF4IC1: exit:
		; CHECK-VF4IC1-NEXT: [[RES:%.*]] = phi i64 [ [[MIN_IDX_NEXT]], [[LOOP]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC1-NEXT: [[RES_UMIN:%.*]] = phi i64 [ [[MIN_VAL_NEXT]], [[LOOP]] ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC1-NEXT: store i64 [[RES_UMIN]], ptr [[UMIN:%.*]], align 4
		; CHECK-VF4IC1-NEXT: ret i64 [[RES]]
		;
		; CHECK-VF4IC2-LABEL: @test_vectorize_select_umin_idx_all_exit_inst(
		; CHECK-VF4IC2-NEXT: entry:
		; CHECK-VF4IC2-NEXT: br i1 true, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK-VF4IC2: vector.ph:
		; CHECK-VF4IC2-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK-VF4IC2: vector.body:
		; CHECK-VF4IC2-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC2-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC2-NEXT: [[VEC_PHI:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP10:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC2-NEXT: [[VEC_PHI2:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP11:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC2-NEXT: [[VEC_PHI3:%.]] = phi <4 x i64> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP8:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC2-NEXT: [[VEC_PHI4:%.]] = phi <4 x i64> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP9:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC2-NEXT: [[STEP_ADD:%.*]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC2-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
		; CHECK-VF4IC2-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 4
		; CHECK-VF4IC2-NEXT: [[TMP2:%.]] = getelementptr i64, ptr [[SRC:%.]], i64 [[TMP0]]
		; CHECK-VF4IC2-NEXT: [[TMP3:%.*]] = getelementptr i64, ptr [[SRC]], i64 [[TMP1]]
		; CHECK-VF4IC2-NEXT: [[TMP4:%.*]] = getelementptr i64, ptr [[TMP2]], i32 0
		; CHECK-VF4IC2-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i64>, ptr [[TMP4]], align 4
		; CHECK-VF4IC2-NEXT: [[TMP5:%.*]] = getelementptr i64, ptr [[TMP2]], i32 4
		; CHECK-VF4IC2-NEXT: [[WIDE_LOAD5:%.*]] = load <4 x i64>, ptr [[TMP5]], align 4
		; CHECK-VF4IC2-NEXT: [[TMP6:%.*]] = icmp ugt <4 x i64> [[VEC_PHI3]], [[WIDE_LOAD]]
		; CHECK-VF4IC2-NEXT: [[TMP7:%.*]] = icmp ugt <4 x i64> [[VEC_PHI4]], [[WIDE_LOAD5]]
		; CHECK-VF4IC2-NEXT: [[TMP8]] = call <4 x i64> @llvm.umin.v4i64(<4 x i64> [[VEC_PHI3]], <4 x i64> [[WIDE_LOAD]])
		; CHECK-VF4IC2-NEXT: [[TMP9]] = call <4 x i64> @llvm.umin.v4i64(<4 x i64> [[VEC_PHI4]], <4 x i64> [[WIDE_LOAD5]])
		; CHECK-VF4IC2-NEXT: [[TMP10]] = select <4 x i1> [[TMP6]], <4 x i64> [[VEC_IND]], <4 x i64> [[VEC_PHI]]
		; CHECK-VF4IC2-NEXT: [[TMP11]] = select <4 x i1> [[TMP7]], <4 x i64> [[STEP_ADD]], <4 x i64> [[VEC_PHI2]]
		; CHECK-VF4IC2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8
		; CHECK-VF4IC2-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[STEP_ADD]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC2-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], 0
		; CHECK-VF4IC2-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
		; CHECK-VF4IC2: middle.block:
		; CHECK-VF4IC2-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp ult <4 x i64> [[TMP8]], [[TMP9]]
		; CHECK-VF4IC2-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP]], <4 x i64> [[TMP8]], <4 x i64> [[TMP9]]
		; CHECK-VF4IC2-NEXT: [[TMP13:%.*]] = call i64 @llvm.vector.reduce.umin.v4i64(<4 x i64> [[RDX_MINMAX_SELECT]])
		; CHECK-VF4IC2-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <4 x i64> poison, i64 [[TMP13]], i64 0
		; CHECK-VF4IC2-NEXT: [[DOTSPLAT:%.*]] = shufflevector <4 x i64> [[DOTSPLATINSERT]], <4 x i64> poison, <4 x i32> zeroinitializer
		; CHECK-VF4IC2-NEXT: [[MASK_CMP:%.*]] = icmp eq <4 x i64> [[DOTSPLAT]], [[RDX_MINMAX_SELECT]]
		; CHECK-VF4IC2-NEXT: [[CMP_N:%.*]] = icmp eq i64 0, 0
		; CHECK-VF4IC2-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ule <4 x i64> [[TMP8]], [[TMP9]]
		; CHECK-VF4IC2-NEXT: [[RDX_SELECT:%.*]] = select <4 x i1> [[RDX_SELECT_CMP]], <4 x i64> [[TMP10]], <4 x i64> [[TMP11]]
		; CHECK-VF4IC2-NEXT: [[MASK_SELECT:%.*]] = select <4 x i1> [[MASK_CMP]], <4 x i64> [[RDX_SELECT]], <4 x i64> <i64 9223372036854775807, i64 9223372036854775807, i64 9223372036854775807, i64 9223372036854775807>
		; CHECK-VF4IC2-NEXT: [[TMP14:%.*]] = call i64 @llvm.vector.reduce.smin.v4i64(<4 x i64> [[MASK_SELECT]])
		; CHECK-VF4IC2-NEXT: [[RDX_SELECT_CMP6:%.*]] = icmp ne i64 [[TMP14]], -9223372036854775808
		; CHECK-VF4IC2-NEXT: [[RDX_SELECT7:%.*]] = select i1 [[RDX_SELECT_CMP6]], i64 [[TMP14]], i64 0
		; CHECK-VF4IC2-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
		; CHECK-VF4IC2: scalar.ph:
		; CHECK-VF4IC2-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 0, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
		; CHECK-VF4IC2-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ 0, [[ENTRY]] ], [ [[TMP13]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC2-NEXT: [[BC_MERGE_RDX8:%.*]] = phi i64 [ 0, [[ENTRY]] ], [ [[RDX_SELECT7]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC2-NEXT: br label [[LOOP:%.*]]
		; CHECK-VF4IC2: loop:
		; CHECK-VF4IC2-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP]] ]
		; CHECK-VF4IC2-NEXT: [[MIN_IDX:%.]] = phi i64 [ [[BC_MERGE_RDX8]], [[SCALAR_PH]] ], [ [[MIN_IDX_NEXT:%.]], [[LOOP]] ]
		; CHECK-VF4IC2-NEXT: [[MIN_VAL:%.]] = phi i64 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[MIN_VAL_NEXT:%.]], [[LOOP]] ]
		; CHECK-VF4IC2-NEXT: [[GEP:%.*]] = getelementptr i64, ptr [[SRC]], i64 [[IV]]
		; CHECK-VF4IC2-NEXT: [[L:%.*]] = load i64, ptr [[GEP]], align 4
		; CHECK-VF4IC2-NEXT: [[CMP:%.*]] = icmp ugt i64 [[MIN_VAL]], [[L]]
		; CHECK-VF4IC2-NEXT: [[MIN_VAL_NEXT]] = tail call i64 @llvm.umin.i64(i64 [[MIN_VAL]], i64 [[L]])
		; CHECK-VF4IC2-NEXT: [[MIN_IDX_NEXT]] = select i1 [[CMP]], i64 [[IV]], i64 [[MIN_IDX]]
		; CHECK-VF4IC2-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
		; CHECK-VF4IC2-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 0
		; CHECK-VF4IC2-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[LOOP]], !llvm.loop [[LOOP3:![0-9]+]]
		; CHECK-VF4IC2: exit:
		; CHECK-VF4IC2-NEXT: [[RES:%.*]] = phi i64 [ [[MIN_IDX_NEXT]], [[LOOP]] ], [ [[RDX_SELECT7]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC2-NEXT: [[RES_UMIN:%.*]] = phi i64 [ [[MIN_VAL_NEXT]], [[LOOP]] ], [ [[TMP13]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC2-NEXT: store i64 [[RES_UMIN]], ptr [[UMIN:%.*]], align 4
		; CHECK-VF4IC2-NEXT: ret i64 [[RES]]
		;
		; CHECK-VF1IC2-LABEL: @test_vectorize_select_umin_idx_all_exit_inst(
		; CHECK-VF1IC2-NEXT: entry:
		; CHECK-VF1IC2-NEXT: br i1 true, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK-VF1IC2: vector.ph:
		; CHECK-VF1IC2-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK-VF1IC2: vector.body:
		; CHECK-VF1IC2-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC2-NEXT: [[VEC_PHI:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP10:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC2-NEXT: [[VEC_PHI1:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP11:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC2-NEXT: [[VEC_PHI2:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[TMP8:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC2-NEXT: [[VEC_PHI3:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[TMP9:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC2-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
		; CHECK-VF1IC2-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
		; CHECK-VF1IC2-NEXT: [[TMP2:%.]] = getelementptr i64, ptr [[SRC:%.]], i64 [[TMP0]]
		; CHECK-VF1IC2-NEXT: [[TMP3:%.*]] = getelementptr i64, ptr [[SRC]], i64 [[TMP1]]
		; CHECK-VF1IC2-NEXT: [[TMP4:%.*]] = load i64, ptr [[TMP2]], align 4
		; CHECK-VF1IC2-NEXT: [[TMP5:%.*]] = load i64, ptr [[TMP3]], align 4
		; CHECK-VF1IC2-NEXT: [[TMP6:%.*]] = icmp ugt i64 [[VEC_PHI2]], [[TMP4]]
		; CHECK-VF1IC2-NEXT: [[TMP7:%.*]] = icmp ugt i64 [[VEC_PHI3]], [[TMP5]]
		; CHECK-VF1IC2-NEXT: [[TMP8]] = tail call i64 @llvm.umin.i64(i64 [[VEC_PHI2]], i64 [[TMP4]])
		; CHECK-VF1IC2-NEXT: [[TMP9]] = tail call i64 @llvm.umin.i64(i64 [[VEC_PHI3]], i64 [[TMP5]])
		; CHECK-VF1IC2-NEXT: [[TMP10]] = select i1 [[TMP6]], i64 [[TMP0]], i64 [[VEC_PHI]]
		; CHECK-VF1IC2-NEXT: [[TMP11]] = select i1 [[TMP7]], i64 [[TMP1]], i64 [[VEC_PHI1]]
		; CHECK-VF1IC2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
		; CHECK-VF1IC2-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], 0
		; CHECK-VF1IC2-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
		; CHECK-VF1IC2: middle.block:
		; CHECK-VF1IC2-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp ult i64 [[TMP8]], [[TMP9]]
		; CHECK-VF1IC2-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select i1 [[RDX_MINMAX_CMP]], i64 [[TMP8]], i64 [[TMP9]]
		; CHECK-VF1IC2-NEXT: [[CMP_N:%.*]] = icmp eq i64 0, 0
		; CHECK-VF1IC2-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ule i64 [[TMP8]], [[TMP9]]
		; CHECK-VF1IC2-NEXT: [[RDX_SELECT:%.*]] = select i1 [[RDX_SELECT_CMP]], i64 [[TMP10]], i64 [[TMP11]]
		; CHECK-VF1IC2-NEXT: [[RDX_SELECT_CMP4:%.*]] = icmp ne i64 [[RDX_SELECT]], -9223372036854775808
		; CHECK-VF1IC2-NEXT: [[RDX_SELECT5:%.*]] = select i1 [[RDX_SELECT_CMP4]], i64 [[RDX_SELECT]], i64 0
		; CHECK-VF1IC2-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
		; CHECK-VF1IC2: scalar.ph:
		; CHECK-VF1IC2-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 0, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
		; CHECK-VF1IC2-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ 0, [[ENTRY]] ], [ [[RDX_MINMAX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF1IC2-NEXT: [[BC_MERGE_RDX6:%.*]] = phi i64 [ 0, [[ENTRY]] ], [ [[RDX_SELECT5]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF1IC2-NEXT: br label [[LOOP:%.*]]
		; CHECK-VF1IC2: loop:
		; CHECK-VF1IC2-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP]] ]
		; CHECK-VF1IC2-NEXT: [[MIN_IDX:%.]] = phi i64 [ [[BC_MERGE_RDX6]], [[SCALAR_PH]] ], [ [[MIN_IDX_NEXT:%.]], [[LOOP]] ]
		; CHECK-VF1IC2-NEXT: [[MIN_VAL:%.]] = phi i64 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[MIN_VAL_NEXT:%.]], [[LOOP]] ]
		; CHECK-VF1IC2-NEXT: [[GEP:%.*]] = getelementptr i64, ptr [[SRC]], i64 [[IV]]
		; CHECK-VF1IC2-NEXT: [[L:%.*]] = load i64, ptr [[GEP]], align 4
		; CHECK-VF1IC2-NEXT: [[CMP:%.*]] = icmp ugt i64 [[MIN_VAL]], [[L]]
		; CHECK-VF1IC2-NEXT: [[MIN_VAL_NEXT]] = tail call i64 @llvm.umin.i64(i64 [[MIN_VAL]], i64 [[L]])
		; CHECK-VF1IC2-NEXT: [[MIN_IDX_NEXT]] = select i1 [[CMP]], i64 [[IV]], i64 [[MIN_IDX]]
		; CHECK-VF1IC2-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
		; CHECK-VF1IC2-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 0
		; CHECK-VF1IC2-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[LOOP]], !llvm.loop [[LOOP3:![0-9]+]]
		; CHECK-VF1IC2: exit:
		; CHECK-VF1IC2-NEXT: [[RES:%.*]] = phi i64 [ [[MIN_IDX_NEXT]], [[LOOP]] ], [ [[RDX_SELECT5]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF1IC2-NEXT: [[RES_UMIN:%.*]] = phi i64 [ [[MIN_VAL_NEXT]], [[LOOP]] ], [ [[RDX_MINMAX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF1IC2-NEXT: store i64 [[RES_UMIN]], ptr [[UMIN:%.*]], align 4
		; CHECK-VF1IC2-NEXT: ret i64 [[RES]]
;		;
entry:		entry:
br label %loop		br label %loop

loop:		loop:
%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]		%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
%min.idx = phi i64 [ 0, %entry ], [ %min.idx.next, %loop ]		%min.idx = phi i64 [ 0, %entry ], [ %min.idx.next, %loop ]
%min.val = phi i64 [ 0, %entry ], [ %min.val.next, %loop ]		%min.val = phi i64 [ 0, %entry ], [ %min.val.next, %loop ]
Show All 33 Lines	loop:
%exitcond.not = icmp eq i64 %iv.next, 0		%exitcond.not = icmp eq i64 %iv.next, 0
br i1 %exitcond.not, label %exit, label %loop		br i1 %exitcond.not, label %exit, label %loop

exit:		exit:
%res = phi i64 [ %min.idx.next, %loop ]		%res = phi i64 [ %min.idx.next, %loop ]
ret i64 %res		ret i64 %res
}		}

define i64 @test_not_vectorize_select_no_min_reduction(ptr %src) {		define i64 @test_not_vectorize_select_no_min_reduction(ptr %src) {
; CHECK-LABEL: @test_not_vectorize_select_no_min_reduction(		; CHECK-VF4IC1-LABEL: @test_not_vectorize_select_no_min_reduction(
		fhahnUnsubmitted Not Done Reply Inline Actions Is this incorrectly vectorized or does the test name need fixing? It looks like `%min.val` isn't an actual minimum value phi? fhahn: Is this incorrectly vectorized or does the test name need fixing? It looks like ` %min.val`…
; CHECK-NOT: vector.body:		; CHECK-VF4IC1-NEXT: entry:
		; CHECK-VF4IC1-NEXT: br i1 true, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK-VF4IC1: vector.ph:
		; CHECK-VF4IC1-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK-VF4IC1: vector.body:
		; CHECK-VF4IC1-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[VEC_PHI:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP6:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[VECTOR_RECUR:%.]] = phi <4 x i64> [ <i64 poison, i64 poison, i64 poison, i64 0>, [[VECTOR_PH]] ], [ [[TMP3:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
		; CHECK-VF4IC1-NEXT: [[TMP1:%.]] = getelementptr i64, ptr [[SRC:%.]], i64 [[TMP0]]
		; CHECK-VF4IC1-NEXT: [[TMP2:%.*]] = getelementptr i64, ptr [[TMP1]], i32 0
		; CHECK-VF4IC1-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i64>, ptr [[TMP2]], align 4
		; CHECK-VF4IC1-NEXT: [[TMP3]] = add <4 x i64> [[WIDE_LOAD]], <i64 1, i64 1, i64 1, i64 1>
		; CHECK-VF4IC1-NEXT: [[TMP4:%.*]] = shufflevector <4 x i64> [[VECTOR_RECUR]], <4 x i64> [[TMP3]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
		; CHECK-VF4IC1-NEXT: [[TMP5:%.*]] = icmp ugt <4 x i64> [[TMP4]], [[WIDE_LOAD]]
		; CHECK-VF4IC1-NEXT: [[TMP6]] = select <4 x i1> [[TMP5]], <4 x i64> [[VEC_IND]], <4 x i64> [[VEC_PHI]]
		; CHECK-VF4IC1-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
		; CHECK-VF4IC1-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC1-NEXT: [[TMP7:%.*]] = icmp eq i64 [[INDEX_NEXT]], 0
		; CHECK-VF4IC1-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
		; CHECK-VF4IC1: middle.block:
		; CHECK-VF4IC1-NEXT: [[TMP8:%.*]] = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> [[TMP6]])
		; CHECK-VF4IC1-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[TMP8]], -9223372036854775808
		; CHECK-VF4IC1-NEXT: [[RDX_SELECT:%.*]] = select i1 [[RDX_SELECT_CMP]], i64 [[TMP8]], i64 0
		; CHECK-VF4IC1-NEXT: [[CMP_N:%.*]] = icmp eq i64 0, 0
		; CHECK-VF4IC1-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3
		; CHECK-VF4IC1-NEXT: [[VECTOR_RECUR_EXTRACT_FOR_PHI:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2
		; CHECK-VF4IC1-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
		; CHECK-VF4IC1: scalar.ph:
		; CHECK-VF4IC1-NEXT: [[SCALAR_RECUR_INIT:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[VECTOR_RECUR_EXTRACT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC1-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]
		; CHECK-VF4IC1-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ 0, [[ENTRY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC1-NEXT: br label [[LOOP:%.*]]
		; CHECK-VF4IC1: loop:
		; CHECK-VF4IC1-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP]] ]
		; CHECK-VF4IC1-NEXT: [[MIN_IDX:%.]] = phi i64 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[MIN_IDX_NEXT:%.]], [[LOOP]] ]
		; CHECK-VF4IC1-NEXT: [[SCALAR_RECUR:%.]] = phi i64 [ [[SCALAR_RECUR_INIT]], [[SCALAR_PH]] ], [ [[MIN_VAL_NEXT:%.]], [[LOOP]] ]
		; CHECK-VF4IC1-NEXT: [[GEP:%.*]] = getelementptr i64, ptr [[SRC]], i64 [[IV]]
		; CHECK-VF4IC1-NEXT: [[L:%.*]] = load i64, ptr [[GEP]], align 4
		; CHECK-VF4IC1-NEXT: [[CMP:%.*]] = icmp ugt i64 [[SCALAR_RECUR]], [[L]]
		; CHECK-VF4IC1-NEXT: [[MIN_VAL_NEXT]] = add i64 [[L]], 1
		; CHECK-VF4IC1-NEXT: [[FOO:%.*]] = call i64 @llvm.umin.i64(i64 [[SCALAR_RECUR]], i64 [[L]])
		; CHECK-VF4IC1-NEXT: [[MIN_IDX_NEXT]] = select i1 [[CMP]], i64 [[IV]], i64 [[MIN_IDX]]
		; CHECK-VF4IC1-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
		; CHECK-VF4IC1-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 0
		; CHECK-VF4IC1-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[LOOP]], !llvm.loop [[LOOP5:![0-9]+]]
		; CHECK-VF4IC1: exit:
		; CHECK-VF4IC1-NEXT: [[RES:%.*]] = phi i64 [ [[MIN_IDX_NEXT]], [[LOOP]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC1-NEXT: ret i64 [[RES]]
		;
		; CHECK-VF4IC2-LABEL: @test_not_vectorize_select_no_min_reduction(
		; CHECK-VF4IC2-NEXT: entry:
		; CHECK-VF4IC2-NEXT: br i1 true, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK-VF4IC2: vector.ph:
		; CHECK-VF4IC2-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK-VF4IC2: vector.body:
		; CHECK-VF4IC2-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC2-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC2-NEXT: [[VEC_PHI:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP12:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC2-NEXT: [[VEC_PHI2:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP13:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC2-NEXT: [[VECTOR_RECUR:%.]] = phi <4 x i64> [ <i64 poison, i64 poison, i64 poison, i64 0>, [[VECTOR_PH]] ], [ [[TMP7:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC2-NEXT: [[STEP_ADD:%.*]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC2-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
		; CHECK-VF4IC2-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 4
		; CHECK-VF4IC2-NEXT: [[TMP2:%.]] = getelementptr i64, ptr [[SRC:%.]], i64 [[TMP0]]
		; CHECK-VF4IC2-NEXT: [[TMP3:%.*]] = getelementptr i64, ptr [[SRC]], i64 [[TMP1]]
		; CHECK-VF4IC2-NEXT: [[TMP4:%.*]] = getelementptr i64, ptr [[TMP2]], i32 0
		; CHECK-VF4IC2-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i64>, ptr [[TMP4]], align 4
		; CHECK-VF4IC2-NEXT: [[TMP5:%.*]] = getelementptr i64, ptr [[TMP2]], i32 4
		; CHECK-VF4IC2-NEXT: [[WIDE_LOAD3:%.*]] = load <4 x i64>, ptr [[TMP5]], align 4
		; CHECK-VF4IC2-NEXT: [[TMP6:%.*]] = add <4 x i64> [[WIDE_LOAD]], <i64 1, i64 1, i64 1, i64 1>
		; CHECK-VF4IC2-NEXT: [[TMP7]] = add <4 x i64> [[WIDE_LOAD3]], <i64 1, i64 1, i64 1, i64 1>
		; CHECK-VF4IC2-NEXT: [[TMP8:%.*]] = shufflevector <4 x i64> [[VECTOR_RECUR]], <4 x i64> [[TMP6]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
		; CHECK-VF4IC2-NEXT: [[TMP9:%.*]] = shufflevector <4 x i64> [[TMP6]], <4 x i64> [[TMP7]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
		; CHECK-VF4IC2-NEXT: [[TMP10:%.*]] = icmp ugt <4 x i64> [[TMP8]], [[WIDE_LOAD]]
		; CHECK-VF4IC2-NEXT: [[TMP11:%.*]] = icmp ugt <4 x i64> [[TMP9]], [[WIDE_LOAD3]]
		; CHECK-VF4IC2-NEXT: [[TMP12]] = select <4 x i1> [[TMP10]], <4 x i64> [[VEC_IND]], <4 x i64> [[VEC_PHI]]
		; CHECK-VF4IC2-NEXT: [[TMP13]] = select <4 x i1> [[TMP11]], <4 x i64> [[STEP_ADD]], <4 x i64> [[VEC_PHI2]]
		; CHECK-VF4IC2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8
		; CHECK-VF4IC2-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[STEP_ADD]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC2-NEXT: [[TMP14:%.*]] = icmp eq i64 [[INDEX_NEXT]], 0
		; CHECK-VF4IC2-NEXT: br i1 [[TMP14]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
		; CHECK-VF4IC2: middle.block:
		; CHECK-VF4IC2-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp sgt <4 x i64> [[TMP12]], [[TMP13]]
		; CHECK-VF4IC2-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP]], <4 x i64> [[TMP12]], <4 x i64> [[TMP13]]
		; CHECK-VF4IC2-NEXT: [[TMP15:%.*]] = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> [[RDX_MINMAX_SELECT]])
		; CHECK-VF4IC2-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[TMP15]], -9223372036854775808
		; CHECK-VF4IC2-NEXT: [[RDX_SELECT:%.*]] = select i1 [[RDX_SELECT_CMP]], i64 [[TMP15]], i64 0
		; CHECK-VF4IC2-NEXT: [[CMP_N:%.*]] = icmp eq i64 0, 0
		; CHECK-VF4IC2-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i64> [[TMP7]], i32 3
		; CHECK-VF4IC2-NEXT: [[VECTOR_RECUR_EXTRACT_FOR_PHI:%.*]] = extractelement <4 x i64> [[TMP7]], i32 2
		; CHECK-VF4IC2-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
		; CHECK-VF4IC2: scalar.ph:
		; CHECK-VF4IC2-NEXT: [[SCALAR_RECUR_INIT:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[VECTOR_RECUR_EXTRACT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC2-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]
		; CHECK-VF4IC2-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ 0, [[ENTRY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC2-NEXT: br label [[LOOP:%.*]]
		; CHECK-VF4IC2: loop:
		; CHECK-VF4IC2-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP]] ]
		; CHECK-VF4IC2-NEXT: [[MIN_IDX:%.]] = phi i64 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[MIN_IDX_NEXT:%.]], [[LOOP]] ]
		; CHECK-VF4IC2-NEXT: [[SCALAR_RECUR:%.]] = phi i64 [ [[SCALAR_RECUR_INIT]], [[SCALAR_PH]] ], [ [[MIN_VAL_NEXT:%.]], [[LOOP]] ]
		; CHECK-VF4IC2-NEXT: [[GEP:%.*]] = getelementptr i64, ptr [[SRC]], i64 [[IV]]
		; CHECK-VF4IC2-NEXT: [[L:%.*]] = load i64, ptr [[GEP]], align 4
		; CHECK-VF4IC2-NEXT: [[CMP:%.*]] = icmp ugt i64 [[SCALAR_RECUR]], [[L]]
		; CHECK-VF4IC2-NEXT: [[MIN_VAL_NEXT]] = add i64 [[L]], 1
		; CHECK-VF4IC2-NEXT: [[FOO:%.*]] = call i64 @llvm.umin.i64(i64 [[SCALAR_RECUR]], i64 [[L]])
		; CHECK-VF4IC2-NEXT: [[MIN_IDX_NEXT]] = select i1 [[CMP]], i64 [[IV]], i64 [[MIN_IDX]]
		; CHECK-VF4IC2-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
		; CHECK-VF4IC2-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 0
		; CHECK-VF4IC2-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[LOOP]], !llvm.loop [[LOOP5:![0-9]+]]
		; CHECK-VF4IC2: exit:
		; CHECK-VF4IC2-NEXT: [[RES:%.*]] = phi i64 [ [[MIN_IDX_NEXT]], [[LOOP]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC2-NEXT: ret i64 [[RES]]
		;
		; CHECK-VF1IC2-LABEL: @test_not_vectorize_select_no_min_reduction(
		; CHECK-VF1IC2-NEXT: entry:
		; CHECK-VF1IC2-NEXT: br i1 true, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK-VF1IC2: vector.ph:
		; CHECK-VF1IC2-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK-VF1IC2: vector.body:
		; CHECK-VF1IC2-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC2-NEXT: [[VEC_PHI:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP10:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC2-NEXT: [[VEC_PHI1:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP11:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC2-NEXT: [[VECTOR_RECUR:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[TMP7:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC2-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
		; CHECK-VF1IC2-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
		; CHECK-VF1IC2-NEXT: [[TMP2:%.]] = getelementptr i64, ptr [[SRC:%.]], i64 [[TMP0]]
		; CHECK-VF1IC2-NEXT: [[TMP3:%.*]] = getelementptr i64, ptr [[SRC]], i64 [[TMP1]]
		; CHECK-VF1IC2-NEXT: [[TMP4:%.*]] = load i64, ptr [[TMP2]], align 4
		; CHECK-VF1IC2-NEXT: [[TMP5:%.*]] = load i64, ptr [[TMP3]], align 4
		; CHECK-VF1IC2-NEXT: [[TMP6:%.*]] = add i64 [[TMP4]], 1
		; CHECK-VF1IC2-NEXT: [[TMP7]] = add i64 [[TMP5]], 1
		; CHECK-VF1IC2-NEXT: [[TMP8:%.*]] = icmp ugt i64 [[VECTOR_RECUR]], [[TMP4]]
		; CHECK-VF1IC2-NEXT: [[TMP9:%.*]] = icmp ugt i64 [[TMP6]], [[TMP5]]
		; CHECK-VF1IC2-NEXT: [[TMP10]] = select i1 [[TMP8]], i64 [[TMP0]], i64 [[VEC_PHI]]
		; CHECK-VF1IC2-NEXT: [[TMP11]] = select i1 [[TMP9]], i64 [[TMP1]], i64 [[VEC_PHI1]]
		; CHECK-VF1IC2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
		; CHECK-VF1IC2-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], 0
		; CHECK-VF1IC2-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
		; CHECK-VF1IC2: middle.block:
		; CHECK-VF1IC2-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp sgt i64 [[TMP10]], [[TMP11]]
		; CHECK-VF1IC2-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select i1 [[RDX_MINMAX_CMP]], i64 [[TMP10]], i64 [[TMP11]]
		; CHECK-VF1IC2-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[RDX_MINMAX_SELECT]], -9223372036854775808
		; CHECK-VF1IC2-NEXT: [[RDX_SELECT:%.*]] = select i1 [[RDX_SELECT_CMP]], i64 [[RDX_MINMAX_SELECT]], i64 0
		; CHECK-VF1IC2-NEXT: [[CMP_N:%.*]] = icmp eq i64 0, 0
		; CHECK-VF1IC2-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
		; CHECK-VF1IC2: scalar.ph:
		; CHECK-VF1IC2-NEXT: [[SCALAR_RECUR_INIT:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF1IC2-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]
		; CHECK-VF1IC2-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ 0, [[ENTRY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF1IC2-NEXT: br label [[LOOP:%.*]]
		; CHECK-VF1IC2: loop:
		; CHECK-VF1IC2-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP]] ]
		; CHECK-VF1IC2-NEXT: [[MIN_IDX:%.]] = phi i64 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[MIN_IDX_NEXT:%.]], [[LOOP]] ]
		; CHECK-VF1IC2-NEXT: [[SCALAR_RECUR:%.]] = phi i64 [ [[SCALAR_RECUR_INIT]], [[SCALAR_PH]] ], [ [[MIN_VAL_NEXT:%.]], [[LOOP]] ]
		; CHECK-VF1IC2-NEXT: [[GEP:%.*]] = getelementptr i64, ptr [[SRC]], i64 [[IV]]
		; CHECK-VF1IC2-NEXT: [[L:%.*]] = load i64, ptr [[GEP]], align 4
		; CHECK-VF1IC2-NEXT: [[CMP:%.*]] = icmp ugt i64 [[SCALAR_RECUR]], [[L]]
		; CHECK-VF1IC2-NEXT: [[MIN_VAL_NEXT]] = add i64 [[L]], 1
		; CHECK-VF1IC2-NEXT: [[FOO:%.*]] = call i64 @llvm.umin.i64(i64 [[SCALAR_RECUR]], i64 [[L]])
		; CHECK-VF1IC2-NEXT: [[MIN_IDX_NEXT]] = select i1 [[CMP]], i64 [[IV]], i64 [[MIN_IDX]]
		; CHECK-VF1IC2-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
		; CHECK-VF1IC2-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 0
		; CHECK-VF1IC2-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[LOOP]], !llvm.loop [[LOOP5:![0-9]+]]
		; CHECK-VF1IC2: exit:
		; CHECK-VF1IC2-NEXT: [[RES:%.*]] = phi i64 [ [[MIN_IDX_NEXT]], [[LOOP]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF1IC2-NEXT: ret i64 [[RES]]
;		;
entry:		entry:
br label %loop		br label %loop

loop:		loop:
%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]		%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
%min.idx = phi i64 [ 0, %entry ], [ %min.idx.next, %loop ]		%min.idx = phi i64 [ 0, %entry ], [ %min.idx.next, %loop ]
%min.val = phi i64 [ 0, %entry ], [ %min.val.next, %loop ]		%min.val = phi i64 [ 0, %entry ], [ %min.val.next, %loop ]
▲ Show 20 Lines • Show All 206 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/smax-idx.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				fhahnUnsubmitted Not Done Reply Inline Actions Could you add new tests as a separate patch? fhahn: Could you add new tests as a separate patch?
				Mel-ChenAuthorUnsubmitted Done Reply Inline Actions Of course. I will split an NFC patch tomorrow. Mel-Chen: Of course. I will split an NFC patch tomorrow.
				; RUN: opt -passes=loop-vectorize -force-vector-width=4 -force-vector-interleave=1 -debug-only=loop-vectorize,iv-descriptors -S < %s 2>&1 \| FileCheck %s --check-prefix=CHECK-VF4IC1 --check-prefix=CHECK
				; RUN: opt -passes=loop-vectorize -force-vector-width=4 -force-vector-interleave=4 -debug-only=loop-vectorize,iv-descriptors -S < %s 2>&1 \| FileCheck %s --check-prefix=CHECK-VF4IC4 --check-prefix=CHECK
				; RUN: opt -passes=loop-vectorize -force-vector-width=1 -force-vector-interleave=4 -debug-only=loop-vectorize,iv-descriptors -S < %s 2>&1 \| FileCheck %s --check-prefix=CHECK-VF1IC4 --check-prefix=CHECK

				define i64 @smax_idx(ptr nocapture readonly %a, i64 %mm, i64 %ii, ptr nocapture writeonly %res_max, i64 %n) {
				; CHECK-VF4IC1-LABEL: @smax_idx(
				; CHECK-VF4IC1-NEXT: entry:
				; CHECK-VF4IC1-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 4
				; CHECK-VF4IC1-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK-VF4IC1: vector.ph:
				; CHECK-VF4IC1-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 4
				; CHECK-VF4IC1-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
				; CHECK-VF4IC1-NEXT: [[MINMAX_IDENT_SPLATINSERT:%.]] = insertelement <4 x i64> poison, i64 [[MM:%.]], i64 0
				; CHECK-VF4IC1-NEXT: [[MINMAX_IDENT_SPLAT:%.*]] = shufflevector <4 x i64> [[MINMAX_IDENT_SPLATINSERT]], <4 x i64> poison, <4 x i32> zeroinitializer
				; CHECK-VF4IC1-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK-VF4IC1: vector.body:
				; CHECK-VF4IC1-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC1-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC1-NEXT: [[VEC_PHI:%.]] = phi <4 x i64> [ [[MINMAX_IDENT_SPLAT]], [[VECTOR_PH]] ], [ [[TMP3:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC1-NEXT: [[VEC_PHI1:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP5:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC1-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
				; CHECK-VF4IC1-NEXT: [[TMP1:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP0]]
				; CHECK-VF4IC1-NEXT: [[TMP2:%.*]] = getelementptr inbounds i64, ptr [[TMP1]], i32 0
				; CHECK-VF4IC1-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i64>, ptr [[TMP2]], align 4
				; CHECK-VF4IC1-NEXT: [[TMP3]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[VEC_PHI]], <4 x i64> [[WIDE_LOAD]])
				; CHECK-VF4IC1-NEXT: [[TMP4:%.*]] = icmp slt <4 x i64> [[VEC_PHI]], [[WIDE_LOAD]]
				; CHECK-VF4IC1-NEXT: [[TMP5]] = select <4 x i1> [[TMP4]], <4 x i64> [[VEC_IND]], <4 x i64> [[VEC_PHI1]]
				; CHECK-VF4IC1-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
				; CHECK-VF4IC1-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
				; CHECK-VF4IC1-NEXT: [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-VF4IC1-NEXT: br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
				; CHECK-VF4IC1: middle.block:
				; CHECK-VF4IC1-NEXT: [[TMP7:%.*]] = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> [[TMP3]])
				; CHECK-VF4IC1-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <4 x i64> poison, i64 [[TMP7]], i64 0
				; CHECK-VF4IC1-NEXT: [[DOTSPLAT:%.*]] = shufflevector <4 x i64> [[DOTSPLATINSERT]], <4 x i64> poison, <4 x i32> zeroinitializer
				; CHECK-VF4IC1-NEXT: [[MASK_CMP:%.*]] = icmp eq <4 x i64> [[DOTSPLAT]], [[TMP3]]
				; CHECK-VF4IC1-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
				; CHECK-VF4IC1-NEXT: [[MASK_SELECT:%.*]] = select <4 x i1> [[MASK_CMP]], <4 x i64> [[TMP5]], <4 x i64> <i64 9223372036854775807, i64 9223372036854775807, i64 9223372036854775807, i64 9223372036854775807>
				; CHECK-VF4IC1-NEXT: [[TMP8:%.*]] = call i64 @llvm.vector.reduce.smin.v4i64(<4 x i64> [[MASK_SELECT]])
				; CHECK-VF4IC1-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[TMP8]], -9223372036854775808
				; CHECK-VF4IC1-NEXT: [[RDX_SELECT:%.]] = select i1 [[RDX_SELECT_CMP]], i64 [[TMP8]], i64 [[II:%.]]
				; CHECK-VF4IC1-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
				; CHECK-VF4IC1: scalar.ph:
				; CHECK-VF4IC1-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; CHECK-VF4IC1-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ [[MM]], [[ENTRY]] ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]
				; CHECK-VF4IC1-NEXT: [[BC_MERGE_RDX2:%.*]] = phi i64 [ [[II]], [[ENTRY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
				; CHECK-VF4IC1-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK-VF4IC1: for.body:
				; CHECK-VF4IC1-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
				; CHECK-VF4IC1-NEXT: [[MAX_09:%.]] = phi i64 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[TMP10:%.]], [[FOR_BODY]] ]
				; CHECK-VF4IC1-NEXT: [[IDX_011:%.]] = phi i64 [ [[BC_MERGE_RDX2]], [[SCALAR_PH]] ], [ [[SPEC_SELECT7:%.]], [[FOR_BODY]] ]
				; CHECK-VF4IC1-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[INDVARS_IV]]
				; CHECK-VF4IC1-NEXT: [[TMP9:%.*]] = load i64, ptr [[ARRAYIDX]], align 4
				; CHECK-VF4IC1-NEXT: [[TMP10]] = tail call i64 @llvm.smax.i64(i64 [[MAX_09]], i64 [[TMP9]])
				; CHECK-VF4IC1-NEXT: [[CMP1:%.*]] = icmp slt i64 [[MAX_09]], [[TMP9]]
				; CHECK-VF4IC1-NEXT: [[SPEC_SELECT7]] = select i1 [[CMP1]], i64 [[INDVARS_IV]], i64 [[IDX_011]]
				; CHECK-VF4IC1-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
				; CHECK-VF4IC1-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]
				; CHECK-VF4IC1-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
				; CHECK-VF4IC1: exit:
				; CHECK-VF4IC1-NEXT: [[DOTLCSSA:%.*]] = phi i64 [ [[TMP10]], [[FOR_BODY]] ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]
				; CHECK-VF4IC1-NEXT: [[SPEC_SELECT7_LCSSA:%.*]] = phi i64 [ [[SPEC_SELECT7]], [[FOR_BODY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
				; CHECK-VF4IC1-NEXT: store i64 [[DOTLCSSA]], ptr [[RES_MAX:%.*]], align 4
				; CHECK-VF4IC1-NEXT: ret i64 [[SPEC_SELECT7_LCSSA]]
				;
				; CHECK-VF4IC4-LABEL: @smax_idx(
				; CHECK-VF4IC4-NEXT: entry:
				; CHECK-VF4IC4-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 16
				; CHECK-VF4IC4-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK-VF4IC4: vector.ph:
				; CHECK-VF4IC4-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 16
				; CHECK-VF4IC4-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
				; CHECK-VF4IC4-NEXT: [[MINMAX_IDENT_SPLATINSERT:%.]] = insertelement <4 x i64> poison, i64 [[MM:%.]], i64 0
				; CHECK-VF4IC4-NEXT: [[MINMAX_IDENT_SPLAT:%.*]] = shufflevector <4 x i64> [[MINMAX_IDENT_SPLATINSERT]], <4 x i64> poison, <4 x i32> zeroinitializer
				; CHECK-VF4IC4-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK-VF4IC4: vector.body:
				; CHECK-VF4IC4-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC4-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC4-NEXT: [[VEC_PHI:%.]] = phi <4 x i64> [ [[MINMAX_IDENT_SPLAT]], [[VECTOR_PH]] ], [ [[TMP12:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC4-NEXT: [[VEC_PHI4:%.]] = phi <4 x i64> [ [[MINMAX_IDENT_SPLAT]], [[VECTOR_PH]] ], [ [[TMP13:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC4-NEXT: [[VEC_PHI5:%.]] = phi <4 x i64> [ [[MINMAX_IDENT_SPLAT]], [[VECTOR_PH]] ], [ [[TMP14:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC4-NEXT: [[VEC_PHI6:%.]] = phi <4 x i64> [ [[MINMAX_IDENT_SPLAT]], [[VECTOR_PH]] ], [ [[TMP15:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC4-NEXT: [[VEC_PHI7:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP20:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC4-NEXT: [[VEC_PHI8:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP21:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC4-NEXT: [[VEC_PHI9:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP22:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC4-NEXT: [[VEC_PHI10:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP23:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC4-NEXT: [[STEP_ADD:%.*]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
				; CHECK-VF4IC4-NEXT: [[STEP_ADD1:%.*]] = add <4 x i64> [[STEP_ADD]], <i64 4, i64 4, i64 4, i64 4>
				; CHECK-VF4IC4-NEXT: [[STEP_ADD2:%.*]] = add <4 x i64> [[STEP_ADD1]], <i64 4, i64 4, i64 4, i64 4>
				; CHECK-VF4IC4-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
				; CHECK-VF4IC4-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 4
				; CHECK-VF4IC4-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 8
				; CHECK-VF4IC4-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 12
				; CHECK-VF4IC4-NEXT: [[TMP4:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP0]]
				; CHECK-VF4IC4-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP1]]
				; CHECK-VF4IC4-NEXT: [[TMP6:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP2]]
				; CHECK-VF4IC4-NEXT: [[TMP7:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP3]]
				; CHECK-VF4IC4-NEXT: [[TMP8:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 0
				; CHECK-VF4IC4-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i64>, ptr [[TMP8]], align 4
				; CHECK-VF4IC4-NEXT: [[TMP9:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 4
				; CHECK-VF4IC4-NEXT: [[WIDE_LOAD11:%.*]] = load <4 x i64>, ptr [[TMP9]], align 4
				; CHECK-VF4IC4-NEXT: [[TMP10:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 8
				; CHECK-VF4IC4-NEXT: [[WIDE_LOAD12:%.*]] = load <4 x i64>, ptr [[TMP10]], align 4
				; CHECK-VF4IC4-NEXT: [[TMP11:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 12
				; CHECK-VF4IC4-NEXT: [[WIDE_LOAD13:%.*]] = load <4 x i64>, ptr [[TMP11]], align 4
				; CHECK-VF4IC4-NEXT: [[TMP12]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[VEC_PHI]], <4 x i64> [[WIDE_LOAD]])
				; CHECK-VF4IC4-NEXT: [[TMP13]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[VEC_PHI4]], <4 x i64> [[WIDE_LOAD11]])
				; CHECK-VF4IC4-NEXT: [[TMP14]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[VEC_PHI5]], <4 x i64> [[WIDE_LOAD12]])
				; CHECK-VF4IC4-NEXT: [[TMP15]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[VEC_PHI6]], <4 x i64> [[WIDE_LOAD13]])
				; CHECK-VF4IC4-NEXT: [[TMP16:%.*]] = icmp slt <4 x i64> [[VEC_PHI]], [[WIDE_LOAD]]
				; CHECK-VF4IC4-NEXT: [[TMP17:%.*]] = icmp slt <4 x i64> [[VEC_PHI4]], [[WIDE_LOAD11]]
				; CHECK-VF4IC4-NEXT: [[TMP18:%.*]] = icmp slt <4 x i64> [[VEC_PHI5]], [[WIDE_LOAD12]]
				; CHECK-VF4IC4-NEXT: [[TMP19:%.*]] = icmp slt <4 x i64> [[VEC_PHI6]], [[WIDE_LOAD13]]
				; CHECK-VF4IC4-NEXT: [[TMP20]] = select <4 x i1> [[TMP16]], <4 x i64> [[VEC_IND]], <4 x i64> [[VEC_PHI7]]
				; CHECK-VF4IC4-NEXT: [[TMP21]] = select <4 x i1> [[TMP17]], <4 x i64> [[STEP_ADD]], <4 x i64> [[VEC_PHI8]]
				; CHECK-VF4IC4-NEXT: [[TMP22]] = select <4 x i1> [[TMP18]], <4 x i64> [[STEP_ADD1]], <4 x i64> [[VEC_PHI9]]
				; CHECK-VF4IC4-NEXT: [[TMP23]] = select <4 x i1> [[TMP19]], <4 x i64> [[STEP_ADD2]], <4 x i64> [[VEC_PHI10]]
				; CHECK-VF4IC4-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16
				; CHECK-VF4IC4-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[STEP_ADD2]], <i64 4, i64 4, i64 4, i64 4>
				; CHECK-VF4IC4-NEXT: [[TMP24:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-VF4IC4-NEXT: br i1 [[TMP24]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
				; CHECK-VF4IC4: middle.block:
				; CHECK-VF4IC4-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp sgt <4 x i64> [[TMP12]], [[TMP13]]
				; CHECK-VF4IC4-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP]], <4 x i64> [[TMP12]], <4 x i64> [[TMP13]]
				; CHECK-VF4IC4-NEXT: [[RDX_MINMAX_CMP14:%.*]] = icmp sgt <4 x i64> [[RDX_MINMAX_SELECT]], [[TMP14]]
				; CHECK-VF4IC4-NEXT: [[RDX_MINMAX_SELECT15:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP14]], <4 x i64> [[RDX_MINMAX_SELECT]], <4 x i64> [[TMP14]]
				; CHECK-VF4IC4-NEXT: [[RDX_MINMAX_CMP16:%.*]] = icmp sgt <4 x i64> [[RDX_MINMAX_SELECT15]], [[TMP15]]
				; CHECK-VF4IC4-NEXT: [[RDX_MINMAX_SELECT17:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP16]], <4 x i64> [[RDX_MINMAX_SELECT15]], <4 x i64> [[TMP15]]
				; CHECK-VF4IC4-NEXT: [[TMP25:%.*]] = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> [[RDX_MINMAX_SELECT17]])
				; CHECK-VF4IC4-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <4 x i64> poison, i64 [[TMP25]], i64 0
				; CHECK-VF4IC4-NEXT: [[DOTSPLAT:%.*]] = shufflevector <4 x i64> [[DOTSPLATINSERT]], <4 x i64> poison, <4 x i32> zeroinitializer
				; CHECK-VF4IC4-NEXT: [[MASK_CMP:%.*]] = icmp eq <4 x i64> [[DOTSPLAT]], [[RDX_MINMAX_SELECT17]]
				; CHECK-VF4IC4-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
				; CHECK-VF4IC4-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp sge <4 x i64> [[TMP12]], [[TMP13]]
				; CHECK-VF4IC4-NEXT: [[RDX_SELECT:%.*]] = select <4 x i1> [[RDX_SELECT_CMP]], <4 x i64> [[TMP20]], <4 x i64> [[TMP21]]
				; CHECK-VF4IC4-NEXT: [[RDX_SELECT_CMP18:%.*]] = icmp sge <4 x i64> [[RDX_MINMAX_SELECT]], [[TMP14]]
				; CHECK-VF4IC4-NEXT: [[RDX_SELECT19:%.*]] = select <4 x i1> [[RDX_SELECT_CMP18]], <4 x i64> [[RDX_SELECT]], <4 x i64> [[TMP22]]
				; CHECK-VF4IC4-NEXT: [[RDX_SELECT_CMP20:%.*]] = icmp sge <4 x i64> [[RDX_MINMAX_SELECT15]], [[TMP15]]
				; CHECK-VF4IC4-NEXT: [[RDX_SELECT21:%.*]] = select <4 x i1> [[RDX_SELECT_CMP20]], <4 x i64> [[RDX_SELECT19]], <4 x i64> [[TMP23]]
				; CHECK-VF4IC4-NEXT: [[MASK_SELECT:%.*]] = select <4 x i1> [[MASK_CMP]], <4 x i64> [[RDX_SELECT21]], <4 x i64> <i64 9223372036854775807, i64 9223372036854775807, i64 9223372036854775807, i64 9223372036854775807>
				; CHECK-VF4IC4-NEXT: [[TMP26:%.*]] = call i64 @llvm.vector.reduce.smin.v4i64(<4 x i64> [[MASK_SELECT]])
				; CHECK-VF4IC4-NEXT: [[RDX_SELECT_CMP22:%.*]] = icmp ne i64 [[TMP26]], -9223372036854775808
				; CHECK-VF4IC4-NEXT: [[RDX_SELECT23:%.]] = select i1 [[RDX_SELECT_CMP22]], i64 [[TMP26]], i64 [[II:%.]]
				; CHECK-VF4IC4-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
				; CHECK-VF4IC4: scalar.ph:
				; CHECK-VF4IC4-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; CHECK-VF4IC4-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ [[MM]], [[ENTRY]] ], [ [[TMP25]], [[MIDDLE_BLOCK]] ]
				; CHECK-VF4IC4-NEXT: [[BC_MERGE_RDX24:%.*]] = phi i64 [ [[II]], [[ENTRY]] ], [ [[RDX_SELECT23]], [[MIDDLE_BLOCK]] ]
				; CHECK-VF4IC4-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK-VF4IC4: for.body:
				; CHECK-VF4IC4-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
				; CHECK-VF4IC4-NEXT: [[MAX_09:%.]] = phi i64 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[TMP28:%.]], [[FOR_BODY]] ]
				; CHECK-VF4IC4-NEXT: [[IDX_011:%.]] = phi i64 [ [[BC_MERGE_RDX24]], [[SCALAR_PH]] ], [ [[SPEC_SELECT7:%.]], [[FOR_BODY]] ]
				; CHECK-VF4IC4-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[INDVARS_IV]]
				; CHECK-VF4IC4-NEXT: [[TMP27:%.*]] = load i64, ptr [[ARRAYIDX]], align 4
				; CHECK-VF4IC4-NEXT: [[TMP28]] = tail call i64 @llvm.smax.i64(i64 [[MAX_09]], i64 [[TMP27]])
				; CHECK-VF4IC4-NEXT: [[CMP1:%.*]] = icmp slt i64 [[MAX_09]], [[TMP27]]
				; CHECK-VF4IC4-NEXT: [[SPEC_SELECT7]] = select i1 [[CMP1]], i64 [[INDVARS_IV]], i64 [[IDX_011]]
				; CHECK-VF4IC4-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
				; CHECK-VF4IC4-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]
				; CHECK-VF4IC4-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
				; CHECK-VF4IC4: exit:
				; CHECK-VF4IC4-NEXT: [[DOTLCSSA:%.*]] = phi i64 [ [[TMP28]], [[FOR_BODY]] ], [ [[TMP25]], [[MIDDLE_BLOCK]] ]
				; CHECK-VF4IC4-NEXT: [[SPEC_SELECT7_LCSSA:%.*]] = phi i64 [ [[SPEC_SELECT7]], [[FOR_BODY]] ], [ [[RDX_SELECT23]], [[MIDDLE_BLOCK]] ]
				; CHECK-VF4IC4-NEXT: store i64 [[DOTLCSSA]], ptr [[RES_MAX:%.*]], align 4
				; CHECK-VF4IC4-NEXT: ret i64 [[SPEC_SELECT7_LCSSA]]
				;
				; CHECK-VF1IC4-LABEL: @smax_idx(
				; CHECK-VF1IC4-NEXT: entry:
				; CHECK-VF1IC4-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 4
				; CHECK-VF1IC4-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK-VF1IC4: vector.ph:
				; CHECK-VF1IC4-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 4
				; CHECK-VF1IC4-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
				; CHECK-VF1IC4-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK-VF1IC4: vector.body:
				; CHECK-VF1IC4-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF1IC4-NEXT: [[VEC_PHI:%.]] = phi i64 [ [[MM:%.]], [[VECTOR_PH]] ], [ [[TMP12:%.*]], [[VECTOR_BODY]] ]
				; CHECK-VF1IC4-NEXT: [[VEC_PHI1:%.]] = phi i64 [ [[MM]], [[VECTOR_PH]] ], [ [[TMP13:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF1IC4-NEXT: [[VEC_PHI2:%.]] = phi i64 [ [[MM]], [[VECTOR_PH]] ], [ [[TMP14:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF1IC4-NEXT: [[VEC_PHI3:%.]] = phi i64 [ [[MM]], [[VECTOR_PH]] ], [ [[TMP15:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF1IC4-NEXT: [[VEC_PHI4:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP20:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF1IC4-NEXT: [[VEC_PHI5:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP21:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF1IC4-NEXT: [[VEC_PHI6:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP22:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF1IC4-NEXT: [[VEC_PHI7:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP23:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF1IC4-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
				; CHECK-VF1IC4-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
				; CHECK-VF1IC4-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 2
				; CHECK-VF1IC4-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 3
				; CHECK-VF1IC4-NEXT: [[TMP4:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP0]]
				; CHECK-VF1IC4-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP1]]
				; CHECK-VF1IC4-NEXT: [[TMP6:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP2]]
				; CHECK-VF1IC4-NEXT: [[TMP7:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP3]]
				; CHECK-VF1IC4-NEXT: [[TMP8:%.*]] = load i64, ptr [[TMP4]], align 4
				; CHECK-VF1IC4-NEXT: [[TMP9:%.*]] = load i64, ptr [[TMP5]], align 4
				; CHECK-VF1IC4-NEXT: [[TMP10:%.*]] = load i64, ptr [[TMP6]], align 4
				; CHECK-VF1IC4-NEXT: [[TMP11:%.*]] = load i64, ptr [[TMP7]], align 4
				; CHECK-VF1IC4-NEXT: [[TMP12]] = tail call i64 @llvm.smax.i64(i64 [[VEC_PHI]], i64 [[TMP8]])
				; CHECK-VF1IC4-NEXT: [[TMP13]] = tail call i64 @llvm.smax.i64(i64 [[VEC_PHI1]], i64 [[TMP9]])
				; CHECK-VF1IC4-NEXT: [[TMP14]] = tail call i64 @llvm.smax.i64(i64 [[VEC_PHI2]], i64 [[TMP10]])
				; CHECK-VF1IC4-NEXT: [[TMP15]] = tail call i64 @llvm.smax.i64(i64 [[VEC_PHI3]], i64 [[TMP11]])
				; CHECK-VF1IC4-NEXT: [[TMP16:%.*]] = icmp slt i64 [[VEC_PHI]], [[TMP8]]
				; CHECK-VF1IC4-NEXT: [[TMP17:%.*]] = icmp slt i64 [[VEC_PHI1]], [[TMP9]]
				; CHECK-VF1IC4-NEXT: [[TMP18:%.*]] = icmp slt i64 [[VEC_PHI2]], [[TMP10]]
				; CHECK-VF1IC4-NEXT: [[TMP19:%.*]] = icmp slt i64 [[VEC_PHI3]], [[TMP11]]
				; CHECK-VF1IC4-NEXT: [[TMP20]] = select i1 [[TMP16]], i64 [[TMP0]], i64 [[VEC_PHI4]]
				; CHECK-VF1IC4-NEXT: [[TMP21]] = select i1 [[TMP17]], i64 [[TMP1]], i64 [[VEC_PHI5]]
				; CHECK-VF1IC4-NEXT: [[TMP22]] = select i1 [[TMP18]], i64 [[TMP2]], i64 [[VEC_PHI6]]
				; CHECK-VF1IC4-NEXT: [[TMP23]] = select i1 [[TMP19]], i64 [[TMP3]], i64 [[VEC_PHI7]]
				; CHECK-VF1IC4-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
				; CHECK-VF1IC4-NEXT: [[TMP24:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-VF1IC4-NEXT: br i1 [[TMP24]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
				; CHECK-VF1IC4: middle.block:
				; CHECK-VF1IC4-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp sgt i64 [[TMP12]], [[TMP13]]
				; CHECK-VF1IC4-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select i1 [[RDX_MINMAX_CMP]], i64 [[TMP12]], i64 [[TMP13]]
				; CHECK-VF1IC4-NEXT: [[RDX_MINMAX_CMP8:%.*]] = icmp sgt i64 [[RDX_MINMAX_SELECT]], [[TMP14]]
				; CHECK-VF1IC4-NEXT: [[RDX_MINMAX_SELECT9:%.*]] = select i1 [[RDX_MINMAX_CMP8]], i64 [[RDX_MINMAX_SELECT]], i64 [[TMP14]]
				; CHECK-VF1IC4-NEXT: [[RDX_MINMAX_CMP10:%.*]] = icmp sgt i64 [[RDX_MINMAX_SELECT9]], [[TMP15]]
				; CHECK-VF1IC4-NEXT: [[RDX_MINMAX_SELECT11:%.*]] = select i1 [[RDX_MINMAX_CMP10]], i64 [[RDX_MINMAX_SELECT9]], i64 [[TMP15]]
				; CHECK-VF1IC4-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
				; CHECK-VF1IC4-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp sge i64 [[TMP12]], [[TMP13]]
				; CHECK-VF1IC4-NEXT: [[RDX_SELECT:%.*]] = select i1 [[RDX_SELECT_CMP]], i64 [[TMP20]], i64 [[TMP21]]
				; CHECK-VF1IC4-NEXT: [[RDX_SELECT_CMP12:%.*]] = icmp sge i64 [[RDX_MINMAX_SELECT]], [[TMP14]]
				; CHECK-VF1IC4-NEXT: [[RDX_SELECT13:%.*]] = select i1 [[RDX_SELECT_CMP12]], i64 [[RDX_SELECT]], i64 [[TMP22]]
				; CHECK-VF1IC4-NEXT: [[RDX_SELECT_CMP14:%.*]] = icmp sge i64 [[RDX_MINMAX_SELECT9]], [[TMP15]]
				; CHECK-VF1IC4-NEXT: [[RDX_SELECT15:%.*]] = select i1 [[RDX_SELECT_CMP14]], i64 [[RDX_SELECT13]], i64 [[TMP23]]
				; CHECK-VF1IC4-NEXT: [[RDX_SELECT_CMP16:%.*]] = icmp ne i64 [[RDX_SELECT15]], -9223372036854775808
				; CHECK-VF1IC4-NEXT: [[RDX_SELECT17:%.]] = select i1 [[RDX_SELECT_CMP16]], i64 [[RDX_SELECT15]], i64 [[II:%.]]
				; CHECK-VF1IC4-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
				; CHECK-VF1IC4: scalar.ph:
				; CHECK-VF1IC4-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; CHECK-VF1IC4-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ [[MM]], [[ENTRY]] ], [ [[RDX_MINMAX_SELECT11]], [[MIDDLE_BLOCK]] ]
				; CHECK-VF1IC4-NEXT: [[BC_MERGE_RDX18:%.*]] = phi i64 [ [[II]], [[ENTRY]] ], [ [[RDX_SELECT17]], [[MIDDLE_BLOCK]] ]
				; CHECK-VF1IC4-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK-VF1IC4: for.body:
				; CHECK-VF1IC4-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
				; CHECK-VF1IC4-NEXT: [[MAX_09:%.]] = phi i64 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[TMP26:%.]], [[FOR_BODY]] ]
				; CHECK-VF1IC4-NEXT: [[IDX_011:%.]] = phi i64 [ [[BC_MERGE_RDX18]], [[SCALAR_PH]] ], [ [[SPEC_SELECT7:%.]], [[FOR_BODY]] ]
				; CHECK-VF1IC4-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[INDVARS_IV]]
				; CHECK-VF1IC4-NEXT: [[TMP25:%.*]] = load i64, ptr [[ARRAYIDX]], align 4
				; CHECK-VF1IC4-NEXT: [[TMP26]] = tail call i64 @llvm.smax.i64(i64 [[MAX_09]], i64 [[TMP25]])
				; CHECK-VF1IC4-NEXT: [[CMP1:%.*]] = icmp slt i64 [[MAX_09]], [[TMP25]]
				; CHECK-VF1IC4-NEXT: [[SPEC_SELECT7]] = select i1 [[CMP1]], i64 [[INDVARS_IV]], i64 [[IDX_011]]
				; CHECK-VF1IC4-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
				; CHECK-VF1IC4-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]
				; CHECK-VF1IC4-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
				; CHECK-VF1IC4: exit:
				; CHECK-VF1IC4-NEXT: [[DOTLCSSA:%.*]] = phi i64 [ [[TMP26]], [[FOR_BODY]] ], [ [[RDX_MINMAX_SELECT11]], [[MIDDLE_BLOCK]] ]
				; CHECK-VF1IC4-NEXT: [[SPEC_SELECT7_LCSSA:%.*]] = phi i64 [ [[SPEC_SELECT7]], [[FOR_BODY]] ], [ [[RDX_SELECT17]], [[MIDDLE_BLOCK]] ]
				; CHECK-VF1IC4-NEXT: store i64 [[DOTLCSSA]], ptr [[RES_MAX:%.*]], align 4
				; CHECK-VF1IC4-NEXT: ret i64 [[SPEC_SELECT7_LCSSA]]
				;
				entry:
				br label %for.body

				for.body:
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%max.09 = phi i64 [ %mm, %entry ], [ %1, %for.body ]
				%idx.011 = phi i64 [ %ii, %entry ], [ %spec.select7, %for.body ]
				%arrayidx = getelementptr inbounds i64, ptr %a, i64 %indvars.iv
				%0 = load i64, ptr %arrayidx
				%1 = tail call i64 @llvm.smax.i64(i64 %max.09, i64 %0)
				%cmp1 = icmp slt i64 %max.09, %0
				%spec.select7 = select i1 %cmp1, i64 %indvars.iv, i64 %idx.011
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond.not = icmp eq i64 %indvars.iv.next, %n
				br i1 %exitcond.not, label %exit, label %for.body

				exit:
				store i64 %1, ptr %res_max
				ret i64 %spec.select7
				}

				;
				; Check the different order of reduction phis.
				;
				define i64 @smax_idx_inverted_phi(ptr nocapture readonly %a, i64 %mm, i64 %ii, ptr nocapture writeonly %res_max, i64 %n) {
				; CHECK-VF4IC1-LABEL: @smax_idx_inverted_phi(
				; CHECK-VF4IC1-NEXT: entry:
				; CHECK-VF4IC1-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 4
				; CHECK-VF4IC1-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK-VF4IC1: vector.ph:
				; CHECK-VF4IC1-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 4
				; CHECK-VF4IC1-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
				; CHECK-VF4IC1-NEXT: [[MINMAX_IDENT_SPLATINSERT:%.]] = insertelement <4 x i64> poison, i64 [[MM:%.]], i64 0
				; CHECK-VF4IC1-NEXT: [[MINMAX_IDENT_SPLAT:%.*]] = shufflevector <4 x i64> [[MINMAX_IDENT_SPLATINSERT]], <4 x i64> poison, <4 x i32> zeroinitializer
				; CHECK-VF4IC1-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK-VF4IC1: vector.body:
				; CHECK-VF4IC1-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC1-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC1-NEXT: [[VEC_PHI:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP5:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC1-NEXT: [[VEC_PHI1:%.]] = phi <4 x i64> [ [[MINMAX_IDENT_SPLAT]], [[VECTOR_PH]] ], [ [[TMP3:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC1-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
				; CHECK-VF4IC1-NEXT: [[TMP1:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP0]]
				; CHECK-VF4IC1-NEXT: [[TMP2:%.*]] = getelementptr inbounds i64, ptr [[TMP1]], i32 0
				; CHECK-VF4IC1-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i64>, ptr [[TMP2]], align 4
				; CHECK-VF4IC1-NEXT: [[TMP3]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[VEC_PHI1]], <4 x i64> [[WIDE_LOAD]])
				; CHECK-VF4IC1-NEXT: [[TMP4:%.*]] = icmp slt <4 x i64> [[VEC_PHI1]], [[WIDE_LOAD]]
				; CHECK-VF4IC1-NEXT: [[TMP5]] = select <4 x i1> [[TMP4]], <4 x i64> [[VEC_IND]], <4 x i64> [[VEC_PHI]]
				; CHECK-VF4IC1-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
				; CHECK-VF4IC1-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
				; CHECK-VF4IC1-NEXT: [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-VF4IC1-NEXT: br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
				; CHECK-VF4IC1: middle.block:
				; CHECK-VF4IC1-NEXT: [[TMP7:%.*]] = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> [[TMP3]])
				; CHECK-VF4IC1-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <4 x i64> poison, i64 [[TMP7]], i64 0
				; CHECK-VF4IC1-NEXT: [[DOTSPLAT:%.*]] = shufflevector <4 x i64> [[DOTSPLATINSERT]], <4 x i64> poison, <4 x i32> zeroinitializer
				; CHECK-VF4IC1-NEXT: [[MASK_CMP:%.*]] = icmp eq <4 x i64> [[DOTSPLAT]], [[TMP3]]
				; CHECK-VF4IC1-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
				; CHECK-VF4IC1-NEXT: [[MASK_SELECT:%.*]] = select <4 x i1> [[MASK_CMP]], <4 x i64> [[TMP5]], <4 x i64> <i64 9223372036854775807, i64 9223372036854775807, i64 9223372036854775807, i64 9223372036854775807>
				; CHECK-VF4IC1-NEXT: [[TMP8:%.*]] = call i64 @llvm.vector.reduce.smin.v4i64(<4 x i64> [[MASK_SELECT]])
				; CHECK-VF4IC1-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[TMP8]], -9223372036854775808
				; CHECK-VF4IC1-NEXT: [[RDX_SELECT:%.]] = select i1 [[RDX_SELECT_CMP]], i64 [[TMP8]], i64 [[II:%.]]
				; CHECK-VF4IC1-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
				; CHECK-VF4IC1: scalar.ph:
				; CHECK-VF4IC1-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; CHECK-VF4IC1-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ [[MM]], [[ENTRY]] ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]
				; CHECK-VF4IC1-NEXT: [[BC_MERGE_RDX2:%.*]] = phi i64 [ [[II]], [[ENTRY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
				; CHECK-VF4IC1-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK-VF4IC1: for.body:
				; CHECK-VF4IC1-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
				; CHECK-VF4IC1-NEXT: [[IDX_011:%.]] = phi i64 [ [[BC_MERGE_RDX2]], [[SCALAR_PH]] ], [ [[SPEC_SELECT7:%.]], [[FOR_BODY]] ]
				; CHECK-VF4IC1-NEXT: [[MAX_09:%.]] = phi i64 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[TMP10:%.]], [[FOR_BODY]] ]
				; CHECK-VF4IC1-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[INDVARS_IV]]
				; CHECK-VF4IC1-NEXT: [[TMP9:%.*]] = load i64, ptr [[ARRAYIDX]], align 4
				; CHECK-VF4IC1-NEXT: [[TMP10]] = tail call i64 @llvm.smax.i64(i64 [[MAX_09]], i64 [[TMP9]])
				; CHECK-VF4IC1-NEXT: [[CMP1:%.*]] = icmp slt i64 [[MAX_09]], [[TMP9]]
				; CHECK-VF4IC1-NEXT: [[SPEC_SELECT7]] = select i1 [[CMP1]], i64 [[INDVARS_IV]], i64 [[IDX_011]]
				; CHECK-VF4IC1-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
				; CHECK-VF4IC1-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]
				; CHECK-VF4IC1-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
				; CHECK-VF4IC1: exit:
				; CHECK-VF4IC1-NEXT: [[DOTLCSSA:%.*]] = phi i64 [ [[TMP10]], [[FOR_BODY]] ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]
				; CHECK-VF4IC1-NEXT: [[SPEC_SELECT7_LCSSA:%.*]] = phi i64 [ [[SPEC_SELECT7]], [[FOR_BODY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
				; CHECK-VF4IC1-NEXT: store i64 [[DOTLCSSA]], ptr [[RES_MAX:%.*]], align 4
				; CHECK-VF4IC1-NEXT: ret i64 [[SPEC_SELECT7_LCSSA]]
				;
				; CHECK-VF4IC4-LABEL: @smax_idx_inverted_phi(
				; CHECK-VF4IC4-NEXT: entry:
				; CHECK-VF4IC4-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 16
				; CHECK-VF4IC4-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK-VF4IC4: vector.ph:
				; CHECK-VF4IC4-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 16
				; CHECK-VF4IC4-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
				; CHECK-VF4IC4-NEXT: [[MINMAX_IDENT_SPLATINSERT:%.]] = insertelement <4 x i64> poison, i64 [[MM:%.]], i64 0
				; CHECK-VF4IC4-NEXT: [[MINMAX_IDENT_SPLAT:%.*]] = shufflevector <4 x i64> [[MINMAX_IDENT_SPLATINSERT]], <4 x i64> poison, <4 x i32> zeroinitializer
				; CHECK-VF4IC4-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK-VF4IC4: vector.body:
				; CHECK-VF4IC4-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC4-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC4-NEXT: [[VEC_PHI:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP20:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC4-NEXT: [[VEC_PHI4:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP21:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC4-NEXT: [[VEC_PHI5:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP22:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC4-NEXT: [[VEC_PHI6:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP23:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC4-NEXT: [[VEC_PHI7:%.]] = phi <4 x i64> [ [[MINMAX_IDENT_SPLAT]], [[VECTOR_PH]] ], [ [[TMP12:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC4-NEXT: [[VEC_PHI8:%.]] = phi <4 x i64> [ [[MINMAX_IDENT_SPLAT]], [[VECTOR_PH]] ], [ [[TMP13:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC4-NEXT: [[VEC_PHI9:%.]] = phi <4 x i64> [ [[MINMAX_IDENT_SPLAT]], [[VECTOR_PH]] ], [ [[TMP14:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC4-NEXT: [[VEC_PHI10:%.]] = phi <4 x i64> [ [[MINMAX_IDENT_SPLAT]], [[VECTOR_PH]] ], [ [[TMP15:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC4-NEXT: [[STEP_ADD:%.*]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
				; CHECK-VF4IC4-NEXT: [[STEP_ADD1:%.*]] = add <4 x i64> [[STEP_ADD]], <i64 4, i64 4, i64 4, i64 4>
				; CHECK-VF4IC4-NEXT: [[STEP_ADD2:%.*]] = add <4 x i64> [[STEP_ADD1]], <i64 4, i64 4, i64 4, i64 4>
				; CHECK-VF4IC4-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
				; CHECK-VF4IC4-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 4
				; CHECK-VF4IC4-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 8
				; CHECK-VF4IC4-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 12
				; CHECK-VF4IC4-NEXT: [[TMP4:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP0]]
				; CHECK-VF4IC4-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP1]]
				; CHECK-VF4IC4-NEXT: [[TMP6:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP2]]
				; CHECK-VF4IC4-NEXT: [[TMP7:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP3]]
				; CHECK-VF4IC4-NEXT: [[TMP8:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 0
				; CHECK-VF4IC4-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i64>, ptr [[TMP8]], align 4
				; CHECK-VF4IC4-NEXT: [[TMP9:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 4
				; CHECK-VF4IC4-NEXT: [[WIDE_LOAD11:%.*]] = load <4 x i64>, ptr [[TMP9]], align 4
				; CHECK-VF4IC4-NEXT: [[TMP10:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 8
				; CHECK-VF4IC4-NEXT: [[WIDE_LOAD12:%.*]] = load <4 x i64>, ptr [[TMP10]], align 4
				; CHECK-VF4IC4-NEXT: [[TMP11:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 12
				; CHECK-VF4IC4-NEXT: [[WIDE_LOAD13:%.*]] = load <4 x i64>, ptr [[TMP11]], align 4
				; CHECK-VF4IC4-NEXT: [[TMP12]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[VEC_PHI7]], <4 x i64> [[WIDE_LOAD]])
				; CHECK-VF4IC4-NEXT: [[TMP13]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[VEC_PHI8]], <4 x i64> [[WIDE_LOAD11]])
				; CHECK-VF4IC4-NEXT: [[TMP14]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[VEC_PHI9]], <4 x i64> [[WIDE_LOAD12]])
				; CHECK-VF4IC4-NEXT: [[TMP15]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[VEC_PHI10]], <4 x i64> [[WIDE_LOAD13]])
				; CHECK-VF4IC4-NEXT: [[TMP16:%.*]] = icmp slt <4 x i64> [[VEC_PHI7]], [[WIDE_LOAD]]
				; CHECK-VF4IC4-NEXT: [[TMP17:%.*]] = icmp slt <4 x i64> [[VEC_PHI8]], [[WIDE_LOAD11]]
				; CHECK-VF4IC4-NEXT: [[TMP18:%.*]] = icmp slt <4 x i64> [[VEC_PHI9]], [[WIDE_LOAD12]]
				; CHECK-VF4IC4-NEXT: [[TMP19:%.*]] = icmp slt <4 x i64> [[VEC_PHI10]], [[WIDE_LOAD13]]
				; CHECK-VF4IC4-NEXT: [[TMP20]] = select <4 x i1> [[TMP16]], <4 x i64> [[VEC_IND]], <4 x i64> [[VEC_PHI]]
				; CHECK-VF4IC4-NEXT: [[TMP21]] = select <4 x i1> [[TMP17]], <4 x i64> [[STEP_ADD]], <4 x i64> [[VEC_PHI4]]
				; CHECK-VF4IC4-NEXT: [[TMP22]] = select <4 x i1> [[TMP18]], <4 x i64> [[STEP_ADD1]], <4 x i64> [[VEC_PHI5]]
				; CHECK-VF4IC4-NEXT: [[TMP23]] = select <4 x i1> [[TMP19]], <4 x i64> [[STEP_ADD2]], <4 x i64> [[VEC_PHI6]]
				; CHECK-VF4IC4-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16
				; CHECK-VF4IC4-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[STEP_ADD2]], <i64 4, i64 4, i64 4, i64 4>
				; CHECK-VF4IC4-NEXT: [[TMP24:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-VF4IC4-NEXT: br i1 [[TMP24]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
				; CHECK-VF4IC4: middle.block:
				; CHECK-VF4IC4-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp sgt <4 x i64> [[TMP12]], [[TMP13]]
				; CHECK-VF4IC4-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP]], <4 x i64> [[TMP12]], <4 x i64> [[TMP13]]
				; CHECK-VF4IC4-NEXT: [[RDX_MINMAX_CMP14:%.*]] = icmp sgt <4 x i64> [[RDX_MINMAX_SELECT]], [[TMP14]]
				; CHECK-VF4IC4-NEXT: [[RDX_MINMAX_SELECT15:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP14]], <4 x i64> [[RDX_MINMAX_SELECT]], <4 x i64> [[TMP14]]
				; CHECK-VF4IC4-NEXT: [[RDX_MINMAX_CMP16:%.*]] = icmp sgt <4 x i64> [[RDX_MINMAX_SELECT15]], [[TMP15]]
				; CHECK-VF4IC4-NEXT: [[RDX_MINMAX_SELECT17:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP16]], <4 x i64> [[RDX_MINMAX_SELECT15]], <4 x i64> [[TMP15]]
				; CHECK-VF4IC4-NEXT: [[TMP25:%.*]] = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> [[RDX_MINMAX_SELECT17]])
				; CHECK-VF4IC4-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <4 x i64> poison, i64 [[TMP25]], i64 0
				; CHECK-VF4IC4-NEXT: [[DOTSPLAT:%.*]] = shufflevector <4 x i64> [[DOTSPLATINSERT]], <4 x i64> poison, <4 x i32> zeroinitializer
				; CHECK-VF4IC4-NEXT: [[MASK_CMP:%.*]] = icmp eq <4 x i64> [[DOTSPLAT]], [[RDX_MINMAX_SELECT17]]
				; CHECK-VF4IC4-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
				; CHECK-VF4IC4-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp sge <4 x i64> [[TMP12]], [[TMP13]]
				; CHECK-VF4IC4-NEXT: [[RDX_SELECT:%.*]] = select <4 x i1> [[RDX_SELECT_CMP]], <4 x i64> [[TMP20]], <4 x i64> [[TMP21]]
				; CHECK-VF4IC4-NEXT: [[RDX_SELECT_CMP18:%.*]] = icmp sge <4 x i64> [[RDX_MINMAX_SELECT]], [[TMP14]]
				; CHECK-VF4IC4-NEXT: [[RDX_SELECT19:%.*]] = select <4 x i1> [[RDX_SELECT_CMP18]], <4 x i64> [[RDX_SELECT]], <4 x i64> [[TMP22]]
				; CHECK-VF4IC4-NEXT: [[RDX_SELECT_CMP20:%.*]] = icmp sge <4 x i64> [[RDX_MINMAX_SELECT15]], [[TMP15]]
				; CHECK-VF4IC4-NEXT: [[RDX_SELECT21:%.*]] = select <4 x i1> [[RDX_SELECT_CMP20]], <4 x i64> [[RDX_SELECT19]], <4 x i64> [[TMP23]]
				; CHECK-VF4IC4-NEXT: [[MASK_SELECT:%.*]] = select <4 x i1> [[MASK_CMP]], <4 x i64> [[RDX_SELECT21]], <4 x i64> <i64 9223372036854775807, i64 9223372036854775807, i64 9223372036854775807, i64 9223372036854775807>
				; CHECK-VF4IC4-NEXT: [[TMP26:%.*]] = call i64 @llvm.vector.reduce.smin.v4i64(<4 x i64> [[MASK_SELECT]])
				; CHECK-VF4IC4-NEXT: [[RDX_SELECT_CMP22:%.*]] = icmp ne i64 [[TMP26]], -9223372036854775808
				; CHECK-VF4IC4-NEXT: [[RDX_SELECT23:%.]] = select i1 [[RDX_SELECT_CMP22]], i64 [[TMP26]], i64 [[II:%.]]
				; CHECK-VF4IC4-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
				; CHECK-VF4IC4: scalar.ph:
				; CHECK-VF4IC4-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; CHECK-VF4IC4-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ [[MM]], [[ENTRY]] ], [ [[TMP25]], [[MIDDLE_BLOCK]] ]
				; CHECK-VF4IC4-NEXT: [[BC_MERGE_RDX24:%.*]] = phi i64 [ [[II]], [[ENTRY]] ], [ [[RDX_SELECT23]], [[MIDDLE_BLOCK]] ]
				; CHECK-VF4IC4-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK-VF4IC4: for.body:
				; CHECK-VF4IC4-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
				; CHECK-VF4IC4-NEXT: [[IDX_011:%.]] = phi i64 [ [[BC_MERGE_RDX24]], [[SCALAR_PH]] ], [ [[SPEC_SELECT7:%.]], [[FOR_BODY]] ]
				; CHECK-VF4IC4-NEXT: [[MAX_09:%.]] = phi i64 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[TMP28:%.]], [[FOR_BODY]] ]
				; CHECK-VF4IC4-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[INDVARS_IV]]
				; CHECK-VF4IC4-NEXT: [[TMP27:%.*]] = load i64, ptr [[ARRAYIDX]], align 4
				; CHECK-VF4IC4-NEXT: [[TMP28]] = tail call i64 @llvm.smax.i64(i64 [[MAX_09]], i64 [[TMP27]])
				; CHECK-VF4IC4-NEXT: [[CMP1:%.*]] = icmp slt i64 [[MAX_09]], [[TMP27]]
				; CHECK-VF4IC4-NEXT: [[SPEC_SELECT7]] = select i1 [[CMP1]], i64 [[INDVARS_IV]], i64 [[IDX_011]]
				; CHECK-VF4IC4-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
				; CHECK-VF4IC4-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]
				; CHECK-VF4IC4-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
				; CHECK-VF4IC4: exit:
				; CHECK-VF4IC4-NEXT: [[DOTLCSSA:%.*]] = phi i64 [ [[TMP28]], [[FOR_BODY]] ], [ [[TMP25]], [[MIDDLE_BLOCK]] ]
				; CHECK-VF4IC4-NEXT: [[SPEC_SELECT7_LCSSA:%.*]] = phi i64 [ [[SPEC_SELECT7]], [[FOR_BODY]] ], [ [[RDX_SELECT23]], [[MIDDLE_BLOCK]] ]
				; CHECK-VF4IC4-NEXT: store i64 [[DOTLCSSA]], ptr [[RES_MAX:%.*]], align 4
				; CHECK-VF4IC4-NEXT: ret i64 [[SPEC_SELECT7_LCSSA]]
				;
				; CHECK-VF1IC4-LABEL: @smax_idx_inverted_phi(
				; CHECK-VF1IC4-NEXT: entry:
				; CHECK-VF1IC4-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 4
				; CHECK-VF1IC4-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK-VF1IC4: vector.ph:
				; CHECK-VF1IC4-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 4
				; CHECK-VF1IC4-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
				; CHECK-VF1IC4-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK-VF1IC4: vector.body:
				; CHECK-VF1IC4-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF1IC4-NEXT: [[VEC_PHI:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP20:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF1IC4-NEXT: [[VEC_PHI1:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP21:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF1IC4-NEXT: [[VEC_PHI2:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP22:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF1IC4-NEXT: [[VEC_PHI3:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP23:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF1IC4-NEXT: [[VEC_PHI4:%.]] = phi i64 [ [[MM:%.]], [[VECTOR_PH]] ], [ [[TMP12:%.*]], [[VECTOR_BODY]] ]
				; CHECK-VF1IC4-NEXT: [[VEC_PHI5:%.]] = phi i64 [ [[MM]], [[VECTOR_PH]] ], [ [[TMP13:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF1IC4-NEXT: [[VEC_PHI6:%.]] = phi i64 [ [[MM]], [[VECTOR_PH]] ], [ [[TMP14:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF1IC4-NEXT: [[VEC_PHI7:%.]] = phi i64 [ [[MM]], [[VECTOR_PH]] ], [ [[TMP15:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF1IC4-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
				; CHECK-VF1IC4-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
				; CHECK-VF1IC4-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 2
				; CHECK-VF1IC4-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 3
				; CHECK-VF1IC4-NEXT: [[TMP4:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP0]]
				; CHECK-VF1IC4-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP1]]
				; CHECK-VF1IC4-NEXT: [[TMP6:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP2]]
				; CHECK-VF1IC4-NEXT: [[TMP7:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP3]]
				; CHECK-VF1IC4-NEXT: [[TMP8:%.*]] = load i64, ptr [[TMP4]], align 4
				; CHECK-VF1IC4-NEXT: [[TMP9:%.*]] = load i64, ptr [[TMP5]], align 4
				; CHECK-VF1IC4-NEXT: [[TMP10:%.*]] = load i64, ptr [[TMP6]], align 4
				; CHECK-VF1IC4-NEXT: [[TMP11:%.*]] = load i64, ptr [[TMP7]], align 4
				; CHECK-VF1IC4-NEXT: [[TMP12]] = tail call i64 @llvm.smax.i64(i64 [[VEC_PHI4]], i64 [[TMP8]])
				; CHECK-VF1IC4-NEXT: [[TMP13]] = tail call i64 @llvm.smax.i64(i64 [[VEC_PHI5]], i64 [[TMP9]])
				; CHECK-VF1IC4-NEXT: [[TMP14]] = tail call i64 @llvm.smax.i64(i64 [[VEC_PHI6]], i64 [[TMP10]])
				; CHECK-VF1IC4-NEXT: [[TMP15]] = tail call i64 @llvm.smax.i64(i64 [[VEC_PHI7]], i64 [[TMP11]])
				; CHECK-VF1IC4-NEXT: [[TMP16:%.*]] = icmp slt i64 [[VEC_PHI4]], [[TMP8]]
				; CHECK-VF1IC4-NEXT: [[TMP17:%.*]] = icmp slt i64 [[VEC_PHI5]], [[TMP9]]
				; CHECK-VF1IC4-NEXT: [[TMP18:%.*]] = icmp slt i64 [[VEC_PHI6]], [[TMP10]]
				; CHECK-VF1IC4-NEXT: [[TMP19:%.*]] = icmp slt i64 [[VEC_PHI7]], [[TMP11]]
				; CHECK-VF1IC4-NEXT: [[TMP20]] = select i1 [[TMP16]], i64 [[TMP0]], i64 [[VEC_PHI]]
				; CHECK-VF1IC4-NEXT: [[TMP21]] = select i1 [[TMP17]], i64 [[TMP1]], i64 [[VEC_PHI1]]
				; CHECK-VF1IC4-NEXT: [[TMP22]] = select i1 [[TMP18]], i64 [[TMP2]], i64 [[VEC_PHI2]]
				; CHECK-VF1IC4-NEXT: [[TMP23]] = select i1 [[TMP19]], i64 [[TMP3]], i64 [[VEC_PHI3]]
				; CHECK-VF1IC4-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
				; CHECK-VF1IC4-NEXT: [[TMP24:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-VF1IC4-NEXT: br i1 [[TMP24]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
				; CHECK-VF1IC4: middle.block:
				; CHECK-VF1IC4-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp sgt i64 [[TMP12]], [[TMP13]]
				; CHECK-VF1IC4-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select i1 [[RDX_MINMAX_CMP]], i64 [[TMP12]], i64 [[TMP13]]
				; CHECK-VF1IC4-NEXT: [[RDX_MINMAX_CMP8:%.*]] = icmp sgt i64 [[RDX_MINMAX_SELECT]], [[TMP14]]
				; CHECK-VF1IC4-NEXT: [[RDX_MINMAX_SELECT9:%.*]] = select i1 [[RDX_MINMAX_CMP8]], i64 [[RDX_MINMAX_SELECT]], i64 [[TMP14]]
				; CHECK-VF1IC4-NEXT: [[RDX_MINMAX_CMP10:%.*]] = icmp sgt i64 [[RDX_MINMAX_SELECT9]], [[TMP15]]
				; CHECK-VF1IC4-NEXT: [[RDX_MINMAX_SELECT11:%.*]] = select i1 [[RDX_MINMAX_CMP10]], i64 [[RDX_MINMAX_SELECT9]], i64 [[TMP15]]
				; CHECK-VF1IC4-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
				; CHECK-VF1IC4-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp sge i64 [[TMP12]], [[TMP13]]
				; CHECK-VF1IC4-NEXT: [[RDX_SELECT:%.*]] = select i1 [[RDX_SELECT_CMP]], i64 [[TMP20]], i64 [[TMP21]]
				; CHECK-VF1IC4-NEXT: [[RDX_SELECT_CMP12:%.*]] = icmp sge i64 [[RDX_MINMAX_SELECT]], [[TMP14]]
				; CHECK-VF1IC4-NEXT: [[RDX_SELECT13:%.*]] = select i1 [[RDX_SELECT_CMP12]], i64 [[RDX_SELECT]], i64 [[TMP22]]
				; CHECK-VF1IC4-NEXT: [[RDX_SELECT_CMP14:%.*]] = icmp sge i64 [[RDX_MINMAX_SELECT9]], [[TMP15]]
				; CHECK-VF1IC4-NEXT: [[RDX_SELECT15:%.*]] = select i1 [[RDX_SELECT_CMP14]], i64 [[RDX_SELECT13]], i64 [[TMP23]]
				; CHECK-VF1IC4-NEXT: [[RDX_SELECT_CMP16:%.*]] = icmp ne i64 [[RDX_SELECT15]], -9223372036854775808
				; CHECK-VF1IC4-NEXT: [[RDX_SELECT17:%.]] = select i1 [[RDX_SELECT_CMP16]], i64 [[RDX_SELECT15]], i64 [[II:%.]]
				; CHECK-VF1IC4-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
				; CHECK-VF1IC4: scalar.ph:
				; CHECK-VF1IC4-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; CHECK-VF1IC4-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ [[MM]], [[ENTRY]] ], [ [[RDX_MINMAX_SELECT11]], [[MIDDLE_BLOCK]] ]
				; CHECK-VF1IC4-NEXT: [[BC_MERGE_RDX18:%.*]] = phi i64 [ [[II]], [[ENTRY]] ], [ [[RDX_SELECT17]], [[MIDDLE_BLOCK]] ]
				; CHECK-VF1IC4-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK-VF1IC4: for.body:
				; CHECK-VF1IC4-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
				; CHECK-VF1IC4-NEXT: [[IDX_011:%.]] = phi i64 [ [[BC_MERGE_RDX18]], [[SCALAR_PH]] ], [ [[SPEC_SELECT7:%.]], [[FOR_BODY]] ]
				; CHECK-VF1IC4-NEXT: [[MAX_09:%.]] = phi i64 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[TMP26:%.]], [[FOR_BODY]] ]
				; CHECK-VF1IC4-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[INDVARS_IV]]
				; CHECK-VF1IC4-NEXT: [[TMP25:%.*]] = load i64, ptr [[ARRAYIDX]], align 4
				; CHECK-VF1IC4-NEXT: [[TMP26]] = tail call i64 @llvm.smax.i64(i64 [[MAX_09]], i64 [[TMP25]])
				; CHECK-VF1IC4-NEXT: [[CMP1:%.*]] = icmp slt i64 [[MAX_09]], [[TMP25]]
				; CHECK-VF1IC4-NEXT: [[SPEC_SELECT7]] = select i1 [[CMP1]], i64 [[INDVARS_IV]], i64 [[IDX_011]]
				; CHECK-VF1IC4-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
				; CHECK-VF1IC4-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]
				; CHECK-VF1IC4-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
				; CHECK-VF1IC4: exit:
				; CHECK-VF1IC4-NEXT: [[DOTLCSSA:%.*]] = phi i64 [ [[TMP26]], [[FOR_BODY]] ], [ [[RDX_MINMAX_SELECT11]], [[MIDDLE_BLOCK]] ]
				; CHECK-VF1IC4-NEXT: [[SPEC_SELECT7_LCSSA:%.*]] = phi i64 [ [[SPEC_SELECT7]], [[FOR_BODY]] ], [ [[RDX_SELECT17]], [[MIDDLE_BLOCK]] ]
				; CHECK-VF1IC4-NEXT: store i64 [[DOTLCSSA]], ptr [[RES_MAX:%.*]], align 4
				; CHECK-VF1IC4-NEXT: ret i64 [[SPEC_SELECT7_LCSSA]]
				;
				entry:
				br label %for.body

				for.body:
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%idx.011 = phi i64 [ %ii, %entry ], [ %spec.select7, %for.body ] ;;
				%max.09 = phi i64 [ %mm, %entry ], [ %1, %for.body ] ;;
				%arrayidx = getelementptr inbounds i64, ptr %a, i64 %indvars.iv
				%0 = load i64, ptr %arrayidx
				%1 = tail call i64 @llvm.smax.i64(i64 %max.09, i64 %0)
				%cmp1 = icmp slt i64 %max.09, %0
				%spec.select7 = select i1 %cmp1, i64 %indvars.iv, i64 %idx.011
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond.not = icmp eq i64 %indvars.iv.next, %n
				br i1 %exitcond.not, label %exit, label %for.body

				exit:
				store i64 %1, ptr %res_max
				ret i64 %spec.select7
				}

				; Check if it is a MMI when smax is not used outside the loop.
				;
				; Currently at the end, it will check if smax has exitInstruction.
				; But in fact MMI should be possible to use the exitInstruction of
				; SelectICmp be the exitInstruction.
				;
				define i64 @smax_idx_max_no_exit_user(ptr nocapture readonly %a, i64 %mm, i64 %ii, i64 %n) {
				; CHECK-LABEL: @smax_idx_max_no_exit_user(
				; CHECK-NOT: vector.body:
				;
				entry:
				br label %for.body

				for.body:
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%max.09 = phi i64 [ %mm, %entry ], [ %1, %for.body ]
				%idx.011 = phi i64 [ %ii, %entry ], [ %spec.select7, %for.body ]
				%arrayidx = getelementptr inbounds i64, ptr %a, i64 %indvars.iv
				%0 = load i64, ptr %arrayidx
				%1 = tail call i64 @llvm.smax.i64(i64 %max.09, i64 %0)
				%cmp1 = icmp slt i64 %max.09, %0
				%spec.select7 = select i1 %cmp1, i64 %indvars.iv, i64 %idx.011
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond.not = icmp eq i64 %indvars.iv.next, %n
				br i1 %exitcond.not, label %exit, label %for.body

				exit:
				; %1 has no external users
				ret i64 %spec.select7
				}

				; Check smax implemented in terms of select(cmp()).
				;
				; Currently SelectICmp does not support icmp with multiple users.
				; It may be possible to reuse some of the methods in Combination pass to check
				; whether icmp can be copied.
				;
				define i64 @smax_idx_select_cmp(ptr nocapture readonly %a, i64 %mm, i64 %ii, ptr nocapture writeonly %res_max, i64 %n) {
				; CHECK-LABEL: @smax_idx_select_cmp(
				; CHECK-NOT: vector.body:
				;
				entry:
				br label %for.body

				for.body:
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%max.09 = phi i64 [ %mm, %entry ], [ %spec.select, %for.body ]
				%idx.011 = phi i64 [ %ii, %entry ], [ %spec.select7, %for.body ]
				%arrayidx = getelementptr inbounds i64, ptr %a, i64 %indvars.iv
				%0 = load i64, ptr %arrayidx
				%cmp1 = icmp slt i64 %max.09, %0 ;;
				%spec.select = select i1 %cmp1, i64 %0, i64 %max.09 ;;
				%spec.select7 = select i1 %cmp1, i64 %indvars.iv, i64 %idx.011
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond.not = icmp eq i64 %indvars.iv.next, %n
				br i1 %exitcond.not, label %exit, label %for.body

				exit:
				store i64 %spec.select, ptr %res_max
				ret i64 %spec.select7
				}

				;
				; Check sge case.
				;
				define i64 @smax_idx_inverted_pred(ptr nocapture readonly %a, i64 %mm, i64 %ii, ptr nocapture writeonly %res_max, i64 %n) {
				; CHECK-VF4IC1-LABEL: @smax_idx_inverted_pred(
				; CHECK-VF4IC1-NEXT: entry:
				; CHECK-VF4IC1-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 4
				; CHECK-VF4IC1-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK-VF4IC1: vector.ph:
				; CHECK-VF4IC1-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 4
				; CHECK-VF4IC1-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
				; CHECK-VF4IC1-NEXT: [[MINMAX_IDENT_SPLATINSERT:%.]] = insertelement <4 x i64> poison, i64 [[MM:%.]], i64 0
				; CHECK-VF4IC1-NEXT: [[MINMAX_IDENT_SPLAT:%.*]] = shufflevector <4 x i64> [[MINMAX_IDENT_SPLATINSERT]], <4 x i64> poison, <4 x i32> zeroinitializer
				; CHECK-VF4IC1-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK-VF4IC1: vector.body:
				; CHECK-VF4IC1-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC1-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC1-NEXT: [[VEC_PHI:%.]] = phi <4 x i64> [ [[MINMAX_IDENT_SPLAT]], [[VECTOR_PH]] ], [ [[TMP3:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC1-NEXT: [[VEC_PHI1:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP5:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC1-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
				; CHECK-VF4IC1-NEXT: [[TMP1:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP0]]
				; CHECK-VF4IC1-NEXT: [[TMP2:%.*]] = getelementptr inbounds i64, ptr [[TMP1]], i32 0
				; CHECK-VF4IC1-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i64>, ptr [[TMP2]], align 4
				; CHECK-VF4IC1-NEXT: [[TMP3]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[VEC_PHI]], <4 x i64> [[WIDE_LOAD]])
				; CHECK-VF4IC1-NEXT: [[TMP4:%.*]] = icmp sge <4 x i64> [[WIDE_LOAD]], [[VEC_PHI]]
				; CHECK-VF4IC1-NEXT: [[TMP5]] = select <4 x i1> [[TMP4]], <4 x i64> [[VEC_IND]], <4 x i64> [[VEC_PHI1]]
				; CHECK-VF4IC1-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
				; CHECK-VF4IC1-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
				; CHECK-VF4IC1-NEXT: [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-VF4IC1-NEXT: br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
				; CHECK-VF4IC1: middle.block:
				; CHECK-VF4IC1-NEXT: [[TMP7:%.*]] = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> [[TMP3]])
				; CHECK-VF4IC1-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <4 x i64> poison, i64 [[TMP7]], i64 0
				; CHECK-VF4IC1-NEXT: [[DOTSPLAT:%.*]] = shufflevector <4 x i64> [[DOTSPLATINSERT]], <4 x i64> poison, <4 x i32> zeroinitializer
				; CHECK-VF4IC1-NEXT: [[MASK_CMP:%.*]] = icmp eq <4 x i64> [[DOTSPLAT]], [[TMP3]]
				; CHECK-VF4IC1-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
				; CHECK-VF4IC1-NEXT: [[MASK_SELECT:%.*]] = select <4 x i1> [[MASK_CMP]], <4 x i64> [[TMP5]], <4 x i64> <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>
				; CHECK-VF4IC1-NEXT: [[TMP8:%.*]] = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> [[MASK_SELECT]])
				; CHECK-VF4IC1-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[TMP8]], -9223372036854775808
				; CHECK-VF4IC1-NEXT: [[RDX_SELECT:%.]] = select i1 [[RDX_SELECT_CMP]], i64 [[TMP8]], i64 [[II:%.]]
				; CHECK-VF4IC1-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
				; CHECK-VF4IC1: scalar.ph:
				; CHECK-VF4IC1-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; CHECK-VF4IC1-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ [[MM]], [[ENTRY]] ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]
				; CHECK-VF4IC1-NEXT: [[BC_MERGE_RDX2:%.*]] = phi i64 [ [[II]], [[ENTRY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
				; CHECK-VF4IC1-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK-VF4IC1: for.body:
				; CHECK-VF4IC1-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
				; CHECK-VF4IC1-NEXT: [[MAX_09:%.]] = phi i64 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[TMP10:%.]], [[FOR_BODY]] ]
				; CHECK-VF4IC1-NEXT: [[IDX_011:%.]] = phi i64 [ [[BC_MERGE_RDX2]], [[SCALAR_PH]] ], [ [[SPEC_SELECT7:%.]], [[FOR_BODY]] ]
				; CHECK-VF4IC1-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[INDVARS_IV]]
				; CHECK-VF4IC1-NEXT: [[TMP9:%.*]] = load i64, ptr [[ARRAYIDX]], align 4
				; CHECK-VF4IC1-NEXT: [[TMP10]] = tail call i64 @llvm.smax.i64(i64 [[MAX_09]], i64 [[TMP9]])
				; CHECK-VF4IC1-NEXT: [[CMP1:%.*]] = icmp sge i64 [[TMP9]], [[MAX_09]]
				; CHECK-VF4IC1-NEXT: [[SPEC_SELECT7]] = select i1 [[CMP1]], i64 [[INDVARS_IV]], i64 [[IDX_011]]
				; CHECK-VF4IC1-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
				; CHECK-VF4IC1-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]
				; CHECK-VF4IC1-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]
				; CHECK-VF4IC1: exit:
				; CHECK-VF4IC1-NEXT: [[DOTLCSSA:%.*]] = phi i64 [ [[TMP10]], [[FOR_BODY]] ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]
				; CHECK-VF4IC1-NEXT: [[SPEC_SELECT7_LCSSA:%.*]] = phi i64 [ [[SPEC_SELECT7]], [[FOR_BODY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
				; CHECK-VF4IC1-NEXT: store i64 [[DOTLCSSA]], ptr [[RES_MAX:%.*]], align 4
				; CHECK-VF4IC1-NEXT: ret i64 [[SPEC_SELECT7_LCSSA]]
				;
				; CHECK-VF4IC4-LABEL: @smax_idx_inverted_pred(
				; CHECK-VF4IC4-NEXT: entry:
				; CHECK-VF4IC4-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 16
				; CHECK-VF4IC4-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK-VF4IC4: vector.ph:
				; CHECK-VF4IC4-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 16
				; CHECK-VF4IC4-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
				; CHECK-VF4IC4-NEXT: [[MINMAX_IDENT_SPLATINSERT:%.]] = insertelement <4 x i64> poison, i64 [[MM:%.]], i64 0
				; CHECK-VF4IC4-NEXT: [[MINMAX_IDENT_SPLAT:%.*]] = shufflevector <4 x i64> [[MINMAX_IDENT_SPLATINSERT]], <4 x i64> poison, <4 x i32> zeroinitializer
				; CHECK-VF4IC4-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK-VF4IC4: vector.body:
				; CHECK-VF4IC4-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC4-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC4-NEXT: [[VEC_PHI:%.]] = phi <4 x i64> [ [[MINMAX_IDENT_SPLAT]], [[VECTOR_PH]] ], [ [[TMP12:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC4-NEXT: [[VEC_PHI4:%.]] = phi <4 x i64> [ [[MINMAX_IDENT_SPLAT]], [[VECTOR_PH]] ], [ [[TMP13:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC4-NEXT: [[VEC_PHI5:%.]] = phi <4 x i64> [ [[MINMAX_IDENT_SPLAT]], [[VECTOR_PH]] ], [ [[TMP14:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC4-NEXT: [[VEC_PHI6:%.]] = phi <4 x i64> [ [[MINMAX_IDENT_SPLAT]], [[VECTOR_PH]] ], [ [[TMP15:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC4-NEXT: [[VEC_PHI7:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP20:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC4-NEXT: [[VEC_PHI8:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP21:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC4-NEXT: [[VEC_PHI9:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP22:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC4-NEXT: [[VEC_PHI10:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP23:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC4-NEXT: [[STEP_ADD:%.*]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
				; CHECK-VF4IC4-NEXT: [[STEP_ADD1:%.*]] = add <4 x i64> [[STEP_ADD]], <i64 4, i64 4, i64 4, i64 4>
				; CHECK-VF4IC4-NEXT: [[STEP_ADD2:%.*]] = add <4 x i64> [[STEP_ADD1]], <i64 4, i64 4, i64 4, i64 4>
				; CHECK-VF4IC4-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
				; CHECK-VF4IC4-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 4
				; CHECK-VF4IC4-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 8
				; CHECK-VF4IC4-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 12
				; CHECK-VF4IC4-NEXT: [[TMP4:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP0]]
				; CHECK-VF4IC4-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP1]]
				; CHECK-VF4IC4-NEXT: [[TMP6:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP2]]
				; CHECK-VF4IC4-NEXT: [[TMP7:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP3]]
				; CHECK-VF4IC4-NEXT: [[TMP8:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 0
				; CHECK-VF4IC4-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i64>, ptr [[TMP8]], align 4
				; CHECK-VF4IC4-NEXT: [[TMP9:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 4
				; CHECK-VF4IC4-NEXT: [[WIDE_LOAD11:%.*]] = load <4 x i64>, ptr [[TMP9]], align 4
				; CHECK-VF4IC4-NEXT: [[TMP10:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 8
				; CHECK-VF4IC4-NEXT: [[WIDE_LOAD12:%.*]] = load <4 x i64>, ptr [[TMP10]], align 4
				; CHECK-VF4IC4-NEXT: [[TMP11:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 12
				; CHECK-VF4IC4-NEXT: [[WIDE_LOAD13:%.*]] = load <4 x i64>, ptr [[TMP11]], align 4
				; CHECK-VF4IC4-NEXT: [[TMP12]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[VEC_PHI]], <4 x i64> [[WIDE_LOAD]])
				; CHECK-VF4IC4-NEXT: [[TMP13]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[VEC_PHI4]], <4 x i64> [[WIDE_LOAD11]])
				; CHECK-VF4IC4-NEXT: [[TMP14]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[VEC_PHI5]], <4 x i64> [[WIDE_LOAD12]])
				; CHECK-VF4IC4-NEXT: [[TMP15]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[VEC_PHI6]], <4 x i64> [[WIDE_LOAD13]])
				; CHECK-VF4IC4-NEXT: [[TMP16:%.*]] = icmp sge <4 x i64> [[WIDE_LOAD]], [[VEC_PHI]]
				; CHECK-VF4IC4-NEXT: [[TMP17:%.*]] = icmp sge <4 x i64> [[WIDE_LOAD11]], [[VEC_PHI4]]
				; CHECK-VF4IC4-NEXT: [[TMP18:%.*]] = icmp sge <4 x i64> [[WIDE_LOAD12]], [[VEC_PHI5]]
				; CHECK-VF4IC4-NEXT: [[TMP19:%.*]] = icmp sge <4 x i64> [[WIDE_LOAD13]], [[VEC_PHI6]]
				; CHECK-VF4IC4-NEXT: [[TMP20]] = select <4 x i1> [[TMP16]], <4 x i64> [[VEC_IND]], <4 x i64> [[VEC_PHI7]]
				; CHECK-VF4IC4-NEXT: [[TMP21]] = select <4 x i1> [[TMP17]], <4 x i64> [[STEP_ADD]], <4 x i64> [[VEC_PHI8]]
				; CHECK-VF4IC4-NEXT: [[TMP22]] = select <4 x i1> [[TMP18]], <4 x i64> [[STEP_ADD1]], <4 x i64> [[VEC_PHI9]]
				; CHECK-VF4IC4-NEXT: [[TMP23]] = select <4 x i1> [[TMP19]], <4 x i64> [[STEP_ADD2]], <4 x i64> [[VEC_PHI10]]
				; CHECK-VF4IC4-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16
				; CHECK-VF4IC4-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[STEP_ADD2]], <i64 4, i64 4, i64 4, i64 4>
				; CHECK-VF4IC4-NEXT: [[TMP24:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-VF4IC4-NEXT: br i1 [[TMP24]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
				; CHECK-VF4IC4: middle.block:
				; CHECK-VF4IC4-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp sgt <4 x i64> [[TMP12]], [[TMP13]]
				; CHECK-VF4IC4-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP]], <4 x i64> [[TMP12]], <4 x i64> [[TMP13]]
				; CHECK-VF4IC4-NEXT: [[RDX_MINMAX_CMP14:%.*]] = icmp sgt <4 x i64> [[RDX_MINMAX_SELECT]], [[TMP14]]
				; CHECK-VF4IC4-NEXT: [[RDX_MINMAX_SELECT15:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP14]], <4 x i64> [[RDX_MINMAX_SELECT]], <4 x i64> [[TMP14]]
				; CHECK-VF4IC4-NEXT: [[RDX_MINMAX_CMP16:%.*]] = icmp sgt <4 x i64> [[RDX_MINMAX_SELECT15]], [[TMP15]]
				; CHECK-VF4IC4-NEXT: [[RDX_MINMAX_SELECT17:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP16]], <4 x i64> [[RDX_MINMAX_SELECT15]], <4 x i64> [[TMP15]]
				; CHECK-VF4IC4-NEXT: [[TMP25:%.*]] = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> [[RDX_MINMAX_SELECT17]])
				; CHECK-VF4IC4-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <4 x i64> poison, i64 [[TMP25]], i64 0
				; CHECK-VF4IC4-NEXT: [[DOTSPLAT:%.*]] = shufflevector <4 x i64> [[DOTSPLATINSERT]], <4 x i64> poison, <4 x i32> zeroinitializer
				; CHECK-VF4IC4-NEXT: [[MASK_CMP:%.*]] = icmp eq <4 x i64> [[DOTSPLAT]], [[RDX_MINMAX_SELECT17]]
				; CHECK-VF4IC4-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
				; CHECK-VF4IC4-NEXT: [[RDX_SELECT:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP]], <4 x i64> [[TMP20]], <4 x i64> [[TMP21]]
				; CHECK-VF4IC4-NEXT: [[RDX_SELECT18:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP14]], <4 x i64> [[RDX_SELECT]], <4 x i64> [[TMP22]]
				; CHECK-VF4IC4-NEXT: [[RDX_SELECT19:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP16]], <4 x i64> [[RDX_SELECT18]], <4 x i64> [[TMP23]]
				; CHECK-VF4IC4-NEXT: [[MASK_SELECT:%.*]] = select <4 x i1> [[MASK_CMP]], <4 x i64> [[RDX_SELECT19]], <4 x i64> <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>
				; CHECK-VF4IC4-NEXT: [[TMP26:%.*]] = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> [[MASK_SELECT]])
				; CHECK-VF4IC4-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[TMP26]], -9223372036854775808
				; CHECK-VF4IC4-NEXT: [[RDX_SELECT20:%.]] = select i1 [[RDX_SELECT_CMP]], i64 [[TMP26]], i64 [[II:%.]]
				; CHECK-VF4IC4-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
				; CHECK-VF4IC4: scalar.ph:
				; CHECK-VF4IC4-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; CHECK-VF4IC4-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ [[MM]], [[ENTRY]] ], [ [[TMP25]], [[MIDDLE_BLOCK]] ]
				; CHECK-VF4IC4-NEXT: [[BC_MERGE_RDX21:%.*]] = phi i64 [ [[II]], [[ENTRY]] ], [ [[RDX_SELECT20]], [[MIDDLE_BLOCK]] ]
				; CHECK-VF4IC4-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK-VF4IC4: for.body:
				; CHECK-VF4IC4-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
				; CHECK-VF4IC4-NEXT: [[MAX_09:%.]] = phi i64 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[TMP28:%.]], [[FOR_BODY]] ]
				; CHECK-VF4IC4-NEXT: [[IDX_011:%.]] = phi i64 [ [[BC_MERGE_RDX21]], [[SCALAR_PH]] ], [ [[SPEC_SELECT7:%.]], [[FOR_BODY]] ]
				; CHECK-VF4IC4-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[INDVARS_IV]]
				; CHECK-VF4IC4-NEXT: [[TMP27:%.*]] = load i64, ptr [[ARRAYIDX]], align 4
				; CHECK-VF4IC4-NEXT: [[TMP28]] = tail call i64 @llvm.smax.i64(i64 [[MAX_09]], i64 [[TMP27]])
				; CHECK-VF4IC4-NEXT: [[CMP1:%.*]] = icmp sge i64 [[TMP27]], [[MAX_09]]
				; CHECK-VF4IC4-NEXT: [[SPEC_SELECT7]] = select i1 [[CMP1]], i64 [[INDVARS_IV]], i64 [[IDX_011]]
				; CHECK-VF4IC4-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
				; CHECK-VF4IC4-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]
				; CHECK-VF4IC4-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]
				; CHECK-VF4IC4: exit:
				; CHECK-VF4IC4-NEXT: [[DOTLCSSA:%.*]] = phi i64 [ [[TMP28]], [[FOR_BODY]] ], [ [[TMP25]], [[MIDDLE_BLOCK]] ]
				; CHECK-VF4IC4-NEXT: [[SPEC_SELECT7_LCSSA:%.*]] = phi i64 [ [[SPEC_SELECT7]], [[FOR_BODY]] ], [ [[RDX_SELECT20]], [[MIDDLE_BLOCK]] ]
				; CHECK-VF4IC4-NEXT: store i64 [[DOTLCSSA]], ptr [[RES_MAX:%.*]], align 4
				; CHECK-VF4IC4-NEXT: ret i64 [[SPEC_SELECT7_LCSSA]]
				;
				; CHECK-VF1IC4-LABEL: @smax_idx_inverted_pred(
				; CHECK-VF1IC4-NEXT: entry:
				; CHECK-VF1IC4-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 4
				; CHECK-VF1IC4-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK-VF1IC4: vector.ph:
				; CHECK-VF1IC4-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 4
				; CHECK-VF1IC4-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
				; CHECK-VF1IC4-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK-VF1IC4: vector.body:
				; CHECK-VF1IC4-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF1IC4-NEXT: [[VEC_PHI:%.]] = phi i64 [ [[MM:%.]], [[VECTOR_PH]] ], [ [[TMP12:%.*]], [[VECTOR_BODY]] ]
				; CHECK-VF1IC4-NEXT: [[VEC_PHI1:%.]] = phi i64 [ [[MM]], [[VECTOR_PH]] ], [ [[TMP13:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF1IC4-NEXT: [[VEC_PHI2:%.]] = phi i64 [ [[MM]], [[VECTOR_PH]] ], [ [[TMP14:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF1IC4-NEXT: [[VEC_PHI3:%.]] = phi i64 [ [[MM]], [[VECTOR_PH]] ], [ [[TMP15:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF1IC4-NEXT: [[VEC_PHI4:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP20:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF1IC4-NEXT: [[VEC_PHI5:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP21:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF1IC4-NEXT: [[VEC_PHI6:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP22:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF1IC4-NEXT: [[VEC_PHI7:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP23:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF1IC4-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
				; CHECK-VF1IC4-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
				; CHECK-VF1IC4-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 2
				; CHECK-VF1IC4-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 3
				; CHECK-VF1IC4-NEXT: [[TMP4:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP0]]
				; CHECK-VF1IC4-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP1]]
				; CHECK-VF1IC4-NEXT: [[TMP6:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP2]]
				; CHECK-VF1IC4-NEXT: [[TMP7:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP3]]
				; CHECK-VF1IC4-NEXT: [[TMP8:%.*]] = load i64, ptr [[TMP4]], align 4
				; CHECK-VF1IC4-NEXT: [[TMP9:%.*]] = load i64, ptr [[TMP5]], align 4
				; CHECK-VF1IC4-NEXT: [[TMP10:%.*]] = load i64, ptr [[TMP6]], align 4
				; CHECK-VF1IC4-NEXT: [[TMP11:%.*]] = load i64, ptr [[TMP7]], align 4
				; CHECK-VF1IC4-NEXT: [[TMP12]] = tail call i64 @llvm.smax.i64(i64 [[VEC_PHI]], i64 [[TMP8]])
				; CHECK-VF1IC4-NEXT: [[TMP13]] = tail call i64 @llvm.smax.i64(i64 [[VEC_PHI1]], i64 [[TMP9]])
				; CHECK-VF1IC4-NEXT: [[TMP14]] = tail call i64 @llvm.smax.i64(i64 [[VEC_PHI2]], i64 [[TMP10]])
				; CHECK-VF1IC4-NEXT: [[TMP15]] = tail call i64 @llvm.smax.i64(i64 [[VEC_PHI3]], i64 [[TMP11]])
				; CHECK-VF1IC4-NEXT: [[TMP16:%.*]] = icmp sge i64 [[TMP8]], [[VEC_PHI]]
				; CHECK-VF1IC4-NEXT: [[TMP17:%.*]] = icmp sge i64 [[TMP9]], [[VEC_PHI1]]
				; CHECK-VF1IC4-NEXT: [[TMP18:%.*]] = icmp sge i64 [[TMP10]], [[VEC_PHI2]]
				; CHECK-VF1IC4-NEXT: [[TMP19:%.*]] = icmp sge i64 [[TMP11]], [[VEC_PHI3]]
				; CHECK-VF1IC4-NEXT: [[TMP20]] = select i1 [[TMP16]], i64 [[TMP0]], i64 [[VEC_PHI4]]
				; CHECK-VF1IC4-NEXT: [[TMP21]] = select i1 [[TMP17]], i64 [[TMP1]], i64 [[VEC_PHI5]]
				; CHECK-VF1IC4-NEXT: [[TMP22]] = select i1 [[TMP18]], i64 [[TMP2]], i64 [[VEC_PHI6]]
				; CHECK-VF1IC4-NEXT: [[TMP23]] = select i1 [[TMP19]], i64 [[TMP3]], i64 [[VEC_PHI7]]
				; CHECK-VF1IC4-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
				; CHECK-VF1IC4-NEXT: [[TMP24:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-VF1IC4-NEXT: br i1 [[TMP24]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
				; CHECK-VF1IC4: middle.block:
				; CHECK-VF1IC4-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp sgt i64 [[TMP12]], [[TMP13]]
				; CHECK-VF1IC4-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select i1 [[RDX_MINMAX_CMP]], i64 [[TMP12]], i64 [[TMP13]]
				; CHECK-VF1IC4-NEXT: [[RDX_MINMAX_CMP8:%.*]] = icmp sgt i64 [[RDX_MINMAX_SELECT]], [[TMP14]]
				; CHECK-VF1IC4-NEXT: [[RDX_MINMAX_SELECT9:%.*]] = select i1 [[RDX_MINMAX_CMP8]], i64 [[RDX_MINMAX_SELECT]], i64 [[TMP14]]
				; CHECK-VF1IC4-NEXT: [[RDX_MINMAX_CMP10:%.*]] = icmp sgt i64 [[RDX_MINMAX_SELECT9]], [[TMP15]]
				; CHECK-VF1IC4-NEXT: [[RDX_MINMAX_SELECT11:%.*]] = select i1 [[RDX_MINMAX_CMP10]], i64 [[RDX_MINMAX_SELECT9]], i64 [[TMP15]]
				; CHECK-VF1IC4-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
				; CHECK-VF1IC4-NEXT: [[RDX_SELECT:%.*]] = select i1 [[RDX_MINMAX_CMP]], i64 [[TMP20]], i64 [[TMP21]]
				; CHECK-VF1IC4-NEXT: [[RDX_SELECT12:%.*]] = select i1 [[RDX_MINMAX_CMP8]], i64 [[RDX_SELECT]], i64 [[TMP22]]
				; CHECK-VF1IC4-NEXT: [[RDX_SELECT13:%.*]] = select i1 [[RDX_MINMAX_CMP10]], i64 [[RDX_SELECT12]], i64 [[TMP23]]
				; CHECK-VF1IC4-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[RDX_SELECT13]], -9223372036854775808
				; CHECK-VF1IC4-NEXT: [[RDX_SELECT14:%.]] = select i1 [[RDX_SELECT_CMP]], i64 [[RDX_SELECT13]], i64 [[II:%.]]
				; CHECK-VF1IC4-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
				; CHECK-VF1IC4: scalar.ph:
				; CHECK-VF1IC4-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; CHECK-VF1IC4-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ [[MM]], [[ENTRY]] ], [ [[RDX_MINMAX_SELECT11]], [[MIDDLE_BLOCK]] ]
				; CHECK-VF1IC4-NEXT: [[BC_MERGE_RDX15:%.*]] = phi i64 [ [[II]], [[ENTRY]] ], [ [[RDX_SELECT14]], [[MIDDLE_BLOCK]] ]
				; CHECK-VF1IC4-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK-VF1IC4: for.body:
				; CHECK-VF1IC4-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
				; CHECK-VF1IC4-NEXT: [[MAX_09:%.]] = phi i64 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[TMP26:%.]], [[FOR_BODY]] ]
				; CHECK-VF1IC4-NEXT: [[IDX_011:%.]] = phi i64 [ [[BC_MERGE_RDX15]], [[SCALAR_PH]] ], [ [[SPEC_SELECT7:%.]], [[FOR_BODY]] ]
				; CHECK-VF1IC4-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[INDVARS_IV]]
				; CHECK-VF1IC4-NEXT: [[TMP25:%.*]] = load i64, ptr [[ARRAYIDX]], align 4
				; CHECK-VF1IC4-NEXT: [[TMP26]] = tail call i64 @llvm.smax.i64(i64 [[MAX_09]], i64 [[TMP25]])
				; CHECK-VF1IC4-NEXT: [[CMP1:%.*]] = icmp sge i64 [[TMP25]], [[MAX_09]]
				; CHECK-VF1IC4-NEXT: [[SPEC_SELECT7]] = select i1 [[CMP1]], i64 [[INDVARS_IV]], i64 [[IDX_011]]
				; CHECK-VF1IC4-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
				; CHECK-VF1IC4-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]
				; CHECK-VF1IC4-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]
				; CHECK-VF1IC4: exit:
				; CHECK-VF1IC4-NEXT: [[DOTLCSSA:%.*]] = phi i64 [ [[TMP26]], [[FOR_BODY]] ], [ [[RDX_MINMAX_SELECT11]], [[MIDDLE_BLOCK]] ]
				; CHECK-VF1IC4-NEXT: [[SPEC_SELECT7_LCSSA:%.*]] = phi i64 [ [[SPEC_SELECT7]], [[FOR_BODY]] ], [ [[RDX_SELECT14]], [[MIDDLE_BLOCK]] ]
				; CHECK-VF1IC4-NEXT: store i64 [[DOTLCSSA]], ptr [[RES_MAX:%.*]], align 4
				; CHECK-VF1IC4-NEXT: ret i64 [[SPEC_SELECT7_LCSSA]]
				;
				entry:
				br label %for.body

				for.body:
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%max.09 = phi i64 [ %mm, %entry ], [ %1, %for.body ]
				%idx.011 = phi i64 [ %ii, %entry ], [ %spec.select7, %for.body ]
				%arrayidx = getelementptr inbounds i64, ptr %a, i64 %indvars.iv
				%0 = load i64, ptr %arrayidx
				%1 = tail call i64 @llvm.smax.i64(i64 %max.09, i64 %0)
				%cmp1 = icmp sge i64 %0, %max.09 ;;
				%spec.select7 = select i1 %cmp1, i64 %indvars.iv, i64 %idx.011
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond.not = icmp eq i64 %indvars.iv.next, %n
				br i1 %exitcond.not, label %exit, label %for.body

				exit:
				store i64 %1, ptr %res_max
				ret i64 %spec.select7
				}

				;
				; In such cases, the last index should be extracted.
				;
				define i64 @smax_idx_extract_last(ptr nocapture readonly %a, i64 %mm, i64 %ii, ptr nocapture writeonly %res_max, i64 %n) {
				; CHECK-VF4IC1-LABEL: @smax_idx_extract_last(
				; CHECK-VF4IC1-NEXT: entry:
				; CHECK-VF4IC1-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 4
				; CHECK-VF4IC1-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK-VF4IC1: vector.ph:
				; CHECK-VF4IC1-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 4
				; CHECK-VF4IC1-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
				; CHECK-VF4IC1-NEXT: [[MINMAX_IDENT_SPLATINSERT:%.]] = insertelement <4 x i64> poison, i64 [[MM:%.]], i64 0
				; CHECK-VF4IC1-NEXT: [[MINMAX_IDENT_SPLAT:%.*]] = shufflevector <4 x i64> [[MINMAX_IDENT_SPLATINSERT]], <4 x i64> poison, <4 x i32> zeroinitializer
				; CHECK-VF4IC1-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK-VF4IC1: vector.body:
				; CHECK-VF4IC1-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC1-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC1-NEXT: [[VEC_PHI:%.]] = phi <4 x i64> [ [[MINMAX_IDENT_SPLAT]], [[VECTOR_PH]] ], [ [[TMP3:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC1-NEXT: [[VEC_PHI1:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP5:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC1-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
				; CHECK-VF4IC1-NEXT: [[TMP1:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP0]]
				; CHECK-VF4IC1-NEXT: [[TMP2:%.*]] = getelementptr inbounds i64, ptr [[TMP1]], i32 0
				; CHECK-VF4IC1-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i64>, ptr [[TMP2]], align 4
				; CHECK-VF4IC1-NEXT: [[TMP3]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[VEC_PHI]], <4 x i64> [[WIDE_LOAD]])
				; CHECK-VF4IC1-NEXT: [[TMP4:%.*]] = icmp sgt <4 x i64> [[VEC_PHI]], [[WIDE_LOAD]]
				; CHECK-VF4IC1-NEXT: [[TMP5]] = select <4 x i1> [[TMP4]], <4 x i64> [[VEC_PHI1]], <4 x i64> [[VEC_IND]]
				; CHECK-VF4IC1-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
				; CHECK-VF4IC1-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
				; CHECK-VF4IC1-NEXT: [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-VF4IC1-NEXT: br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]
				; CHECK-VF4IC1: middle.block:
				; CHECK-VF4IC1-NEXT: [[TMP7:%.*]] = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> [[TMP3]])
				; CHECK-VF4IC1-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <4 x i64> poison, i64 [[TMP7]], i64 0
				; CHECK-VF4IC1-NEXT: [[DOTSPLAT:%.*]] = shufflevector <4 x i64> [[DOTSPLATINSERT]], <4 x i64> poison, <4 x i32> zeroinitializer
				; CHECK-VF4IC1-NEXT: [[MASK_CMP:%.*]] = icmp eq <4 x i64> [[DOTSPLAT]], [[TMP3]]
				; CHECK-VF4IC1-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
				; CHECK-VF4IC1-NEXT: [[MASK_SELECT:%.*]] = select <4 x i1> [[MASK_CMP]], <4 x i64> [[TMP5]], <4 x i64> <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>
				; CHECK-VF4IC1-NEXT: [[TMP8:%.*]] = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> [[MASK_SELECT]])
				; CHECK-VF4IC1-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[TMP8]], -9223372036854775808
				; CHECK-VF4IC1-NEXT: [[RDX_SELECT:%.]] = select i1 [[RDX_SELECT_CMP]], i64 [[TMP8]], i64 [[II:%.]]
				; CHECK-VF4IC1-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
				; CHECK-VF4IC1: scalar.ph:
				; CHECK-VF4IC1-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; CHECK-VF4IC1-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ [[MM]], [[ENTRY]] ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]
				; CHECK-VF4IC1-NEXT: [[BC_MERGE_RDX2:%.*]] = phi i64 [ [[II]], [[ENTRY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
				; CHECK-VF4IC1-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK-VF4IC1: for.body:
				; CHECK-VF4IC1-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
				; CHECK-VF4IC1-NEXT: [[MAX_09:%.]] = phi i64 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[TMP10:%.]], [[FOR_BODY]] ]
				; CHECK-VF4IC1-NEXT: [[IDX_011:%.]] = phi i64 [ [[BC_MERGE_RDX2]], [[SCALAR_PH]] ], [ [[SPEC_SELECT7:%.]], [[FOR_BODY]] ]
				; CHECK-VF4IC1-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[INDVARS_IV]]
				; CHECK-VF4IC1-NEXT: [[TMP9:%.*]] = load i64, ptr [[ARRAYIDX]], align 4
				; CHECK-VF4IC1-NEXT: [[TMP10]] = tail call i64 @llvm.smax.i64(i64 [[MAX_09]], i64 [[TMP9]])
				; CHECK-VF4IC1-NEXT: [[CMP1_NOT:%.*]] = icmp sgt i64 [[MAX_09]], [[TMP9]]
				; CHECK-VF4IC1-NEXT: [[SPEC_SELECT7]] = select i1 [[CMP1_NOT]], i64 [[IDX_011]], i64 [[INDVARS_IV]]
				; CHECK-VF4IC1-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
				; CHECK-VF4IC1-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]
				; CHECK-VF4IC1-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP9:![0-9]+]]
				; CHECK-VF4IC1: exit:
				; CHECK-VF4IC1-NEXT: [[DOTLCSSA:%.*]] = phi i64 [ [[TMP10]], [[FOR_BODY]] ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]
				; CHECK-VF4IC1-NEXT: [[SPEC_SELECT7_LCSSA:%.*]] = phi i64 [ [[SPEC_SELECT7]], [[FOR_BODY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
				; CHECK-VF4IC1-NEXT: store i64 [[DOTLCSSA]], ptr [[RES_MAX:%.*]], align 4
				; CHECK-VF4IC1-NEXT: ret i64 [[SPEC_SELECT7_LCSSA]]
				;
				; CHECK-VF4IC4-LABEL: @smax_idx_extract_last(
				; CHECK-VF4IC4-NEXT: entry:
				; CHECK-VF4IC4-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 16
				; CHECK-VF4IC4-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK-VF4IC4: vector.ph:
				; CHECK-VF4IC4-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 16
				; CHECK-VF4IC4-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
				; CHECK-VF4IC4-NEXT: [[MINMAX_IDENT_SPLATINSERT:%.]] = insertelement <4 x i64> poison, i64 [[MM:%.]], i64 0
				; CHECK-VF4IC4-NEXT: [[MINMAX_IDENT_SPLAT:%.*]] = shufflevector <4 x i64> [[MINMAX_IDENT_SPLATINSERT]], <4 x i64> poison, <4 x i32> zeroinitializer
				; CHECK-VF4IC4-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK-VF4IC4: vector.body:
				; CHECK-VF4IC4-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC4-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC4-NEXT: [[VEC_PHI:%.]] = phi <4 x i64> [ [[MINMAX_IDENT_SPLAT]], [[VECTOR_PH]] ], [ [[TMP12:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC4-NEXT: [[VEC_PHI4:%.]] = phi <4 x i64> [ [[MINMAX_IDENT_SPLAT]], [[VECTOR_PH]] ], [ [[TMP13:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC4-NEXT: [[VEC_PHI5:%.]] = phi <4 x i64> [ [[MINMAX_IDENT_SPLAT]], [[VECTOR_PH]] ], [ [[TMP14:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC4-NEXT: [[VEC_PHI6:%.]] = phi <4 x i64> [ [[MINMAX_IDENT_SPLAT]], [[VECTOR_PH]] ], [ [[TMP15:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC4-NEXT: [[VEC_PHI7:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP20:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC4-NEXT: [[VEC_PHI8:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP21:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC4-NEXT: [[VEC_PHI9:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP22:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC4-NEXT: [[VEC_PHI10:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP23:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF4IC4-NEXT: [[STEP_ADD:%.*]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
				; CHECK-VF4IC4-NEXT: [[STEP_ADD1:%.*]] = add <4 x i64> [[STEP_ADD]], <i64 4, i64 4, i64 4, i64 4>
				; CHECK-VF4IC4-NEXT: [[STEP_ADD2:%.*]] = add <4 x i64> [[STEP_ADD1]], <i64 4, i64 4, i64 4, i64 4>
				; CHECK-VF4IC4-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
				; CHECK-VF4IC4-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 4
				; CHECK-VF4IC4-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 8
				; CHECK-VF4IC4-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 12
				; CHECK-VF4IC4-NEXT: [[TMP4:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP0]]
				; CHECK-VF4IC4-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP1]]
				; CHECK-VF4IC4-NEXT: [[TMP6:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP2]]
				; CHECK-VF4IC4-NEXT: [[TMP7:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP3]]
				; CHECK-VF4IC4-NEXT: [[TMP8:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 0
				; CHECK-VF4IC4-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i64>, ptr [[TMP8]], align 4
				; CHECK-VF4IC4-NEXT: [[TMP9:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 4
				; CHECK-VF4IC4-NEXT: [[WIDE_LOAD11:%.*]] = load <4 x i64>, ptr [[TMP9]], align 4
				; CHECK-VF4IC4-NEXT: [[TMP10:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 8
				; CHECK-VF4IC4-NEXT: [[WIDE_LOAD12:%.*]] = load <4 x i64>, ptr [[TMP10]], align 4
				; CHECK-VF4IC4-NEXT: [[TMP11:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 12
				; CHECK-VF4IC4-NEXT: [[WIDE_LOAD13:%.*]] = load <4 x i64>, ptr [[TMP11]], align 4
				; CHECK-VF4IC4-NEXT: [[TMP12]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[VEC_PHI]], <4 x i64> [[WIDE_LOAD]])
				; CHECK-VF4IC4-NEXT: [[TMP13]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[VEC_PHI4]], <4 x i64> [[WIDE_LOAD11]])
				; CHECK-VF4IC4-NEXT: [[TMP14]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[VEC_PHI5]], <4 x i64> [[WIDE_LOAD12]])
				; CHECK-VF4IC4-NEXT: [[TMP15]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[VEC_PHI6]], <4 x i64> [[WIDE_LOAD13]])
				; CHECK-VF4IC4-NEXT: [[TMP16:%.*]] = icmp sgt <4 x i64> [[VEC_PHI]], [[WIDE_LOAD]]
				; CHECK-VF4IC4-NEXT: [[TMP17:%.*]] = icmp sgt <4 x i64> [[VEC_PHI4]], [[WIDE_LOAD11]]
				; CHECK-VF4IC4-NEXT: [[TMP18:%.*]] = icmp sgt <4 x i64> [[VEC_PHI5]], [[WIDE_LOAD12]]
				; CHECK-VF4IC4-NEXT: [[TMP19:%.*]] = icmp sgt <4 x i64> [[VEC_PHI6]], [[WIDE_LOAD13]]
				; CHECK-VF4IC4-NEXT: [[TMP20]] = select <4 x i1> [[TMP16]], <4 x i64> [[VEC_PHI7]], <4 x i64> [[VEC_IND]]
				; CHECK-VF4IC4-NEXT: [[TMP21]] = select <4 x i1> [[TMP17]], <4 x i64> [[VEC_PHI8]], <4 x i64> [[STEP_ADD]]
				; CHECK-VF4IC4-NEXT: [[TMP22]] = select <4 x i1> [[TMP18]], <4 x i64> [[VEC_PHI9]], <4 x i64> [[STEP_ADD1]]
				; CHECK-VF4IC4-NEXT: [[TMP23]] = select <4 x i1> [[TMP19]], <4 x i64> [[VEC_PHI10]], <4 x i64> [[STEP_ADD2]]
				; CHECK-VF4IC4-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16
				; CHECK-VF4IC4-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[STEP_ADD2]], <i64 4, i64 4, i64 4, i64 4>
				; CHECK-VF4IC4-NEXT: [[TMP24:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-VF4IC4-NEXT: br i1 [[TMP24]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]
				; CHECK-VF4IC4: middle.block:
				; CHECK-VF4IC4-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp sgt <4 x i64> [[TMP12]], [[TMP13]]
				; CHECK-VF4IC4-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP]], <4 x i64> [[TMP12]], <4 x i64> [[TMP13]]
				; CHECK-VF4IC4-NEXT: [[RDX_MINMAX_CMP14:%.*]] = icmp sgt <4 x i64> [[RDX_MINMAX_SELECT]], [[TMP14]]
				; CHECK-VF4IC4-NEXT: [[RDX_MINMAX_SELECT15:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP14]], <4 x i64> [[RDX_MINMAX_SELECT]], <4 x i64> [[TMP14]]
				; CHECK-VF4IC4-NEXT: [[RDX_MINMAX_CMP16:%.*]] = icmp sgt <4 x i64> [[RDX_MINMAX_SELECT15]], [[TMP15]]
				; CHECK-VF4IC4-NEXT: [[RDX_MINMAX_SELECT17:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP16]], <4 x i64> [[RDX_MINMAX_SELECT15]], <4 x i64> [[TMP15]]
				; CHECK-VF4IC4-NEXT: [[TMP25:%.*]] = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> [[RDX_MINMAX_SELECT17]])
				; CHECK-VF4IC4-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <4 x i64> poison, i64 [[TMP25]], i64 0
				; CHECK-VF4IC4-NEXT: [[DOTSPLAT:%.*]] = shufflevector <4 x i64> [[DOTSPLATINSERT]], <4 x i64> poison, <4 x i32> zeroinitializer
				; CHECK-VF4IC4-NEXT: [[MASK_CMP:%.*]] = icmp eq <4 x i64> [[DOTSPLAT]], [[RDX_MINMAX_SELECT17]]
				; CHECK-VF4IC4-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
				; CHECK-VF4IC4-NEXT: [[RDX_SELECT:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP]], <4 x i64> [[TMP20]], <4 x i64> [[TMP21]]
				; CHECK-VF4IC4-NEXT: [[RDX_SELECT18:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP14]], <4 x i64> [[RDX_SELECT]], <4 x i64> [[TMP22]]
				; CHECK-VF4IC4-NEXT: [[RDX_SELECT19:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP16]], <4 x i64> [[RDX_SELECT18]], <4 x i64> [[TMP23]]
				; CHECK-VF4IC4-NEXT: [[MASK_SELECT:%.*]] = select <4 x i1> [[MASK_CMP]], <4 x i64> [[RDX_SELECT19]], <4 x i64> <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>
				; CHECK-VF4IC4-NEXT: [[TMP26:%.*]] = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> [[MASK_SELECT]])
				; CHECK-VF4IC4-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[TMP26]], -9223372036854775808
				; CHECK-VF4IC4-NEXT: [[RDX_SELECT20:%.]] = select i1 [[RDX_SELECT_CMP]], i64 [[TMP26]], i64 [[II:%.]]
				; CHECK-VF4IC4-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
				; CHECK-VF4IC4: scalar.ph:
				; CHECK-VF4IC4-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; CHECK-VF4IC4-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ [[MM]], [[ENTRY]] ], [ [[TMP25]], [[MIDDLE_BLOCK]] ]
				; CHECK-VF4IC4-NEXT: [[BC_MERGE_RDX21:%.*]] = phi i64 [ [[II]], [[ENTRY]] ], [ [[RDX_SELECT20]], [[MIDDLE_BLOCK]] ]
				; CHECK-VF4IC4-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK-VF4IC4: for.body:
				; CHECK-VF4IC4-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
				; CHECK-VF4IC4-NEXT: [[MAX_09:%.]] = phi i64 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[TMP28:%.]], [[FOR_BODY]] ]
				; CHECK-VF4IC4-NEXT: [[IDX_011:%.]] = phi i64 [ [[BC_MERGE_RDX21]], [[SCALAR_PH]] ], [ [[SPEC_SELECT7:%.]], [[FOR_BODY]] ]
				; CHECK-VF4IC4-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[INDVARS_IV]]
				; CHECK-VF4IC4-NEXT: [[TMP27:%.*]] = load i64, ptr [[ARRAYIDX]], align 4
				; CHECK-VF4IC4-NEXT: [[TMP28]] = tail call i64 @llvm.smax.i64(i64 [[MAX_09]], i64 [[TMP27]])
				; CHECK-VF4IC4-NEXT: [[CMP1_NOT:%.*]] = icmp sgt i64 [[MAX_09]], [[TMP27]]
				; CHECK-VF4IC4-NEXT: [[SPEC_SELECT7]] = select i1 [[CMP1_NOT]], i64 [[IDX_011]], i64 [[INDVARS_IV]]
				; CHECK-VF4IC4-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
				; CHECK-VF4IC4-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]
				; CHECK-VF4IC4-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP9:![0-9]+]]
				; CHECK-VF4IC4: exit:
				; CHECK-VF4IC4-NEXT: [[DOTLCSSA:%.*]] = phi i64 [ [[TMP28]], [[FOR_BODY]] ], [ [[TMP25]], [[MIDDLE_BLOCK]] ]
				; CHECK-VF4IC4-NEXT: [[SPEC_SELECT7_LCSSA:%.*]] = phi i64 [ [[SPEC_SELECT7]], [[FOR_BODY]] ], [ [[RDX_SELECT20]], [[MIDDLE_BLOCK]] ]
				; CHECK-VF4IC4-NEXT: store i64 [[DOTLCSSA]], ptr [[RES_MAX:%.*]], align 4
				; CHECK-VF4IC4-NEXT: ret i64 [[SPEC_SELECT7_LCSSA]]
				;
				; CHECK-VF1IC4-LABEL: @smax_idx_extract_last(
				; CHECK-VF1IC4-NEXT: entry:
				; CHECK-VF1IC4-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 4
				; CHECK-VF1IC4-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK-VF1IC4: vector.ph:
				; CHECK-VF1IC4-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 4
				; CHECK-VF1IC4-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
				; CHECK-VF1IC4-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK-VF1IC4: vector.body:
				; CHECK-VF1IC4-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF1IC4-NEXT: [[VEC_PHI:%.]] = phi i64 [ [[MM:%.]], [[VECTOR_PH]] ], [ [[TMP12:%.*]], [[VECTOR_BODY]] ]
				; CHECK-VF1IC4-NEXT: [[VEC_PHI1:%.]] = phi i64 [ [[MM]], [[VECTOR_PH]] ], [ [[TMP13:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF1IC4-NEXT: [[VEC_PHI2:%.]] = phi i64 [ [[MM]], [[VECTOR_PH]] ], [ [[TMP14:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF1IC4-NEXT: [[VEC_PHI3:%.]] = phi i64 [ [[MM]], [[VECTOR_PH]] ], [ [[TMP15:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF1IC4-NEXT: [[VEC_PHI4:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP20:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF1IC4-NEXT: [[VEC_PHI5:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP21:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF1IC4-NEXT: [[VEC_PHI6:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP22:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF1IC4-NEXT: [[VEC_PHI7:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP23:%.]], [[VECTOR_BODY]] ]
				; CHECK-VF1IC4-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
				; CHECK-VF1IC4-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
				; CHECK-VF1IC4-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 2
				; CHECK-VF1IC4-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 3
				; CHECK-VF1IC4-NEXT: [[TMP4:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP0]]
				; CHECK-VF1IC4-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP1]]
				; CHECK-VF1IC4-NEXT: [[TMP6:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP2]]
				; CHECK-VF1IC4-NEXT: [[TMP7:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP3]]
				; CHECK-VF1IC4-NEXT: [[TMP8:%.*]] = load i64, ptr [[TMP4]], align 4
				; CHECK-VF1IC4-NEXT: [[TMP9:%.*]] = load i64, ptr [[TMP5]], align 4
				; CHECK-VF1IC4-NEXT: [[TMP10:%.*]] = load i64, ptr [[TMP6]], align 4
				; CHECK-VF1IC4-NEXT: [[TMP11:%.*]] = load i64, ptr [[TMP7]], align 4
				; CHECK-VF1IC4-NEXT: [[TMP12]] = tail call i64 @llvm.smax.i64(i64 [[VEC_PHI]], i64 [[TMP8]])
				; CHECK-VF1IC4-NEXT: [[TMP13]] = tail call i64 @llvm.smax.i64(i64 [[VEC_PHI1]], i64 [[TMP9]])
				; CHECK-VF1IC4-NEXT: [[TMP14]] = tail call i64 @llvm.smax.i64(i64 [[VEC_PHI2]], i64 [[TMP10]])
				; CHECK-VF1IC4-NEXT: [[TMP15]] = tail call i64 @llvm.smax.i64(i64 [[VEC_PHI3]], i64 [[TMP11]])
				; CHECK-VF1IC4-NEXT: [[TMP16:%.*]] = icmp sgt i64 [[VEC_PHI]], [[TMP8]]
				; CHECK-VF1IC4-NEXT: [[TMP17:%.*]] = icmp sgt i64 [[VEC_PHI1]], [[TMP9]]
				; CHECK-VF1IC4-NEXT: [[TMP18:%.*]] = icmp sgt i64 [[VEC_PHI2]], [[TMP10]]
				; CHECK-VF1IC4-NEXT: [[TMP19:%.*]] = icmp sgt i64 [[VEC_PHI3]], [[TMP11]]
				; CHECK-VF1IC4-NEXT: [[TMP20]] = select i1 [[TMP16]], i64 [[VEC_PHI4]], i64 [[TMP0]]
				; CHECK-VF1IC4-NEXT: [[TMP21]] = select i1 [[TMP17]], i64 [[VEC_PHI5]], i64 [[TMP1]]
				; CHECK-VF1IC4-NEXT: [[TMP22]] = select i1 [[TMP18]], i64 [[VEC_PHI6]], i64 [[TMP2]]
				; CHECK-VF1IC4-NEXT: [[TMP23]] = select i1 [[TMP19]], i64 [[VEC_PHI7]], i64 [[TMP3]]
				; CHECK-VF1IC4-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
				; CHECK-VF1IC4-NEXT: [[TMP24:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-VF1IC4-NEXT: br i1 [[TMP24]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]
				; CHECK-VF1IC4: middle.block:
				; CHECK-VF1IC4-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp sgt i64 [[TMP12]], [[TMP13]]
				; CHECK-VF1IC4-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select i1 [[RDX_MINMAX_CMP]], i64 [[TMP12]], i64 [[TMP13]]
				; CHECK-VF1IC4-NEXT: [[RDX_MINMAX_CMP8:%.*]] = icmp sgt i64 [[RDX_MINMAX_SELECT]], [[TMP14]]
				; CHECK-VF1IC4-NEXT: [[RDX_MINMAX_SELECT9:%.*]] = select i1 [[RDX_MINMAX_CMP8]], i64 [[RDX_MINMAX_SELECT]], i64 [[TMP14]]
				; CHECK-VF1IC4-NEXT: [[RDX_MINMAX_CMP10:%.*]] = icmp sgt i64 [[RDX_MINMAX_SELECT9]], [[TMP15]]
				; CHECK-VF1IC4-NEXT: [[RDX_MINMAX_SELECT11:%.*]] = select i1 [[RDX_MINMAX_CMP10]], i64 [[RDX_MINMAX_SELECT9]], i64 [[TMP15]]
				; CHECK-VF1IC4-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
				; CHECK-VF1IC4-NEXT: [[RDX_SELECT:%.*]] = select i1 [[RDX_MINMAX_CMP]], i64 [[TMP20]], i64 [[TMP21]]
				; CHECK-VF1IC4-NEXT: [[RDX_SELECT12:%.*]] = select i1 [[RDX_MINMAX_CMP8]], i64 [[RDX_SELECT]], i64 [[TMP22]]
				; CHECK-VF1IC4-NEXT: [[RDX_SELECT13:%.*]] = select i1 [[RDX_MINMAX_CMP10]], i64 [[RDX_SELECT12]], i64 [[TMP23]]
				; CHECK-VF1IC4-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[RDX_SELECT13]], -9223372036854775808
				; CHECK-VF1IC4-NEXT: [[RDX_SELECT14:%.]] = select i1 [[RDX_SELECT_CMP]], i64 [[RDX_SELECT13]], i64 [[II:%.]]
				; CHECK-VF1IC4-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
				; CHECK-VF1IC4: scalar.ph:
				; CHECK-VF1IC4-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; CHECK-VF1IC4-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ [[MM]], [[ENTRY]] ], [ [[RDX_MINMAX_SELECT11]], [[MIDDLE_BLOCK]] ]
				; CHECK-VF1IC4-NEXT: [[BC_MERGE_RDX15:%.*]] = phi i64 [ [[II]], [[ENTRY]] ], [ [[RDX_SELECT14]], [[MIDDLE_BLOCK]] ]
				; CHECK-VF1IC4-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK-VF1IC4: for.body:
				; CHECK-VF1IC4-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
				; CHECK-VF1IC4-NEXT: [[MAX_09:%.]] = phi i64 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[TMP26:%.]], [[FOR_BODY]] ]
				; CHECK-VF1IC4-NEXT: [[IDX_011:%.]] = phi i64 [ [[BC_MERGE_RDX15]], [[SCALAR_PH]] ], [ [[SPEC_SELECT7:%.]], [[FOR_BODY]] ]
				; CHECK-VF1IC4-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[INDVARS_IV]]
				; CHECK-VF1IC4-NEXT: [[TMP25:%.*]] = load i64, ptr [[ARRAYIDX]], align 4
				; CHECK-VF1IC4-NEXT: [[TMP26]] = tail call i64 @llvm.smax.i64(i64 [[MAX_09]], i64 [[TMP25]])
				; CHECK-VF1IC4-NEXT: [[CMP1_NOT:%.*]] = icmp sgt i64 [[MAX_09]], [[TMP25]]
				; CHECK-VF1IC4-NEXT: [[SPEC_SELECT7]] = select i1 [[CMP1_NOT]], i64 [[IDX_011]], i64 [[INDVARS_IV]]
				; CHECK-VF1IC4-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
				; CHECK-VF1IC4-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]
				; CHECK-VF1IC4-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP9:![0-9]+]]
				; CHECK-VF1IC4: exit:
				; CHECK-VF1IC4-NEXT: [[DOTLCSSA:%.*]] = phi i64 [ [[TMP26]], [[FOR_BODY]] ], [ [[RDX_MINMAX_SELECT11]], [[MIDDLE_BLOCK]] ]
				; CHECK-VF1IC4-NEXT: [[SPEC_SELECT7_LCSSA:%.*]] = phi i64 [ [[SPEC_SELECT7]], [[FOR_BODY]] ], [ [[RDX_SELECT14]], [[MIDDLE_BLOCK]] ]
				; CHECK-VF1IC4-NEXT: store i64 [[DOTLCSSA]], ptr [[RES_MAX:%.*]], align 4
				; CHECK-VF1IC4-NEXT: ret i64 [[SPEC_SELECT7_LCSSA]]
				;
				entry:
				br label %for.body

				for.body:
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%max.09 = phi i64 [ %mm, %entry ], [ %1, %for.body ]
				%idx.011 = phi i64 [ %ii, %entry ], [ %spec.select7, %for.body ]
				%arrayidx = getelementptr inbounds i64, ptr %a, i64 %indvars.iv
				%0 = load i64, ptr %arrayidx
				%1 = tail call i64 @llvm.smax.i64(i64 %max.09, i64 %0)
				%cmp1.not = icmp sgt i64 %max.09, %0
				%spec.select7 = select i1 %cmp1.not, i64 %idx.011, i64 %indvars.iv
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond.not = icmp eq i64 %indvars.iv.next, %n
				br i1 %exitcond.not, label %exit, label %for.body

				exit:
				store i64 %1, ptr %res_max
				ret i64 %spec.select7
				}

				;
				; The operands of smax intrinsic and icmp are not the same to be recognized as MMI.
				;
				; FIXME: this case should not be vectorized. We have to check the operands of intrinsic and icmp.
				define i64 @smax_idx_not_vec_1(ptr nocapture readonly %a, ptr nocapture readonly %b, i64 %mm, i64 %ii, ptr nocapture writeonly %res_max, i64 %n) {
				; CHECK-LABEL: @smax_idx_not_vec_1(
				; CHECK-NOT: vector.body:
				;
				entry:
				br label %for.body

				for.body:
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%max.09 = phi i64 [ %mm, %entry ], [ %2, %for.body ]
				%idx.011 = phi i64 [ %ii, %entry ], [ %spec.select7, %for.body ]
				%arrayidx = getelementptr inbounds i64, ptr %a, i64 %indvars.iv
				%0 = load i64, ptr %arrayidx
				%arrayidx.01 = getelementptr inbounds i64, ptr %b, i64 %indvars.iv
				%1 = load i64, ptr %arrayidx
				%2 = tail call i64 @llvm.smax.i64(i64 %max.09, i64 %0)
				%cmp1 = icmp slt i64 %max.09, %1 ;;
				%spec.select7 = select i1 %cmp1, i64 %indvars.iv, i64 %idx.011
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond.not = icmp eq i64 %indvars.iv.next, %n
				br i1 %exitcond.not, label %exit, label %for.body

				exit:
				store i64 %2, ptr %res_max
				ret i64 %spec.select7
				}

				;
				; It cannot be recognized as MMI when the operand of index select is not an induction variable.
				;
				define i64 @smax_idx_not_vec_2(ptr nocapture readonly %a, i64 %mm, i64 %ii, ptr nocapture writeonly %res_max, i64 %n) {
				; CHECK-LABEL: @smax_idx_not_vec_2(
				; CHECK-NOT: vector.body:
				;
				entry:
				br label %for.body

				for.body:
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%max.09 = phi i64 [ %mm, %entry ], [ %1, %for.body ]
				%idx.011 = phi i64 [ %ii, %entry ], [ %spec.select7, %for.body ]
				%arrayidx = getelementptr inbounds i64, ptr %a, i64 %indvars.iv
				%0 = load i64, ptr %arrayidx
				%1 = tail call i64 @llvm.smax.i64(i64 %max.09, i64 %0)
				%cmp1 = icmp slt i64 %max.09, %0
				%spec.select7 = select i1 %cmp1, i64 123, i64 %idx.011 ;;
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond.not = icmp eq i64 %indvars.iv.next, %n
				br i1 %exitcond.not, label %exit, label %for.body

				exit:
				store i64 %1, ptr %res_max
				ret i64 %spec.select7
				}

				declare i64 @llvm.smax.i64(i64, i64)

This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorize] Vectorize the reduction pattern of integer min/max with index.Needs ReviewPublic

Details

The Concept and Approach

The Implementation in LLVM

Select-Cmp Reduction

Internal User Issue: UserRecurPhi and UserRecurKind

The Second Phase of Recognition

Code Generation and Reduction Fix

Diff Detail

Event Timeline

Revision Contents

Diff 504121

llvm/include/llvm/Analysis/IVDescriptors.h

llvm/include/llvm/Transforms/Utils/LoopUtils.h

llvm/lib/Analysis/IVDescriptors.cpp

llvm/lib/Transforms/Utils/LoopUtils.cpp

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

llvm/test/Transforms/LoopVectorize/select-min-index.ll

llvm/test/Transforms/LoopVectorize/smax-idx.ll

[LoopVectorize] Vectorize the reduction pattern of integer min/max with index.
Needs ReviewPublic