This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
Analysis/
2/3
IVDescriptors.h
-
Transforms/Utils/
-
Utils/
-
LoopUtils.h
-
lib/
-
Analysis/
1/10
IVDescriptors.cpp
-
Transforms/
-
Utils/
-
LoopUtils.cpp
-
Vectorize/
2
LoopVectorizationLegality.cpp
2/10
LoopVectorize.cpp
-
VPlanRecipes.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
1
select-min-index.ll
1/2
smax-idx.ll

Differential D143465

[LoopVectorize] Vectorize the reduction pattern of integer min/max with index.
Needs ReviewPublic

Authored by Mel-Chen on Feb 6 2023, 11:09 PM.

Download Raw Diff

Details

Reviewers

ABataev
fhahn

Summary

The Concept and Approach

Here is an example of min max with index pattern:

int idx = ii;
int max = mm;
for (int i = 0; i < n; ++i) {
  int x = a[i];
  if (max < x) {
    max = x;
    idx = i;
  }
}

After transfering to LLVM IR, it will look like this:

define i64 @smax_idx(ptr nocapture readonly %a, i64 %mm, i64 %ii, ptr nocapture writeonly %res_max, i64 %n) {
entry:
  br label %for.body

for.body:
  %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
  %max.09 = phi i64 [ %mm, %entry ], [ %1, %for.body ]
  %idx.011 = phi i64 [ %ii, %entry ], [ %spec.select7, %for.body ]
  %arrayidx = getelementptr inbounds i64, ptr %a, i64 %indvars.iv
  %0 = load i64, ptr %arrayidx
  %1 = tail call i64 @llvm.smax.i64(i64 %max.09, i64 %0)
  %cmp1 = icmp slt i64 %max.09, %0
  %spec.select7 = select i1 %cmp1, i64 %indvars.iv, i64 %idx.011
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
  %exitcond.not = icmp eq i64 %indvars.iv.next, %n
  br i1 %exitcond.not, label %exit, label %for.body

exit:
  store i64 %1, ptr %res_max
  ret i64 %spec.select7
}

Then we'll make a def-use graph for illustration (focus on for.body):

                   ┌──────────────────┐
                   ▼                  │
          %indvars.iv                 │
         /         │ \                │
        ▼          │  ▼               │
%arrayidx          │%indvars.iv.next  │
     │             │      │        └──┘
     ▼             │      ▼
    %0             │   %exitcond.not
┌─┐ /└───────────┐ │      │
│ ▼▼             │ │      ▼
│ %1             │ │      br
│  │             │ │
│  ▼             │ │
│phi_max:%max.09 │ │
└──┘  \    ┌─────┘ │
       ▼   ▼       │ 
       %cmp1       │
      ┌─┐ \        │
      │ ▼  ▼       ▼
      │ %spec.select7
      │       │
      │       ▼
      │    phi_idx:%idx.011 
      └───────┘

Generally, we will do traveling that starts from the phi of the loop header block when recognizing a reduction pattern, that is, phi_max and phi_idx in the graph. Taking simple max reduction as an example, we will start with phi_max and perform depth-first traveling on the def-use graph to check whether we can go back to phi_max, which means forming a cycle. Besides that, two things must be confirmed: first, at least one reduction operation, the operations in the cycle, is used outside the loop. If there is no external user, there is no need for the vectorizer to vectorize for it. Second, the reduction operation cannot have users inside the loop, unless the internal users are also reduction operations, and this is one of the issues that this patch needs to handle.

Let’s go back to the def-use graph. If we want to find two cycles for one traveling, it is obviously difficult. Besides making the algorithm more complicated, we also face the issue of ordering phi_max and phi_idx, because we cannot control the order of input IR. The better way is to find the cycles of phi_max and phi_idx respectively according to the original algorithm, and perform the second stage - combining phi_max and phi_idx. In this way, the phi order issue can be solved.

for.body:
  %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
  %max.09 = phi i64 [ %mm, %entry ], [ %1, %for.body ]
  %idx.011 = phi i64 [ %ii, %entry ], [ %spec.select7, %for.body ]
…

And 

for.body:
  %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
  %idx.011 = phi i64 [ %ii, %entry ], [ %spec.select7, %for.body ]
  %max.09 = phi i64 [ %mm, %entry ], [ %1, %for.body ]
…

Next, let's see what type of cycle phi_max and phi_idx are. Ignoring the edge (phi_max, %cmp1), phi_max is the general max reduction, and phi_idx is the select-cmp reduction. In other words, we need to recognize max reduction and select-cmp reduction in the first stage of reduction recognition. Then use the relationship between the two reductions found in the first stage to perform the combination in the second stage.

After the recognition is completed, the next step is code generation. First, let's look at how to code generation when there is no dependency between max reduction and select-cmp reduction (that is, when it is not a pattern of min max with index):

/* Normal two independent reductions*/
int idx = ii;
int max = mm;
for (int i = 0; i < n; ++i) {
  int x = a[i];
  if (max < x) 
    max = x; 
  if (b[i] < a[i])
    idx = i;
}

vec_max = broadcast(mm)
vec_step = iota
vec_idx = broadcast(MIN_VALUE(DType))
for (int i = 0; i < n; i += vf) {
  vec_a = load(a, i, vf)
  vec_b = load(b, i, vf)
  vec_cmp = vec_b < vec_a
  vec_max = max(vec_max, vec_a)
  vec_idx = select(vec_cmp, vec_step, vec_idx)
  vec_step += vf;
}
red_max = reduce_max(vec_max)
red_idx_candidate = reduce_max(vec_idx)
red_idx = red_idx_candidate == MIN_VALUE(DType) ? ii : red_idx_candidate

And when there is a dependency between max reduction and select-cmp reduction (that is, it is a pattern of min max with index), what will happen to code generation?

/* Two dependent reductions, the max with first index pattern */
int idx = ii;
int max = mm;
for (int i = 0; i < n; ++i) {
  int x = a[i];
  if (max < x) {  // strict
    max = x; 
    idx = i;
  }
}

vec_max = broadcast(mm)
vec_step = iota
vec_idx = broadcast(MIN_VALUE(DType))
for (int i = 0; i < n; i += vf) {
  vec_a = load(a, i, vf)
  vec_cmp = vec_max < vec_a
  vec_max = max(vec_max, vec_a)
  vec_idx = select(vec_cmp, vec_step, vec_idx)
  vec_step += vf;
}
red_max = reduce_max(vec_max)
vec_all_max = broadcast(red_max)
mask = (vec_max == vec_all_max)
red_idx_candidate = reduce_min(vec_idx, mask)  // since this case is strict max reduction
red_idx = red_idx_candidate == MIN_VALUE(DType) ? ii : red_idx_candidate

The biggest difference is whether a mask needs to be created between max reduction and select-cmp reduction in the exit block (or middle block in LLVM vectorizer). Secondly, according to whether min max is strict or non-strict, decide whether to use the maximum index or the minimum index. Therefore, as long as the correct reduction dependency can be established in the recognition stage, and do code generation according to the dependency, the pattern of min max with index can be vectorized.

The Implementation in LLVM

According to the description in the previous chapter, first of all, the vectorizer must be able to recognize select-cmp reduction. Next, we need to solve the issue of internal reduction users, so that min max reduction can accept loop internal users in the first recognition stage. The third is to combine min max reduction and select-cmp reduction. Finally, according to the relationship between select-cmp reduction and min max reduction, a mask is generated in the middle block and reduction fix is performed.

Select-Cmp Reduction

At present, there is already a select-cmp implementation in LLVM, developed by the author david-arm. However, the current implementation restricts the value of non-reduction phi to be loop invariant, which does not meet our demands. Therefore, we need to expand this feature, namely SelectIVICmp/ SelectIVFCmp.
Taking SelectIVICmp as an example, the result of vectorization is as follows:

/* A SelectIVICmp example */
#include <stdint.h>

int64_t idx_scalar(int64_t *a, int64_t *b, int64_t ii, int64_t n) {
  int64_t idx = ii;
  for (int64_t i = 0; i < n; ++i)
    idx = (a[i] > b[i]) ? i : idx;

  return idx;
}

/* LLVM IR for vectorized SelectIVICmp reduction */
define dso_local i64 @idx_scalar(ptr nocapture noundef readonly %a, ptr nocapture noundef readonly %b, i64 noundef %ii, i64 noundef %n) local_unnamed_addr #0 {
…

vector.body:                                      ; preds = %vector.body, %vector.ph
  %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
  %vec.ind = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, %vector.ph ], [ %vec.ind.next, %vector.body ]
  %vec.phi = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, %vector.ph ], [ %6, %vector.body ]
  %0 = add i64 %index, 0
  %1 = getelementptr inbounds i64, ptr %a, i64 %0
  %2 = getelementptr inbounds i64, ptr %1, i32 0
  %wide.load = load <4 x i64>, ptr %2, align 8, !tbaa !4
  %3 = getelementptr inbounds i64, ptr %b, i64 %0
  %4 = getelementptr inbounds i64, ptr %3, i32 0
  %wide.load1 = load <4 x i64>, ptr %4, align 8, !tbaa !4
  %5 = icmp sgt <4 x i64> %wide.load, %wide.load1
  %6 = select <4 x i1> %5, <4 x i64> %vec.ind, <4 x i64> %vec.phi
  %index.next = add nuw i64 %index, 4
  %vec.ind.next = add <4 x i64> %vec.ind, <i64 4, i64 4, i64 4, i64 4>
  %7 = icmp eq i64 %index.next, %n.vec
  br i1 %7, label %middle.block, label %vector.body, !llvm.loop !8

middle.block:                                     ; preds = %vector.body
  %8 = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> %6)
  %rdx.select.cmp = icmp ne i64 %8, -9223372036854775808
  %rdx.select = select i1 %rdx.select.cmp, i64 %8, i64 %ii
  %cmp.n = icmp eq i64 %n, %n.vec
  br i1 %cmp.n, label %for.exit.loopexit, label %scalar.ph
…
}

Assuming that the format of the induction variable i is {start, +, step}, SelectIVICmp should use start - step as its identity. However, for easy implementation, I directly use MIN_VALUE(DType) as identity (-9223372036854775808 in above example). The role of identity here is the sentinel value, which represents the start value of SelectIVICmp (%ii in the above example). In the end, if the result of reduce_max is identity, the result of reduction will be fixed as the start value.

Note that this is a temporary implementation. Unexpected errors may occur when the start of the induction variable is MIN_TYPE(DType), or when the maximum value of the induction variable exceeds SignedMax(DType). Generally, it should be implemented by two reductions.

Internal User Issue: UserRecurPhi and UserRecurKind

I modified the function isMinMaxPattern so that while identifying min max reduction, it is also possible to identify whether there is an index pattern that may depend on min max reduction.

isMinMaxIdxPattern will set UserRecurPhi and UserRecurKind according to the current select and cmp IR traveling, and the currently recognizing kind of min max reduction. Please refer to the form below:

Take the first example:
  %1 = tail call i64 @llvm.smax.i64(i64 %max.09, i64 %0)
  %cmp1 = icmp slt i64 %max.09, %0
  %spec.select7 = select i1 %cmp1, i64 %indvars.iv, i64 %idx.011

cmp format	%max.09 < %0	%max.09 <= %0	%max.09 > %0	%max.09 >= %0	%0 < %max.09	%0 <= %max.09	%0 > %max.09	%0 >= %max.09
UMax SMax	UserRecurPhi=select.getFalseValue() UserRecurKind=MinMaxFirstIdx	UserRecurPhi=select.getFalseValue() UserRecurKind=MinMaxLastIdx	UserRecurPhi=select.getTrueValue() UserRecurKind=MinMaxLastIdx	UserRecurPhi=select.getTrueValue() UserRecurKind=MinMaxFirstIdx	UserRecurPhi=select.getTrueValue() UserRecurKind=MinMaxLastIdx	UserRecurPhi=select.getTrueValue() UserRecurKind=MinMaxFirstIdx	UserRecurPhi=select.getFalseValue() UserRecurKind=MinMaxFirstIdx	UserRecurPhi=select.getFalseValue() UserRecurKind=MinMaxLastIdx
UMin SMin	UserRecurPhi=select.getTrueValue() UserRecurKind=MinMaxLastIdx	UserRecurPhi=select.getTrueValue() UserRecurKind=MinMaxFirstIdx	UserRecurPhi=select.getFalseValue() UserRecurKind=MinMaxFirstIdx	UserRecurPhi=select.getFalseValue() UserRecurKind=MinMaxLastIdx	UserRecurPhi=select.getFalseValue() UserRecurKind=MinMaxFirstIdx	UserRecurPhi=select.getFalseValue() UserRecurKind=MinMaxLastIdx	UserRecurPhi=select.getTrueValue() UserRecurKind=MinMaxLastIdx	UserRecurPhi=select.getTrueValue() UserRecurKind=MinMaxFirstIdx

According to this table, UserRecurPhi will be set to %idx.011 (FalseValue in the select IR) and UserRecurKind to MinMaxFirstIdx.

Once UserRecurPhi is set, it means that the select should belong to another unknown reduction, UserRecurPhi should be the phi of the unknown reduction, and UserRecurKind is the expected reduction kind. Both UserRecurPhi and UserRecurKind will be used in the second phase recognition.

The Second Phase of Recognition

At the end of function canVectorizeInstrs, there will be a second confirmation against the reduction that has UserRecurPhi.
At this stage, all reductions should have been found. At this point, the vectorizer only needs to check whether UserRecurPhi is a reduction phi. At the same time, by using the function fixUserRecurrence, the SelectIVICmp will be converted into MinMaxFirstIdx or MinMaxLastIdx according to UserRecurKind.

Code Generation and Reduction Fix

Min max with index vectorization does not need to be adjusted for the contents of vector.body, but requires changes in function fixReduction.

We must ensure that min max reduction is fixed earlier than index reduction, because index reduction needs to use the mask generated by min max reduction. We can achieve this by modifying the function fixCrossIterationPHIs. The map DependRecurrenceMasks will keep the mask generated by min max reduction. If the mask required by index reduction is not ready, it will postpone the fix of the index reduction until the required mask is ready.

Consider the following loops:

int idx = ii;
int max = mm;
for (int i = 0; i < n; ++i) {
  int x = a[i];
  if (max < x) {
    max = x;
    idx = i;
  }
}

and

int idx = ii;
int max = mm;
for (int i = 0; i < n; ++i) {
  int x = a[i];
  if (max <= x) {
    max = x;
    idx = i;
  }
}

Changes:

New recurrence Kinds: MinMaxFirstIdx and MinMaxLastIdx. This kind is not directly generated by function AddReductionVar, but converted from SelectIVICmp/SelectIVFCmp.

TODOs:

Now have not support that the min/max recurrence without exit instruction. Refer to test case smax_idx_max_no_exit_user.
Support the min/max recurrence in select(cmp()). Refer to test case smax_idx_select_cmp.
Support FP min/max recurrence.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

Mel-Chen created this revision.Feb 6 2023, 11:09 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 6 2023, 11:09 PM

Herald added subscribers: shiva0217, arphaman, rogfer01, hiraditya. · View Herald Transcript

Mel-Chen requested review of this revision.Feb 6 2023, 11:09 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 6 2023, 11:09 PM

Herald added subscribers: llvm-commits, • pcwang-thead, vkmr. · View Herald Transcript

Mel-Chen added reviewers: ABataev, fhahn.Feb 6 2023, 11:56 PM

Herald added a subscriber: StephenFan. · View Herald TranscriptFeb 6 2023, 11:56 PM

Mel-Chen mentioned this in D132063: [LV] Support vectorizing 'select index of minimum element' idiom. (WIP).Feb 7 2023, 12:01 AM

Mel-Chen retitled this revision from [LoopVectorize] Vectorize the reduction pattern of integer min/max with index. to [LoopVectorize] Vectorize the reduction pattern of integer min/max with index. (WIP).Feb 7 2023, 12:16 AM

Harbormaster completed remote builds in B212285: Diff 495388.Feb 7 2023, 12:18 AM

Mel-Chen edited the summary of this revision. (Show Details)Feb 7 2023, 12:21 AM

rui.zhang added a subscriber: rui.zhang.Feb 7 2023, 10:06 AM

Rebase and update the command in test case.

Harbormaster completed remote builds in B212749: Diff 496034.Feb 9 2023, 12:22 AM

Changes:

Fix interleave code generation for SelectIVICmp and SelectIVFCmp.
Fix the internal compiler error for MinMaxFirstIdx and MinMaxLastIdx when -force-vector-width=1.
Update test cases. Add more run command lines.

Harbormaster completed remote builds in B213366: Diff 496879.Feb 13 2023, 3:24 AM

Changes:

Rebase
Split the patch of test case and implementation
Update FIXME

Harbormaster completed remote builds in B213407: Diff 496947.Feb 13 2023, 5:52 AM

Mel-Chen added a parent revision: D143905: [LV] Harden the test of the minmax with index pattern. (NFC).Feb 13 2023, 5:53 AM

Mel-Chen edited the summary of this revision. (Show Details)Feb 13 2023, 5:57 AM

huntergr added a subscriber: huntergr.Feb 13 2023, 6:34 AM

Changes:

Remove function createInductionSelectCmpTargetReduction
Add function createSentinelValueHandling
Fix the start value of SelectCmp and MinMaxIdx

And then I don't know why all FIXME has been fixed, will confirm.

Harbormaster completed remote builds in B213658: Diff 497320.Feb 14 2023, 9:12 AM

Rebase and fix the check prefix in test cases.

Harbormaster completed remote builds in B213805: Diff 497548.Feb 14 2023, 11:06 PM

Changes:

Confirm the operands of intrinsic and cmp. Fixed test case @smax_idx_not_vec_1.
Format code.

Mel-Chen retitled this revision from [LoopVectorize] Vectorize the reduction pattern of integer min/max with index. (WIP) to [LoopVectorize] Vectorize the reduction pattern of integer min/max with index..Feb 15 2023, 3:24 AM

Mel-Chen edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B213854: Diff 497615.Feb 15 2023, 5:06 AM

Rebase.

Harbormaster completed remote builds in B213881: Diff 497652.Feb 15 2023, 6:46 AM

@fhahn Ping. What do you think of my approach? I am looking forward to your reply.

Changes:

Rebase
Fix the bug of predicate normalization

Harbormaster completed remote builds in B215445: Diff 499753.Feb 23 2023, 1:32 AM

Rebase and update test case result.

Harbormaster completed remote builds in B218666: Diff 504121.Mar 10 2023, 7:43 AM

fhahn added inline comments.Mar 19 2023, 2:15 PM

llvm/include/llvm/Analysis/IVDescriptors.h
396	It would be helpful to document how the new system of recurrences depending on other recurrences would work I think, possibly also with an explanation of the whole approach in the patch description.
llvm/lib/Analysis/IVDescriptors.cpp
743	nit: Variables should start with upper case also, move definition to use?
776	The naming here is a bit confusing now, `NonPhi` can be an increasing loop induction? In that case it would be a phi, right?
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
786	We are in the process of removing those kinds of global maps that are used to carry information used during codegen and later. Ideally the combination of values would be modeled explicitly in the exit block of the plan, but we are not there yet. This is the main reason for D132063 doing things the way it does.
3759	this would need documenting.
llvm/test/Transforms/LoopVectorize/smax-idx.ll
1	Could you add new tests as a separate patch?

Mel-Chen added inline comments.Mar 22 2023, 7:31 AM

llvm/include/llvm/Analysis/IVDescriptors.h
396	Sure. I will document the whole approach. and update in the summary tomorrow. Quickly explain the function of `UserRecurPhi` . The purpose of `UserRecurPhi` is to allow the recurrence to be used in the loop (loop internal use), and to ensure that the user is also a recurrence. `UserRecurPhi` will record the candidate user recurrence phi, and `UserRecurKind` will recored the excepted user recurrence kind. Currently I'm limiting candidates to one, but it should be possible to have more than one.
llvm/lib/Analysis/IVDescriptors.cpp
776	Yes, it's a little confusing here. It could be better to replace `NonPhi` with `NonRecurPhi`. By the way, are you interested in supporting full functional SelectCmp pattern? I think min max with index pattern really needs to depend on the SelectCmp to be safe.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
786	I see., but `DependRecurrenceMasks` exists for a reason. Consider the following case: int idx = ii; int foo = jj; int max = mm; for (int i = 0; i < n; ++i) { int x = a[i]; if (max < x) { max = x; idx = i; foo = b[i]; } } That mask has the chance to be reused, and I try to keep that flexibility. Of course, we can recalculate the mask for each reduction that needs a mask, but currently using the global maps to preserve the mask is a relatively simple method that I think of. I have heard that VPlan is going to be extended to other blocks, could you share the relevant discussion links?
3759	Sure. Quick explanation: min/max recurrence should be done earlier than min max idx recurrence, because idx recurrence depends on the mask produced by min max recurrence. Here is to ensure that the recurrence dependencies are correct.
llvm/test/Transforms/LoopVectorize/smax-idx.ll
1	Of course. I will split an NFC patch tomorrow.

Mel-Chen added a parent revision: D146718: [LV] Add tests for integer min max with index reduction pattern. (NFC).Mar 23 2023, 6:03 AM

Split test case into parent revision.

Mel-Chen added inline comments.Mar 23 2023, 6:13 AM

llvm/include/llvm/Analysis/IVDescriptors.h
396	Too busy today. The document will be available in next week.

Harbormaster completed remote builds in B221296: Diff 507716.Mar 23 2023, 7:03 AM

Mel-Chen edited the summary of this revision. (Show Details)Mar 30 2023, 11:14 PM

Herald added subscribers: jeroen.dobbelaere, kosarev, kristof.beyls. · View Herald TranscriptMar 30 2023, 11:14 PM

@fhahn Updated my approach introduction in the summary.
If you have any questions, please contact me. Looking forward to discussing with you again. Thank you.

@fhahn Ping. I'd be glad to discuss this patch with you.

fhahn added inline comments.Apr 23 2023, 2:44 PM

llvm/lib/Analysis/IVDescriptors.cpp
751	Using this API seems unnecessarily strict; we don't need to bounds (and getBounds may fail if It cannot identify the bounds), we just need to check the direction of the IV, which can be done by checking if it is an induction PHI and use `SE.getMonotonicPredicateTyp`.
llvm/test/Transforms/LoopVectorize/select-min-index.ll
264–265	Is this incorrectly vectorized or does the test name need fixing? It looks like `%min.val` isn't an actual minimum value phi?

Herald added a subscriber: hoy. · View Herald TranscriptApr 23 2023, 2:44 PM

Rebase this patch, and created revision D149731 to preserving min max operation in select-cmp form.

Mel-Chen added a parent revision: D149731: [IR] New function llvm::createMinMaxSelectCmpOp for creating min/max operation in select-cmp form.May 3 2023, 1:15 AM

Harbormaster completed remote builds in B229636: Diff 519006.May 3 2023, 2:21 AM

Changes:

Split SelectIVICmp and SelectIVFCmp out
Add Comment
Minor refine the code

Harbormaster completed remote builds in B232802: Diff 523302.May 18 2023, 2:47 AM

Mel-Chen added a parent revision: D150851: [LoopVectorize] Vectorize select-cmp reduction pattern for increasing integer induction variable.May 18 2023, 3:01 AM

Mel-Chen edited the summary of this revision. (Show Details)

RKSimon edited the summary of this revision. (Show Details)Jun 5 2023, 2:31 AM

Changes:

Format and minor fix

Harbormaster completed remote builds in B236626: Diff 528430.Jun 5 2023, 7:17 AM

artagnon added a subscriber: artagnon.Jun 14 2023, 7:16 AM

Matt added a subscriber: Matt.Jun 14 2023, 3:35 PM

artagnon added inline comments.Jun 15 2023, 3:52 AM

llvm/lib/Analysis/IVDescriptors.cpp
424	Why?
433–435	If you separate out the MinMaxIdx pattern into its own function, we can check `NumCmpSelectPatternInst` for it separately.
879–908	This is a bit cryptic: would you consider adding more `RecurKind`s to make this less cryptic?
925–926	Can we avoid the expensive call to `isInductionPHI()` by checking that the `SCEVAddRec` is a `SCEVConstant`?
1292–1296	Why not merge this with the `RecurKind::SMax` case?
1326–1327	Rename these to `IMinMaxFirstIdx` and `IMinMaxLastIdx`?
llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
1027	Typo: comfirm.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4064–4067	`RdxDesc.isOrdered()` can help you pick between `FCMP_OEQ` and `FCMP_UEQ`.

Mel-Chen mentioned this in D150851: [LoopVectorize] Vectorize select-cmp reduction pattern for increasing integer induction variable.Jun 29 2023, 3:08 AM

shiva0217 added inline comments.Aug 17 2023, 1:59 AM

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
1038	Instead of fixUserRecurrence to setDependMinMaxRecurDes and change the user RecurKind, is it possible to setDependMinMaxRecurDes when isReductionPHI return true? If we able to propagate parent(dependent) RecurDes to isReductionPHI, perhaps we can create reduction as following. RecurKind ParentKind = RedDes.getRecurrenceKind(); if (ParentKind == RecurKind::SMax) { if (AddReductionVar(Phi, RecurKind::MinMaxFirstIdx, TheLoop, FMF, RedDes, DB, AC, DT, SE)) { LLVM_DEBUG(dbgs() << "Found an MinMaxFirstIdx reduction PHI." << *Phi << "\n"); return true; } } The dependency for the RecurKind could be explicitly and avoid the user RecurKind fixup.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3759	Perhaps we could do the sorting according to the reduction dependency before calling fixReduction which may be similar to https://reviews.llvm.org/D157631.
4033	Should we check isMinMaxRecurrenceKind(Kind) and isMinMaxIdxRecurrenceKind for user kind? Although it would be the only dependency currently, it might be explicit for the reader and avoid unexpected codegen in the future.
4035	Could we encapsulate the mask generation to createMinMaxIdxMaskOp or other name you prefer?
4060	Should we check isMinMaxRecurrenceKind(Kind) and isMinMaxIdxRecurrenceKind for user kind?
4061	Could we encapsulate the mask generation to createMinMaxIdxMask or similar?

Herald added a subscriber: wangpc. · View Herald TranscriptAug 17 2023, 1:59 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

IVDescriptors.h

101 lines

Transforms/

Utils/

LoopUtils.h

25 lines

lib/

Analysis/

IVDescriptors.cpp

226 lines

Transforms/

Utils/

LoopUtils.cpp

51 lines

Vectorize/

LoopVectorizationLegality.cpp

15 lines

LoopVectorize.cpp

95 lines

VPlanRecipes.cpp

3 lines

test/

Transforms/

LoopVectorize/

select-min-index.ll

199 lines

smax-idx.ll

987 lines

Diff 528430

llvm/include/llvm/Analysis/IVDescriptors.h

Show First 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	enum class RecurKind {
SelectICmp, ///< Integer select(icmp(),x,y) where one of (x,y) is loop		SelectICmp, ///< Integer select(icmp(),x,y) where one of (x,y) is loop
///< invariant		///< invariant
SelectFCmp, ///< Integer select(fcmp(),x,y) where one of (x,y) is loop		SelectFCmp, ///< Integer select(fcmp(),x,y) where one of (x,y) is loop
///< invariant		///< invariant
SelectIVICmp, ///< Integer select(icmp(),x,y) where one of (x,y) is increasing		SelectIVICmp, ///< Integer select(icmp(),x,y) where one of (x,y) is increasing
///< loop induction PHI		///< loop induction PHI
SelectIVFCmp, ///< Integer select(fcmp(),x,y) where one of (x,y) is increasing		SelectIVFCmp, ///< Integer select(fcmp(),x,y) where one of (x,y) is increasing
///< loop induction PHI		///< loop induction PHI
		MinMaxFirstIdx, ///< Min/Max with first index
		MinMaxLastIdx ///< Min/Max with last index
};		};

/// The RecurrenceDescriptor is used to identify recurrences variables in a		/// The RecurrenceDescriptor is used to identify recurrences variables in a
/// loop. Reduction is a special case of recurrence that has uses of the		/// loop. Reduction is a special case of recurrence that has uses of the
/// recurrence variable outside the loop. The method isReductionPHI identifies		/// recurrence variable outside the loop. The method isReductionPHI identifies
/// reductions that are basic recurrences.		/// reductions that are basic recurrences.
///		///
/// Basic recurrences are defined as the summation, product, OR, AND, XOR, min,		/// Basic recurrences are defined as the summation, product, OR, AND, XOR, min,
/// or max of a set of terms. For example: for(i=0; i<n; i++) { total +=		/// or max of a set of terms. For example: for(i=0; i<n; i++) { total +=
/// array[i]; } is a summation of array elements. Basic recurrences are a		/// array[i]; } is a summation of array elements. Basic recurrences are a
/// special case of chains of recurrences (CR). See ScalarEvolution for CR		/// special case of chains of recurrences (CR). See ScalarEvolution for CR
/// references.		/// references.

/// This struct holds information about recurrence variables.		/// This struct holds information about recurrence variables.
class RecurrenceDescriptor {		class RecurrenceDescriptor {
public:		public:
RecurrenceDescriptor() = default;		RecurrenceDescriptor() = default;

RecurrenceDescriptor(Value Start, Instruction Exit, StoreInst *Store,		RecurrenceDescriptor(Value Start, Instruction Exit, StoreInst *Store,
RecurKind K, FastMathFlags FMF, Instruction *ExactFP,		RecurKind K, FastMathFlags FMF, Instruction *ExactFP,
Type *RT, bool Signed, bool Ordered,		Type *RT, bool Signed, bool Ordered,
SmallPtrSetImpl<Instruction *> &CI,		SmallPtrSetImpl<Instruction *> &CI,
unsigned MinWidthCastToRecurTy)		unsigned MinWidthCastToRecurTy, PHINode *UserRecurPhi,
		RecurKind UserRecurKind)
: IntermediateStore(Store), StartValue(Start), LoopExitInstr(Exit),		: IntermediateStore(Store), StartValue(Start), LoopExitInstr(Exit),
Kind(K), FMF(FMF), ExactFPMathInst(ExactFP), RecurrenceType(RT),		Kind(K), FMF(FMF), ExactFPMathInst(ExactFP), RecurrenceType(RT),
IsSigned(Signed), IsOrdered(Ordered),		IsSigned(Signed), IsOrdered(Ordered),
MinWidthCastToRecurrenceType(MinWidthCastToRecurTy) {		MinWidthCastToRecurrenceType(MinWidthCastToRecurTy),
		UserRecurPhi(UserRecurPhi), UserRecurKind(UserRecurKind) {
CastInsts.insert(CI.begin(), CI.end());		CastInsts.insert(CI.begin(), CI.end());
}		}

/// This POD struct holds information about a potential recurrence operation.		/// This POD struct holds information about a potential recurrence operation.
class InstDesc {		class InstDesc {
public:		public:
InstDesc(bool IsRecur, Instruction I, Instruction ExactFP = nullptr)		InstDesc(bool IsRecur, Instruction I, Instruction ExactFP = nullptr)
: IsRecurrence(IsRecur), PatternLastInst(I),		: IsRecurrence(IsRecur), PatternLastInst(I),
RecKind(RecurKind::None), ExactFPMathInst(ExactFP) {}		RecKind(RecurKind::None), ExactFPMathInst(ExactFP) {}

InstDesc(Instruction I, RecurKind K, Instruction ExactFP = nullptr)		InstDesc(Instruction I, RecurKind K, Instruction ExactFP = nullptr)
: IsRecurrence(true), PatternLastInst(I), RecKind(K),		: IsRecurrence(true), PatternLastInst(I), RecKind(K),
ExactFPMathInst(ExactFP) {}		ExactFPMathInst(ExactFP) {}

		InstDesc(bool IsRecur, Instruction I, PHINode CandUserRecurPhi,
		RecurKind CandUserRecurKind, Instruction *ExactFP = nullptr)
		: IsRecurrence(IsRecur), PatternLastInst(I), RecKind(RecurKind::None),
		CandUserRecurPhi(CandUserRecurPhi),
		CandUserRecurKind(CandUserRecurKind), ExactFPMathInst(ExactFP) {}

bool isRecurrence() const { return IsRecurrence; }		bool isRecurrence() const { return IsRecurrence; }

bool needsExactFPMath() const { return ExactFPMathInst != nullptr; }		bool needsExactFPMath() const { return ExactFPMathInst != nullptr; }

Instruction *getExactFPMathInst() const { return ExactFPMathInst; }		Instruction *getExactFPMathInst() const { return ExactFPMathInst; }

RecurKind getRecKind() const { return RecKind; }		RecurKind getRecKind() const { return RecKind; }

Instruction *getPatternInst() const { return PatternLastInst; }		Instruction *getPatternInst() const { return PatternLastInst; }

		PHINode *getCandUserRecurPhi() const { return CandUserRecurPhi; }

		RecurKind getCandUserRecurKind() const { return CandUserRecurKind; }

		bool isCandidateUser() const {
		return getCandUserRecurPhi() && getCandUserRecurKind() != RecurKind::None;
		}

private:		private:
// Is this instruction a recurrence candidate.		// Is this instruction a recurrence candidate.
bool IsRecurrence;		bool IsRecurrence;
// The last instruction in a min/max pattern (select of the select(icmp())		// The last instruction in a min/max pattern (select of the select(icmp())
// pattern), or the current recurrence instruction otherwise.		// pattern), or the current recurrence instruction otherwise.
Instruction *PatternLastInst;		Instruction *PatternLastInst;
// If this is a min/max pattern.		// If this is a min/max pattern.
RecurKind RecKind;		RecurKind RecKind;
		// This instruction may be the operation of another recurrence.
		// Record potential recurrence phi.
		PHINode *CandUserRecurPhi = nullptr;
		// And expected recurrence kind.
		RecurKind CandUserRecurKind = RecurKind::None;
// Recurrence does not allow floating-point reassociation.		// Recurrence does not allow floating-point reassociation.
Instruction *ExactFPMathInst;		Instruction *ExactFPMathInst;
};		};

/// Returns a struct describing if the instruction 'I' can be a recurrence		/// Returns a struct describing if the instruction 'I' can be a recurrence
/// variable of type 'Kind' for a Loop \p L and reduction PHI \p Phi.		/// variable of type 'Kind' for a Loop \p L and reduction PHI \p Phi.
/// If the recurrence is a min/max pattern of select(icmp()) this function		/// If the recurrence is a min/max pattern of select(icmp()) this function
/// advances the instruction pointer 'I' from the compare instruction to the		/// advances the instruction pointer 'I' from the compare instruction to the
Show All 12 Lines	public:
static bool areAllUsesIn(Instruction I, SmallPtrSetImpl<Instruction > &Set);		static bool areAllUsesIn(Instruction I, SmallPtrSetImpl<Instruction > &Set);

/// Returns a struct describing if the instruction is a llvm.(s/u)(min/max),		/// Returns a struct describing if the instruction is a llvm.(s/u)(min/max),
/// llvm.minnum/maxnum or a Select(ICmp(X, Y), X, Y) pair of instructions		/// llvm.minnum/maxnum or a Select(ICmp(X, Y), X, Y) pair of instructions
/// corresponding to a min(X, Y) or max(X, Y), matching the recurrence kind \p		/// corresponding to a min(X, Y) or max(X, Y), matching the recurrence kind \p
/// Kind. \p Prev specifies the description of an already processed select		/// Kind. \p Prev specifies the description of an already processed select
/// instruction, so its corresponding cmp can be matched to it.		/// instruction, so its corresponding cmp can be matched to it.
static InstDesc isMinMaxPattern(Instruction *I, RecurKind Kind,		static InstDesc isMinMaxPattern(Instruction *I, RecurKind Kind,
const InstDesc &Prev);		const InstDesc &Prev, Loop *Loop,
		PHINode OrigPhi, ScalarEvolution SE);

		/// Returns RecurKind describing which min/max recurrence kind the instruction
		/// \p I belongs to. Return RecurKind::None if instruction \p I is not matched
		/// any of min/max recurrence kind. Unlike isMinMaxPattern, this function does
		/// not limit exactly one use of cmp value.
		static RecurKind isMinMaxOperation(Instruction *I);

		/// Returns a struct describing if the instruction is
		/// Select(ICmp(A, B), X, Y)
		/// where one of (X, Y) is a loop induction variable and the other is a index
		/// reduction phi. A and B must be used by a min max recurrence. The check of
		/// A and B will be in AddReductionVar, not in this function. \p MinMaxPhi
		/// specifies the phi of min max recurrence, and \p MinMaxKind indicates the
		/// kind of min max recurrence.
		static InstDesc isMinMaxIdxPattern(Loop Loop, Instruction I,
		PHINode *MinMaxPhi, RecurKind MinMaxKind,
		ScalarEvolution *SE);

/// Returns a struct describing whether the instruction is either a		/// Returns a struct describing whether the instruction is either a
/// Select(ICmp(A, B), X, Y), or		/// Select(ICmp(A, B), X, Y), or
/// Select(FCmp(A, B), X, Y)		/// Select(FCmp(A, B), X, Y)
/// where one of (X, Y) is a loop invariant integer or an increasing loop		/// where one of (X, Y) is a loop invariant integer or an increasing loop
/// induction variable and the other is a PHI value. \p Prev specifies the		/// induction variable and the other is a PHI value. \p Prev specifies the
/// description of an already processed select instruction, so its		/// description of an already processed select instruction, so its
/// corresponding cmp can be matched to it.		/// corresponding cmp can be matched to it.
▲ Show 20 Lines • Show All 77 Lines • ▼ Show 20 Lines	static bool isFPMinMaxRecurrenceKind(RecurKind Kind) {
return Kind == RecurKind::FMin \|\| Kind == RecurKind::FMax;		return Kind == RecurKind::FMin \|\| Kind == RecurKind::FMax;
}		}

/// Returns true if the recurrence kind is any min/max kind.		/// Returns true if the recurrence kind is any min/max kind.
static bool isMinMaxRecurrenceKind(RecurKind Kind) {		static bool isMinMaxRecurrenceKind(RecurKind Kind) {
return isIntMinMaxRecurrenceKind(Kind) \|\| isFPMinMaxRecurrenceKind(Kind);		return isIntMinMaxRecurrenceKind(Kind) \|\| isFPMinMaxRecurrenceKind(Kind);
}		}

		/// Returns true if the recurrence kind is a max kind.
		static bool isMaxRecurrenceKind(RecurKind Kind) {
		return Kind == RecurKind::UMax \|\| Kind == RecurKind::SMax \|\|
		Kind == RecurKind::FMax;
		}

		/// Returns true if the recurrence kind is of the form
		/// select(icmp(a,b),x,y) where one of (x,y) is increasing loop induction
		/// variable, and icmp(a,b) depends on a min/max recurrence.
		static bool isMinMaxIdxRecurrenceKind(RecurKind Kind) {
		return Kind == RecurKind::MinMaxFirstIdx \|\|
		Kind == RecurKind::MinMaxLastIdx;
		}

/// Returns true if the recurrence kind is of the form		/// Returns true if the recurrence kind is of the form
/// select(cmp(),x,y) where one of (x,y) is loop invariant or increasing		/// select(cmp(),x,y) where one of (x,y) is loop invariant or increasing
/// loop induction.		/// loop induction.
static bool isSelectCmpRecurrenceKind(RecurKind Kind) {		static bool isSelectCmpRecurrenceKind(RecurKind Kind) {
return Kind == RecurKind::SelectICmp \|\| Kind == RecurKind::SelectFCmp \|\|		return Kind == RecurKind::SelectICmp \|\| Kind == RecurKind::SelectFCmp \|\|
Kind == RecurKind::SelectIVICmp \|\| Kind == RecurKind::SelectIVFCmp;		Kind == RecurKind::SelectIVICmp \|\| Kind == RecurKind::SelectIVFCmp \|\|
		isMinMaxIdxRecurrenceKind(Kind);
}		}

/// Returns the type of the recurrence. This type can be narrower than the		/// Returns the type of the recurrence. This type can be narrower than the
/// actual type of the Phi if the recurrence has been type-promoted.		/// actual type of the Phi if the recurrence has been type-promoted.
Type *getRecurrenceType() const { return RecurrenceType; }		Type *getRecurrenceType() const { return RecurrenceType; }

/// Returns a reference to the instructions used for type-promoting the		/// Returns a reference to the instructions used for type-promoting the
/// recurrence.		/// recurrence.
const SmallPtrSet<Instruction *, 8> &getCastInsts() const { return CastInsts; }		const SmallPtrSet<Instruction *, 8> &getCastInsts() const { return CastInsts; }

		/// Returns the PHI of another recurrence who uses the recurrence.
		PHINode *getUserRecurPhi() const { return UserRecurPhi; }

		/// Set the recurrence kind.
		void setRecurKind(RecurKind K) {
		assert(K != RecurKind::None && "Unexpected recurrence kind.");
		Kind = K;
		}

		/// Set the min/max recurrence that the recurrence depends on.
		void setDependMinMaxRecurDes(RecurrenceDescriptor *MMRD) {
		assert(isMinMaxRecurrenceKind(MMRD->getRecurrenceKind()) &&
		"DependMinMaxRecDes must be a min/max recurrence.");
		DependMinMaxRecDes = MMRD;
		}

		/// Returns the min/max recurrence that is depended by the recurrence.
		RecurrenceDescriptor *getDependMinMaxRecDes() const {
		return DependMinMaxRecDes;
		}

		/// Returns true if the recurrence is used by another.
		bool hasUserRecurrence() const {
		return UserRecurPhi && UserRecurKind != RecurKind::None;
		}

		/// Converts \p UserRedDes to the correct recurrence kind, and complete the
		/// recurrence descriptor. Returns true if successful, otherwise returns
		/// false.
		bool fixUserRecurrence(RecurrenceDescriptor &UserRedDes);

/// Returns the minimum width used by the recurrence in bits.		/// Returns the minimum width used by the recurrence in bits.
unsigned getMinWidthCastToRecurrenceTypeInBits() const {		unsigned getMinWidthCastToRecurrenceTypeInBits() const {
return MinWidthCastToRecurrenceType;		return MinWidthCastToRecurrenceType;
}		}

/// Returns true if all source operands of the recurrence are SExtInsts.		/// Returns true if all source operands of the recurrence are SExtInsts.
bool isSigned() const { return IsSigned; }		bool isSigned() const { return IsSigned; }

Show All 36 Lines	private:
// True if this recurrence can be treated as an in-order reduction.		// True if this recurrence can be treated as an in-order reduction.
// Currently only a non-reassociative FAdd can be considered in-order,		// Currently only a non-reassociative FAdd can be considered in-order,
// if it is also the only FAdd in the PHI's use chain.		// if it is also the only FAdd in the PHI's use chain.
bool IsOrdered = false;		bool IsOrdered = false;
// Instructions used for type-promoting the recurrence.		// Instructions used for type-promoting the recurrence.
SmallPtrSet<Instruction *, 8> CastInsts;		SmallPtrSet<Instruction *, 8> CastInsts;
// The minimum width used by the recurrence.		// The minimum width used by the recurrence.
unsigned MinWidthCastToRecurrenceType;		unsigned MinWidthCastToRecurrenceType;
		// The PHI of another potential recurrence who uses the recurrence.
		PHINode *UserRecurPhi = nullptr;
		fhahnUnsubmitted Not Done Reply Inline Actions It would be helpful to document how the new system of recurrences depending on other recurrences would work I think, possibly also with an explanation of the whole approach in the patch description. fhahn: It would be helpful to document how the new system of recurrences depending on other…
		Mel-ChenAuthorUnsubmitted Done Reply Inline Actions Sure. I will document the whole approach. and update in the summary tomorrow. Quickly explain the function of `UserRecurPhi` . The purpose of `UserRecurPhi` is to allow the recurrence to be used in the loop (loop internal use), and to ensure that the user is also a recurrence. `UserRecurPhi` will record the candidate user recurrence phi, and `UserRecurKind` will recored the excepted user recurrence kind. Currently I'm limiting candidates to one, but it should be possible to have more than one. Mel-Chen: Sure. I will document the whole approach. and update in the summary tomorrow. Quickly explain…
		Mel-ChenAuthorUnsubmitted Done Reply Inline Actions Too busy today. The document will be available in next week. Mel-Chen: Too busy today. The document will be available in next week.
		// The kind of another potential recurrence who uses the recurrence.
		RecurKind UserRecurKind = RecurKind::None;
		// The min/max recurrence that is depended by the recurrence.
		RecurrenceDescriptor *DependMinMaxRecDes = nullptr;
};		};

/// A struct for saving information about induction variables.		/// A struct for saving information about induction variables.
class InductionDescriptor {		class InductionDescriptor {
public:		public:
/// This enum represents the kinds of inductions that we support.		/// This enum represents the kinds of inductions that we support.
enum InductionKind {		enum InductionKind {
IK_NoInduction, ///< Not an induction variable.		IK_NoInduction, ///< Not an induction variable.
▲ Show 20 Lines • Show All 99 Lines • Show Last 20 Lines

llvm/include/llvm/Transforms/Utils/LoopUtils.h

	Show First 20 Lines • Show All 360 Lines • ▼ Show 20 Lines
	/// See RecurrenceDescriptor::isSelectCmpPattern for a description of the			/// See RecurrenceDescriptor::isSelectCmpPattern for a description of the
	/// pattern we are trying to match. In this pattern we are only ever selecting			/// pattern we are trying to match. In this pattern we are only ever selecting
	/// between two values: 1) an initial PHI start value, and 2) a loop invariant			/// between two values: 1) an initial PHI start value, and 2) a loop invariant
	/// value and increasing loop induction variable. This function uses \p			/// value and increasing loop induction variable. This function uses \p
	/// LoopExitInst to determine 2), which we then use to select between \p Left			/// LoopExitInst to determine 2), which we then use to select between \p Left
	/// and \p Right. Any lane value in \p Left that matches 2) will be merged into			/// and \p Right. Any lane value in \p Left that matches 2) will be merged into
	/// \p Right.			/// \p Right.
	Value createSelectCmpOp(IRBuilderBase &Builder, Value StartVal, RecurKind RK,			Value createSelectCmpOp(IRBuilderBase &Builder, Value StartVal, RecurKind RK,
	Value Left, Value Right);			Value Left, Value Right, Value *SrcCmp = nullptr);

	/// Returns a Min/Max operation in select-cmp form corresponding to			/// Returns a Min/Max operation in select-cmp form corresponding to
	/// MinMaxRecurrenceKind.			/// MinMaxRecurrenceKind.
	/// Select(Cmp(strict min max predicate, Left, Right), Left, Right)			/// Select(Cmp(strict min max predicate, Left, Right), Left, Right)
	/// The Builder's fast-math-flags must be set to propagate the expected values.			/// The Builder's fast-math-flags must be set to propagate the expected values.
	Value *createMinMaxSelectCmpOp(IRBuilderBase &Builder, RecurKind RK,			Value *createMinMaxSelectCmpOp(IRBuilderBase &Builder, RecurKind RK,
	Value Left, Value Right);			Value Left, Value Right);

	Show All 26 Lines
	/// is described by \p Desc.			/// is described by \p Desc.
	Value *createInvariantSelectCmpTargetReduction(IRBuilderBase &B,			Value *createInvariantSelectCmpTargetReduction(IRBuilderBase &B,
	const TargetTransformInfo *TTI,			const TargetTransformInfo *TTI,
	Value *Src,			Value *Src,
	const RecurrenceDescriptor &Desc,			const RecurrenceDescriptor &Desc,
	PHINode *OrigPhi);			PHINode *OrigPhi);

	/// Create a target reduction of the given vector \p Src for a reduction of the			/// Create a target reduction of the given vector \p Src for a reduction of the
	/// kind conforms to RecurrenceDescriptor::isSelectCmpPattern. The reduction			/// kind RecurKind::MinMaxLastIdx or RecurKind::MinMaxFirstIdx. The reduction
	/// operation is described by \p Desc.			/// operation is described by \p Desc. \p SrcMask is a mask generated by min/max
	Value *createSelectCmpTargetReduction(IRBuilderBase &B,			/// reduction, used to restrict the range of selectable \p Src for target
				/// reduction.
				Value *createMMISelectCmpTargetReduction(IRBuilderBase &Builder,
	const TargetTransformInfo *TTI,			const TargetTransformInfo *TTI,
	Value *Src,			Value *Src,
	const RecurrenceDescriptor &Desc,			const RecurrenceDescriptor &Desc,
	PHINode *OrigPhi = nullptr);			Value *SrcMask);

				/// Create a target reduction of the given vector \p Src for a reduction of the
				/// kind conforms to RecurrenceDescriptor::isSelectCmpPattern. The reduction
				/// operation is described by \p Desc.
				Value *
				createSelectCmpTargetReduction(IRBuilderBase &B, const TargetTransformInfo *TTI,
				Value *Src, const RecurrenceDescriptor &Desc,
				PHINode OrigPhi, Value SrcMask = nullptr);

	/// Create a generic target reduction using a recurrence descriptor \p Desc			/// Create a generic target reduction using a recurrence descriptor \p Desc
	/// The target is queried to determine if intrinsics or shuffle sequences are			/// The target is queried to determine if intrinsics or shuffle sequences are
	/// required to implement the reduction.			/// required to implement the reduction.
	/// Fast-math-flags are propagated using the RecurrenceDescriptor.			/// Fast-math-flags are propagated using the RecurrenceDescriptor.
	Value createTargetReduction(IRBuilderBase &B, const TargetTransformInfo TTI,			Value createTargetReduction(IRBuilderBase &B, const TargetTransformInfo TTI,
	const RecurrenceDescriptor &Desc, Value *Src,			const RecurrenceDescriptor &Desc, Value *Src,
	PHINode *OrigPhi = nullptr);			PHINode *OrigPhi = nullptr,
				Value *SrcMask = nullptr);

	/// Create an ordered reduction intrinsic using the given recurrence			/// Create an ordered reduction intrinsic using the given recurrence
	/// descriptor \p Desc.			/// descriptor \p Desc.
	Value *createOrderedReduction(IRBuilderBase &B,			Value *createOrderedReduction(IRBuilderBase &B,
	const RecurrenceDescriptor &Desc, Value *Src,			const RecurrenceDescriptor &Desc, Value *Src,
	Value *Start);			Value *Start);

	/// Returns a set of cmp and select instructions as shown below:			/// Returns a set of cmp and select instructions as shown below:
	▲ Show 20 Lines • Show All 157 Lines • Show Last 20 Lines

llvm/lib/Analysis/IVDescriptors.cpp

Show First 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	bool RecurrenceDescriptor::isIntegerRecurrenceKind(RecurKind Kind) {
case RecurKind::SMax:		case RecurKind::SMax:
case RecurKind::SMin:		case RecurKind::SMin:
case RecurKind::UMax:		case RecurKind::UMax:
case RecurKind::UMin:		case RecurKind::UMin:
case RecurKind::SelectICmp:		case RecurKind::SelectICmp:
case RecurKind::SelectFCmp:		case RecurKind::SelectFCmp:
case RecurKind::SelectIVICmp:		case RecurKind::SelectIVICmp:
case RecurKind::SelectIVFCmp:		case RecurKind::SelectIVFCmp:
		case RecurKind::MinMaxFirstIdx:
		case RecurKind::MinMaxLastIdx:
return true;		return true;
}		}
return false;		return false;
}		}

bool RecurrenceDescriptor::isFloatingPointRecurrenceKind(RecurKind Kind) {		bool RecurrenceDescriptor::isFloatingPointRecurrenceKind(RecurKind Kind) {
return (Kind != RecurKind::None) && !isIntegerRecurrenceKind(Kind);		return (Kind != RecurKind::None) && !isIntegerRecurrenceKind(Kind);
}		}

		bool RecurrenceDescriptor::fixUserRecurrence(RecurrenceDescriptor &UserRedDes) {
		RecurKind UserCurrKind = UserRedDes.getRecurrenceKind();
		assert(UserCurrKind != RecurKind::None && "Unexpected recurrence kind.");

		if (isMinMaxRecurrenceKind(Kind))
		if (UserCurrKind == RecurKind::SelectIVICmp \|\|
		UserCurrKind == RecurKind::SelectIVFCmp) {
		UserRedDes.setRecurKind(UserRecurKind);
		UserRedDes.setDependMinMaxRecurDes(this);
		return true;
		}

		return false;
		}

/// Determines if Phi may have been type-promoted. If Phi has a single user		/// Determines if Phi may have been type-promoted. If Phi has a single user
/// that ANDs the Phi with a type mask, return the user. RT is updated to		/// that ANDs the Phi with a type mask, return the user. RT is updated to
/// account for the narrower bit width represented by the mask, and the AND		/// account for the narrower bit width represented by the mask, and the AND
/// instruction is added to CI.		/// instruction is added to CI.
static Instruction lookThroughAnd(PHINode Phi, Type *&RT,		static Instruction lookThroughAnd(PHINode Phi, Type *&RT,
SmallPtrSetImpl<Instruction *> &Visited,		SmallPtrSetImpl<Instruction *> &Visited,
SmallPtrSetImpl<Instruction *> &CI) {		SmallPtrSetImpl<Instruction *> &CI) {
if (!Phi->hasOneUse())		if (!Phi->hasOneUse())
▲ Show 20 Lines • Show All 168 Lines • ▼ Show 20 Lines	bool RecurrenceDescriptor::AddReductionVar(
bool FoundReduxOp = false;		bool FoundReduxOp = false;

// We start with the PHI node and scan for all of the users of this		// We start with the PHI node and scan for all of the users of this
// instruction. All users must be instructions that can be used as reduction		// instruction. All users must be instructions that can be used as reduction
// variables (such as ADD). We must have a single out-of-block user. The cycle		// variables (such as ADD). We must have a single out-of-block user. The cycle
// must include the original PHI.		// must include the original PHI.
bool FoundStartPHI = false;		bool FoundStartPHI = false;

		// UserRecurPHI refers to the starting PHI of another recurrence that may use
		// this reduction operation. It is used for recognize the min/max with index
		// pattern.
		// TODO: So far only one user is allowed, but ideally, multiple user
		// recurrences should be supported.
		PHINode *UserRecurPHI = nullptr;
		// UserRecurKind refers to the expected kind of user recurrence.
		RecurKind UserRecurKind = RecurKind::None;
		// UserRecurInstr refers to the ExitInstruction of a user recurrence.
		// FIXME: Should rename to UserRecurExit
		Instruction *UserRecurInstr = nullptr;

// To recognize min/max patterns formed by a icmp select sequence, we store		// To recognize min/max patterns formed by a icmp select sequence, we store
// the number of instruction we saw from the recognized min/max pattern,		// the number of instruction we saw from the recognized min/max pattern,
// to make sure we only see exactly the two instructions.		// to make sure we only see exactly the two instructions.
unsigned NumCmpSelectPatternInst = 0;		unsigned NumCmpSelectPatternInst = 0;
InstDesc ReduxDesc(false, nullptr);		InstDesc ReduxDesc(false, nullptr);

// Data used for determining if the recurrence has been type-promoted.		// Data used for determining if the recurrence has been type-promoted.
Type *RecurrenceType = Phi->getType();		Type *RecurrenceType = Phi->getType();
▲ Show 20 Lines • Show All 116 Lines • ▼ Show 20 Lines	while (!Worklist.empty()) {
// the starting value (the Phi or an AND instruction if the Phi has been		// the starting value (the Phi or an AND instruction if the Phi has been
// type-promoted).		// type-promoted).
if (Cur != Start) {		if (Cur != Start) {
ReduxDesc =		ReduxDesc =
isRecurrenceInstr(TheLoop, Phi, Cur, Kind, ReduxDesc, FuncFMF, SE);		isRecurrenceInstr(TheLoop, Phi, Cur, Kind, ReduxDesc, FuncFMF, SE);
ExactFPMathInst = ExactFPMathInst == nullptr		ExactFPMathInst = ExactFPMathInst == nullptr
? ReduxDesc.getExactFPMathInst()		? ReduxDesc.getExactFPMathInst()
: ExactFPMathInst;		: ExactFPMathInst;
if (!ReduxDesc.isRecurrence())		if (!ReduxDesc.isRecurrence()) {
		if (!ReduxDesc.isCandidateUser())
return false;		return false;

		// TODO: Only allow one user recurrence now.
		if (UserRecurPHI)
		return false;

		UserRecurPHI = ReduxDesc.getCandUserRecurPhi();
		UserRecurKind = ReduxDesc.getCandUserRecurKind();
		UserRecurInstr = Cur;
		// TODO: Call AddReductionVar here?
		artagnonUnsubmitted Not Done Reply Inline Actions Why? artagnon: Why?

		// Fix NumCmpSelectPatternInst
		// When searching min/max with index pattern, the cmp belonging to index
		// reduction will be mistaken for the cmp belonging to min/max
		// reduction. This will cause the min/max reduction to be unrecognizable
		// due to the number exception of NumCmpSelectPatternInst.
		// FIXME: There may be a better way to handle NumCmpSelectPatternInst
		// issue.
		if (match(UserRecurInstr,
		m_Select(m_OneUse(m_Cmp()), m_Value(), m_Value())))
		--NumCmpSelectPatternInst;
		artagnonUnsubmitted Not Done Reply Inline Actions If you separate out the MinMaxIdx pattern into its own function, we can check `NumCmpSelectPatternInst` for it separately. artagnon: If you separate out the MinMaxIdx pattern into its own function, we can check…

		// Stop visiting the users of current instruction if it contains user
		// recurrence.
		continue;
		}
// FIXME: FMF is allowed on phi, but propagation is not handled correctly.		// FIXME: FMF is allowed on phi, but propagation is not handled correctly.
if (isa<FPMathOperator>(ReduxDesc.getPatternInst()) && !IsAPhi) {		if (isa<FPMathOperator>(ReduxDesc.getPatternInst()) && !IsAPhi) {
FastMathFlags CurFMF = ReduxDesc.getPatternInst()->getFastMathFlags();		FastMathFlags CurFMF = ReduxDesc.getPatternInst()->getFastMathFlags();
if (auto *Sel = dyn_cast<SelectInst>(ReduxDesc.getPatternInst())) {		if (auto *Sel = dyn_cast<SelectInst>(ReduxDesc.getPatternInst())) {
// Accept FMF on either fcmp or select of a min/max idiom.		// Accept FMF on either fcmp or select of a min/max idiom.
// TODO: This is a hack to work-around the fact that FMF may not be		// TODO: This is a hack to work-around the fact that FMF may not be
// assigned/propagated correctly. If that problem is fixed or we		// assigned/propagated correctly. If that problem is fixed or we
// standardize on fmin/fmax via intrinsics, this can be removed.		// standardize on fmin/fmax via intrinsics, this can be removed.
▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines	for (User *U : Cur->users()) {
NonPHIs.push_back(UI);		NonPHIs.push_back(UI);
}		}
} else if (!isa<PHINode>(UI) &&		} else if (!isa<PHINode>(UI) &&
((!isa<FCmpInst>(UI) && !isa<ICmpInst>(UI) &&		((!isa<FCmpInst>(UI) && !isa<ICmpInst>(UI) &&
!isa<SelectInst>(UI)) \|\|		!isa<SelectInst>(UI)) \|\|
(!isConditionalRdxPattern(Kind, UI).isRecurrence() &&		(!isConditionalRdxPattern(Kind, UI).isRecurrence() &&
!isSelectCmpPattern(TheLoop, Phi, UI, IgnoredVal, SE)		!isSelectCmpPattern(TheLoop, Phi, UI, IgnoredVal, SE)
.isRecurrence() &&		.isRecurrence() &&
!isMinMaxPattern(UI, Kind, IgnoredVal).isRecurrence())))		!isMinMaxPattern(UI, Kind, IgnoredVal, TheLoop, Phi, SE)
		.isRecurrence())))
return false;		return false;

// Remember that we completed the cycle.		// Remember that we completed the cycle.
if (UI == Phi)		if (UI == Phi)
FoundStartPHI = true;		FoundStartPHI = true;
}		}
Worklist.append(PHIs.begin(), PHIs.end());		Worklist.append(PHIs.begin(), PHIs.end());
Worklist.append(NonPHIs.begin(), NonPHIs.end());		Worklist.append(NonPHIs.begin(), NonPHIs.end());
Show All 33 Lines	if (IntermediateStore) {
// reduction value after the loop will be the one used in the last store.		// reduction value after the loop will be the one used in the last store.
if (!ExitInstruction)		if (!ExitInstruction)
ExitInstruction = cast<Instruction>(IntermediateStore->getValueOperand());		ExitInstruction = cast<Instruction>(IntermediateStore->getValueOperand());
}		}

if (!FoundStartPHI \|\| !FoundReduxOp \|\| !ExitInstruction)		if (!FoundStartPHI \|\| !FoundReduxOp \|\| !ExitInstruction)
return false;		return false;

		// Check for the min/max with index pattern. Check if the operands used by cmp
		// instruction of UserRecurInstr is the same as the operands used by min/max
		// recurrence.
		if (isMinMaxRecurrenceKind(Kind) && UserRecurPHI) {
		auto *UserRecurSI = cast<SelectInst>(UserRecurInstr);
		Value *UserRecurCond = UserRecurSI->getCondition();
		if (auto *MinMaxSI = dyn_cast<SelectInst>(ExitInstruction)) {
		// TODO: As long as the operands are the same, it is not limited to the
		// same cmp instruction.
		if (UserRecurCond != MinMaxSI->getCondition())
		return false;
		} else if (auto *MinMaxII = dyn_cast<IntrinsicInst>(ExitInstruction)) {
		// Match smax(%maxphi, %0), icmp(pred, %maxphi, %0) or
		// smax(%maxphi, %0), icmp(swapped_pred, %0, %maxphi)
		Value *MinMaxOp0 = MinMaxII->getOperand(0);
		Value *MinMaxOp1 = MinMaxII->getOperand(1);
		CmpInst::Predicate Pred;
		if (!match(UserRecurCond,
		m_Cmp(Pred, m_Specific(MinMaxOp0), m_Specific(MinMaxOp1))) &&
		!match(UserRecurCond,
		m_Cmp(Pred, m_Specific(MinMaxOp1), m_Specific(MinMaxOp0))))
		return false;
		}
		}

const bool IsOrdered =		const bool IsOrdered =
checkOrderedReduction(Kind, ExactFPMathInst, ExitInstruction, Phi);		checkOrderedReduction(Kind, ExactFPMathInst, ExitInstruction, Phi);

if (Start != Phi) {		if (Start != Phi) {
// If the starting value is not the same as the phi node, we speculatively		// If the starting value is not the same as the phi node, we speculatively
// looked through an 'and' instruction when evaluating a potential		// looked through an 'and' instruction when evaluating a potential
// arithmetic reduction to determine if it may have been type-promoted.		// arithmetic reduction to determine if it may have been type-promoted.
//		//
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	bool RecurrenceDescriptor::AddReductionVar(
// only have a single instruction with out-of-loop users.		// only have a single instruction with out-of-loop users.

// The ExitInstruction(Instruction which is allowed to have out-of-loop users)		// The ExitInstruction(Instruction which is allowed to have out-of-loop users)
// is saved as part of the RecurrenceDescriptor.		// is saved as part of the RecurrenceDescriptor.

// Save the description of this reduction variable.		// Save the description of this reduction variable.
RecurrenceDescriptor RD(RdxStart, ExitInstruction, IntermediateStore, Kind,		RecurrenceDescriptor RD(RdxStart, ExitInstruction, IntermediateStore, Kind,
FMF, ExactFPMathInst, RecurrenceType, IsSigned,		FMF, ExactFPMathInst, RecurrenceType, IsSigned,
IsOrdered, CastInsts, MinWidthCastToRecurrenceType);		IsOrdered, CastInsts, MinWidthCastToRecurrenceType,
		UserRecurPHI, UserRecurKind);
RedDes = RD;		RedDes = RD;

return true;		return true;
}		}

// We are looking for loops that do something like this:		// We are looking for loops that do something like this:
// int r = 0;		// int r = 0;
// for (int i = 0; i < n; i++) {		// for (int i = 0; i < n; i++) {
Show All 37 Lines	RecurrenceDescriptor::isSelectCmpPattern(Loop Loop, PHINode OrigPhi,

if (OrigPhi == dyn_cast<PHINode>(SI->getTrueValue()))		if (OrigPhi == dyn_cast<PHINode>(SI->getTrueValue()))
NonPhi = SI->getFalseValue();		NonPhi = SI->getFalseValue();
else if (OrigPhi == dyn_cast<PHINode>(SI->getFalseValue()))		else if (OrigPhi == dyn_cast<PHINode>(SI->getFalseValue()))
NonPhi = SI->getTrueValue();		NonPhi = SI->getTrueValue();
else		else
return InstDesc(false, I);		return InstDesc(false, I);

auto IsIncreasingLoopInduction = [&SE, &Loop](Value *V) {		auto IsIncreasingLoopInduction = [&SE, &Loop](Value *V) {
		fhahnUnsubmitted Not Done Reply Inline Actions nit: Variables should start with upper case also, move definition to use? fhahn: nit: Variables should start with upper case also, move definition to use?
auto *Phi = dyn_cast<PHINode>(V);		auto *Phi = dyn_cast<PHINode>(V);
if (!Phi)		if (!Phi)
return false;		return false;

if (!SE)		if (!SE)
return false;		return false;

InductionDescriptor ID;		InductionDescriptor ID;
		fhahnUnsubmitted Not Done Reply Inline Actions Using this API seems unnecessarily strict; we don't need to bounds (and getBounds may fail if It cannot identify the bounds), we just need to check the direction of the IV, which can be done by checking if it is an induction PHI and use `SE.getMonotonicPredicateTyp`. fhahn: Using this API seems unnecessarily strict; we don't need to bounds (and getBounds may fail if…
if (!InductionDescriptor::isInductionPHI(Phi, Loop, SE, ID))		if (!InductionDescriptor::isInductionPHI(Phi, Loop, SE, ID))
return false;		return false;

const SCEVAddRecExpr *AR = cast<SCEVAddRecExpr>(SE->getSCEV(Phi));		const SCEVAddRecExpr *AR = cast<SCEVAddRecExpr>(SE->getSCEV(Phi));
if (!AR->hasNoSignedWrap())		if (!AR->hasNoSignedWrap())
return false;		return false;

ConstantInt *IVStartValue = dyn_cast<ConstantInt>(ID.getStartValue());		ConstantInt *IVStartValue = dyn_cast<ConstantInt>(ID.getStartValue());
if (!IVStartValue \|\| IVStartValue->isMinSignedValue())		if (!IVStartValue \|\| IVStartValue->isMinSignedValue())
return false;		return false;

const SCEV *Step = ID.getStep();		const SCEV *Step = ID.getStep();
return SE->isKnownPositive(Step);		return SE->isKnownPositive(Step);
};		};

// We are looking for selects of the form:		// We are looking for selects of the form:
// select(cmp(), phi, loop_invariant) or		// select(cmp(), phi, loop_invariant) or
// select(cmp(), loop_invariant, phi)		// select(cmp(), loop_invariant, phi)
if (Loop->isLoopInvariant(NonPhi))		if (Loop->isLoopInvariant(NonPhi))
return InstDesc(I, isa<ICmpInst>(I->getOperand(0)) ? RecurKind::SelectICmp		return InstDesc(I, isa<ICmpInst>(I->getOperand(0)) ? RecurKind::SelectICmp
: RecurKind::SelectFCmp);		: RecurKind::SelectFCmp);
// or		// or
// select(cmp(), phi, loop_induction) or		// select(cmp(), phi, loop_induction) or
// select(cmp(), loop_induction, phi)		// select(cmp(), loop_induction, phi)
if (IsIncreasingLoopInduction(NonPhi))		if (IsIncreasingLoopInduction(NonPhi))
		fhahnUnsubmitted Not Done Reply Inline Actions The naming here is a bit confusing now, `NonPhi` can be an increasing loop induction? In that case it would be a phi, right? fhahn: The naming here is a bit confusing now, `NonPhi` can be an increasing loop induction? In that…
		Mel-ChenAuthorUnsubmitted Done Reply Inline Actions Yes, it's a little confusing here. It could be better to replace `NonPhi` with `NonRecurPhi`. By the way, are you interested in supporting full functional SelectCmp pattern? I think min max with index pattern really needs to depend on the SelectCmp to be safe. Mel-Chen: Yes, it's a little confusing here. It could be better to replace `NonPhi` with `NonRecurPhi`.
return InstDesc(I, isa<ICmpInst>(I->getOperand(0))		return InstDesc(I, isa<ICmpInst>(I->getOperand(0))
? RecurKind::SelectIVICmp		? RecurKind::SelectIVICmp
: RecurKind::SelectIVFCmp);		: RecurKind::SelectIVFCmp);

return InstDesc(false, I);		return InstDesc(false, I);
}		}

RecurrenceDescriptor::InstDesc		RecurrenceDescriptor::InstDesc
RecurrenceDescriptor::isMinMaxPattern(Instruction *I, RecurKind Kind,		RecurrenceDescriptor::isMinMaxPattern(Instruction *I, RecurKind Kind,
const InstDesc &Prev) {		const InstDesc &Prev, Loop *Loop,
		PHINode OrigPhi, ScalarEvolution SE) {
assert((isa<CmpInst>(I) \|\| isa<SelectInst>(I) \|\| isa<CallInst>(I)) &&		assert((isa<CmpInst>(I) \|\| isa<SelectInst>(I) \|\| isa<CallInst>(I)) &&
"Expected a cmp or select or call instruction");		"Expected a cmp or select or call instruction");
if (!isMinMaxRecurrenceKind(Kind))		if (!isMinMaxRecurrenceKind(Kind))
return InstDesc(false, I);		return InstDesc(false, I);

// We must handle the select(cmp()) as a single instruction. Advance to the		// We must handle the select(cmp()) as a single instruction. Advance to the
// select.		// select.
CmpInst::Predicate Pred;		CmpInst::Predicate Pred;
if (match(I, m_OneUse(m_Cmp(Pred, m_Value(), m_Value())))) {		if (match(I, m_OneUse(m_Cmp(Pred, m_Value(), m_Value())))) {
if (auto Select = dyn_cast<SelectInst>(I->user_begin()))		if (auto Select = dyn_cast<SelectInst>(I->user_begin()))
return InstDesc(Select, Prev.getRecKind());		return InstDesc(Select, Prev.getRecKind());
}		}

// Only match select with single use cmp condition, or a min/max intrinsic.		// Only match select with single use cmp condition, or a min/max intrinsic.
if (!isa<IntrinsicInst>(I) &&		if (!isa<IntrinsicInst>(I) &&
!match(I, m_Select(m_OneUse(m_Cmp(Pred, m_Value(), m_Value())), m_Value(),		!match(I, m_Select(m_OneUse(m_Cmp(Pred, m_Value(), m_Value())), m_Value(),
m_Value())))		m_Value())))
return InstDesc(false, I);		return InstDesc(false, I);

		RecurKind MMRK = isMinMaxOperation(I);
		if (MMRK != RecurKind::None)
		return InstDesc(Kind == MMRK, I);

		if (isa<SelectInst>(I))
		return isMinMaxIdxPattern(Loop, I, OrigPhi, Kind, SE);

		return InstDesc(false, I);
		}

		RecurKind RecurrenceDescriptor::isMinMaxOperation(Instruction *I) {
// Look for a min/max pattern.		// Look for a min/max pattern.
if (match(I, m_UMin(m_Value(), m_Value())))		if (match(I, m_UMin(m_Value(), m_Value())))
return InstDesc(Kind == RecurKind::UMin, I);		return RecurKind::UMin;
if (match(I, m_UMax(m_Value(), m_Value())))		if (match(I, m_UMax(m_Value(), m_Value())))
return InstDesc(Kind == RecurKind::UMax, I);		return RecurKind::UMax;
if (match(I, m_SMax(m_Value(), m_Value())))		if (match(I, m_SMax(m_Value(), m_Value())))
return InstDesc(Kind == RecurKind::SMax, I);		return RecurKind::SMax;
if (match(I, m_SMin(m_Value(), m_Value())))		if (match(I, m_SMin(m_Value(), m_Value())))
return InstDesc(Kind == RecurKind::SMin, I);		return RecurKind::SMin;
if (match(I, m_OrdFMin(m_Value(), m_Value())))		if (match(I, m_OrdFMin(m_Value(), m_Value())))
return InstDesc(Kind == RecurKind::FMin, I);		return RecurKind::FMin;
if (match(I, m_OrdFMax(m_Value(), m_Value())))		if (match(I, m_OrdFMax(m_Value(), m_Value())))
return InstDesc(Kind == RecurKind::FMax, I);		return RecurKind::FMax;
if (match(I, m_UnordFMin(m_Value(), m_Value())))		if (match(I, m_UnordFMin(m_Value(), m_Value())))
return InstDesc(Kind == RecurKind::FMin, I);		return RecurKind::FMin;
if (match(I, m_UnordFMax(m_Value(), m_Value())))		if (match(I, m_UnordFMax(m_Value(), m_Value())))
return InstDesc(Kind == RecurKind::FMax, I);		return RecurKind::FMax;
if (match(I, m_Intrinsic<Intrinsic::minnum>(m_Value(), m_Value())))		if (match(I, m_Intrinsic<Intrinsic::minnum>(m_Value(), m_Value())))
return InstDesc(Kind == RecurKind::FMin, I);		return RecurKind::FMin;
if (match(I, m_Intrinsic<Intrinsic::maxnum>(m_Value(), m_Value())))		if (match(I, m_Intrinsic<Intrinsic::maxnum>(m_Value(), m_Value())))
return InstDesc(Kind == RecurKind::FMax, I);		return RecurKind::FMax;

		return RecurKind::None;
		}

		RecurrenceDescriptor::InstDesc RecurrenceDescriptor::isMinMaxIdxPattern(
		Loop Loop, Instruction I, PHINode *MinMaxPhi, RecurKind MinMaxKind,
		ScalarEvolution *SE) {
		assert(isa<SelectInst>(I) && "Expected a select instruction");
		// TODO: FP MinMax
		if (!isIntMinMaxRecurrenceKind(MinMaxKind))
		return InstDesc(false, I);

		// Requires SCEV to check the index part
		if (!SE) {
		LLVM_DEBUG(dbgs() << "MinMaxIdx patterns are not recognized without "
		<< "Scalar Evolution Analysis\n");
		return InstDesc(false, I);
		}

		// Check the index select
		auto *SI = cast<SelectInst>(I);
		Value *Cond = SI->getCondition();
		CmpInst::Predicate Pred;
		CmpInst::Predicate NormPred;

		// %cmp = icmp pred, %mmphi, %0
		// %select = select %cmp, %update, %idxphi
		// Check if cmp used min/max phi
		if (match(Cond, m_Cmp(Pred, m_Specific(MinMaxPhi), m_Value())))
		NormPred = Pred;
		else if (match(Cond, m_Cmp(Pred, m_Value(), m_Specific(MinMaxPhi))))
		// Normalize the predicate, and get which side the select should update idx
		// TODO: Need to consider commutable.
		NormPred = CmpInst::getSwappedPredicate(Pred);
		else
		return InstDesc(false, I);

		bool UpdateSide;
		RecurKind ExpectedIdxRK;
		switch (NormPred) {
		case CmpInst::ICMP_SLT:
		case CmpInst::ICMP_ULT:
		// %mmphi < %0
		UpdateSide = isMaxRecurrenceKind(MinMaxKind);
		ExpectedIdxRK = isMaxRecurrenceKind(MinMaxKind) ? RecurKind::MinMaxFirstIdx
		: RecurKind::MinMaxLastIdx;
		break;
		case CmpInst::ICMP_SLE:
		case CmpInst::ICMP_ULE:
		// %mmphi <= %0
		UpdateSide = isMaxRecurrenceKind(MinMaxKind);
		ExpectedIdxRK = isMaxRecurrenceKind(MinMaxKind) ? RecurKind::MinMaxLastIdx
		: RecurKind::MinMaxFirstIdx;
		break;
		case CmpInst::ICMP_SGT:
		case CmpInst::ICMP_UGT:
		// %mmphi > %0
		UpdateSide = !isMaxRecurrenceKind(MinMaxKind);
		ExpectedIdxRK = isMaxRecurrenceKind(MinMaxKind) ? RecurKind::MinMaxLastIdx
		: RecurKind::MinMaxFirstIdx;
		break;
		case CmpInst::ICMP_SGE:
		case CmpInst::ICMP_UGE:
		// %mmphi >= %0
		UpdateSide = !isMaxRecurrenceKind(MinMaxKind);
		ExpectedIdxRK = isMaxRecurrenceKind(MinMaxKind) ? RecurKind::MinMaxFirstIdx
		: RecurKind::MinMaxLastIdx;
		break;
		default:
		return InstDesc(false, I);
		artagnonUnsubmitted Not Done Reply Inline Actions This is a bit cryptic: would you consider adding more `RecurKind`s to make this less cryptic? artagnon: This is a bit cryptic: would you consider adding more `RecurKind`s to make this less cryptic?
		}

		// Get the reduction phi of index select
		Value *IdxUpdateV = UpdateSide ? SI->getTrueValue() : SI->getFalseValue();
		Value *IdxReduxV = UpdateSide ? SI->getFalseValue() : SI->getTrueValue();
		// Handle the operand of index select may have been casted.
		if (auto *Cast = dyn_cast<CastInst>(IdxUpdateV))
		IdxUpdateV = Cast->getOperand(0);

		auto *IdxUpdatePhi = dyn_cast<PHINode>(IdxUpdateV);
		auto *IdxReduxPhi = dyn_cast<PHINode>(IdxReduxV);
		if (!IdxUpdatePhi \|\| !IdxReduxPhi)
return InstDesc(false, I);		return InstDesc(false, I);

		// Check update side is a loop induction variable
		InductionDescriptor ID;
		if (!InductionDescriptor::isInductionPHI(IdxUpdatePhi, Loop, SE, ID))
		return InstDesc(false, I);
		artagnonUnsubmitted Not Done Reply Inline Actions Can we avoid the expensive call to `isInductionPHI()` by checking that the `SCEVAddRec` is a `SCEVConstant`? artagnon: Can we avoid the expensive call to `isInductionPHI()` by checking that the `SCEVAddRec` is a…

		// The reduction phi of index select and reduction phi of min/max must not the
		// same
		if (IdxReduxPhi == MinMaxPhi)
		return InstDesc(false, I);

		return InstDesc(false, I, IdxReduxPhi, ExpectedIdxRK);
}		}

/// Returns true if the select instruction has users in the compare-and-add		/// Returns true if the select instruction has users in the compare-and-add
/// reduction pattern below. The select instruction argument is the last one		/// reduction pattern below. The select instruction argument is the last one
/// in the sequence.		/// in the sequence.
///		///
/// %sum.1 = phi ...		/// %sum.1 = phi ...
/// ...		/// ...
▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines	RecurrenceDescriptor::InstDesc RecurrenceDescriptor::isRecurrenceInstr(
case Instruction::Call:		case Instruction::Call:
if (isSelectCmpRecurrenceKind(Kind))		if (isSelectCmpRecurrenceKind(Kind))
return isSelectCmpPattern(L, OrigPhi, I, Prev, SE);		return isSelectCmpPattern(L, OrigPhi, I, Prev, SE);
if (isIntMinMaxRecurrenceKind(Kind) \|\|		if (isIntMinMaxRecurrenceKind(Kind) \|\|
(((FuncFMF.noNaNs() && FuncFMF.noSignedZeros()) \|\|		(((FuncFMF.noNaNs() && FuncFMF.noSignedZeros()) \|\|
(isa<FPMathOperator>(I) && I->hasNoNaNs() &&		(isa<FPMathOperator>(I) && I->hasNoNaNs() &&
I->hasNoSignedZeros())) &&		I->hasNoSignedZeros())) &&
isFPMinMaxRecurrenceKind(Kind)))		isFPMinMaxRecurrenceKind(Kind)))
return isMinMaxPattern(I, Kind, Prev);		return isMinMaxPattern(I, Kind, Prev, L, OrigPhi, SE);
else if (isFMulAddIntrinsic(I))		else if (isFMulAddIntrinsic(I))
return InstDesc(Kind == RecurKind::FMulAdd, I,		return InstDesc(Kind == RecurKind::FMulAdd, I,
I->hasAllowReassoc() ? nullptr : I);		I->hasAllowReassoc() ? nullptr : I);
return InstDesc(false, I);		return InstDesc(false, I);
}		}
}		}

bool RecurrenceDescriptor::hasMultipleUsesOf(		bool RecurrenceDescriptor::hasMultipleUsesOf(
▲ Show 20 Lines • Show All 244 Lines • ▼ Show 20 Lines	Value RecurrenceDescriptor::getRecurrenceIdentity(RecurKind K, Type Tp,
case RecurKind::FMax:		case RecurKind::FMax:
assert((FMF.noNaNs() && FMF.noSignedZeros()) &&		assert((FMF.noNaNs() && FMF.noSignedZeros()) &&
"nnan, nsz is expected to be set for FP max reduction.");		"nnan, nsz is expected to be set for FP max reduction.");
return ConstantFP::getInfinity(Tp, true /Negative/);		return ConstantFP::getInfinity(Tp, true /Negative/);
case RecurKind::SelectICmp:		case RecurKind::SelectICmp:
case RecurKind::SelectFCmp:		case RecurKind::SelectFCmp:
return getRecurrenceStartValue();		return getRecurrenceStartValue();
break;		break;
case RecurKind::SelectIVICmp:		case RecurKind::SelectIVICmp:
case RecurKind::SelectIVFCmp:		case RecurKind::SelectIVFCmp:
		case RecurKind::MinMaxFirstIdx:
		case RecurKind::MinMaxLastIdx:
return getRecurrenceIdentity(RecurKind::SMax, Tp, FMF);		return getRecurrenceIdentity(RecurKind::SMax, Tp, FMF);
		artagnonUnsubmitted Not Done Reply Inline Actions Why not merge this with the `RecurKind::SMax` case? artagnon: Why not merge this with the `RecurKind::SMax` case?
default:		default:
llvm_unreachable("Unknown recurrence kind");		llvm_unreachable("Unknown recurrence kind");
}		}
}		}

unsigned RecurrenceDescriptor::getOpcode(RecurKind Kind) {		unsigned RecurrenceDescriptor::getOpcode(RecurKind Kind) {
switch (Kind) {		switch (Kind) {
case RecurKind::Add:		case RecurKind::Add:
Show All 12 Lines	unsigned RecurrenceDescriptor::getOpcode(RecurKind Kind) {
case RecurKind::FAdd:		case RecurKind::FAdd:
return Instruction::FAdd;		return Instruction::FAdd;
case RecurKind::SMax:		case RecurKind::SMax:
case RecurKind::SMin:		case RecurKind::SMin:
case RecurKind::UMax:		case RecurKind::UMax:
case RecurKind::UMin:		case RecurKind::UMin:
case RecurKind::SelectICmp:		case RecurKind::SelectICmp:
case RecurKind::SelectIVICmp:		case RecurKind::SelectIVICmp:
		// TODO: maybe new FMinMaxFirstIdx/ FMinMaxLastIdx
		case RecurKind::MinMaxFirstIdx:
		case RecurKind::MinMaxLastIdx:
		artagnonUnsubmitted Not Done Reply Inline Actions Rename these to `IMinMaxFirstIdx` and `IMinMaxLastIdx`? artagnon: Rename these to `IMinMaxFirstIdx` and `IMinMaxLastIdx`?
return Instruction::ICmp;		return Instruction::ICmp;
case RecurKind::FMax:		case RecurKind::FMax:
case RecurKind::FMin:		case RecurKind::FMin:
case RecurKind::SelectFCmp:		case RecurKind::SelectFCmp:
case RecurKind::SelectIVFCmp:		case RecurKind::SelectIVFCmp:
return Instruction::FCmp;		return Instruction::FCmp;
default:		default:
llvm_unreachable("Unknown recurrence operation");		llvm_unreachable("Unknown recurrence operation");
▲ Show 20 Lines • Show All 453 Lines • Show Last 20 Lines

llvm/lib/Transforms/Utils/LoopUtils.cpp

Show First 20 Lines • Show All 926 Lines • ▼ Show 20 Lines	CmpInst::Predicate llvm::getMinMaxReductionPredicate(RecurKind RK) {
case RecurKind::FMin:		case RecurKind::FMin:
return CmpInst::FCMP_OLT;		return CmpInst::FCMP_OLT;
case RecurKind::FMax:		case RecurKind::FMax:
return CmpInst::FCMP_OGT;		return CmpInst::FCMP_OGT;
}		}
}		}

Value llvm::createSelectCmpOp(IRBuilderBase &Builder, Value StartVal,		Value llvm::createSelectCmpOp(IRBuilderBase &Builder, Value StartVal,
RecurKind RK, Value Left, Value Right) {		RecurKind RK, Value Left, Value Right,
		Value *SrcCmp) {
switch (RK) {		switch (RK) {
case RecurKind::SelectICmp:		case RecurKind::SelectICmp:
case RecurKind::SelectFCmp: {		case RecurKind::SelectFCmp: {
if (auto VTy = dyn_cast<VectorType>(Left->getType()))		if (auto VTy = dyn_cast<VectorType>(Left->getType()))
StartVal = Builder.CreateVectorSplat(VTy->getElementCount(), StartVal);		StartVal = Builder.CreateVectorSplat(VTy->getElementCount(), StartVal);
Value *Cmp =		Value *Cmp =
Builder.CreateCmp(CmpInst::ICMP_NE, Left, StartVal, "rdx.select.cmp");		Builder.CreateCmp(CmpInst::ICMP_NE, Left, StartVal, "rdx.select.cmp");
return Builder.CreateSelect(Cmp, Left, Right, "rdx.select");		return Builder.CreateSelect(Cmp, Left, Right, "rdx.select");
}		}
case RecurKind::SelectIVICmp:		case RecurKind::SelectIVICmp:
case RecurKind::SelectIVFCmp:		case RecurKind::SelectIVFCmp:
return createMinMaxOp(Builder, RecurKind::SMax, Left, Right);		return createMinMaxOp(Builder, RecurKind::SMax, Left, Right);
		case RecurKind::MinMaxFirstIdx: {
		assert(isa_and_nonnull<CmpInst>(SrcCmp) &&
		"SrcCmp should not be nullptr when MinMaxFirstIdx recurrence");
		auto *SrcCI = cast<CmpInst>(SrcCmp);
		CmpInst::Predicate Pred = SrcCI->getNonStrictPredicate();
		Value *Cmp = Builder.CreateCmp(Pred, SrcCI->getOperand(0),
		SrcCI->getOperand(1), "rdx.select.cmp");
		return Builder.CreateSelect(Cmp, Left, Right, "rdx.select");
		}
		case RecurKind::MinMaxLastIdx:
		assert(isa_and_nonnull<CmpInst>(SrcCmp) &&
		"SrcCmp should not be nullptr when MinMaxLastIdx recurrence");
		return Builder.CreateSelect(SrcCmp, Left, Right, "rdx.select");
default:		default:
llvm_unreachable("Unknown SelectCmp recurrence kind");		llvm_unreachable("Unknown SelectCmp recurrence kind");
}		}
}		}

Value *llvm::createMinMaxSelectCmpOp(IRBuilderBase &Builder, RecurKind RK,		Value *llvm::createMinMaxSelectCmpOp(IRBuilderBase &Builder, RecurKind RK,
Value Left, Value Right) {		Value Left, Value Right) {
CmpInst::Predicate Pred = getMinMaxReductionPredicate(RK);		CmpInst::Predicate Pred = getMinMaxReductionPredicate(RK);
▲ Show 20 Lines • Show All 113 Lines • ▼ Show 20 Lines	Value *llvm::createInvariantSelectCmpTargetReduction(
Value *Cmp =		Value *Cmp =
Builder.CreateCmp(CmpInst::ICMP_NE, Src, Right, "rdx.select.cmp");		Builder.CreateCmp(CmpInst::ICMP_NE, Src, Right, "rdx.select.cmp");

// If any predicate is true it means that we want to select the new value.		// If any predicate is true it means that we want to select the new value.
Cmp = Builder.CreateOrReduce(Cmp);		Cmp = Builder.CreateOrReduce(Cmp);
return Builder.CreateSelect(Cmp, NewVal, InitVal, "rdx.select");		return Builder.CreateSelect(Cmp, NewVal, InitVal, "rdx.select");
}		}

		Value *llvm::createMMISelectCmpTargetReduction(IRBuilderBase &Builder,
		const TargetTransformInfo *TTI,
		Value *Src,
		const RecurrenceDescriptor &Desc,
		Value *SrcMask) {
		assert(RecurrenceDescriptor::isMinMaxIdxRecurrenceKind(
		Desc.getRecurrenceKind()) &&
		"Unexpected reduction kind");
		RecurKind Kind = Desc.getRecurrenceKind();
		// FIXME: UMax/SMax or UMin/UMax?
		RecurKind RdxExtractK =
		Kind == RecurKind::MinMaxFirstIdx ? RecurKind::SMin : RecurKind::SMax;

		assert(SrcMask && "MinMaxIdx recurrence requests mask");
		// TODO: If vp reduction intrinsic is supported, there is no need to generate
		// additional select here.
		auto *SrcVecEltTy = cast<VectorType>(Src->getType())->getElementType();
		Value *RdxOpIden = Desc.getRecurrenceIdentity(RdxExtractK, SrcVecEltTy,
		Desc.getFastMathFlags());
		ElementCount EC = cast<VectorType>(Src->getType())->getElementCount();
		RdxOpIden = Builder.CreateVectorSplat(EC, RdxOpIden);
		Value *NewVal = Builder.CreateSelect(SrcMask, Src, RdxOpIden, "mask.select");

		return createSimpleTargetReduction(Builder, TTI, NewVal, RdxExtractK);
		}

Value *llvm::createSelectCmpTargetReduction(IRBuilderBase &Builder,		Value *llvm::createSelectCmpTargetReduction(IRBuilderBase &Builder,
const TargetTransformInfo *TTI,		const TargetTransformInfo *TTI,
Value *Src,		Value *Src,
const RecurrenceDescriptor &Desc,		const RecurrenceDescriptor &Desc,
PHINode *OrigPhi) {		PHINode OrigPhi, Value SrcMask) {
assert(RecurrenceDescriptor::isSelectCmpRecurrenceKind(		assert(RecurrenceDescriptor::isSelectCmpRecurrenceKind(
Desc.getRecurrenceKind()) &&		Desc.getRecurrenceKind()) &&
"Unexpected reduction kind");		"Unexpected reduction kind");
RecurKind RdxKind = Desc.getRecurrenceKind();		RecurKind RdxKind = Desc.getRecurrenceKind();
switch (RdxKind) {		switch (RdxKind) {
case RecurKind::SelectICmp:		case RecurKind::SelectICmp:
case RecurKind::SelectFCmp:		case RecurKind::SelectFCmp:
return createInvariantSelectCmpTargetReduction(Builder, TTI, Src, Desc,		return createInvariantSelectCmpTargetReduction(Builder, TTI, Src, Desc,
OrigPhi);		OrigPhi);
case RecurKind::SelectIVICmp:		case RecurKind::SelectIVICmp:
case RecurKind::SelectIVFCmp:		case RecurKind::SelectIVFCmp:
// TODO: Decreasing induction need fix here		// TODO: Decreasing induction need fix here
return Builder.CreateIntMaxReduce(Src, true);		return Builder.CreateIntMaxReduce(Src, true);
		case RecurKind::MinMaxFirstIdx:
		case RecurKind::MinMaxLastIdx:
		return createMMISelectCmpTargetReduction(Builder, TTI, Src, Desc, SrcMask);
default:		default:
llvm_unreachable("Unknown SelectCmp recurrence kind");		llvm_unreachable("Unknown SelectCmp recurrence kind");
}		}
}		}

Value *llvm::createSimpleTargetReduction(IRBuilderBase &Builder,		Value *llvm::createSimpleTargetReduction(IRBuilderBase &Builder,
const TargetTransformInfo *TTI,		const TargetTransformInfo *TTI,
Value *Src, RecurKind RdxKind) {		Value *Src, RecurKind RdxKind) {
Show All 30 Lines	Value *llvm::createSimpleTargetReduction(IRBuilderBase &Builder,
default:		default:
llvm_unreachable("Unhandled opcode");		llvm_unreachable("Unhandled opcode");
}		}
}		}

Value *llvm::createTargetReduction(IRBuilderBase &B,		Value *llvm::createTargetReduction(IRBuilderBase &B,
const TargetTransformInfo *TTI,		const TargetTransformInfo *TTI,
const RecurrenceDescriptor &Desc, Value *Src,		const RecurrenceDescriptor &Desc, Value *Src,
PHINode *OrigPhi) {		PHINode OrigPhi, Value SrcMask) {
// TODO: Support in-order reductions based on the recurrence descriptor.		// TODO: Support in-order reductions based on the recurrence descriptor.
// All ops in the reduction inherit fast-math-flags from the recurrence		// All ops in the reduction inherit fast-math-flags from the recurrence
// descriptor.		// descriptor.
IRBuilderBase::FastMathFlagGuard FMFGuard(B);		IRBuilderBase::FastMathFlagGuard FMFGuard(B);
B.setFastMathFlags(Desc.getFastMathFlags());		B.setFastMathFlags(Desc.getFastMathFlags());

RecurKind RK = Desc.getRecurrenceKind();		RecurKind RK = Desc.getRecurrenceKind();
if (RecurrenceDescriptor::isSelectCmpRecurrenceKind(RK))		if (RecurrenceDescriptor::isSelectCmpRecurrenceKind(RK))
return createSelectCmpTargetReduction(B, TTI, Src, Desc, OrigPhi);		return createSelectCmpTargetReduction(B, TTI, Src, Desc, OrigPhi, SrcMask);

return createSimpleTargetReduction(B, TTI, Src, RK);		return createSimpleTargetReduction(B, TTI, Src, RK);
}		}

Value *llvm::createOrderedReduction(IRBuilderBase &B,		Value *llvm::createOrderedReduction(IRBuilderBase &B,
const RecurrenceDescriptor &Desc,		const RecurrenceDescriptor &Desc,
Value Src, Value Start) {		Value Src, Value Start) {
assert((Desc.getRecurrenceKind() == RecurKind::FAdd \|\|		assert((Desc.getRecurrenceKind() == RecurKind::FAdd \|\|
▲ Show 20 Lines • Show All 819 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

Show First 20 Lines • Show All 1,018 Lines • ▼ Show 20 Lines	for (Instruction &I : *BB) {
reportVectorizationFailure("Value cannot be used outside the loop",		reportVectorizationFailure("Value cannot be used outside the loop",
"value cannot be used outside the loop",		"value cannot be used outside the loop",
"ValueUsedOutsideLoop", ORE, TheLoop, &I);		"ValueUsedOutsideLoop", ORE, TheLoop, &I);
return false;		return false;
}		}
} // next instr.		} // next instr.
}		}

		// Second comfirm the incomplete reductions
		artagnonUnsubmitted Not Done Reply Inline Actions Typo: comfirm. artagnon: Typo: comfirm.
		for (auto R : Reductions) {
		RecurrenceDescriptor &RedDes = Reductions.find(R.first)->second;
		if (!RedDes.hasUserRecurrence())
		continue;

		PHINode *UserPhi = RedDes.getUserRecurPhi();
		if (!isReductionVariable(UserPhi))
		return false;

		RecurrenceDescriptor &UserRedDes = Reductions.find(UserPhi)->second;
		if (!RedDes.fixUserRecurrence(UserRedDes))
		shiva0217Unsubmitted Not Done Reply Inline Actions Instead of fixUserRecurrence to setDependMinMaxRecurDes and change the user RecurKind, is it possible to setDependMinMaxRecurDes when isReductionPHI return true? If we able to propagate parent(dependent) RecurDes to isReductionPHI, perhaps we can create reduction as following. RecurKind ParentKind = RedDes.getRecurrenceKind(); if (ParentKind == RecurKind::SMax) { if (AddReductionVar(Phi, RecurKind::MinMaxFirstIdx, TheLoop, FMF, RedDes, DB, AC, DT, SE)) { LLVM_DEBUG(dbgs() << "Found an MinMaxFirstIdx reduction PHI." << Phi << "\n"); return true; } } The dependency for the RecurKind could be explicitly and avoid the user RecurKind fixup. shiva0217:* Instead of fixUserRecurrence to setDependMinMaxRecurDes and change the user RecurKind, is it…
		return false;
		}

if (!PrimaryInduction) {		if (!PrimaryInduction) {
if (Inductions.empty()) {		if (Inductions.empty()) {
reportVectorizationFailure("Did not find one integer induction var",		reportVectorizationFailure("Did not find one integer induction var",
"loop induction variable could not be identified",		"loop induction variable could not be identified",
"NoInductionVariable", ORE, TheLoop);		"NoInductionVariable", ORE, TheLoop);
return false;		return false;
} else if (!WidestIndTy) {		} else if (!WidestIndTy) {
reportVectorizationFailure("Did not find one integer induction var",		reportVectorizationFailure("Did not find one integer induction var",
▲ Show 20 Lines • Show All 545 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 551 Lines • ▼ Show 20 Lines	public:
/// this is needed because each iteration in the loop corresponds to a SIMD		/// this is needed because each iteration in the loop corresponds to a SIMD
/// element.		/// element.
virtual Value getBroadcastInstrs(Value V);		virtual Value getBroadcastInstrs(Value V);

// Returns the resume value (bc.merge.rdx) for a reduction as		// Returns the resume value (bc.merge.rdx) for a reduction as
// generated by fixReduction.		// generated by fixReduction.
PHINode *getReductionResumeValue(const RecurrenceDescriptor &RdxDesc);		PHINode *getReductionResumeValue(const RecurrenceDescriptor &RdxDesc);

		// Returns the recurrence mask (mask.cmp) for a recurrence as generated by
		// fixReduction.
		std::pair<Value *, VectorParts>
		getDependRecurrenceMask(const RecurrenceDescriptor &RdxDesc);

/// Create a new phi node for the induction variable \p OrigPhi to resume		/// Create a new phi node for the induction variable \p OrigPhi to resume
/// iteration count in the scalar epilogue, from where the vectorized loop		/// iteration count in the scalar epilogue, from where the vectorized loop
/// left off. \p Step is the SCEV-expanded induction step to use. In cases		/// left off. \p Step is the SCEV-expanded induction step to use. In cases
/// where the loop skeleton is more complicated (i.e., epilogue vectorization)		/// where the loop skeleton is more complicated (i.e., epilogue vectorization)
/// and the resume values can come from an additional bypass block, the \p		/// and the resume values can come from an additional bypass block, the \p
/// AdditionalBypass pair provides information about the bypass block and the		/// AdditionalBypass pair provides information about the bypass block and the
/// end value on the edge from bypass to this loop.		/// end value on the edge from bypass to this loop.
PHINode *createInductionResumeValue(		PHINode *createInductionResumeValue(
▲ Show 20 Lines • Show All 200 Lines • ▼ Show 20 Lines	protected:
/// Structure to hold information about generated runtime checks, responsible		/// Structure to hold information about generated runtime checks, responsible
/// for cleaning the checks, if vectorization turns out unprofitable.		/// for cleaning the checks, if vectorization turns out unprofitable.
GeneratedRTChecks &RTChecks;		GeneratedRTChecks &RTChecks;

// Holds the resume values for reductions in the loops, used to set the		// Holds the resume values for reductions in the loops, used to set the
// correct start value of reduction PHIs when vectorizing the epilogue.		// correct start value of reduction PHIs when vectorizing the epilogue.
SmallMapVector<const RecurrenceDescriptor , PHINode , 4>		SmallMapVector<const RecurrenceDescriptor , PHINode , 4>
ReductionResumeValues;		ReductionResumeValues;

		// Holds the masks for recurrences in the loops, be used for reduction when
		// there is a reduction that depends on the recurrence.
		SmallMapVector<const RecurrenceDescriptor , std::pair<Value , VectorParts>,
		4>
		ReductionDependMasks;
		fhahnUnsubmitted Not Done Reply Inline Actions We are in the process of removing those kinds of global maps that are used to carry information used during codegen and later. Ideally the combination of values would be modeled explicitly in the exit block of the plan, but we are not there yet. This is the main reason for D132063 doing things the way it does. fhahn: We are in the process of removing those kinds of global maps that are used to carry information…
		Mel-ChenAuthorUnsubmitted Done Reply Inline Actions I see., but `DependRecurrenceMasks` exists for a reason. Consider the following case: int idx = ii; int foo = jj; int max = mm; for (int i = 0; i < n; ++i) { int x = a[i]; if (max < x) { max = x; idx = i; foo = b[i]; } } That mask has the chance to be reused, and I try to keep that flexibility. Of course, we can recalculate the mask for each reduction that needs a mask, but currently using the global maps to preserve the mask is a relatively simple method that I think of. I have heard that VPlan is going to be extended to other blocks, could you share the relevant discussion links? Mel-Chen: I see., but `DependRecurrenceMasks` exists for a reason. Consider the following case: ``` int…
};		};

class InnerLoopUnroller : public InnerLoopVectorizer {		class InnerLoopUnroller : public InnerLoopVectorizer {
public:		public:
InnerLoopUnroller(Loop *OrigLoop, PredicatedScalarEvolution &PSE,		InnerLoopUnroller(Loop *OrigLoop, PredicatedScalarEvolution &PSE,
LoopInfo LI, DominatorTree DT,		LoopInfo LI, DominatorTree DT,
const TargetLibraryInfo *TLI,		const TargetLibraryInfo *TLI,
const TargetTransformInfo TTI, AssumptionCache AC,		const TargetTransformInfo TTI, AssumptionCache AC,
▲ Show 20 Lines • Show All 355 Lines • ▼ Show 20 Lines
PHINode *InnerLoopVectorizer::getReductionResumeValue(		PHINode *InnerLoopVectorizer::getReductionResumeValue(
const RecurrenceDescriptor &RdxDesc) {		const RecurrenceDescriptor &RdxDesc) {
auto It = ReductionResumeValues.find(&RdxDesc);		auto It = ReductionResumeValues.find(&RdxDesc);
assert(It != ReductionResumeValues.end() &&		assert(It != ReductionResumeValues.end() &&
"Expected to find a resume value for the reduction.");		"Expected to find a resume value for the reduction.");
return It->second;		return It->second;
}		}

		std::pair<Value *, InnerLoopVectorizer::VectorParts>
		InnerLoopVectorizer::getDependRecurrenceMask(
		const RecurrenceDescriptor &RdxDesc) {
		auto It = ReductionDependMasks.find(&RdxDesc);
		assert(It != ReductionDependMasks.end() &&
		"Expected to find a dependence mask for the recurrence.");
		return It->second;
		}

namespace llvm {		namespace llvm {

// Loop vectorization cost-model hints how the scalar epilogue loop should be		// Loop vectorization cost-model hints how the scalar epilogue loop should be
// lowered.		// lowered.
enum ScalarEpilogueLowering {		enum ScalarEpilogueLowering {

// The default: allowing scalar epilogues.		// The default: allowing scalar epilogues.
CM_ScalarEpilogueAllowed,		CM_ScalarEpilogueAllowed,
▲ Show 20 Lines • Show All 2,571 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::fixCrossIterationPHIs(VPTransformState &State) {
// In order to support recurrences we need to be able to vectorize Phi nodes.		// In order to support recurrences we need to be able to vectorize Phi nodes.
// Phi nodes have cycles, so we need to vectorize them in two stages. This is		// Phi nodes have cycles, so we need to vectorize them in two stages. This is
// stage #2: We now need to fix the recurrences by adding incoming edges to		// stage #2: We now need to fix the recurrences by adding incoming edges to
// the currently empty PHI nodes. At this point every instruction in the		// the currently empty PHI nodes. At this point every instruction in the
// original loop is widened to a vector form so we can use them to construct		// original loop is widened to a vector form so we can use them to construct
// the incoming edges.		// the incoming edges.
VPBasicBlock *Header =		VPBasicBlock *Header =
State.Plan->getVectorLoopRegion()->getEntryBasicBlock();		State.Plan->getVectorLoopRegion()->getEntryBasicBlock();
for (VPRecipeBase &R : Header->phis()) {		// FIXME: Maybe I should not choose std::queue...
if (auto *ReductionPhi = dyn_cast<VPReductionPHIRecipe>(&R))		std::queue<VPRecipeBase *> Worklist;
		for (VPRecipeBase &R : Header->phis())
		Worklist.push(&R);

		while (!Worklist.empty()) {
		fhahnUnsubmitted Not Done Reply Inline Actions this would need documenting. fhahn: this would need documenting.
		Mel-ChenAuthorUnsubmitted Done Reply Inline Actions Sure. Quick explanation: min/max recurrence should be done earlier than min max idx recurrence, because idx recurrence depends on the mask produced by min max recurrence. Here is to ensure that the recurrence dependencies are correct. Mel-Chen: Sure. Quick explanation: min/max recurrence should be done earlier than min max idx recurrence…
		shiva0217Unsubmitted Not Done Reply Inline Actions Perhaps we could do the sorting according to the reduction dependency before calling fixReduction which may be similar to https://reviews.llvm.org/D157631. shiva0217: Perhaps we could do the sorting according to the reduction dependency before calling…
		VPRecipeBase &R = *(Worklist.front());
		Worklist.pop();
		if (auto *ReductionPhi = dyn_cast<VPReductionPHIRecipe>(&R)) {
		const RecurrenceDescriptor &RecDesc =
		ReductionPhi->getRecurrenceDescriptor();
		RecurrenceDescriptor *DependRecDesc = RecDesc.getDependMinMaxRecDes();
		if (DependRecDesc && !ReductionDependMasks.count(DependRecDesc)) {
		Worklist.push(&R);
		continue;
		}
fixReduction(ReductionPhi, State);		fixReduction(ReductionPhi, State);
else if (auto *FOR = dyn_cast<VPFirstOrderRecurrencePHIRecipe>(&R))		} else if (auto *FOR = dyn_cast<VPFirstOrderRecurrencePHIRecipe>(&R)) {
fixFixedOrderRecurrence(FOR, State);		fixFixedOrderRecurrence(FOR, State);
}		}
}		}
		}

void InnerLoopVectorizer::fixFixedOrderRecurrence(		void InnerLoopVectorizer::fixFixedOrderRecurrence(
VPFirstOrderRecurrencePHIRecipe *PhiR, VPTransformState &State) {		VPFirstOrderRecurrencePHIRecipe *PhiR, VPTransformState &State) {
// This is the second phase of vectorizing first-order recurrences. An		// This is the second phase of vectorizing first-order recurrences. An
// overview of the transformation is described below. Suppose we have the		// overview of the transformation is described below. Suppose we have the
// following loop.		// following loop.
//		//
// for (int i = 0; i < n; ++i)		// for (int i = 0; i < n; ++i)
▲ Show 20 Lines • Show All 204 Lines • ▼ Show 20 Lines	for (unsigned Part = 0; Part < UF; ++Part) {
State.reset(LoopExitInstDef, RdxParts[Part], Part);		State.reset(LoopExitInstDef, RdxParts[Part], Part);
}		}
}		}

// Reduce all of the unrolled parts into a single vector.		// Reduce all of the unrolled parts into a single vector.
Value *ReducedPartRdx = State.get(LoopExitInstDef, 0);		Value *ReducedPartRdx = State.get(LoopExitInstDef, 0);
unsigned Op = RecurrenceDescriptor::getOpcode(RK);		unsigned Op = RecurrenceDescriptor::getOpcode(RK);

		// Get the reduction mask if the reduction depend on another one.
		RecurrenceDescriptor *DependDesc = RdxDesc.getDependMinMaxRecDes();
		Value *DependRdxMask = nullptr;
		VectorParts DependPartMasks;
		if (DependDesc) {
		Builder.SetInsertPoint(&*LoopMiddleBlock->getTerminator());
		std::tie(DependRdxMask, DependPartMasks) =
		getDependRecurrenceMask(*DependDesc);
		}

		Value *NewRdxMask = nullptr;
		VectorParts NewPartMasks(UF);

// The middle block terminator has already been assigned a DebugLoc here (the		// The middle block terminator has already been assigned a DebugLoc here (the
// OrigLoop's single latch terminator). We want the whole middle block to		// OrigLoop's single latch terminator). We want the whole middle block to
// appear to execute on this line because: (a) it is all compiler generated,		// appear to execute on this line because: (a) it is all compiler generated,
// (b) these instructions are always executed after evaluating the latch		// (b) these instructions are always executed after evaluating the latch
// conditional branch, and (c) other passes may add new predecessors which		// conditional branch, and (c) other passes may add new predecessors which
// terminate on this line. This is the easiest way to ensure we don't		// terminate on this line. This is the easiest way to ensure we don't
// accidentally cause an extra step back into the loop while debugging.		// accidentally cause an extra step back into the loop while debugging.
State.setDebugLocFromInst(LoopMiddleBlock->getTerminator());		State.setDebugLocFromInst(LoopMiddleBlock->getTerminator());
if (PhiR->isOrdered())		if (PhiR->isOrdered())
ReducedPartRdx = State.get(LoopExitInstDef, UF - 1);		ReducedPartRdx = State.get(LoopExitInstDef, UF - 1);
else {		else {
// Floating-point operations should have some FMF to enable the reduction.		// Floating-point operations should have some FMF to enable the reduction.
IRBuilderBase::FastMathFlagGuard FMFG(Builder);		IRBuilderBase::FastMathFlagGuard FMFG(Builder);
Builder.setFastMathFlags(RdxDesc.getFastMathFlags());		Builder.setFastMathFlags(RdxDesc.getFastMathFlags());
for (unsigned Part = 1; Part < UF; ++Part) {		for (unsigned Part = 1; Part < UF; ++Part) {
Value *RdxPart = State.get(LoopExitInstDef, Part);		Value *RdxPart = State.get(LoopExitInstDef, Part);
		Value *PartMask = DependDesc ? DependPartMasks[Part] : nullptr;
if (Op != Instruction::ICmp && Op != Instruction::FCmp) {		if (Op != Instruction::ICmp && Op != Instruction::FCmp) {
ReducedPartRdx = Builder.CreateBinOp(		ReducedPartRdx = Builder.CreateBinOp(
(Instruction::BinaryOps)Op, RdxPart, ReducedPartRdx, "bin.rdx");		(Instruction::BinaryOps)Op, RdxPart, ReducedPartRdx, "bin.rdx");
} else if (RecurrenceDescriptor::isSelectCmpRecurrenceKind(RK))		} else if (RecurrenceDescriptor::isSelectCmpRecurrenceKind(RK))
ReducedPartRdx = createSelectCmpOp(Builder, ReductionStartValue, RK,		ReducedPartRdx = createSelectCmpOp(Builder, ReductionStartValue, RK,
ReducedPartRdx, RdxPart);		ReducedPartRdx, RdxPart, PartMask);
else		else {
		if (RdxDesc.hasUserRecurrence()) {
		shiva0217Unsubmitted Not Done Reply Inline Actions Should we check isMinMaxRecurrenceKind(Kind) and isMinMaxIdxRecurrenceKind for user kind? Although it would be the only dependency currently, it might be explicit for the reader and avoid unexpected codegen in the future. shiva0217: Should we check isMinMaxRecurrenceKind(Kind) and isMinMaxIdxRecurrenceKind for user kind?
		ReducedPartRdx =
		createMinMaxSelectCmpOp(Builder, RK, ReducedPartRdx, RdxPart);
		shiva0217Unsubmitted Not Done Reply Inline Actions Could we encapsulate the mask generation to createMinMaxIdxMaskOp or other name you prefer? shiva0217: Could we encapsulate the mask generation to createMinMaxIdxMaskOp or other name you prefer?
		// Keep the part mask on demand.
		Value *Cond = cast<SelectInst>(ReducedPartRdx)->getCondition();
		NewPartMasks[Part] = Cond;
		} else {
ReducedPartRdx = createMinMaxOp(Builder, RK, ReducedPartRdx, RdxPart);		ReducedPartRdx = createMinMaxOp(Builder, RK, ReducedPartRdx, RdxPart);
}		}
}		}
		}
		}

// Create the reduction after the loop. Note that inloop reductions create the		// Create the reduction after the loop. Note that inloop reductions create the
// target reduction in the loop using a Reduction recipe.		// target reduction in the loop using a Reduction recipe.
if (VF.isVector() && !PhiR->isInLoop()) {		if (VF.isVector() && !PhiR->isInLoop()) {
ReducedPartRdx =		Value *ReducedPart = ReducedPartRdx;
createTargetReduction(Builder, TTI, RdxDesc, ReducedPartRdx, OrigPhi);		ReducedPartRdx = createTargetReduction(
		Builder, TTI, RdxDesc, ReducedPartRdx, OrigPhi, DependRdxMask);
// If the reduction can be performed in a smaller type, we need to extend		// If the reduction can be performed in a smaller type, we need to extend
// the reduction to the wider type before we branch to the original loop.		// the reduction to the wider type before we branch to the original loop.
if (PhiTy != RdxDesc.getRecurrenceType())		if (PhiTy != RdxDesc.getRecurrenceType())
ReducedPartRdx = RdxDesc.isSigned()		ReducedPartRdx = RdxDesc.isSigned()
? Builder.CreateSExt(ReducedPartRdx, PhiTy)		? Builder.CreateSExt(ReducedPartRdx, PhiTy)
: Builder.CreateZExt(ReducedPartRdx, PhiTy);		: Builder.CreateZExt(ReducedPartRdx, PhiTy);

		// Create depend recurrence mask on demand.
		if (RdxDesc.hasUserRecurrence()) {
		shiva0217Unsubmitted Not Done Reply Inline Actions Should we check isMinMaxRecurrenceKind(Kind) and isMinMaxIdxRecurrenceKind for user kind? shiva0217: Should we check isMinMaxRecurrenceKind(Kind) and isMinMaxIdxRecurrenceKind for user kind?
		ElementCount EC =
		shiva0217Unsubmitted Not Done Reply Inline Actions Could we encapsulate the mask generation to createMinMaxIdxMask or similar? shiva0217: Could we encapsulate the mask generation to createMinMaxIdxMask or similar?
		cast<VectorType>(ReducedPart->getType())->getElementCount();
		Value *RdxSplat = Builder.CreateVectorSplat(EC, ReducedPartRdx);
		// FIXME: Not sure use FCMP_OEQ is right or not.
		CmpInst::Predicate MaskPred =
		(ReducedPartRdx->getType()->isFloatingPointTy()) ? CmpInst::FCMP_OEQ
		: CmpInst::ICMP_EQ;
		artagnonUnsubmitted Not Done Reply Inline Actions `RdxDesc.isOrdered()` can help you pick between `FCMP_OEQ` and `FCMP_UEQ`. artagnon: `RdxDesc.isOrdered()` can help you pick between `FCMP_OEQ` and `FCMP_UEQ`.
		NewRdxMask =
		Builder.CreateCmp(MaskPred, RdxSplat, ReducedPart, "mask.cmp");
		}
}		}

if (RK == RecurKind::SelectIVICmp \|\| RK == RecurKind::SelectIVFCmp)		if (RK == RecurKind::SelectIVICmp \|\| RK == RecurKind::SelectIVFCmp \|\|
		RecurrenceDescriptor::isMinMaxIdxRecurrenceKind(RK))
ReducedPartRdx =		ReducedPartRdx =
createSentinelValueHandling(Builder, TTI, RdxDesc, ReducedPartRdx);		createSentinelValueHandling(Builder, TTI, RdxDesc, ReducedPartRdx);

		// Set the recurrence mask for this reduction on demand.
		if (RdxDesc.hasUserRecurrence())
		ReductionDependMasks.insert({&RdxDesc, {NewRdxMask, NewPartMasks}});

PHINode *ResumePhi =		PHINode *ResumePhi =
dyn_cast<PHINode>(PhiR->getStartValue()->getUnderlyingValue());		dyn_cast<PHINode>(PhiR->getStartValue()->getUnderlyingValue());

// Create a phi node that merges control-flow from the backedge-taken check		// Create a phi node that merges control-flow from the backedge-taken check
// block and the middle block.		// block and the middle block.
PHINode *BCBlockPhi = PHINode::Create(PhiTy, 2, "bc.merge.rdx",		PHINode *BCBlockPhi = PHINode::Create(PhiTy, 2, "bc.merge.rdx",
LoopScalarPreHeader->getTerminator());		LoopScalarPreHeader->getTerminator());

▲ Show 20 Lines • Show All 6,625 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

Show First 20 Lines • Show All 1,298 Lines • ▼ Show 20 Lines	if (RecurrenceDescriptor::isMinMaxRecurrenceKind(RK) \|\|
if (ScalarPHI) {		if (ScalarPHI) {
Iden = StartV;		Iden = StartV;
} else {		} else {
IRBuilderBase::InsertPointGuard IPBuilder(Builder);		IRBuilderBase::InsertPointGuard IPBuilder(Builder);
Builder.SetInsertPoint(VectorPH->getTerminator());		Builder.SetInsertPoint(VectorPH->getTerminator());
StartV = Iden =		StartV = Iden =
Builder.CreateVectorSplat(State.VF, StartV, "minmax.ident");		Builder.CreateVectorSplat(State.VF, StartV, "minmax.ident");
}		}
} else if (RK == RecurKind::SelectIVICmp \|\| RK == RecurKind::SelectIVFCmp) {		} else if (RK == RecurKind::SelectIVICmp \|\| RK == RecurKind::SelectIVFCmp \|\|
		RecurrenceDescriptor::isMinMaxIdxRecurrenceKind(RK)) {
StartV = Iden = RdxDesc.getRecurrenceIdentity(RK, VecTy->getScalarType(),		StartV = Iden = RdxDesc.getRecurrenceIdentity(RK, VecTy->getScalarType(),
RdxDesc.getFastMathFlags());		RdxDesc.getFastMathFlags());
if (!ScalarPHI) {		if (!ScalarPHI) {
IRBuilderBase::InsertPointGuard IPBuilder(Builder);		IRBuilderBase::InsertPointGuard IPBuilder(Builder);
Builder.SetInsertPoint(VectorPH->getTerminator());		Builder.SetInsertPoint(VectorPH->getTerminator());
StartV = Iden = Builder.CreateVectorSplat(State.VF, Iden);		StartV = Iden = Builder.CreateVectorSplat(State.VF, Iden);
}		}
} else {		} else {
▲ Show 20 Lines • Show All 103 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/select-min-index.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function test_not_vectorize_select_no_min_reduction --version 2		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2
; RUN: opt -passes=loop-vectorize -force-vector-width=4 -force-vector-interleave=1 -S %s \| FileCheck %s --check-prefix=CHECK-VF4IC1 --check-prefix=CHECK		; RUN: opt -passes=loop-vectorize -force-vector-width=4 -force-vector-interleave=1 -S %s \| FileCheck %s --check-prefix=CHECK-VF4IC1 --check-prefix=CHECK
; RUN: opt -passes=loop-vectorize -force-vector-width=4 -force-vector-interleave=2 -S %s \| FileCheck %s --check-prefix=CHECK-VF4IC2 --check-prefix=CHECK		; RUN: opt -passes=loop-vectorize -force-vector-width=4 -force-vector-interleave=2 -S %s \| FileCheck %s --check-prefix=CHECK-VF4IC2 --check-prefix=CHECK
; RUN: opt -passes=loop-vectorize -force-vector-width=1 -force-vector-interleave=2 -S %s \| FileCheck %s --check-prefix=CHECK-VF1IC2 --check-prefix=CHECK		; RUN: opt -passes=loop-vectorize -force-vector-width=1 -force-vector-interleave=2 -S %s \| FileCheck %s --check-prefix=CHECK-VF1IC2 --check-prefix=CHECK

; Test cases for selecting the index with the minimum value.		; Test cases for selecting the index with the minimum value.

define i64 @test_vectorize_select_umin_idx(ptr %src) {		define i64 @test_vectorize_select_umin_idx(ptr %src) {
; CHECK-LABEL: @test_vectorize_select_umin_idx(		; CHECK-LABEL: @test_vectorize_select_umin_idx(
Show All 16 Lines	loop:
br i1 %exitcond.not, label %exit, label %loop		br i1 %exitcond.not, label %exit, label %loop

exit:		exit:
%res = phi i64 [ %min.idx.next, %loop ]		%res = phi i64 [ %min.idx.next, %loop ]
ret i64 %res		ret i64 %res
}		}

define i64 @test_vectorize_select_umin_idx_all_exit_inst(ptr %src, ptr %umin) {		define i64 @test_vectorize_select_umin_idx_all_exit_inst(ptr %src, ptr %umin) {
; CHECK-LABEL: @test_vectorize_select_umin_idx_all_exit_inst(		; CHECK-VF4IC1-LABEL: @test_vectorize_select_umin_idx_all_exit_inst(
; CHECK-NOT: vector.body:		; CHECK-VF4IC1-NEXT: entry:
		; CHECK-VF4IC1-NEXT: br i1 true, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK-VF4IC1: vector.ph:
		; CHECK-VF4IC1-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK-VF4IC1: vector.body:
		; CHECK-VF4IC1-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[VEC_PHI:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP5:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[VEC_PHI1:%.]] = phi <4 x i64> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP4:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
		; CHECK-VF4IC1-NEXT: [[TMP1:%.]] = getelementptr i64, ptr [[SRC:%.]], i64 [[TMP0]]
		; CHECK-VF4IC1-NEXT: [[TMP2:%.*]] = getelementptr i64, ptr [[TMP1]], i32 0
		; CHECK-VF4IC1-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i64>, ptr [[TMP2]], align 4
		; CHECK-VF4IC1-NEXT: [[TMP3:%.*]] = icmp ugt <4 x i64> [[VEC_PHI1]], [[WIDE_LOAD]]
		; CHECK-VF4IC1-NEXT: [[TMP4]] = call <4 x i64> @llvm.umin.v4i64(<4 x i64> [[VEC_PHI1]], <4 x i64> [[WIDE_LOAD]])
		; CHECK-VF4IC1-NEXT: [[TMP5]] = select <4 x i1> [[TMP3]], <4 x i64> [[VEC_IND]], <4 x i64> [[VEC_PHI]]
		; CHECK-VF4IC1-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
		; CHECK-VF4IC1-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC1-NEXT: [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], 0
		; CHECK-VF4IC1-NEXT: br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
		; CHECK-VF4IC1: middle.block:
		; CHECK-VF4IC1-NEXT: [[TMP7:%.*]] = call i64 @llvm.vector.reduce.umin.v4i64(<4 x i64> [[TMP4]])
		; CHECK-VF4IC1-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <4 x i64> poison, i64 [[TMP7]], i64 0
		; CHECK-VF4IC1-NEXT: [[DOTSPLAT:%.*]] = shufflevector <4 x i64> [[DOTSPLATINSERT]], <4 x i64> poison, <4 x i32> zeroinitializer
		; CHECK-VF4IC1-NEXT: [[MASK_CMP:%.*]] = icmp eq <4 x i64> [[DOTSPLAT]], [[TMP4]]
		; CHECK-VF4IC1-NEXT: [[CMP_N:%.*]] = icmp eq i64 0, 0
		; CHECK-VF4IC1-NEXT: [[MASK_SELECT:%.*]] = select <4 x i1> [[MASK_CMP]], <4 x i64> [[TMP5]], <4 x i64> <i64 9223372036854775807, i64 9223372036854775807, i64 9223372036854775807, i64 9223372036854775807>
		; CHECK-VF4IC1-NEXT: [[TMP8:%.*]] = call i64 @llvm.vector.reduce.smin.v4i64(<4 x i64> [[MASK_SELECT]])
		; CHECK-VF4IC1-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[TMP8]], -9223372036854775808
		; CHECK-VF4IC1-NEXT: [[RDX_SELECT:%.*]] = select i1 [[RDX_SELECT_CMP]], i64 [[TMP8]], i64 0
		; CHECK-VF4IC1-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
		; CHECK-VF4IC1: scalar.ph:
		; CHECK-VF4IC1-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 0, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
		; CHECK-VF4IC1-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ 0, [[ENTRY]] ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC1-NEXT: [[BC_MERGE_RDX2:%.*]] = phi i64 [ 0, [[ENTRY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC1-NEXT: br label [[LOOP:%.*]]
		; CHECK-VF4IC1: loop:
		; CHECK-VF4IC1-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP]] ]
		; CHECK-VF4IC1-NEXT: [[MIN_IDX:%.]] = phi i64 [ [[BC_MERGE_RDX2]], [[SCALAR_PH]] ], [ [[MIN_IDX_NEXT:%.]], [[LOOP]] ]
		; CHECK-VF4IC1-NEXT: [[MIN_VAL:%.]] = phi i64 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[MIN_VAL_NEXT:%.]], [[LOOP]] ]
		; CHECK-VF4IC1-NEXT: [[GEP:%.*]] = getelementptr i64, ptr [[SRC]], i64 [[IV]]
		; CHECK-VF4IC1-NEXT: [[L:%.*]] = load i64, ptr [[GEP]], align 4
		; CHECK-VF4IC1-NEXT: [[CMP:%.*]] = icmp ugt i64 [[MIN_VAL]], [[L]]
		; CHECK-VF4IC1-NEXT: [[MIN_VAL_NEXT]] = tail call i64 @llvm.umin.i64(i64 [[MIN_VAL]], i64 [[L]])
		; CHECK-VF4IC1-NEXT: [[MIN_IDX_NEXT]] = select i1 [[CMP]], i64 [[IV]], i64 [[MIN_IDX]]
		; CHECK-VF4IC1-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
		; CHECK-VF4IC1-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 0
		; CHECK-VF4IC1-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[LOOP]], !llvm.loop [[LOOP3:![0-9]+]]
		; CHECK-VF4IC1: exit:
		; CHECK-VF4IC1-NEXT: [[RES:%.*]] = phi i64 [ [[MIN_IDX_NEXT]], [[LOOP]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC1-NEXT: [[RES_UMIN:%.*]] = phi i64 [ [[MIN_VAL_NEXT]], [[LOOP]] ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC1-NEXT: store i64 [[RES_UMIN]], ptr [[UMIN:%.*]], align 4
		; CHECK-VF4IC1-NEXT: ret i64 [[RES]]
		;
		; CHECK-VF4IC2-LABEL: @test_vectorize_select_umin_idx_all_exit_inst(
		; CHECK-VF4IC2-NEXT: entry:
		; CHECK-VF4IC2-NEXT: br i1 true, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK-VF4IC2: vector.ph:
		; CHECK-VF4IC2-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK-VF4IC2: vector.body:
		; CHECK-VF4IC2-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC2-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC2-NEXT: [[VEC_PHI:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP10:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC2-NEXT: [[VEC_PHI2:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP11:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC2-NEXT: [[VEC_PHI3:%.]] = phi <4 x i64> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP8:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC2-NEXT: [[VEC_PHI4:%.]] = phi <4 x i64> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP9:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC2-NEXT: [[STEP_ADD:%.*]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC2-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
		; CHECK-VF4IC2-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 4
		; CHECK-VF4IC2-NEXT: [[TMP2:%.]] = getelementptr i64, ptr [[SRC:%.]], i64 [[TMP0]]
		; CHECK-VF4IC2-NEXT: [[TMP3:%.*]] = getelementptr i64, ptr [[SRC]], i64 [[TMP1]]
		; CHECK-VF4IC2-NEXT: [[TMP4:%.*]] = getelementptr i64, ptr [[TMP2]], i32 0
		; CHECK-VF4IC2-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i64>, ptr [[TMP4]], align 4
		; CHECK-VF4IC2-NEXT: [[TMP5:%.*]] = getelementptr i64, ptr [[TMP2]], i32 4
		; CHECK-VF4IC2-NEXT: [[WIDE_LOAD5:%.*]] = load <4 x i64>, ptr [[TMP5]], align 4
		; CHECK-VF4IC2-NEXT: [[TMP6:%.*]] = icmp ugt <4 x i64> [[VEC_PHI3]], [[WIDE_LOAD]]
		; CHECK-VF4IC2-NEXT: [[TMP7:%.*]] = icmp ugt <4 x i64> [[VEC_PHI4]], [[WIDE_LOAD5]]
		; CHECK-VF4IC2-NEXT: [[TMP8]] = call <4 x i64> @llvm.umin.v4i64(<4 x i64> [[VEC_PHI3]], <4 x i64> [[WIDE_LOAD]])
		; CHECK-VF4IC2-NEXT: [[TMP9]] = call <4 x i64> @llvm.umin.v4i64(<4 x i64> [[VEC_PHI4]], <4 x i64> [[WIDE_LOAD5]])
		; CHECK-VF4IC2-NEXT: [[TMP10]] = select <4 x i1> [[TMP6]], <4 x i64> [[VEC_IND]], <4 x i64> [[VEC_PHI]]
		; CHECK-VF4IC2-NEXT: [[TMP11]] = select <4 x i1> [[TMP7]], <4 x i64> [[STEP_ADD]], <4 x i64> [[VEC_PHI2]]
		; CHECK-VF4IC2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8
		; CHECK-VF4IC2-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[STEP_ADD]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC2-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], 0
		; CHECK-VF4IC2-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
		; CHECK-VF4IC2: middle.block:
		; CHECK-VF4IC2-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp ult <4 x i64> [[TMP8]], [[TMP9]]
		; CHECK-VF4IC2-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP]], <4 x i64> [[TMP8]], <4 x i64> [[TMP9]]
		; CHECK-VF4IC2-NEXT: [[TMP13:%.*]] = call i64 @llvm.vector.reduce.umin.v4i64(<4 x i64> [[RDX_MINMAX_SELECT]])
		; CHECK-VF4IC2-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <4 x i64> poison, i64 [[TMP13]], i64 0
		; CHECK-VF4IC2-NEXT: [[DOTSPLAT:%.*]] = shufflevector <4 x i64> [[DOTSPLATINSERT]], <4 x i64> poison, <4 x i32> zeroinitializer
		; CHECK-VF4IC2-NEXT: [[MASK_CMP:%.*]] = icmp eq <4 x i64> [[DOTSPLAT]], [[RDX_MINMAX_SELECT]]
		; CHECK-VF4IC2-NEXT: [[CMP_N:%.*]] = icmp eq i64 0, 0
		; CHECK-VF4IC2-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ule <4 x i64> [[TMP8]], [[TMP9]]
		; CHECK-VF4IC2-NEXT: [[RDX_SELECT:%.*]] = select <4 x i1> [[RDX_SELECT_CMP]], <4 x i64> [[TMP10]], <4 x i64> [[TMP11]]
		; CHECK-VF4IC2-NEXT: [[MASK_SELECT:%.*]] = select <4 x i1> [[MASK_CMP]], <4 x i64> [[RDX_SELECT]], <4 x i64> <i64 9223372036854775807, i64 9223372036854775807, i64 9223372036854775807, i64 9223372036854775807>
		; CHECK-VF4IC2-NEXT: [[TMP14:%.*]] = call i64 @llvm.vector.reduce.smin.v4i64(<4 x i64> [[MASK_SELECT]])
		; CHECK-VF4IC2-NEXT: [[RDX_SELECT_CMP6:%.*]] = icmp ne i64 [[TMP14]], -9223372036854775808
		; CHECK-VF4IC2-NEXT: [[RDX_SELECT7:%.*]] = select i1 [[RDX_SELECT_CMP6]], i64 [[TMP14]], i64 0
		; CHECK-VF4IC2-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
		; CHECK-VF4IC2: scalar.ph:
		; CHECK-VF4IC2-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 0, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
		; CHECK-VF4IC2-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ 0, [[ENTRY]] ], [ [[TMP13]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC2-NEXT: [[BC_MERGE_RDX8:%.*]] = phi i64 [ 0, [[ENTRY]] ], [ [[RDX_SELECT7]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC2-NEXT: br label [[LOOP:%.*]]
		; CHECK-VF4IC2: loop:
		; CHECK-VF4IC2-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP]] ]
		; CHECK-VF4IC2-NEXT: [[MIN_IDX:%.]] = phi i64 [ [[BC_MERGE_RDX8]], [[SCALAR_PH]] ], [ [[MIN_IDX_NEXT:%.]], [[LOOP]] ]
		; CHECK-VF4IC2-NEXT: [[MIN_VAL:%.]] = phi i64 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[MIN_VAL_NEXT:%.]], [[LOOP]] ]
		; CHECK-VF4IC2-NEXT: [[GEP:%.*]] = getelementptr i64, ptr [[SRC]], i64 [[IV]]
		; CHECK-VF4IC2-NEXT: [[L:%.*]] = load i64, ptr [[GEP]], align 4
		; CHECK-VF4IC2-NEXT: [[CMP:%.*]] = icmp ugt i64 [[MIN_VAL]], [[L]]
		; CHECK-VF4IC2-NEXT: [[MIN_VAL_NEXT]] = tail call i64 @llvm.umin.i64(i64 [[MIN_VAL]], i64 [[L]])
		; CHECK-VF4IC2-NEXT: [[MIN_IDX_NEXT]] = select i1 [[CMP]], i64 [[IV]], i64 [[MIN_IDX]]
		; CHECK-VF4IC2-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
		; CHECK-VF4IC2-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 0
		; CHECK-VF4IC2-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[LOOP]], !llvm.loop [[LOOP3:![0-9]+]]
		; CHECK-VF4IC2: exit:
		; CHECK-VF4IC2-NEXT: [[RES:%.*]] = phi i64 [ [[MIN_IDX_NEXT]], [[LOOP]] ], [ [[RDX_SELECT7]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC2-NEXT: [[RES_UMIN:%.*]] = phi i64 [ [[MIN_VAL_NEXT]], [[LOOP]] ], [ [[TMP13]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC2-NEXT: store i64 [[RES_UMIN]], ptr [[UMIN:%.*]], align 4
		; CHECK-VF4IC2-NEXT: ret i64 [[RES]]
		;
		; CHECK-VF1IC2-LABEL: @test_vectorize_select_umin_idx_all_exit_inst(
		; CHECK-VF1IC2-NEXT: entry:
		; CHECK-VF1IC2-NEXT: br i1 true, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK-VF1IC2: vector.ph:
		; CHECK-VF1IC2-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK-VF1IC2: vector.body:
		; CHECK-VF1IC2-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC2-NEXT: [[VEC_PHI:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP10:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC2-NEXT: [[VEC_PHI1:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP11:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC2-NEXT: [[VEC_PHI2:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[TMP8:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC2-NEXT: [[VEC_PHI3:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[TMP9:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC2-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
		; CHECK-VF1IC2-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
		; CHECK-VF1IC2-NEXT: [[TMP2:%.]] = getelementptr i64, ptr [[SRC:%.]], i64 [[TMP0]]
		; CHECK-VF1IC2-NEXT: [[TMP3:%.*]] = getelementptr i64, ptr [[SRC]], i64 [[TMP1]]
		; CHECK-VF1IC2-NEXT: [[TMP4:%.*]] = load i64, ptr [[TMP2]], align 4
		; CHECK-VF1IC2-NEXT: [[TMP5:%.*]] = load i64, ptr [[TMP3]], align 4
		; CHECK-VF1IC2-NEXT: [[TMP6:%.*]] = icmp ugt i64 [[VEC_PHI2]], [[TMP4]]
		; CHECK-VF1IC2-NEXT: [[TMP7:%.*]] = icmp ugt i64 [[VEC_PHI3]], [[TMP5]]
		; CHECK-VF1IC2-NEXT: [[TMP8]] = tail call i64 @llvm.umin.i64(i64 [[VEC_PHI2]], i64 [[TMP4]])
		; CHECK-VF1IC2-NEXT: [[TMP9]] = tail call i64 @llvm.umin.i64(i64 [[VEC_PHI3]], i64 [[TMP5]])
		; CHECK-VF1IC2-NEXT: [[TMP10]] = select i1 [[TMP6]], i64 [[TMP0]], i64 [[VEC_PHI]]
		; CHECK-VF1IC2-NEXT: [[TMP11]] = select i1 [[TMP7]], i64 [[TMP1]], i64 [[VEC_PHI1]]
		; CHECK-VF1IC2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
		; CHECK-VF1IC2-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], 0
		; CHECK-VF1IC2-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
		; CHECK-VF1IC2: middle.block:
		; CHECK-VF1IC2-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp ult i64 [[TMP8]], [[TMP9]]
		; CHECK-VF1IC2-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select i1 [[RDX_MINMAX_CMP]], i64 [[TMP8]], i64 [[TMP9]]
		; CHECK-VF1IC2-NEXT: [[CMP_N:%.*]] = icmp eq i64 0, 0
		; CHECK-VF1IC2-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ule i64 [[TMP8]], [[TMP9]]
		; CHECK-VF1IC2-NEXT: [[RDX_SELECT:%.*]] = select i1 [[RDX_SELECT_CMP]], i64 [[TMP10]], i64 [[TMP11]]
		; CHECK-VF1IC2-NEXT: [[RDX_SELECT_CMP4:%.*]] = icmp ne i64 [[RDX_SELECT]], -9223372036854775808
		; CHECK-VF1IC2-NEXT: [[RDX_SELECT5:%.*]] = select i1 [[RDX_SELECT_CMP4]], i64 [[RDX_SELECT]], i64 0
		; CHECK-VF1IC2-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
		; CHECK-VF1IC2: scalar.ph:
		; CHECK-VF1IC2-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 0, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
		; CHECK-VF1IC2-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ 0, [[ENTRY]] ], [ [[RDX_MINMAX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF1IC2-NEXT: [[BC_MERGE_RDX6:%.*]] = phi i64 [ 0, [[ENTRY]] ], [ [[RDX_SELECT5]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF1IC2-NEXT: br label [[LOOP:%.*]]
		; CHECK-VF1IC2: loop:
		; CHECK-VF1IC2-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP]] ]
		; CHECK-VF1IC2-NEXT: [[MIN_IDX:%.]] = phi i64 [ [[BC_MERGE_RDX6]], [[SCALAR_PH]] ], [ [[MIN_IDX_NEXT:%.]], [[LOOP]] ]
		; CHECK-VF1IC2-NEXT: [[MIN_VAL:%.]] = phi i64 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[MIN_VAL_NEXT:%.]], [[LOOP]] ]
		; CHECK-VF1IC2-NEXT: [[GEP:%.*]] = getelementptr i64, ptr [[SRC]], i64 [[IV]]
		; CHECK-VF1IC2-NEXT: [[L:%.*]] = load i64, ptr [[GEP]], align 4
		; CHECK-VF1IC2-NEXT: [[CMP:%.*]] = icmp ugt i64 [[MIN_VAL]], [[L]]
		; CHECK-VF1IC2-NEXT: [[MIN_VAL_NEXT]] = tail call i64 @llvm.umin.i64(i64 [[MIN_VAL]], i64 [[L]])
		; CHECK-VF1IC2-NEXT: [[MIN_IDX_NEXT]] = select i1 [[CMP]], i64 [[IV]], i64 [[MIN_IDX]]
		; CHECK-VF1IC2-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
		; CHECK-VF1IC2-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 0
		; CHECK-VF1IC2-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[LOOP]], !llvm.loop [[LOOP3:![0-9]+]]
		; CHECK-VF1IC2: exit:
		; CHECK-VF1IC2-NEXT: [[RES:%.*]] = phi i64 [ [[MIN_IDX_NEXT]], [[LOOP]] ], [ [[RDX_SELECT5]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF1IC2-NEXT: [[RES_UMIN:%.*]] = phi i64 [ [[MIN_VAL_NEXT]], [[LOOP]] ], [ [[RDX_MINMAX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF1IC2-NEXT: store i64 [[RES_UMIN]], ptr [[UMIN:%.*]], align 4
		; CHECK-VF1IC2-NEXT: ret i64 [[RES]]
;		;
entry:		entry:
br label %loop		br label %loop

loop:		loop:
%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]		%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
%min.idx = phi i64 [ 0, %entry ], [ %min.idx.next, %loop ]		%min.idx = phi i64 [ 0, %entry ], [ %min.idx.next, %loop ]
%min.val = phi i64 [ 0, %entry ], [ %min.val.next, %loop ]		%min.val = phi i64 [ 0, %entry ], [ %min.val.next, %loop ]
Show All 33 Lines	loop:
%exitcond.not = icmp eq i64 %iv.next, 0		%exitcond.not = icmp eq i64 %iv.next, 0
br i1 %exitcond.not, label %exit, label %loop		br i1 %exitcond.not, label %exit, label %loop

exit:		exit:
%res = phi i64 [ %min.idx.next, %loop ]		%res = phi i64 [ %min.idx.next, %loop ]
ret i64 %res		ret i64 %res
}		}

define i64 @test_not_vectorize_select_no_min_reduction(ptr %src) {		define i64 @test_not_vectorize_select_no_min_reduction(ptr %src) {
; CHECK-VF4IC1-LABEL: define i64 @test_not_vectorize_select_no_min_reduction		; CHECK-VF4IC1-LABEL: define i64 @test_not_vectorize_select_no_min_reduction
		fhahnUnsubmitted Not Done Reply Inline Actions Is this incorrectly vectorized or does the test name need fixing? It looks like `%min.val` isn't an actual minimum value phi? fhahn: Is this incorrectly vectorized or does the test name need fixing? It looks like ` %min.val`…
; CHECK-VF4IC1-SAME: (ptr [[SRC:%.*]]) {		; CHECK-VF4IC1-SAME: (ptr [[SRC:%.*]]) {
; CHECK-VF4IC1-NEXT: entry:		; CHECK-VF4IC1-NEXT: entry:
; CHECK-VF4IC1-NEXT: br i1 true, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]		; CHECK-VF4IC1-NEXT: br i1 true, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
; CHECK-VF4IC1: vector.ph:		; CHECK-VF4IC1: vector.ph:
; CHECK-VF4IC1-NEXT: br label [[VECTOR_BODY:%.*]]		; CHECK-VF4IC1-NEXT: br label [[VECTOR_BODY:%.*]]
; CHECK-VF4IC1: vector.body:		; CHECK-VF4IC1: vector.body:
; CHECK-VF4IC1-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]		; CHECK-VF4IC1-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
; CHECK-VF4IC1-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]		; CHECK-VF4IC1-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
Show All 30 Lines
; CHECK-VF4IC1-NEXT: [[GEP:%.*]] = getelementptr i64, ptr [[SRC]], i64 [[IV]]		; CHECK-VF4IC1-NEXT: [[GEP:%.*]] = getelementptr i64, ptr [[SRC]], i64 [[IV]]
; CHECK-VF4IC1-NEXT: [[L:%.*]] = load i64, ptr [[GEP]], align 4		; CHECK-VF4IC1-NEXT: [[L:%.*]] = load i64, ptr [[GEP]], align 4
; CHECK-VF4IC1-NEXT: [[CMP:%.*]] = icmp ugt i64 [[SCALAR_RECUR]], [[L]]		; CHECK-VF4IC1-NEXT: [[CMP:%.*]] = icmp ugt i64 [[SCALAR_RECUR]], [[L]]
; CHECK-VF4IC1-NEXT: [[MIN_VAL_NEXT]] = add i64 [[L]], 1		; CHECK-VF4IC1-NEXT: [[MIN_VAL_NEXT]] = add i64 [[L]], 1
; CHECK-VF4IC1-NEXT: [[FOO:%.*]] = call i64 @llvm.umin.i64(i64 [[SCALAR_RECUR]], i64 [[L]])		; CHECK-VF4IC1-NEXT: [[FOO:%.*]] = call i64 @llvm.umin.i64(i64 [[SCALAR_RECUR]], i64 [[L]])
; CHECK-VF4IC1-NEXT: [[MIN_IDX_NEXT]] = select i1 [[CMP]], i64 [[IV]], i64 [[MIN_IDX]]		; CHECK-VF4IC1-NEXT: [[MIN_IDX_NEXT]] = select i1 [[CMP]], i64 [[IV]], i64 [[MIN_IDX]]
; CHECK-VF4IC1-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1		; CHECK-VF4IC1-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
; CHECK-VF4IC1-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 0		; CHECK-VF4IC1-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 0
; CHECK-VF4IC1-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[LOOP]], !llvm.loop [[LOOP3:![0-9]+]]		; CHECK-VF4IC1-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[LOOP]], !llvm.loop [[LOOP5:![0-9]+]]
; CHECK-VF4IC1: exit:		; CHECK-VF4IC1: exit:
; CHECK-VF4IC1-NEXT: [[RES:%.*]] = phi i64 [ [[MIN_IDX_NEXT]], [[LOOP]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]		; CHECK-VF4IC1-NEXT: [[RES:%.*]] = phi i64 [ [[MIN_IDX_NEXT]], [[LOOP]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
; CHECK-VF4IC1-NEXT: ret i64 [[RES]]		; CHECK-VF4IC1-NEXT: ret i64 [[RES]]
;		;
; CHECK-VF4IC2-LABEL: define i64 @test_not_vectorize_select_no_min_reduction		; CHECK-VF4IC2-LABEL: define i64 @test_not_vectorize_select_no_min_reduction
; CHECK-VF4IC2-SAME: (ptr [[SRC:%.*]]) {		; CHECK-VF4IC2-SAME: (ptr [[SRC:%.*]]) {
; CHECK-VF4IC2-NEXT: entry:		; CHECK-VF4IC2-NEXT: entry:
; CHECK-VF4IC2-NEXT: br i1 true, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]		; CHECK-VF4IC2-NEXT: br i1 true, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
; CHECK-VF4IC2: vector.ph:		; CHECK-VF4IC2: vector.ph:
; CHECK-VF4IC2-NEXT: br label [[VECTOR_BODY:%.*]]		; CHECK-VF4IC2-NEXT: br label [[VECTOR_BODY:%.*]]
; CHECK-VF4IC2: vector.body:		; CHECK-VF4IC2: vector.body:
; CHECK-VF4IC2-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]		; CHECK-VF4IC2-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
; CHECK-VF4IC2-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]		; CHECK-VF4IC2-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
; CHECK-VF4IC2-NEXT: [[VEC_PHI:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP12:%.]], [[VECTOR_BODY]] ]		; CHECK-VF4IC2-NEXT: [[VEC_PHI:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP12:%.]], [[VECTOR_BODY]] ]
; CHECK-VF4IC2-NEXT: [[VEC_PHI2:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP13:%.]], [[VECTOR_BODY]] ]		; CHECK-VF4IC2-NEXT: [[VEC_PHI2:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP13:%.]], [[VECTOR_BODY]] ]
; CHECK-VF4IC2-NEXT: [[VECTOR_RECUR:%.]] = phi <4 x i64> [ <i64 poison, i64 poison, i64 poison, i64 0>, [[VECTOR_PH]] ], [ [[TMP7:%.]], [[VECTOR_BODY]] ]		; CHECK-VF4IC2-NEXT: [[VECTOR_RECUR:%.]] = phi <4 x i64> [ <i64 poison, i64 poison, i64 poison, i64 0>, [[VECTOR_PH]] ], [ [[TMP7:%.]], [[VECTOR_BODY]] ]
; CHECK-VF4IC2-NEXT: [[STEP_ADD:%.*]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>		; CHECK-VF4IC2-NEXT: [[STEP_ADD:%.*]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
; CHECK-VF4IC2-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0		; CHECK-VF4IC2-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
; CHECK-VF4IC2-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 4		; CHECK-VF4IC2-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 4
; CHECK-VF4IC2-NEXT: [[TMP2:%.*]] = getelementptr i64, ptr [[SRC]], i64 [[TMP0]]		; CHECK-VF4IC2-NEXT: [[TMP2:%.]] = getelementptr i64, ptr [[SRC:%.]], i64 [[TMP0]]
; CHECK-VF4IC2-NEXT: [[TMP3:%.*]] = getelementptr i64, ptr [[SRC]], i64 [[TMP1]]		; CHECK-VF4IC2-NEXT: [[TMP3:%.*]] = getelementptr i64, ptr [[SRC]], i64 [[TMP1]]
; CHECK-VF4IC2-NEXT: [[TMP4:%.*]] = getelementptr i64, ptr [[TMP2]], i32 0		; CHECK-VF4IC2-NEXT: [[TMP4:%.*]] = getelementptr i64, ptr [[TMP2]], i32 0
; CHECK-VF4IC2-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i64>, ptr [[TMP4]], align 4		; CHECK-VF4IC2-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i64>, ptr [[TMP4]], align 4
; CHECK-VF4IC2-NEXT: [[TMP5:%.*]] = getelementptr i64, ptr [[TMP2]], i32 4		; CHECK-VF4IC2-NEXT: [[TMP5:%.*]] = getelementptr i64, ptr [[TMP2]], i32 4
; CHECK-VF4IC2-NEXT: [[WIDE_LOAD3:%.*]] = load <4 x i64>, ptr [[TMP5]], align 4		; CHECK-VF4IC2-NEXT: [[WIDE_LOAD3:%.*]] = load <4 x i64>, ptr [[TMP5]], align 4
; CHECK-VF4IC2-NEXT: [[TMP6:%.*]] = add <4 x i64> [[WIDE_LOAD]], <i64 1, i64 1, i64 1, i64 1>		; CHECK-VF4IC2-NEXT: [[TMP6:%.*]] = add <4 x i64> [[WIDE_LOAD]], <i64 1, i64 1, i64 1, i64 1>
; CHECK-VF4IC2-NEXT: [[TMP7]] = add <4 x i64> [[WIDE_LOAD3]], <i64 1, i64 1, i64 1, i64 1>		; CHECK-VF4IC2-NEXT: [[TMP7]] = add <4 x i64> [[WIDE_LOAD3]], <i64 1, i64 1, i64 1, i64 1>
; CHECK-VF4IC2-NEXT: [[TMP8:%.*]] = shufflevector <4 x i64> [[VECTOR_RECUR]], <4 x i64> [[TMP6]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>		; CHECK-VF4IC2-NEXT: [[TMP8:%.*]] = shufflevector <4 x i64> [[VECTOR_RECUR]], <4 x i64> [[TMP6]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
; CHECK-VF4IC2-NEXT: [[TMP9:%.*]] = shufflevector <4 x i64> [[TMP6]], <4 x i64> [[TMP7]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>		; CHECK-VF4IC2-NEXT: [[TMP9:%.*]] = shufflevector <4 x i64> [[TMP6]], <4 x i64> [[TMP7]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
; CHECK-VF4IC2-NEXT: [[TMP10:%.*]] = icmp ugt <4 x i64> [[TMP8]], [[WIDE_LOAD]]		; CHECK-VF4IC2-NEXT: [[TMP10:%.*]] = icmp ugt <4 x i64> [[TMP8]], [[WIDE_LOAD]]
; CHECK-VF4IC2-NEXT: [[TMP11:%.*]] = icmp ugt <4 x i64> [[TMP9]], [[WIDE_LOAD3]]		; CHECK-VF4IC2-NEXT: [[TMP11:%.*]] = icmp ugt <4 x i64> [[TMP9]], [[WIDE_LOAD3]]
; CHECK-VF4IC2-NEXT: [[TMP12]] = select <4 x i1> [[TMP10]], <4 x i64> [[VEC_IND]], <4 x i64> [[VEC_PHI]]		; CHECK-VF4IC2-NEXT: [[TMP12]] = select <4 x i1> [[TMP10]], <4 x i64> [[VEC_IND]], <4 x i64> [[VEC_PHI]]
; CHECK-VF4IC2-NEXT: [[TMP13]] = select <4 x i1> [[TMP11]], <4 x i64> [[STEP_ADD]], <4 x i64> [[VEC_PHI2]]		; CHECK-VF4IC2-NEXT: [[TMP13]] = select <4 x i1> [[TMP11]], <4 x i64> [[STEP_ADD]], <4 x i64> [[VEC_PHI2]]
; CHECK-VF4IC2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8		; CHECK-VF4IC2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8
; CHECK-VF4IC2-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[STEP_ADD]], <i64 4, i64 4, i64 4, i64 4>		; CHECK-VF4IC2-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[STEP_ADD]], <i64 4, i64 4, i64 4, i64 4>
; CHECK-VF4IC2-NEXT: [[TMP14:%.*]] = icmp eq i64 [[INDEX_NEXT]], 0		; CHECK-VF4IC2-NEXT: [[TMP14:%.*]] = icmp eq i64 [[INDEX_NEXT]], 0
; CHECK-VF4IC2-NEXT: br i1 [[TMP14]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]		; CHECK-VF4IC2-NEXT: br i1 [[TMP14]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
; CHECK-VF4IC2: middle.block:		; CHECK-VF4IC2: middle.block:
; CHECK-VF4IC2-NEXT: [[RDX_MINMAX:%.*]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[TMP12]], <4 x i64> [[TMP13]])		; CHECK-VF4IC2-NEXT: [[RDX_MINMAX:%.*]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[TMP12]], <4 x i64> [[TMP13]])
; CHECK-VF4IC2-NEXT: [[TMP15:%.*]] = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> [[RDX_MINMAX]])		; CHECK-VF4IC2-NEXT: [[TMP15:%.*]] = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> [[RDX_MINMAX]])
; CHECK-VF4IC2-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[TMP15]], -9223372036854775808		; CHECK-VF4IC2-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[TMP15]], -9223372036854775808
; CHECK-VF4IC2-NEXT: [[RDX_SELECT:%.*]] = select i1 [[RDX_SELECT_CMP]], i64 [[TMP15]], i64 0		; CHECK-VF4IC2-NEXT: [[RDX_SELECT:%.*]] = select i1 [[RDX_SELECT_CMP]], i64 [[TMP15]], i64 0
; CHECK-VF4IC2-NEXT: [[CMP_N:%.*]] = icmp eq i64 0, 0		; CHECK-VF4IC2-NEXT: [[CMP_N:%.*]] = icmp eq i64 0, 0
; CHECK-VF4IC2-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i64> [[TMP7]], i32 3		; CHECK-VF4IC2-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i64> [[TMP7]], i32 3
; CHECK-VF4IC2-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]		; CHECK-VF4IC2-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
Show All 9 Lines
; CHECK-VF4IC2-NEXT: [[GEP:%.*]] = getelementptr i64, ptr [[SRC]], i64 [[IV]]		; CHECK-VF4IC2-NEXT: [[GEP:%.*]] = getelementptr i64, ptr [[SRC]], i64 [[IV]]
; CHECK-VF4IC2-NEXT: [[L:%.*]] = load i64, ptr [[GEP]], align 4		; CHECK-VF4IC2-NEXT: [[L:%.*]] = load i64, ptr [[GEP]], align 4
; CHECK-VF4IC2-NEXT: [[CMP:%.*]] = icmp ugt i64 [[SCALAR_RECUR]], [[L]]		; CHECK-VF4IC2-NEXT: [[CMP:%.*]] = icmp ugt i64 [[SCALAR_RECUR]], [[L]]
; CHECK-VF4IC2-NEXT: [[MIN_VAL_NEXT]] = add i64 [[L]], 1		; CHECK-VF4IC2-NEXT: [[MIN_VAL_NEXT]] = add i64 [[L]], 1
; CHECK-VF4IC2-NEXT: [[FOO:%.*]] = call i64 @llvm.umin.i64(i64 [[SCALAR_RECUR]], i64 [[L]])		; CHECK-VF4IC2-NEXT: [[FOO:%.*]] = call i64 @llvm.umin.i64(i64 [[SCALAR_RECUR]], i64 [[L]])
; CHECK-VF4IC2-NEXT: [[MIN_IDX_NEXT]] = select i1 [[CMP]], i64 [[IV]], i64 [[MIN_IDX]]		; CHECK-VF4IC2-NEXT: [[MIN_IDX_NEXT]] = select i1 [[CMP]], i64 [[IV]], i64 [[MIN_IDX]]
; CHECK-VF4IC2-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1		; CHECK-VF4IC2-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
; CHECK-VF4IC2-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 0		; CHECK-VF4IC2-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 0
; CHECK-VF4IC2-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[LOOP]], !llvm.loop [[LOOP3:![0-9]+]]		; CHECK-VF4IC2-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[LOOP]], !llvm.loop [[LOOP5:![0-9]+]]
; CHECK-VF4IC2: exit:		; CHECK-VF4IC2: exit:
; CHECK-VF4IC2-NEXT: [[RES:%.*]] = phi i64 [ [[MIN_IDX_NEXT]], [[LOOP]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]		; CHECK-VF4IC2-NEXT: [[RES:%.*]] = phi i64 [ [[MIN_IDX_NEXT]], [[LOOP]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
; CHECK-VF4IC2-NEXT: ret i64 [[RES]]		; CHECK-VF4IC2-NEXT: ret i64 [[RES]]
;		;
; CHECK-VF1IC2-LABEL: define i64 @test_not_vectorize_select_no_min_reduction		; CHECK-VF1IC2-LABEL: define i64 @test_not_vectorize_select_no_min_reduction
; CHECK-VF1IC2-SAME: (ptr [[SRC:%.*]]) {		; CHECK-VF1IC2-SAME: (ptr [[SRC:%.*]]) {
; CHECK-VF1IC2-NEXT: entry:		; CHECK-VF1IC2-NEXT: entry:
; CHECK-VF1IC2-NEXT: br i1 true, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]		; CHECK-VF1IC2-NEXT: br i1 true, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
; CHECK-VF1IC2: vector.ph:		; CHECK-VF1IC2: vector.ph:
; CHECK-VF1IC2-NEXT: br label [[VECTOR_BODY:%.*]]		; CHECK-VF1IC2-NEXT: br label [[VECTOR_BODY:%.*]]
; CHECK-VF1IC2: vector.body:		; CHECK-VF1IC2: vector.body:
; CHECK-VF1IC2-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]		; CHECK-VF1IC2-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
; CHECK-VF1IC2-NEXT: [[VEC_PHI:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP10:%.]], [[VECTOR_BODY]] ]		; CHECK-VF1IC2-NEXT: [[VEC_PHI:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP10:%.]], [[VECTOR_BODY]] ]
; CHECK-VF1IC2-NEXT: [[VEC_PHI1:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP11:%.]], [[VECTOR_BODY]] ]		; CHECK-VF1IC2-NEXT: [[VEC_PHI1:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP11:%.]], [[VECTOR_BODY]] ]
; CHECK-VF1IC2-NEXT: [[VECTOR_RECUR:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[TMP7:%.]], [[VECTOR_BODY]] ]		; CHECK-VF1IC2-NEXT: [[VECTOR_RECUR:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[TMP7:%.]], [[VECTOR_BODY]] ]
; CHECK-VF1IC2-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0		; CHECK-VF1IC2-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
; CHECK-VF1IC2-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1		; CHECK-VF1IC2-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
; CHECK-VF1IC2-NEXT: [[TMP2:%.*]] = getelementptr i64, ptr [[SRC]], i64 [[TMP0]]		; CHECK-VF1IC2-NEXT: [[TMP2:%.]] = getelementptr i64, ptr [[SRC:%.]], i64 [[TMP0]]
; CHECK-VF1IC2-NEXT: [[TMP3:%.*]] = getelementptr i64, ptr [[SRC]], i64 [[TMP1]]		; CHECK-VF1IC2-NEXT: [[TMP3:%.*]] = getelementptr i64, ptr [[SRC]], i64 [[TMP1]]
; CHECK-VF1IC2-NEXT: [[TMP4:%.*]] = load i64, ptr [[TMP2]], align 4		; CHECK-VF1IC2-NEXT: [[TMP4:%.*]] = load i64, ptr [[TMP2]], align 4
; CHECK-VF1IC2-NEXT: [[TMP5:%.*]] = load i64, ptr [[TMP3]], align 4		; CHECK-VF1IC2-NEXT: [[TMP5:%.*]] = load i64, ptr [[TMP3]], align 4
; CHECK-VF1IC2-NEXT: [[TMP6:%.*]] = add i64 [[TMP4]], 1		; CHECK-VF1IC2-NEXT: [[TMP6:%.*]] = add i64 [[TMP4]], 1
; CHECK-VF1IC2-NEXT: [[TMP7]] = add i64 [[TMP5]], 1		; CHECK-VF1IC2-NEXT: [[TMP7]] = add i64 [[TMP5]], 1
; CHECK-VF1IC2-NEXT: [[TMP8:%.*]] = icmp ugt i64 [[VECTOR_RECUR]], [[TMP4]]		; CHECK-VF1IC2-NEXT: [[TMP8:%.*]] = icmp ugt i64 [[VECTOR_RECUR]], [[TMP4]]
; CHECK-VF1IC2-NEXT: [[TMP9:%.*]] = icmp ugt i64 [[TMP6]], [[TMP5]]		; CHECK-VF1IC2-NEXT: [[TMP9:%.*]] = icmp ugt i64 [[TMP6]], [[TMP5]]
; CHECK-VF1IC2-NEXT: [[TMP10]] = select i1 [[TMP8]], i64 [[TMP0]], i64 [[VEC_PHI]]		; CHECK-VF1IC2-NEXT: [[TMP10]] = select i1 [[TMP8]], i64 [[TMP0]], i64 [[VEC_PHI]]
; CHECK-VF1IC2-NEXT: [[TMP11]] = select i1 [[TMP9]], i64 [[TMP1]], i64 [[VEC_PHI1]]		; CHECK-VF1IC2-NEXT: [[TMP11]] = select i1 [[TMP9]], i64 [[TMP1]], i64 [[VEC_PHI1]]
; CHECK-VF1IC2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2		; CHECK-VF1IC2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
; CHECK-VF1IC2-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], 0		; CHECK-VF1IC2-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], 0
; CHECK-VF1IC2-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]		; CHECK-VF1IC2-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
; CHECK-VF1IC2: middle.block:		; CHECK-VF1IC2: middle.block:
; CHECK-VF1IC2-NEXT: [[RDX_MINMAX:%.*]] = call i64 @llvm.smax.i64(i64 [[TMP10]], i64 [[TMP11]])		; CHECK-VF1IC2-NEXT: [[RDX_MINMAX:%.*]] = call i64 @llvm.smax.i64(i64 [[TMP10]], i64 [[TMP11]])
; CHECK-VF1IC2-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[RDX_MINMAX]], -9223372036854775808		; CHECK-VF1IC2-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[RDX_MINMAX]], -9223372036854775808
; CHECK-VF1IC2-NEXT: [[RDX_SELECT:%.*]] = select i1 [[RDX_SELECT_CMP]], i64 [[RDX_MINMAX]], i64 0		; CHECK-VF1IC2-NEXT: [[RDX_SELECT:%.*]] = select i1 [[RDX_SELECT_CMP]], i64 [[RDX_MINMAX]], i64 0
; CHECK-VF1IC2-NEXT: [[CMP_N:%.*]] = icmp eq i64 0, 0		; CHECK-VF1IC2-NEXT: [[CMP_N:%.*]] = icmp eq i64 0, 0
; CHECK-VF1IC2-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]		; CHECK-VF1IC2-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
; CHECK-VF1IC2: scalar.ph:		; CHECK-VF1IC2: scalar.ph:
; CHECK-VF1IC2-NEXT: [[SCALAR_RECUR_INIT:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]		; CHECK-VF1IC2-NEXT: [[SCALAR_RECUR_INIT:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]
; CHECK-VF1IC2-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]		; CHECK-VF1IC2-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]
; CHECK-VF1IC2-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ 0, [[ENTRY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]		; CHECK-VF1IC2-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ 0, [[ENTRY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
; CHECK-VF1IC2-NEXT: br label [[LOOP:%.*]]		; CHECK-VF1IC2-NEXT: br label [[LOOP:%.*]]
; CHECK-VF1IC2: loop:		; CHECK-VF1IC2: loop:
; CHECK-VF1IC2-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP]] ]		; CHECK-VF1IC2-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP]] ]
; CHECK-VF1IC2-NEXT: [[MIN_IDX:%.]] = phi i64 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[MIN_IDX_NEXT:%.]], [[LOOP]] ]		; CHECK-VF1IC2-NEXT: [[MIN_IDX:%.]] = phi i64 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[MIN_IDX_NEXT:%.]], [[LOOP]] ]
; CHECK-VF1IC2-NEXT: [[SCALAR_RECUR:%.]] = phi i64 [ [[SCALAR_RECUR_INIT]], [[SCALAR_PH]] ], [ [[MIN_VAL_NEXT:%.]], [[LOOP]] ]		; CHECK-VF1IC2-NEXT: [[SCALAR_RECUR:%.]] = phi i64 [ [[SCALAR_RECUR_INIT]], [[SCALAR_PH]] ], [ [[MIN_VAL_NEXT:%.]], [[LOOP]] ]
; CHECK-VF1IC2-NEXT: [[GEP:%.*]] = getelementptr i64, ptr [[SRC]], i64 [[IV]]		; CHECK-VF1IC2-NEXT: [[GEP:%.*]] = getelementptr i64, ptr [[SRC]], i64 [[IV]]
; CHECK-VF1IC2-NEXT: [[L:%.*]] = load i64, ptr [[GEP]], align 4		; CHECK-VF1IC2-NEXT: [[L:%.*]] = load i64, ptr [[GEP]], align 4
; CHECK-VF1IC2-NEXT: [[CMP:%.*]] = icmp ugt i64 [[SCALAR_RECUR]], [[L]]		; CHECK-VF1IC2-NEXT: [[CMP:%.*]] = icmp ugt i64 [[SCALAR_RECUR]], [[L]]
; CHECK-VF1IC2-NEXT: [[MIN_VAL_NEXT]] = add i64 [[L]], 1		; CHECK-VF1IC2-NEXT: [[MIN_VAL_NEXT]] = add i64 [[L]], 1
; CHECK-VF1IC2-NEXT: [[FOO:%.*]] = call i64 @llvm.umin.i64(i64 [[SCALAR_RECUR]], i64 [[L]])		; CHECK-VF1IC2-NEXT: [[FOO:%.*]] = call i64 @llvm.umin.i64(i64 [[SCALAR_RECUR]], i64 [[L]])
; CHECK-VF1IC2-NEXT: [[MIN_IDX_NEXT]] = select i1 [[CMP]], i64 [[IV]], i64 [[MIN_IDX]]		; CHECK-VF1IC2-NEXT: [[MIN_IDX_NEXT]] = select i1 [[CMP]], i64 [[IV]], i64 [[MIN_IDX]]
; CHECK-VF1IC2-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1		; CHECK-VF1IC2-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
; CHECK-VF1IC2-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 0		; CHECK-VF1IC2-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 0
; CHECK-VF1IC2-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[LOOP]], !llvm.loop [[LOOP3:![0-9]+]]		; CHECK-VF1IC2-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[LOOP]], !llvm.loop [[LOOP5:![0-9]+]]
; CHECK-VF1IC2: exit:		; CHECK-VF1IC2: exit:
; CHECK-VF1IC2-NEXT: [[RES:%.*]] = phi i64 [ [[MIN_IDX_NEXT]], [[LOOP]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]		; CHECK-VF1IC2-NEXT: [[RES:%.*]] = phi i64 [ [[MIN_IDX_NEXT]], [[LOOP]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
; CHECK-VF1IC2-NEXT: ret i64 [[RES]]		; CHECK-VF1IC2-NEXT: ret i64 [[RES]]
;		;
entry:		entry:
br label %loop		br label %loop

loop:		loop:
▲ Show 20 Lines • Show All 209 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/smax-idx.ll

; RUN: opt -passes=loop-vectorize -force-vector-width=4 -force-vector-interleave=1 -S < %s \| FileCheck %s --check-prefix=CHECK		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
		fhahnUnsubmitted Not Done Reply Inline Actions Could you add new tests as a separate patch? fhahn: Could you add new tests as a separate patch?
		Mel-ChenAuthorUnsubmitted Done Reply Inline Actions Of course. I will split an NFC patch tomorrow. Mel-Chen: Of course. I will split an NFC patch tomorrow.
; RUN: opt -passes=loop-vectorize -force-vector-width=4 -force-vector-interleave=4 -S < %s \| FileCheck %s --check-prefix=CHECK		; RUN: opt -passes=loop-vectorize -force-vector-width=4 -force-vector-interleave=1 -S < %s \| FileCheck %s --check-prefix=CHECK-VF4IC1 --check-prefix=CHECK
; RUN: opt -passes=loop-vectorize -force-vector-width=1 -force-vector-interleave=4 -S < %s \| FileCheck %s --check-prefix=CHECK		; RUN: opt -passes=loop-vectorize -force-vector-width=4 -force-vector-interleave=4 -S < %s \| FileCheck %s --check-prefix=CHECK-VF4IC4 --check-prefix=CHECK
		; RUN: opt -passes=loop-vectorize -force-vector-width=1 -force-vector-interleave=4 -S < %s \| FileCheck %s --check-prefix=CHECK-VF1IC4 --check-prefix=CHECK

define i64 @smax_idx(ptr nocapture readonly %a, i64 %mm, i64 %ii, ptr nocapture writeonly %res_max, i64 %n) {		define i64 @smax_idx(ptr nocapture readonly %a, i64 %mm, i64 %ii, ptr nocapture writeonly %res_max, i64 %n) {
; CHECK-LABEL: @smax_idx(		; CHECK-VF4IC1-LABEL: @smax_idx(
; CHECK-NOT: vector.body:		; CHECK-VF4IC1-NEXT: entry:
		; CHECK-VF4IC1-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 4
		; CHECK-VF4IC1-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK-VF4IC1: vector.ph:
		; CHECK-VF4IC1-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 4
		; CHECK-VF4IC1-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
		; CHECK-VF4IC1-NEXT: [[MINMAX_IDENT_SPLATINSERT:%.]] = insertelement <4 x i64> poison, i64 [[MM:%.]], i64 0
		; CHECK-VF4IC1-NEXT: [[MINMAX_IDENT_SPLAT:%.*]] = shufflevector <4 x i64> [[MINMAX_IDENT_SPLATINSERT]], <4 x i64> poison, <4 x i32> zeroinitializer
		; CHECK-VF4IC1-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK-VF4IC1: vector.body:
		; CHECK-VF4IC1-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[VEC_PHI:%.]] = phi <4 x i64> [ [[MINMAX_IDENT_SPLAT]], [[VECTOR_PH]] ], [ [[TMP3:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[VEC_PHI1:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP5:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
		; CHECK-VF4IC1-NEXT: [[TMP1:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP0]]
		; CHECK-VF4IC1-NEXT: [[TMP2:%.*]] = getelementptr inbounds i64, ptr [[TMP1]], i32 0
		; CHECK-VF4IC1-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i64>, ptr [[TMP2]], align 4
		; CHECK-VF4IC1-NEXT: [[TMP3]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[VEC_PHI]], <4 x i64> [[WIDE_LOAD]])
		; CHECK-VF4IC1-NEXT: [[TMP4:%.*]] = icmp slt <4 x i64> [[VEC_PHI]], [[WIDE_LOAD]]
		; CHECK-VF4IC1-NEXT: [[TMP5]] = select <4 x i1> [[TMP4]], <4 x i64> [[VEC_IND]], <4 x i64> [[VEC_PHI1]]
		; CHECK-VF4IC1-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
		; CHECK-VF4IC1-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC1-NEXT: [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
		; CHECK-VF4IC1-NEXT: br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
		; CHECK-VF4IC1: middle.block:
		; CHECK-VF4IC1-NEXT: [[TMP7:%.*]] = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> [[TMP3]])
		; CHECK-VF4IC1-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <4 x i64> poison, i64 [[TMP7]], i64 0
		; CHECK-VF4IC1-NEXT: [[DOTSPLAT:%.*]] = shufflevector <4 x i64> [[DOTSPLATINSERT]], <4 x i64> poison, <4 x i32> zeroinitializer
		; CHECK-VF4IC1-NEXT: [[MASK_CMP:%.*]] = icmp eq <4 x i64> [[DOTSPLAT]], [[TMP3]]
		; CHECK-VF4IC1-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
		; CHECK-VF4IC1-NEXT: [[MASK_SELECT:%.*]] = select <4 x i1> [[MASK_CMP]], <4 x i64> [[TMP5]], <4 x i64> <i64 9223372036854775807, i64 9223372036854775807, i64 9223372036854775807, i64 9223372036854775807>
		; CHECK-VF4IC1-NEXT: [[TMP8:%.*]] = call i64 @llvm.vector.reduce.smin.v4i64(<4 x i64> [[MASK_SELECT]])
		; CHECK-VF4IC1-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[TMP8]], -9223372036854775808
		; CHECK-VF4IC1-NEXT: [[RDX_SELECT:%.]] = select i1 [[RDX_SELECT_CMP]], i64 [[TMP8]], i64 [[II:%.]]
		; CHECK-VF4IC1-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
		; CHECK-VF4IC1: scalar.ph:
		; CHECK-VF4IC1-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
		; CHECK-VF4IC1-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ [[MM]], [[ENTRY]] ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC1-NEXT: [[BC_MERGE_RDX2:%.*]] = phi i64 [ [[II]], [[ENTRY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC1-NEXT: br label [[FOR_BODY:%.*]]
		; CHECK-VF4IC1: for.body:
		; CHECK-VF4IC1-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[MAX_09:%.]] = phi i64 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[TMP10:%.]], [[FOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[IDX_011:%.]] = phi i64 [ [[BC_MERGE_RDX2]], [[SCALAR_PH]] ], [ [[SPEC_SELECT7:%.]], [[FOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[INDVARS_IV]]
		; CHECK-VF4IC1-NEXT: [[TMP9:%.*]] = load i64, ptr [[ARRAYIDX]], align 4
		; CHECK-VF4IC1-NEXT: [[TMP10]] = tail call i64 @llvm.smax.i64(i64 [[MAX_09]], i64 [[TMP9]])
		; CHECK-VF4IC1-NEXT: [[CMP1:%.*]] = icmp slt i64 [[MAX_09]], [[TMP9]]
		; CHECK-VF4IC1-NEXT: [[SPEC_SELECT7]] = select i1 [[CMP1]], i64 [[INDVARS_IV]], i64 [[IDX_011]]
		; CHECK-VF4IC1-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
		; CHECK-VF4IC1-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]
		; CHECK-VF4IC1-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
		; CHECK-VF4IC1: exit:
		; CHECK-VF4IC1-NEXT: [[DOTLCSSA:%.*]] = phi i64 [ [[TMP10]], [[FOR_BODY]] ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC1-NEXT: [[SPEC_SELECT7_LCSSA:%.*]] = phi i64 [ [[SPEC_SELECT7]], [[FOR_BODY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC1-NEXT: store i64 [[DOTLCSSA]], ptr [[RES_MAX:%.*]], align 4
		; CHECK-VF4IC1-NEXT: ret i64 [[SPEC_SELECT7_LCSSA]]
		;
		; CHECK-VF4IC4-LABEL: @smax_idx(
		; CHECK-VF4IC4-NEXT: entry:
		; CHECK-VF4IC4-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 16
		; CHECK-VF4IC4-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK-VF4IC4: vector.ph:
		; CHECK-VF4IC4-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 16
		; CHECK-VF4IC4-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
		; CHECK-VF4IC4-NEXT: [[MINMAX_IDENT_SPLATINSERT:%.]] = insertelement <4 x i64> poison, i64 [[MM:%.]], i64 0
		; CHECK-VF4IC4-NEXT: [[MINMAX_IDENT_SPLAT:%.*]] = shufflevector <4 x i64> [[MINMAX_IDENT_SPLATINSERT]], <4 x i64> poison, <4 x i32> zeroinitializer
		; CHECK-VF4IC4-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK-VF4IC4: vector.body:
		; CHECK-VF4IC4-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI:%.]] = phi <4 x i64> [ [[MINMAX_IDENT_SPLAT]], [[VECTOR_PH]] ], [ [[TMP12:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI4:%.]] = phi <4 x i64> [ [[MINMAX_IDENT_SPLAT]], [[VECTOR_PH]] ], [ [[TMP13:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI5:%.]] = phi <4 x i64> [ [[MINMAX_IDENT_SPLAT]], [[VECTOR_PH]] ], [ [[TMP14:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI6:%.]] = phi <4 x i64> [ [[MINMAX_IDENT_SPLAT]], [[VECTOR_PH]] ], [ [[TMP15:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI7:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP20:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI8:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP21:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI9:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP22:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI10:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP23:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[STEP_ADD:%.*]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC4-NEXT: [[STEP_ADD1:%.*]] = add <4 x i64> [[STEP_ADD]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC4-NEXT: [[STEP_ADD2:%.*]] = add <4 x i64> [[STEP_ADD1]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC4-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
		; CHECK-VF4IC4-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 4
		; CHECK-VF4IC4-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 8
		; CHECK-VF4IC4-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 12
		; CHECK-VF4IC4-NEXT: [[TMP4:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP0]]
		; CHECK-VF4IC4-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP1]]
		; CHECK-VF4IC4-NEXT: [[TMP6:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP2]]
		; CHECK-VF4IC4-NEXT: [[TMP7:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP3]]
		; CHECK-VF4IC4-NEXT: [[TMP8:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 0
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i64>, ptr [[TMP8]], align 4
		; CHECK-VF4IC4-NEXT: [[TMP9:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 4
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD11:%.*]] = load <4 x i64>, ptr [[TMP9]], align 4
		; CHECK-VF4IC4-NEXT: [[TMP10:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 8
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD12:%.*]] = load <4 x i64>, ptr [[TMP10]], align 4
		; CHECK-VF4IC4-NEXT: [[TMP11:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 12
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD13:%.*]] = load <4 x i64>, ptr [[TMP11]], align 4
		; CHECK-VF4IC4-NEXT: [[TMP12]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[VEC_PHI]], <4 x i64> [[WIDE_LOAD]])
		; CHECK-VF4IC4-NEXT: [[TMP13]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[VEC_PHI4]], <4 x i64> [[WIDE_LOAD11]])
		; CHECK-VF4IC4-NEXT: [[TMP14]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[VEC_PHI5]], <4 x i64> [[WIDE_LOAD12]])
		; CHECK-VF4IC4-NEXT: [[TMP15]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[VEC_PHI6]], <4 x i64> [[WIDE_LOAD13]])
		; CHECK-VF4IC4-NEXT: [[TMP16:%.*]] = icmp slt <4 x i64> [[VEC_PHI]], [[WIDE_LOAD]]
		; CHECK-VF4IC4-NEXT: [[TMP17:%.*]] = icmp slt <4 x i64> [[VEC_PHI4]], [[WIDE_LOAD11]]
		; CHECK-VF4IC4-NEXT: [[TMP18:%.*]] = icmp slt <4 x i64> [[VEC_PHI5]], [[WIDE_LOAD12]]
		; CHECK-VF4IC4-NEXT: [[TMP19:%.*]] = icmp slt <4 x i64> [[VEC_PHI6]], [[WIDE_LOAD13]]
		; CHECK-VF4IC4-NEXT: [[TMP20]] = select <4 x i1> [[TMP16]], <4 x i64> [[VEC_IND]], <4 x i64> [[VEC_PHI7]]
		; CHECK-VF4IC4-NEXT: [[TMP21]] = select <4 x i1> [[TMP17]], <4 x i64> [[STEP_ADD]], <4 x i64> [[VEC_PHI8]]
		; CHECK-VF4IC4-NEXT: [[TMP22]] = select <4 x i1> [[TMP18]], <4 x i64> [[STEP_ADD1]], <4 x i64> [[VEC_PHI9]]
		; CHECK-VF4IC4-NEXT: [[TMP23]] = select <4 x i1> [[TMP19]], <4 x i64> [[STEP_ADD2]], <4 x i64> [[VEC_PHI10]]
		; CHECK-VF4IC4-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16
		; CHECK-VF4IC4-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[STEP_ADD2]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC4-NEXT: [[TMP24:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
		; CHECK-VF4IC4-NEXT: br i1 [[TMP24]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
		; CHECK-VF4IC4: middle.block:
		; CHECK-VF4IC4-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp sgt <4 x i64> [[TMP12]], [[TMP13]]
		; CHECK-VF4IC4-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP]], <4 x i64> [[TMP12]], <4 x i64> [[TMP13]]
		; CHECK-VF4IC4-NEXT: [[RDX_MINMAX_CMP14:%.*]] = icmp sgt <4 x i64> [[RDX_MINMAX_SELECT]], [[TMP14]]
		; CHECK-VF4IC4-NEXT: [[RDX_MINMAX_SELECT15:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP14]], <4 x i64> [[RDX_MINMAX_SELECT]], <4 x i64> [[TMP14]]
		; CHECK-VF4IC4-NEXT: [[RDX_MINMAX_CMP16:%.*]] = icmp sgt <4 x i64> [[RDX_MINMAX_SELECT15]], [[TMP15]]
		; CHECK-VF4IC4-NEXT: [[RDX_MINMAX_SELECT17:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP16]], <4 x i64> [[RDX_MINMAX_SELECT15]], <4 x i64> [[TMP15]]
		; CHECK-VF4IC4-NEXT: [[TMP25:%.*]] = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> [[RDX_MINMAX_SELECT17]])
		; CHECK-VF4IC4-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <4 x i64> poison, i64 [[TMP25]], i64 0
		; CHECK-VF4IC4-NEXT: [[DOTSPLAT:%.*]] = shufflevector <4 x i64> [[DOTSPLATINSERT]], <4 x i64> poison, <4 x i32> zeroinitializer
		; CHECK-VF4IC4-NEXT: [[MASK_CMP:%.*]] = icmp eq <4 x i64> [[DOTSPLAT]], [[RDX_MINMAX_SELECT17]]
		; CHECK-VF4IC4-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
		; CHECK-VF4IC4-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp sge <4 x i64> [[TMP12]], [[TMP13]]
		; CHECK-VF4IC4-NEXT: [[RDX_SELECT:%.*]] = select <4 x i1> [[RDX_SELECT_CMP]], <4 x i64> [[TMP20]], <4 x i64> [[TMP21]]
		; CHECK-VF4IC4-NEXT: [[RDX_SELECT_CMP18:%.*]] = icmp sge <4 x i64> [[RDX_MINMAX_SELECT]], [[TMP14]]
		; CHECK-VF4IC4-NEXT: [[RDX_SELECT19:%.*]] = select <4 x i1> [[RDX_SELECT_CMP18]], <4 x i64> [[RDX_SELECT]], <4 x i64> [[TMP22]]
		; CHECK-VF4IC4-NEXT: [[RDX_SELECT_CMP20:%.*]] = icmp sge <4 x i64> [[RDX_MINMAX_SELECT15]], [[TMP15]]
		; CHECK-VF4IC4-NEXT: [[RDX_SELECT21:%.*]] = select <4 x i1> [[RDX_SELECT_CMP20]], <4 x i64> [[RDX_SELECT19]], <4 x i64> [[TMP23]]
		; CHECK-VF4IC4-NEXT: [[MASK_SELECT:%.*]] = select <4 x i1> [[MASK_CMP]], <4 x i64> [[RDX_SELECT21]], <4 x i64> <i64 9223372036854775807, i64 9223372036854775807, i64 9223372036854775807, i64 9223372036854775807>
		; CHECK-VF4IC4-NEXT: [[TMP26:%.*]] = call i64 @llvm.vector.reduce.smin.v4i64(<4 x i64> [[MASK_SELECT]])
		; CHECK-VF4IC4-NEXT: [[RDX_SELECT_CMP22:%.*]] = icmp ne i64 [[TMP26]], -9223372036854775808
		; CHECK-VF4IC4-NEXT: [[RDX_SELECT23:%.]] = select i1 [[RDX_SELECT_CMP22]], i64 [[TMP26]], i64 [[II:%.]]
		; CHECK-VF4IC4-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
		; CHECK-VF4IC4: scalar.ph:
		; CHECK-VF4IC4-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
		; CHECK-VF4IC4-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ [[MM]], [[ENTRY]] ], [ [[TMP25]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC4-NEXT: [[BC_MERGE_RDX24:%.*]] = phi i64 [ [[II]], [[ENTRY]] ], [ [[RDX_SELECT23]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC4-NEXT: br label [[FOR_BODY:%.*]]
		; CHECK-VF4IC4: for.body:
		; CHECK-VF4IC4-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[MAX_09:%.]] = phi i64 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[TMP28:%.]], [[FOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[IDX_011:%.]] = phi i64 [ [[BC_MERGE_RDX24]], [[SCALAR_PH]] ], [ [[SPEC_SELECT7:%.]], [[FOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[INDVARS_IV]]
		; CHECK-VF4IC4-NEXT: [[TMP27:%.*]] = load i64, ptr [[ARRAYIDX]], align 4
		; CHECK-VF4IC4-NEXT: [[TMP28]] = tail call i64 @llvm.smax.i64(i64 [[MAX_09]], i64 [[TMP27]])
		; CHECK-VF4IC4-NEXT: [[CMP1:%.*]] = icmp slt i64 [[MAX_09]], [[TMP27]]
		; CHECK-VF4IC4-NEXT: [[SPEC_SELECT7]] = select i1 [[CMP1]], i64 [[INDVARS_IV]], i64 [[IDX_011]]
		; CHECK-VF4IC4-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
		; CHECK-VF4IC4-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]
		; CHECK-VF4IC4-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
		; CHECK-VF4IC4: exit:
		; CHECK-VF4IC4-NEXT: [[DOTLCSSA:%.*]] = phi i64 [ [[TMP28]], [[FOR_BODY]] ], [ [[TMP25]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC4-NEXT: [[SPEC_SELECT7_LCSSA:%.*]] = phi i64 [ [[SPEC_SELECT7]], [[FOR_BODY]] ], [ [[RDX_SELECT23]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC4-NEXT: store i64 [[DOTLCSSA]], ptr [[RES_MAX:%.*]], align 4
		; CHECK-VF4IC4-NEXT: ret i64 [[SPEC_SELECT7_LCSSA]]
		;
		; CHECK-VF1IC4-LABEL: @smax_idx(
		; CHECK-VF1IC4-NEXT: entry:
		; CHECK-VF1IC4-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 4
		; CHECK-VF1IC4-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK-VF1IC4: vector.ph:
		; CHECK-VF1IC4-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 4
		; CHECK-VF1IC4-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
		; CHECK-VF1IC4-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK-VF1IC4: vector.body:
		; CHECK-VF1IC4-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI:%.]] = phi i64 [ [[MM:%.]], [[VECTOR_PH]] ], [ [[TMP12:%.*]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI1:%.]] = phi i64 [ [[MM]], [[VECTOR_PH]] ], [ [[TMP13:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI2:%.]] = phi i64 [ [[MM]], [[VECTOR_PH]] ], [ [[TMP14:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI3:%.]] = phi i64 [ [[MM]], [[VECTOR_PH]] ], [ [[TMP15:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI4:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP20:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI5:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP21:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI6:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP22:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI7:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP23:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
		; CHECK-VF1IC4-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
		; CHECK-VF1IC4-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 2
		; CHECK-VF1IC4-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 3
		; CHECK-VF1IC4-NEXT: [[TMP4:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP0]]
		; CHECK-VF1IC4-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP1]]
		; CHECK-VF1IC4-NEXT: [[TMP6:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP2]]
		; CHECK-VF1IC4-NEXT: [[TMP7:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP3]]
		; CHECK-VF1IC4-NEXT: [[TMP8:%.*]] = load i64, ptr [[TMP4]], align 4
		; CHECK-VF1IC4-NEXT: [[TMP9:%.*]] = load i64, ptr [[TMP5]], align 4
		; CHECK-VF1IC4-NEXT: [[TMP10:%.*]] = load i64, ptr [[TMP6]], align 4
		; CHECK-VF1IC4-NEXT: [[TMP11:%.*]] = load i64, ptr [[TMP7]], align 4
		; CHECK-VF1IC4-NEXT: [[TMP12]] = tail call i64 @llvm.smax.i64(i64 [[VEC_PHI]], i64 [[TMP8]])
		; CHECK-VF1IC4-NEXT: [[TMP13]] = tail call i64 @llvm.smax.i64(i64 [[VEC_PHI1]], i64 [[TMP9]])
		; CHECK-VF1IC4-NEXT: [[TMP14]] = tail call i64 @llvm.smax.i64(i64 [[VEC_PHI2]], i64 [[TMP10]])
		; CHECK-VF1IC4-NEXT: [[TMP15]] = tail call i64 @llvm.smax.i64(i64 [[VEC_PHI3]], i64 [[TMP11]])
		; CHECK-VF1IC4-NEXT: [[TMP16:%.*]] = icmp slt i64 [[VEC_PHI]], [[TMP8]]
		; CHECK-VF1IC4-NEXT: [[TMP17:%.*]] = icmp slt i64 [[VEC_PHI1]], [[TMP9]]
		; CHECK-VF1IC4-NEXT: [[TMP18:%.*]] = icmp slt i64 [[VEC_PHI2]], [[TMP10]]
		; CHECK-VF1IC4-NEXT: [[TMP19:%.*]] = icmp slt i64 [[VEC_PHI3]], [[TMP11]]
		; CHECK-VF1IC4-NEXT: [[TMP20]] = select i1 [[TMP16]], i64 [[TMP0]], i64 [[VEC_PHI4]]
		; CHECK-VF1IC4-NEXT: [[TMP21]] = select i1 [[TMP17]], i64 [[TMP1]], i64 [[VEC_PHI5]]
		; CHECK-VF1IC4-NEXT: [[TMP22]] = select i1 [[TMP18]], i64 [[TMP2]], i64 [[VEC_PHI6]]
		; CHECK-VF1IC4-NEXT: [[TMP23]] = select i1 [[TMP19]], i64 [[TMP3]], i64 [[VEC_PHI7]]
		; CHECK-VF1IC4-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
		; CHECK-VF1IC4-NEXT: [[TMP24:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
		; CHECK-VF1IC4-NEXT: br i1 [[TMP24]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
		; CHECK-VF1IC4: middle.block:
		; CHECK-VF1IC4-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp sgt i64 [[TMP12]], [[TMP13]]
		; CHECK-VF1IC4-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select i1 [[RDX_MINMAX_CMP]], i64 [[TMP12]], i64 [[TMP13]]
		; CHECK-VF1IC4-NEXT: [[RDX_MINMAX_CMP8:%.*]] = icmp sgt i64 [[RDX_MINMAX_SELECT]], [[TMP14]]
		; CHECK-VF1IC4-NEXT: [[RDX_MINMAX_SELECT9:%.*]] = select i1 [[RDX_MINMAX_CMP8]], i64 [[RDX_MINMAX_SELECT]], i64 [[TMP14]]
		; CHECK-VF1IC4-NEXT: [[RDX_MINMAX_CMP10:%.*]] = icmp sgt i64 [[RDX_MINMAX_SELECT9]], [[TMP15]]
		; CHECK-VF1IC4-NEXT: [[RDX_MINMAX_SELECT11:%.*]] = select i1 [[RDX_MINMAX_CMP10]], i64 [[RDX_MINMAX_SELECT9]], i64 [[TMP15]]
		; CHECK-VF1IC4-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
		; CHECK-VF1IC4-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp sge i64 [[TMP12]], [[TMP13]]
		; CHECK-VF1IC4-NEXT: [[RDX_SELECT:%.*]] = select i1 [[RDX_SELECT_CMP]], i64 [[TMP20]], i64 [[TMP21]]
		; CHECK-VF1IC4-NEXT: [[RDX_SELECT_CMP12:%.*]] = icmp sge i64 [[RDX_MINMAX_SELECT]], [[TMP14]]
		; CHECK-VF1IC4-NEXT: [[RDX_SELECT13:%.*]] = select i1 [[RDX_SELECT_CMP12]], i64 [[RDX_SELECT]], i64 [[TMP22]]
		; CHECK-VF1IC4-NEXT: [[RDX_SELECT_CMP14:%.*]] = icmp sge i64 [[RDX_MINMAX_SELECT9]], [[TMP15]]
		; CHECK-VF1IC4-NEXT: [[RDX_SELECT15:%.*]] = select i1 [[RDX_SELECT_CMP14]], i64 [[RDX_SELECT13]], i64 [[TMP23]]
		; CHECK-VF1IC4-NEXT: [[RDX_SELECT_CMP16:%.*]] = icmp ne i64 [[RDX_SELECT15]], -9223372036854775808
		; CHECK-VF1IC4-NEXT: [[RDX_SELECT17:%.]] = select i1 [[RDX_SELECT_CMP16]], i64 [[RDX_SELECT15]], i64 [[II:%.]]
		; CHECK-VF1IC4-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
		; CHECK-VF1IC4: scalar.ph:
		; CHECK-VF1IC4-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
		; CHECK-VF1IC4-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ [[MM]], [[ENTRY]] ], [ [[RDX_MINMAX_SELECT11]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF1IC4-NEXT: [[BC_MERGE_RDX18:%.*]] = phi i64 [ [[II]], [[ENTRY]] ], [ [[RDX_SELECT17]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF1IC4-NEXT: br label [[FOR_BODY:%.*]]
		; CHECK-VF1IC4: for.body:
		; CHECK-VF1IC4-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[MAX_09:%.]] = phi i64 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[TMP26:%.]], [[FOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[IDX_011:%.]] = phi i64 [ [[BC_MERGE_RDX18]], [[SCALAR_PH]] ], [ [[SPEC_SELECT7:%.]], [[FOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[INDVARS_IV]]
		; CHECK-VF1IC4-NEXT: [[TMP25:%.*]] = load i64, ptr [[ARRAYIDX]], align 4
		; CHECK-VF1IC4-NEXT: [[TMP26]] = tail call i64 @llvm.smax.i64(i64 [[MAX_09]], i64 [[TMP25]])
		; CHECK-VF1IC4-NEXT: [[CMP1:%.*]] = icmp slt i64 [[MAX_09]], [[TMP25]]
		; CHECK-VF1IC4-NEXT: [[SPEC_SELECT7]] = select i1 [[CMP1]], i64 [[INDVARS_IV]], i64 [[IDX_011]]
		; CHECK-VF1IC4-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
		; CHECK-VF1IC4-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]
		; CHECK-VF1IC4-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
		; CHECK-VF1IC4: exit:
		; CHECK-VF1IC4-NEXT: [[DOTLCSSA:%.*]] = phi i64 [ [[TMP26]], [[FOR_BODY]] ], [ [[RDX_MINMAX_SELECT11]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF1IC4-NEXT: [[SPEC_SELECT7_LCSSA:%.*]] = phi i64 [ [[SPEC_SELECT7]], [[FOR_BODY]] ], [ [[RDX_SELECT17]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF1IC4-NEXT: store i64 [[DOTLCSSA]], ptr [[RES_MAX:%.*]], align 4
		; CHECK-VF1IC4-NEXT: ret i64 [[SPEC_SELECT7_LCSSA]]
;		;
entry:		entry:
br label %for.body		br label %for.body

for.body:		for.body:
%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]		%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
%max.09 = phi i64 [ %mm, %entry ], [ %1, %for.body ]		%max.09 = phi i64 [ %mm, %entry ], [ %1, %for.body ]
%idx.011 = phi i64 [ %ii, %entry ], [ %spec.select7, %for.body ]		%idx.011 = phi i64 [ %ii, %entry ], [ %spec.select7, %for.body ]
Show All 10 Lines	exit:
store i64 %1, ptr %res_max		store i64 %1, ptr %res_max
ret i64 %spec.select7		ret i64 %spec.select7
}		}

;		;
; Check the different order of reduction phis.		; Check the different order of reduction phis.
;		;
define i64 @smax_idx_inverted_phi(ptr nocapture readonly %a, i64 %mm, i64 %ii, ptr nocapture writeonly %res_max, i64 %n) {		define i64 @smax_idx_inverted_phi(ptr nocapture readonly %a, i64 %mm, i64 %ii, ptr nocapture writeonly %res_max, i64 %n) {
; CHECK-LABEL: @smax_idx_inverted_phi(		; CHECK-VF4IC1-LABEL: @smax_idx_inverted_phi(
; CHECK-NOT: vector.body:		; CHECK-VF4IC1-NEXT: entry:
		; CHECK-VF4IC1-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 4
		; CHECK-VF4IC1-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK-VF4IC1: vector.ph:
		; CHECK-VF4IC1-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 4
		; CHECK-VF4IC1-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
		; CHECK-VF4IC1-NEXT: [[MINMAX_IDENT_SPLATINSERT:%.]] = insertelement <4 x i64> poison, i64 [[MM:%.]], i64 0
		; CHECK-VF4IC1-NEXT: [[MINMAX_IDENT_SPLAT:%.*]] = shufflevector <4 x i64> [[MINMAX_IDENT_SPLATINSERT]], <4 x i64> poison, <4 x i32> zeroinitializer
		; CHECK-VF4IC1-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK-VF4IC1: vector.body:
		; CHECK-VF4IC1-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[VEC_PHI:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP5:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[VEC_PHI1:%.]] = phi <4 x i64> [ [[MINMAX_IDENT_SPLAT]], [[VECTOR_PH]] ], [ [[TMP3:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
		; CHECK-VF4IC1-NEXT: [[TMP1:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP0]]
		; CHECK-VF4IC1-NEXT: [[TMP2:%.*]] = getelementptr inbounds i64, ptr [[TMP1]], i32 0
		; CHECK-VF4IC1-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i64>, ptr [[TMP2]], align 4
		; CHECK-VF4IC1-NEXT: [[TMP3]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[VEC_PHI1]], <4 x i64> [[WIDE_LOAD]])
		; CHECK-VF4IC1-NEXT: [[TMP4:%.*]] = icmp slt <4 x i64> [[VEC_PHI1]], [[WIDE_LOAD]]
		; CHECK-VF4IC1-NEXT: [[TMP5]] = select <4 x i1> [[TMP4]], <4 x i64> [[VEC_IND]], <4 x i64> [[VEC_PHI]]
		; CHECK-VF4IC1-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
		; CHECK-VF4IC1-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC1-NEXT: [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
		; CHECK-VF4IC1-NEXT: br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
		; CHECK-VF4IC1: middle.block:
		; CHECK-VF4IC1-NEXT: [[TMP7:%.*]] = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> [[TMP3]])
		; CHECK-VF4IC1-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <4 x i64> poison, i64 [[TMP7]], i64 0
		; CHECK-VF4IC1-NEXT: [[DOTSPLAT:%.*]] = shufflevector <4 x i64> [[DOTSPLATINSERT]], <4 x i64> poison, <4 x i32> zeroinitializer
		; CHECK-VF4IC1-NEXT: [[MASK_CMP:%.*]] = icmp eq <4 x i64> [[DOTSPLAT]], [[TMP3]]
		; CHECK-VF4IC1-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
		; CHECK-VF4IC1-NEXT: [[MASK_SELECT:%.*]] = select <4 x i1> [[MASK_CMP]], <4 x i64> [[TMP5]], <4 x i64> <i64 9223372036854775807, i64 9223372036854775807, i64 9223372036854775807, i64 9223372036854775807>
		; CHECK-VF4IC1-NEXT: [[TMP8:%.*]] = call i64 @llvm.vector.reduce.smin.v4i64(<4 x i64> [[MASK_SELECT]])
		; CHECK-VF4IC1-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[TMP8]], -9223372036854775808
		; CHECK-VF4IC1-NEXT: [[RDX_SELECT:%.]] = select i1 [[RDX_SELECT_CMP]], i64 [[TMP8]], i64 [[II:%.]]
		; CHECK-VF4IC1-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
		; CHECK-VF4IC1: scalar.ph:
		; CHECK-VF4IC1-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
		; CHECK-VF4IC1-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ [[MM]], [[ENTRY]] ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC1-NEXT: [[BC_MERGE_RDX2:%.*]] = phi i64 [ [[II]], [[ENTRY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC1-NEXT: br label [[FOR_BODY:%.*]]
		; CHECK-VF4IC1: for.body:
		; CHECK-VF4IC1-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[IDX_011:%.]] = phi i64 [ [[BC_MERGE_RDX2]], [[SCALAR_PH]] ], [ [[SPEC_SELECT7:%.]], [[FOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[MAX_09:%.]] = phi i64 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[TMP10:%.]], [[FOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[INDVARS_IV]]
		; CHECK-VF4IC1-NEXT: [[TMP9:%.*]] = load i64, ptr [[ARRAYIDX]], align 4
		; CHECK-VF4IC1-NEXT: [[TMP10]] = tail call i64 @llvm.smax.i64(i64 [[MAX_09]], i64 [[TMP9]])
		; CHECK-VF4IC1-NEXT: [[CMP1:%.*]] = icmp slt i64 [[MAX_09]], [[TMP9]]
		; CHECK-VF4IC1-NEXT: [[SPEC_SELECT7]] = select i1 [[CMP1]], i64 [[INDVARS_IV]], i64 [[IDX_011]]
		; CHECK-VF4IC1-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
		; CHECK-VF4IC1-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]
		; CHECK-VF4IC1-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
		; CHECK-VF4IC1: exit:
		; CHECK-VF4IC1-NEXT: [[DOTLCSSA:%.*]] = phi i64 [ [[TMP10]], [[FOR_BODY]] ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC1-NEXT: [[SPEC_SELECT7_LCSSA:%.*]] = phi i64 [ [[SPEC_SELECT7]], [[FOR_BODY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC1-NEXT: store i64 [[DOTLCSSA]], ptr [[RES_MAX:%.*]], align 4
		; CHECK-VF4IC1-NEXT: ret i64 [[SPEC_SELECT7_LCSSA]]
		;
		; CHECK-VF4IC4-LABEL: @smax_idx_inverted_phi(
		; CHECK-VF4IC4-NEXT: entry:
		; CHECK-VF4IC4-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 16
		; CHECK-VF4IC4-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK-VF4IC4: vector.ph:
		; CHECK-VF4IC4-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 16
		; CHECK-VF4IC4-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
		; CHECK-VF4IC4-NEXT: [[MINMAX_IDENT_SPLATINSERT:%.]] = insertelement <4 x i64> poison, i64 [[MM:%.]], i64 0
		; CHECK-VF4IC4-NEXT: [[MINMAX_IDENT_SPLAT:%.*]] = shufflevector <4 x i64> [[MINMAX_IDENT_SPLATINSERT]], <4 x i64> poison, <4 x i32> zeroinitializer
		; CHECK-VF4IC4-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK-VF4IC4: vector.body:
		; CHECK-VF4IC4-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP20:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI4:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP21:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI5:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP22:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI6:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP23:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI7:%.]] = phi <4 x i64> [ [[MINMAX_IDENT_SPLAT]], [[VECTOR_PH]] ], [ [[TMP12:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI8:%.]] = phi <4 x i64> [ [[MINMAX_IDENT_SPLAT]], [[VECTOR_PH]] ], [ [[TMP13:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI9:%.]] = phi <4 x i64> [ [[MINMAX_IDENT_SPLAT]], [[VECTOR_PH]] ], [ [[TMP14:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI10:%.]] = phi <4 x i64> [ [[MINMAX_IDENT_SPLAT]], [[VECTOR_PH]] ], [ [[TMP15:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[STEP_ADD:%.*]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC4-NEXT: [[STEP_ADD1:%.*]] = add <4 x i64> [[STEP_ADD]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC4-NEXT: [[STEP_ADD2:%.*]] = add <4 x i64> [[STEP_ADD1]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC4-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
		; CHECK-VF4IC4-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 4
		; CHECK-VF4IC4-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 8
		; CHECK-VF4IC4-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 12
		; CHECK-VF4IC4-NEXT: [[TMP4:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP0]]
		; CHECK-VF4IC4-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP1]]
		; CHECK-VF4IC4-NEXT: [[TMP6:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP2]]
		; CHECK-VF4IC4-NEXT: [[TMP7:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP3]]
		; CHECK-VF4IC4-NEXT: [[TMP8:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 0
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i64>, ptr [[TMP8]], align 4
		; CHECK-VF4IC4-NEXT: [[TMP9:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 4
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD11:%.*]] = load <4 x i64>, ptr [[TMP9]], align 4
		; CHECK-VF4IC4-NEXT: [[TMP10:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 8
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD12:%.*]] = load <4 x i64>, ptr [[TMP10]], align 4
		; CHECK-VF4IC4-NEXT: [[TMP11:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 12
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD13:%.*]] = load <4 x i64>, ptr [[TMP11]], align 4
		; CHECK-VF4IC4-NEXT: [[TMP12]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[VEC_PHI7]], <4 x i64> [[WIDE_LOAD]])
		; CHECK-VF4IC4-NEXT: [[TMP13]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[VEC_PHI8]], <4 x i64> [[WIDE_LOAD11]])
		; CHECK-VF4IC4-NEXT: [[TMP14]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[VEC_PHI9]], <4 x i64> [[WIDE_LOAD12]])
		; CHECK-VF4IC4-NEXT: [[TMP15]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[VEC_PHI10]], <4 x i64> [[WIDE_LOAD13]])
		; CHECK-VF4IC4-NEXT: [[TMP16:%.*]] = icmp slt <4 x i64> [[VEC_PHI7]], [[WIDE_LOAD]]
		; CHECK-VF4IC4-NEXT: [[TMP17:%.*]] = icmp slt <4 x i64> [[VEC_PHI8]], [[WIDE_LOAD11]]
		; CHECK-VF4IC4-NEXT: [[TMP18:%.*]] = icmp slt <4 x i64> [[VEC_PHI9]], [[WIDE_LOAD12]]
		; CHECK-VF4IC4-NEXT: [[TMP19:%.*]] = icmp slt <4 x i64> [[VEC_PHI10]], [[WIDE_LOAD13]]
		; CHECK-VF4IC4-NEXT: [[TMP20]] = select <4 x i1> [[TMP16]], <4 x i64> [[VEC_IND]], <4 x i64> [[VEC_PHI]]
		; CHECK-VF4IC4-NEXT: [[TMP21]] = select <4 x i1> [[TMP17]], <4 x i64> [[STEP_ADD]], <4 x i64> [[VEC_PHI4]]
		; CHECK-VF4IC4-NEXT: [[TMP22]] = select <4 x i1> [[TMP18]], <4 x i64> [[STEP_ADD1]], <4 x i64> [[VEC_PHI5]]
		; CHECK-VF4IC4-NEXT: [[TMP23]] = select <4 x i1> [[TMP19]], <4 x i64> [[STEP_ADD2]], <4 x i64> [[VEC_PHI6]]
		; CHECK-VF4IC4-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16
		; CHECK-VF4IC4-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[STEP_ADD2]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC4-NEXT: [[TMP24:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
		; CHECK-VF4IC4-NEXT: br i1 [[TMP24]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
		; CHECK-VF4IC4: middle.block:
		; CHECK-VF4IC4-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp sgt <4 x i64> [[TMP12]], [[TMP13]]
		; CHECK-VF4IC4-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP]], <4 x i64> [[TMP12]], <4 x i64> [[TMP13]]
		; CHECK-VF4IC4-NEXT: [[RDX_MINMAX_CMP14:%.*]] = icmp sgt <4 x i64> [[RDX_MINMAX_SELECT]], [[TMP14]]
		; CHECK-VF4IC4-NEXT: [[RDX_MINMAX_SELECT15:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP14]], <4 x i64> [[RDX_MINMAX_SELECT]], <4 x i64> [[TMP14]]
		; CHECK-VF4IC4-NEXT: [[RDX_MINMAX_CMP16:%.*]] = icmp sgt <4 x i64> [[RDX_MINMAX_SELECT15]], [[TMP15]]
		; CHECK-VF4IC4-NEXT: [[RDX_MINMAX_SELECT17:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP16]], <4 x i64> [[RDX_MINMAX_SELECT15]], <4 x i64> [[TMP15]]
		; CHECK-VF4IC4-NEXT: [[TMP25:%.*]] = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> [[RDX_MINMAX_SELECT17]])
		; CHECK-VF4IC4-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <4 x i64> poison, i64 [[TMP25]], i64 0
		; CHECK-VF4IC4-NEXT: [[DOTSPLAT:%.*]] = shufflevector <4 x i64> [[DOTSPLATINSERT]], <4 x i64> poison, <4 x i32> zeroinitializer
		; CHECK-VF4IC4-NEXT: [[MASK_CMP:%.*]] = icmp eq <4 x i64> [[DOTSPLAT]], [[RDX_MINMAX_SELECT17]]
		; CHECK-VF4IC4-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
		; CHECK-VF4IC4-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp sge <4 x i64> [[TMP12]], [[TMP13]]
		; CHECK-VF4IC4-NEXT: [[RDX_SELECT:%.*]] = select <4 x i1> [[RDX_SELECT_CMP]], <4 x i64> [[TMP20]], <4 x i64> [[TMP21]]
		; CHECK-VF4IC4-NEXT: [[RDX_SELECT_CMP18:%.*]] = icmp sge <4 x i64> [[RDX_MINMAX_SELECT]], [[TMP14]]
		; CHECK-VF4IC4-NEXT: [[RDX_SELECT19:%.*]] = select <4 x i1> [[RDX_SELECT_CMP18]], <4 x i64> [[RDX_SELECT]], <4 x i64> [[TMP22]]
		; CHECK-VF4IC4-NEXT: [[RDX_SELECT_CMP20:%.*]] = icmp sge <4 x i64> [[RDX_MINMAX_SELECT15]], [[TMP15]]
		; CHECK-VF4IC4-NEXT: [[RDX_SELECT21:%.*]] = select <4 x i1> [[RDX_SELECT_CMP20]], <4 x i64> [[RDX_SELECT19]], <4 x i64> [[TMP23]]
		; CHECK-VF4IC4-NEXT: [[MASK_SELECT:%.*]] = select <4 x i1> [[MASK_CMP]], <4 x i64> [[RDX_SELECT21]], <4 x i64> <i64 9223372036854775807, i64 9223372036854775807, i64 9223372036854775807, i64 9223372036854775807>
		; CHECK-VF4IC4-NEXT: [[TMP26:%.*]] = call i64 @llvm.vector.reduce.smin.v4i64(<4 x i64> [[MASK_SELECT]])
		; CHECK-VF4IC4-NEXT: [[RDX_SELECT_CMP22:%.*]] = icmp ne i64 [[TMP26]], -9223372036854775808
		; CHECK-VF4IC4-NEXT: [[RDX_SELECT23:%.]] = select i1 [[RDX_SELECT_CMP22]], i64 [[TMP26]], i64 [[II:%.]]
		; CHECK-VF4IC4-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
		; CHECK-VF4IC4: scalar.ph:
		; CHECK-VF4IC4-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
		; CHECK-VF4IC4-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ [[MM]], [[ENTRY]] ], [ [[TMP25]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC4-NEXT: [[BC_MERGE_RDX24:%.*]] = phi i64 [ [[II]], [[ENTRY]] ], [ [[RDX_SELECT23]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC4-NEXT: br label [[FOR_BODY:%.*]]
		; CHECK-VF4IC4: for.body:
		; CHECK-VF4IC4-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[IDX_011:%.]] = phi i64 [ [[BC_MERGE_RDX24]], [[SCALAR_PH]] ], [ [[SPEC_SELECT7:%.]], [[FOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[MAX_09:%.]] = phi i64 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[TMP28:%.]], [[FOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[INDVARS_IV]]
		; CHECK-VF4IC4-NEXT: [[TMP27:%.*]] = load i64, ptr [[ARRAYIDX]], align 4
		; CHECK-VF4IC4-NEXT: [[TMP28]] = tail call i64 @llvm.smax.i64(i64 [[MAX_09]], i64 [[TMP27]])
		; CHECK-VF4IC4-NEXT: [[CMP1:%.*]] = icmp slt i64 [[MAX_09]], [[TMP27]]
		; CHECK-VF4IC4-NEXT: [[SPEC_SELECT7]] = select i1 [[CMP1]], i64 [[INDVARS_IV]], i64 [[IDX_011]]
		; CHECK-VF4IC4-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
		; CHECK-VF4IC4-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]
		; CHECK-VF4IC4-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
		; CHECK-VF4IC4: exit:
		; CHECK-VF4IC4-NEXT: [[DOTLCSSA:%.*]] = phi i64 [ [[TMP28]], [[FOR_BODY]] ], [ [[TMP25]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC4-NEXT: [[SPEC_SELECT7_LCSSA:%.*]] = phi i64 [ [[SPEC_SELECT7]], [[FOR_BODY]] ], [ [[RDX_SELECT23]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC4-NEXT: store i64 [[DOTLCSSA]], ptr [[RES_MAX:%.*]], align 4
		; CHECK-VF4IC4-NEXT: ret i64 [[SPEC_SELECT7_LCSSA]]
		;
		; CHECK-VF1IC4-LABEL: @smax_idx_inverted_phi(
		; CHECK-VF1IC4-NEXT: entry:
		; CHECK-VF1IC4-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 4
		; CHECK-VF1IC4-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK-VF1IC4: vector.ph:
		; CHECK-VF1IC4-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 4
		; CHECK-VF1IC4-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
		; CHECK-VF1IC4-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK-VF1IC4: vector.body:
		; CHECK-VF1IC4-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP20:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI1:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP21:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI2:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP22:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI3:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP23:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI4:%.]] = phi i64 [ [[MM:%.]], [[VECTOR_PH]] ], [ [[TMP12:%.*]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI5:%.]] = phi i64 [ [[MM]], [[VECTOR_PH]] ], [ [[TMP13:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI6:%.]] = phi i64 [ [[MM]], [[VECTOR_PH]] ], [ [[TMP14:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI7:%.]] = phi i64 [ [[MM]], [[VECTOR_PH]] ], [ [[TMP15:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
		; CHECK-VF1IC4-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
		; CHECK-VF1IC4-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 2
		; CHECK-VF1IC4-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 3
		; CHECK-VF1IC4-NEXT: [[TMP4:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP0]]
		; CHECK-VF1IC4-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP1]]
		; CHECK-VF1IC4-NEXT: [[TMP6:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP2]]
		; CHECK-VF1IC4-NEXT: [[TMP7:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP3]]
		; CHECK-VF1IC4-NEXT: [[TMP8:%.*]] = load i64, ptr [[TMP4]], align 4
		; CHECK-VF1IC4-NEXT: [[TMP9:%.*]] = load i64, ptr [[TMP5]], align 4
		; CHECK-VF1IC4-NEXT: [[TMP10:%.*]] = load i64, ptr [[TMP6]], align 4
		; CHECK-VF1IC4-NEXT: [[TMP11:%.*]] = load i64, ptr [[TMP7]], align 4
		; CHECK-VF1IC4-NEXT: [[TMP12]] = tail call i64 @llvm.smax.i64(i64 [[VEC_PHI4]], i64 [[TMP8]])
		; CHECK-VF1IC4-NEXT: [[TMP13]] = tail call i64 @llvm.smax.i64(i64 [[VEC_PHI5]], i64 [[TMP9]])
		; CHECK-VF1IC4-NEXT: [[TMP14]] = tail call i64 @llvm.smax.i64(i64 [[VEC_PHI6]], i64 [[TMP10]])
		; CHECK-VF1IC4-NEXT: [[TMP15]] = tail call i64 @llvm.smax.i64(i64 [[VEC_PHI7]], i64 [[TMP11]])
		; CHECK-VF1IC4-NEXT: [[TMP16:%.*]] = icmp slt i64 [[VEC_PHI4]], [[TMP8]]
		; CHECK-VF1IC4-NEXT: [[TMP17:%.*]] = icmp slt i64 [[VEC_PHI5]], [[TMP9]]
		; CHECK-VF1IC4-NEXT: [[TMP18:%.*]] = icmp slt i64 [[VEC_PHI6]], [[TMP10]]
		; CHECK-VF1IC4-NEXT: [[TMP19:%.*]] = icmp slt i64 [[VEC_PHI7]], [[TMP11]]
		; CHECK-VF1IC4-NEXT: [[TMP20]] = select i1 [[TMP16]], i64 [[TMP0]], i64 [[VEC_PHI]]
		; CHECK-VF1IC4-NEXT: [[TMP21]] = select i1 [[TMP17]], i64 [[TMP1]], i64 [[VEC_PHI1]]
		; CHECK-VF1IC4-NEXT: [[TMP22]] = select i1 [[TMP18]], i64 [[TMP2]], i64 [[VEC_PHI2]]
		; CHECK-VF1IC4-NEXT: [[TMP23]] = select i1 [[TMP19]], i64 [[TMP3]], i64 [[VEC_PHI3]]
		; CHECK-VF1IC4-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
		; CHECK-VF1IC4-NEXT: [[TMP24:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
		; CHECK-VF1IC4-NEXT: br i1 [[TMP24]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
		; CHECK-VF1IC4: middle.block:
		; CHECK-VF1IC4-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp sgt i64 [[TMP12]], [[TMP13]]
		; CHECK-VF1IC4-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select i1 [[RDX_MINMAX_CMP]], i64 [[TMP12]], i64 [[TMP13]]
		; CHECK-VF1IC4-NEXT: [[RDX_MINMAX_CMP8:%.*]] = icmp sgt i64 [[RDX_MINMAX_SELECT]], [[TMP14]]
		; CHECK-VF1IC4-NEXT: [[RDX_MINMAX_SELECT9:%.*]] = select i1 [[RDX_MINMAX_CMP8]], i64 [[RDX_MINMAX_SELECT]], i64 [[TMP14]]
		; CHECK-VF1IC4-NEXT: [[RDX_MINMAX_CMP10:%.*]] = icmp sgt i64 [[RDX_MINMAX_SELECT9]], [[TMP15]]
		; CHECK-VF1IC4-NEXT: [[RDX_MINMAX_SELECT11:%.*]] = select i1 [[RDX_MINMAX_CMP10]], i64 [[RDX_MINMAX_SELECT9]], i64 [[TMP15]]
		; CHECK-VF1IC4-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
		; CHECK-VF1IC4-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp sge i64 [[TMP12]], [[TMP13]]
		; CHECK-VF1IC4-NEXT: [[RDX_SELECT:%.*]] = select i1 [[RDX_SELECT_CMP]], i64 [[TMP20]], i64 [[TMP21]]
		; CHECK-VF1IC4-NEXT: [[RDX_SELECT_CMP12:%.*]] = icmp sge i64 [[RDX_MINMAX_SELECT]], [[TMP14]]
		; CHECK-VF1IC4-NEXT: [[RDX_SELECT13:%.*]] = select i1 [[RDX_SELECT_CMP12]], i64 [[RDX_SELECT]], i64 [[TMP22]]
		; CHECK-VF1IC4-NEXT: [[RDX_SELECT_CMP14:%.*]] = icmp sge i64 [[RDX_MINMAX_SELECT9]], [[TMP15]]
		; CHECK-VF1IC4-NEXT: [[RDX_SELECT15:%.*]] = select i1 [[RDX_SELECT_CMP14]], i64 [[RDX_SELECT13]], i64 [[TMP23]]
		; CHECK-VF1IC4-NEXT: [[RDX_SELECT_CMP16:%.*]] = icmp ne i64 [[RDX_SELECT15]], -9223372036854775808
		; CHECK-VF1IC4-NEXT: [[RDX_SELECT17:%.]] = select i1 [[RDX_SELECT_CMP16]], i64 [[RDX_SELECT15]], i64 [[II:%.]]
		; CHECK-VF1IC4-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
		; CHECK-VF1IC4: scalar.ph:
		; CHECK-VF1IC4-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
		; CHECK-VF1IC4-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ [[MM]], [[ENTRY]] ], [ [[RDX_MINMAX_SELECT11]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF1IC4-NEXT: [[BC_MERGE_RDX18:%.*]] = phi i64 [ [[II]], [[ENTRY]] ], [ [[RDX_SELECT17]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF1IC4-NEXT: br label [[FOR_BODY:%.*]]
		; CHECK-VF1IC4: for.body:
		; CHECK-VF1IC4-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[IDX_011:%.]] = phi i64 [ [[BC_MERGE_RDX18]], [[SCALAR_PH]] ], [ [[SPEC_SELECT7:%.]], [[FOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[MAX_09:%.]] = phi i64 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[TMP26:%.]], [[FOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[INDVARS_IV]]
		; CHECK-VF1IC4-NEXT: [[TMP25:%.*]] = load i64, ptr [[ARRAYIDX]], align 4
		; CHECK-VF1IC4-NEXT: [[TMP26]] = tail call i64 @llvm.smax.i64(i64 [[MAX_09]], i64 [[TMP25]])
		; CHECK-VF1IC4-NEXT: [[CMP1:%.*]] = icmp slt i64 [[MAX_09]], [[TMP25]]
		; CHECK-VF1IC4-NEXT: [[SPEC_SELECT7]] = select i1 [[CMP1]], i64 [[INDVARS_IV]], i64 [[IDX_011]]
		; CHECK-VF1IC4-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
		; CHECK-VF1IC4-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]
		; CHECK-VF1IC4-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
		; CHECK-VF1IC4: exit:
		; CHECK-VF1IC4-NEXT: [[DOTLCSSA:%.*]] = phi i64 [ [[TMP26]], [[FOR_BODY]] ], [ [[RDX_MINMAX_SELECT11]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF1IC4-NEXT: [[SPEC_SELECT7_LCSSA:%.*]] = phi i64 [ [[SPEC_SELECT7]], [[FOR_BODY]] ], [ [[RDX_SELECT17]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF1IC4-NEXT: store i64 [[DOTLCSSA]], ptr [[RES_MAX:%.*]], align 4
		; CHECK-VF1IC4-NEXT: ret i64 [[SPEC_SELECT7_LCSSA]]
;		;
entry:		entry:
br label %for.body		br label %for.body

for.body:		for.body:
%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]		%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
%idx.011 = phi i64 [ %ii, %entry ], [ %spec.select7, %for.body ]		%idx.011 = phi i64 [ %ii, %entry ], [ %spec.select7, %for.body ]
%max.09 = phi i64 [ %mm, %entry ], [ %1, %for.body ]		%max.09 = phi i64 [ %mm, %entry ], [ %1, %for.body ]
▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	exit:
store i64 %spec.select, ptr %res_max		store i64 %spec.select, ptr %res_max
ret i64 %spec.select7		ret i64 %spec.select7
}		}

;		;
; Check sge case.		; Check sge case.
;		;
define i64 @smax_idx_inverted_pred(ptr nocapture readonly %a, i64 %mm, i64 %ii, ptr nocapture writeonly %res_max, i64 %n) {		define i64 @smax_idx_inverted_pred(ptr nocapture readonly %a, i64 %mm, i64 %ii, ptr nocapture writeonly %res_max, i64 %n) {
; CHECK-LABEL: @smax_idx_inverted_pred(		; CHECK-VF4IC1-LABEL: @smax_idx_inverted_pred(
; CHECK-NOT: vector.body:		; CHECK-VF4IC1-NEXT: entry:
		; CHECK-VF4IC1-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 4
		; CHECK-VF4IC1-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK-VF4IC1: vector.ph:
		; CHECK-VF4IC1-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 4
		; CHECK-VF4IC1-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
		; CHECK-VF4IC1-NEXT: [[MINMAX_IDENT_SPLATINSERT:%.]] = insertelement <4 x i64> poison, i64 [[MM:%.]], i64 0
		; CHECK-VF4IC1-NEXT: [[MINMAX_IDENT_SPLAT:%.*]] = shufflevector <4 x i64> [[MINMAX_IDENT_SPLATINSERT]], <4 x i64> poison, <4 x i32> zeroinitializer
		; CHECK-VF4IC1-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK-VF4IC1: vector.body:
		; CHECK-VF4IC1-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[VEC_PHI:%.]] = phi <4 x i64> [ [[MINMAX_IDENT_SPLAT]], [[VECTOR_PH]] ], [ [[TMP3:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[VEC_PHI1:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP5:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
		; CHECK-VF4IC1-NEXT: [[TMP1:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP0]]
		; CHECK-VF4IC1-NEXT: [[TMP2:%.*]] = getelementptr inbounds i64, ptr [[TMP1]], i32 0
		; CHECK-VF4IC1-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i64>, ptr [[TMP2]], align 4
		; CHECK-VF4IC1-NEXT: [[TMP3]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[VEC_PHI]], <4 x i64> [[WIDE_LOAD]])
		; CHECK-VF4IC1-NEXT: [[TMP4:%.*]] = icmp sge <4 x i64> [[WIDE_LOAD]], [[VEC_PHI]]
		; CHECK-VF4IC1-NEXT: [[TMP5]] = select <4 x i1> [[TMP4]], <4 x i64> [[VEC_IND]], <4 x i64> [[VEC_PHI1]]
		; CHECK-VF4IC1-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
		; CHECK-VF4IC1-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC1-NEXT: [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
		; CHECK-VF4IC1-NEXT: br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
		; CHECK-VF4IC1: middle.block:
		; CHECK-VF4IC1-NEXT: [[TMP7:%.*]] = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> [[TMP3]])
		; CHECK-VF4IC1-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <4 x i64> poison, i64 [[TMP7]], i64 0
		; CHECK-VF4IC1-NEXT: [[DOTSPLAT:%.*]] = shufflevector <4 x i64> [[DOTSPLATINSERT]], <4 x i64> poison, <4 x i32> zeroinitializer
		; CHECK-VF4IC1-NEXT: [[MASK_CMP:%.*]] = icmp eq <4 x i64> [[DOTSPLAT]], [[TMP3]]
		; CHECK-VF4IC1-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
		; CHECK-VF4IC1-NEXT: [[MASK_SELECT:%.*]] = select <4 x i1> [[MASK_CMP]], <4 x i64> [[TMP5]], <4 x i64> <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>
		; CHECK-VF4IC1-NEXT: [[TMP8:%.*]] = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> [[MASK_SELECT]])
		; CHECK-VF4IC1-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[TMP8]], -9223372036854775808
		; CHECK-VF4IC1-NEXT: [[RDX_SELECT:%.]] = select i1 [[RDX_SELECT_CMP]], i64 [[TMP8]], i64 [[II:%.]]
		; CHECK-VF4IC1-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
		; CHECK-VF4IC1: scalar.ph:
		; CHECK-VF4IC1-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
		; CHECK-VF4IC1-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ [[MM]], [[ENTRY]] ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC1-NEXT: [[BC_MERGE_RDX2:%.*]] = phi i64 [ [[II]], [[ENTRY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC1-NEXT: br label [[FOR_BODY:%.*]]
		; CHECK-VF4IC1: for.body:
		; CHECK-VF4IC1-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[MAX_09:%.]] = phi i64 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[TMP10:%.]], [[FOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[IDX_011:%.]] = phi i64 [ [[BC_MERGE_RDX2]], [[SCALAR_PH]] ], [ [[SPEC_SELECT7:%.]], [[FOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[INDVARS_IV]]
		; CHECK-VF4IC1-NEXT: [[TMP9:%.*]] = load i64, ptr [[ARRAYIDX]], align 4
		; CHECK-VF4IC1-NEXT: [[TMP10]] = tail call i64 @llvm.smax.i64(i64 [[MAX_09]], i64 [[TMP9]])
		; CHECK-VF4IC1-NEXT: [[CMP1:%.*]] = icmp sge i64 [[TMP9]], [[MAX_09]]
		; CHECK-VF4IC1-NEXT: [[SPEC_SELECT7]] = select i1 [[CMP1]], i64 [[INDVARS_IV]], i64 [[IDX_011]]
		; CHECK-VF4IC1-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
		; CHECK-VF4IC1-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]
		; CHECK-VF4IC1-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]
		; CHECK-VF4IC1: exit:
		; CHECK-VF4IC1-NEXT: [[DOTLCSSA:%.*]] = phi i64 [ [[TMP10]], [[FOR_BODY]] ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC1-NEXT: [[SPEC_SELECT7_LCSSA:%.*]] = phi i64 [ [[SPEC_SELECT7]], [[FOR_BODY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC1-NEXT: store i64 [[DOTLCSSA]], ptr [[RES_MAX:%.*]], align 4
		; CHECK-VF4IC1-NEXT: ret i64 [[SPEC_SELECT7_LCSSA]]
		;
		; CHECK-VF4IC4-LABEL: @smax_idx_inverted_pred(
		; CHECK-VF4IC4-NEXT: entry:
		; CHECK-VF4IC4-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 16
		; CHECK-VF4IC4-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK-VF4IC4: vector.ph:
		; CHECK-VF4IC4-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 16
		; CHECK-VF4IC4-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
		; CHECK-VF4IC4-NEXT: [[MINMAX_IDENT_SPLATINSERT:%.]] = insertelement <4 x i64> poison, i64 [[MM:%.]], i64 0
		; CHECK-VF4IC4-NEXT: [[MINMAX_IDENT_SPLAT:%.*]] = shufflevector <4 x i64> [[MINMAX_IDENT_SPLATINSERT]], <4 x i64> poison, <4 x i32> zeroinitializer
		; CHECK-VF4IC4-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK-VF4IC4: vector.body:
		; CHECK-VF4IC4-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI:%.]] = phi <4 x i64> [ [[MINMAX_IDENT_SPLAT]], [[VECTOR_PH]] ], [ [[TMP12:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI4:%.]] = phi <4 x i64> [ [[MINMAX_IDENT_SPLAT]], [[VECTOR_PH]] ], [ [[TMP13:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI5:%.]] = phi <4 x i64> [ [[MINMAX_IDENT_SPLAT]], [[VECTOR_PH]] ], [ [[TMP14:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI6:%.]] = phi <4 x i64> [ [[MINMAX_IDENT_SPLAT]], [[VECTOR_PH]] ], [ [[TMP15:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI7:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP20:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI8:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP21:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI9:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP22:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI10:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP23:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[STEP_ADD:%.*]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC4-NEXT: [[STEP_ADD1:%.*]] = add <4 x i64> [[STEP_ADD]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC4-NEXT: [[STEP_ADD2:%.*]] = add <4 x i64> [[STEP_ADD1]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC4-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
		; CHECK-VF4IC4-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 4
		; CHECK-VF4IC4-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 8
		; CHECK-VF4IC4-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 12
		; CHECK-VF4IC4-NEXT: [[TMP4:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP0]]
		; CHECK-VF4IC4-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP1]]
		; CHECK-VF4IC4-NEXT: [[TMP6:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP2]]
		; CHECK-VF4IC4-NEXT: [[TMP7:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP3]]
		; CHECK-VF4IC4-NEXT: [[TMP8:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 0
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i64>, ptr [[TMP8]], align 4
		; CHECK-VF4IC4-NEXT: [[TMP9:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 4
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD11:%.*]] = load <4 x i64>, ptr [[TMP9]], align 4
		; CHECK-VF4IC4-NEXT: [[TMP10:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 8
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD12:%.*]] = load <4 x i64>, ptr [[TMP10]], align 4
		; CHECK-VF4IC4-NEXT: [[TMP11:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 12
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD13:%.*]] = load <4 x i64>, ptr [[TMP11]], align 4
		; CHECK-VF4IC4-NEXT: [[TMP12]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[VEC_PHI]], <4 x i64> [[WIDE_LOAD]])
		; CHECK-VF4IC4-NEXT: [[TMP13]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[VEC_PHI4]], <4 x i64> [[WIDE_LOAD11]])
		; CHECK-VF4IC4-NEXT: [[TMP14]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[VEC_PHI5]], <4 x i64> [[WIDE_LOAD12]])
		; CHECK-VF4IC4-NEXT: [[TMP15]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[VEC_PHI6]], <4 x i64> [[WIDE_LOAD13]])
		; CHECK-VF4IC4-NEXT: [[TMP16:%.*]] = icmp sge <4 x i64> [[WIDE_LOAD]], [[VEC_PHI]]
		; CHECK-VF4IC4-NEXT: [[TMP17:%.*]] = icmp sge <4 x i64> [[WIDE_LOAD11]], [[VEC_PHI4]]
		; CHECK-VF4IC4-NEXT: [[TMP18:%.*]] = icmp sge <4 x i64> [[WIDE_LOAD12]], [[VEC_PHI5]]
		; CHECK-VF4IC4-NEXT: [[TMP19:%.*]] = icmp sge <4 x i64> [[WIDE_LOAD13]], [[VEC_PHI6]]
		; CHECK-VF4IC4-NEXT: [[TMP20]] = select <4 x i1> [[TMP16]], <4 x i64> [[VEC_IND]], <4 x i64> [[VEC_PHI7]]
		; CHECK-VF4IC4-NEXT: [[TMP21]] = select <4 x i1> [[TMP17]], <4 x i64> [[STEP_ADD]], <4 x i64> [[VEC_PHI8]]
		; CHECK-VF4IC4-NEXT: [[TMP22]] = select <4 x i1> [[TMP18]], <4 x i64> [[STEP_ADD1]], <4 x i64> [[VEC_PHI9]]
		; CHECK-VF4IC4-NEXT: [[TMP23]] = select <4 x i1> [[TMP19]], <4 x i64> [[STEP_ADD2]], <4 x i64> [[VEC_PHI10]]
		; CHECK-VF4IC4-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16
		; CHECK-VF4IC4-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[STEP_ADD2]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC4-NEXT: [[TMP24:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
		; CHECK-VF4IC4-NEXT: br i1 [[TMP24]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
		; CHECK-VF4IC4: middle.block:
		; CHECK-VF4IC4-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp sgt <4 x i64> [[TMP12]], [[TMP13]]
		; CHECK-VF4IC4-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP]], <4 x i64> [[TMP12]], <4 x i64> [[TMP13]]
		; CHECK-VF4IC4-NEXT: [[RDX_MINMAX_CMP14:%.*]] = icmp sgt <4 x i64> [[RDX_MINMAX_SELECT]], [[TMP14]]
		; CHECK-VF4IC4-NEXT: [[RDX_MINMAX_SELECT15:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP14]], <4 x i64> [[RDX_MINMAX_SELECT]], <4 x i64> [[TMP14]]
		; CHECK-VF4IC4-NEXT: [[RDX_MINMAX_CMP16:%.*]] = icmp sgt <4 x i64> [[RDX_MINMAX_SELECT15]], [[TMP15]]
		; CHECK-VF4IC4-NEXT: [[RDX_MINMAX_SELECT17:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP16]], <4 x i64> [[RDX_MINMAX_SELECT15]], <4 x i64> [[TMP15]]
		; CHECK-VF4IC4-NEXT: [[TMP25:%.*]] = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> [[RDX_MINMAX_SELECT17]])
		; CHECK-VF4IC4-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <4 x i64> poison, i64 [[TMP25]], i64 0
		; CHECK-VF4IC4-NEXT: [[DOTSPLAT:%.*]] = shufflevector <4 x i64> [[DOTSPLATINSERT]], <4 x i64> poison, <4 x i32> zeroinitializer
		; CHECK-VF4IC4-NEXT: [[MASK_CMP:%.*]] = icmp eq <4 x i64> [[DOTSPLAT]], [[RDX_MINMAX_SELECT17]]
		; CHECK-VF4IC4-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
		; CHECK-VF4IC4-NEXT: [[RDX_SELECT:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP]], <4 x i64> [[TMP20]], <4 x i64> [[TMP21]]
		; CHECK-VF4IC4-NEXT: [[RDX_SELECT18:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP14]], <4 x i64> [[RDX_SELECT]], <4 x i64> [[TMP22]]
		; CHECK-VF4IC4-NEXT: [[RDX_SELECT19:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP16]], <4 x i64> [[RDX_SELECT18]], <4 x i64> [[TMP23]]
		; CHECK-VF4IC4-NEXT: [[MASK_SELECT:%.*]] = select <4 x i1> [[MASK_CMP]], <4 x i64> [[RDX_SELECT19]], <4 x i64> <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>
		; CHECK-VF4IC4-NEXT: [[TMP26:%.*]] = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> [[MASK_SELECT]])
		; CHECK-VF4IC4-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[TMP26]], -9223372036854775808
		; CHECK-VF4IC4-NEXT: [[RDX_SELECT20:%.]] = select i1 [[RDX_SELECT_CMP]], i64 [[TMP26]], i64 [[II:%.]]
		; CHECK-VF4IC4-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
		; CHECK-VF4IC4: scalar.ph:
		; CHECK-VF4IC4-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
		; CHECK-VF4IC4-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ [[MM]], [[ENTRY]] ], [ [[TMP25]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC4-NEXT: [[BC_MERGE_RDX21:%.*]] = phi i64 [ [[II]], [[ENTRY]] ], [ [[RDX_SELECT20]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC4-NEXT: br label [[FOR_BODY:%.*]]
		; CHECK-VF4IC4: for.body:
		; CHECK-VF4IC4-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[MAX_09:%.]] = phi i64 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[TMP28:%.]], [[FOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[IDX_011:%.]] = phi i64 [ [[BC_MERGE_RDX21]], [[SCALAR_PH]] ], [ [[SPEC_SELECT7:%.]], [[FOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[INDVARS_IV]]
		; CHECK-VF4IC4-NEXT: [[TMP27:%.*]] = load i64, ptr [[ARRAYIDX]], align 4
		; CHECK-VF4IC4-NEXT: [[TMP28]] = tail call i64 @llvm.smax.i64(i64 [[MAX_09]], i64 [[TMP27]])
		; CHECK-VF4IC4-NEXT: [[CMP1:%.*]] = icmp sge i64 [[TMP27]], [[MAX_09]]
		; CHECK-VF4IC4-NEXT: [[SPEC_SELECT7]] = select i1 [[CMP1]], i64 [[INDVARS_IV]], i64 [[IDX_011]]
		; CHECK-VF4IC4-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
		; CHECK-VF4IC4-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]
		; CHECK-VF4IC4-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]
		; CHECK-VF4IC4: exit:
		; CHECK-VF4IC4-NEXT: [[DOTLCSSA:%.*]] = phi i64 [ [[TMP28]], [[FOR_BODY]] ], [ [[TMP25]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC4-NEXT: [[SPEC_SELECT7_LCSSA:%.*]] = phi i64 [ [[SPEC_SELECT7]], [[FOR_BODY]] ], [ [[RDX_SELECT20]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC4-NEXT: store i64 [[DOTLCSSA]], ptr [[RES_MAX:%.*]], align 4
		; CHECK-VF4IC4-NEXT: ret i64 [[SPEC_SELECT7_LCSSA]]
		;
		; CHECK-VF1IC4-LABEL: @smax_idx_inverted_pred(
		; CHECK-VF1IC4-NEXT: entry:
		; CHECK-VF1IC4-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 4
		; CHECK-VF1IC4-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK-VF1IC4: vector.ph:
		; CHECK-VF1IC4-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 4
		; CHECK-VF1IC4-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
		; CHECK-VF1IC4-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK-VF1IC4: vector.body:
		; CHECK-VF1IC4-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI:%.]] = phi i64 [ [[MM:%.]], [[VECTOR_PH]] ], [ [[TMP12:%.*]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI1:%.]] = phi i64 [ [[MM]], [[VECTOR_PH]] ], [ [[TMP13:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI2:%.]] = phi i64 [ [[MM]], [[VECTOR_PH]] ], [ [[TMP14:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI3:%.]] = phi i64 [ [[MM]], [[VECTOR_PH]] ], [ [[TMP15:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI4:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP20:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI5:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP21:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI6:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP22:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI7:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP23:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
		; CHECK-VF1IC4-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
		; CHECK-VF1IC4-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 2
		; CHECK-VF1IC4-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 3
		; CHECK-VF1IC4-NEXT: [[TMP4:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP0]]
		; CHECK-VF1IC4-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP1]]
		; CHECK-VF1IC4-NEXT: [[TMP6:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP2]]
		; CHECK-VF1IC4-NEXT: [[TMP7:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP3]]
		; CHECK-VF1IC4-NEXT: [[TMP8:%.*]] = load i64, ptr [[TMP4]], align 4
		; CHECK-VF1IC4-NEXT: [[TMP9:%.*]] = load i64, ptr [[TMP5]], align 4
		; CHECK-VF1IC4-NEXT: [[TMP10:%.*]] = load i64, ptr [[TMP6]], align 4
		; CHECK-VF1IC4-NEXT: [[TMP11:%.*]] = load i64, ptr [[TMP7]], align 4
		; CHECK-VF1IC4-NEXT: [[TMP12]] = tail call i64 @llvm.smax.i64(i64 [[VEC_PHI]], i64 [[TMP8]])
		; CHECK-VF1IC4-NEXT: [[TMP13]] = tail call i64 @llvm.smax.i64(i64 [[VEC_PHI1]], i64 [[TMP9]])
		; CHECK-VF1IC4-NEXT: [[TMP14]] = tail call i64 @llvm.smax.i64(i64 [[VEC_PHI2]], i64 [[TMP10]])
		; CHECK-VF1IC4-NEXT: [[TMP15]] = tail call i64 @llvm.smax.i64(i64 [[VEC_PHI3]], i64 [[TMP11]])
		; CHECK-VF1IC4-NEXT: [[TMP16:%.*]] = icmp sge i64 [[TMP8]], [[VEC_PHI]]
		; CHECK-VF1IC4-NEXT: [[TMP17:%.*]] = icmp sge i64 [[TMP9]], [[VEC_PHI1]]
		; CHECK-VF1IC4-NEXT: [[TMP18:%.*]] = icmp sge i64 [[TMP10]], [[VEC_PHI2]]
		; CHECK-VF1IC4-NEXT: [[TMP19:%.*]] = icmp sge i64 [[TMP11]], [[VEC_PHI3]]
		; CHECK-VF1IC4-NEXT: [[TMP20]] = select i1 [[TMP16]], i64 [[TMP0]], i64 [[VEC_PHI4]]
		; CHECK-VF1IC4-NEXT: [[TMP21]] = select i1 [[TMP17]], i64 [[TMP1]], i64 [[VEC_PHI5]]
		; CHECK-VF1IC4-NEXT: [[TMP22]] = select i1 [[TMP18]], i64 [[TMP2]], i64 [[VEC_PHI6]]
		; CHECK-VF1IC4-NEXT: [[TMP23]] = select i1 [[TMP19]], i64 [[TMP3]], i64 [[VEC_PHI7]]
		; CHECK-VF1IC4-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
		; CHECK-VF1IC4-NEXT: [[TMP24:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
		; CHECK-VF1IC4-NEXT: br i1 [[TMP24]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
		; CHECK-VF1IC4: middle.block:
		; CHECK-VF1IC4-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp sgt i64 [[TMP12]], [[TMP13]]
		; CHECK-VF1IC4-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select i1 [[RDX_MINMAX_CMP]], i64 [[TMP12]], i64 [[TMP13]]
		; CHECK-VF1IC4-NEXT: [[RDX_MINMAX_CMP8:%.*]] = icmp sgt i64 [[RDX_MINMAX_SELECT]], [[TMP14]]
		; CHECK-VF1IC4-NEXT: [[RDX_MINMAX_SELECT9:%.*]] = select i1 [[RDX_MINMAX_CMP8]], i64 [[RDX_MINMAX_SELECT]], i64 [[TMP14]]
		; CHECK-VF1IC4-NEXT: [[RDX_MINMAX_CMP10:%.*]] = icmp sgt i64 [[RDX_MINMAX_SELECT9]], [[TMP15]]
		; CHECK-VF1IC4-NEXT: [[RDX_MINMAX_SELECT11:%.*]] = select i1 [[RDX_MINMAX_CMP10]], i64 [[RDX_MINMAX_SELECT9]], i64 [[TMP15]]
		; CHECK-VF1IC4-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
		; CHECK-VF1IC4-NEXT: [[RDX_SELECT:%.*]] = select i1 [[RDX_MINMAX_CMP]], i64 [[TMP20]], i64 [[TMP21]]
		; CHECK-VF1IC4-NEXT: [[RDX_SELECT12:%.*]] = select i1 [[RDX_MINMAX_CMP8]], i64 [[RDX_SELECT]], i64 [[TMP22]]
		; CHECK-VF1IC4-NEXT: [[RDX_SELECT13:%.*]] = select i1 [[RDX_MINMAX_CMP10]], i64 [[RDX_SELECT12]], i64 [[TMP23]]
		; CHECK-VF1IC4-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[RDX_SELECT13]], -9223372036854775808
		; CHECK-VF1IC4-NEXT: [[RDX_SELECT14:%.]] = select i1 [[RDX_SELECT_CMP]], i64 [[RDX_SELECT13]], i64 [[II:%.]]
		; CHECK-VF1IC4-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
		; CHECK-VF1IC4: scalar.ph:
		; CHECK-VF1IC4-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
		; CHECK-VF1IC4-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ [[MM]], [[ENTRY]] ], [ [[RDX_MINMAX_SELECT11]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF1IC4-NEXT: [[BC_MERGE_RDX15:%.*]] = phi i64 [ [[II]], [[ENTRY]] ], [ [[RDX_SELECT14]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF1IC4-NEXT: br label [[FOR_BODY:%.*]]
		; CHECK-VF1IC4: for.body:
		; CHECK-VF1IC4-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[MAX_09:%.]] = phi i64 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[TMP26:%.]], [[FOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[IDX_011:%.]] = phi i64 [ [[BC_MERGE_RDX15]], [[SCALAR_PH]] ], [ [[SPEC_SELECT7:%.]], [[FOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[INDVARS_IV]]
		; CHECK-VF1IC4-NEXT: [[TMP25:%.*]] = load i64, ptr [[ARRAYIDX]], align 4
		; CHECK-VF1IC4-NEXT: [[TMP26]] = tail call i64 @llvm.smax.i64(i64 [[MAX_09]], i64 [[TMP25]])
		; CHECK-VF1IC4-NEXT: [[CMP1:%.*]] = icmp sge i64 [[TMP25]], [[MAX_09]]
		; CHECK-VF1IC4-NEXT: [[SPEC_SELECT7]] = select i1 [[CMP1]], i64 [[INDVARS_IV]], i64 [[IDX_011]]
		; CHECK-VF1IC4-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
		; CHECK-VF1IC4-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]
		; CHECK-VF1IC4-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]
		; CHECK-VF1IC4: exit:
		; CHECK-VF1IC4-NEXT: [[DOTLCSSA:%.*]] = phi i64 [ [[TMP26]], [[FOR_BODY]] ], [ [[RDX_MINMAX_SELECT11]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF1IC4-NEXT: [[SPEC_SELECT7_LCSSA:%.*]] = phi i64 [ [[SPEC_SELECT7]], [[FOR_BODY]] ], [ [[RDX_SELECT14]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF1IC4-NEXT: store i64 [[DOTLCSSA]], ptr [[RES_MAX:%.*]], align 4
		; CHECK-VF1IC4-NEXT: ret i64 [[SPEC_SELECT7_LCSSA]]
;		;
entry:		entry:
br label %for.body		br label %for.body

for.body:		for.body:
%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]		%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
%max.09 = phi i64 [ %mm, %entry ], [ %1, %for.body ]		%max.09 = phi i64 [ %mm, %entry ], [ %1, %for.body ]
%idx.011 = phi i64 [ %ii, %entry ], [ %spec.select7, %for.body ]		%idx.011 = phi i64 [ %ii, %entry ], [ %spec.select7, %for.body ]
Show All 10 Lines	exit:
store i64 %1, ptr %res_max		store i64 %1, ptr %res_max
ret i64 %spec.select7		ret i64 %spec.select7
}		}

;		;
; In such cases, the last index should be extracted.		; In such cases, the last index should be extracted.
;		;
define i64 @smax_idx_extract_last(ptr nocapture readonly %a, i64 %mm, i64 %ii, ptr nocapture writeonly %res_max, i64 %n) {		define i64 @smax_idx_extract_last(ptr nocapture readonly %a, i64 %mm, i64 %ii, ptr nocapture writeonly %res_max, i64 %n) {
; CHECK-LABEL: @smax_idx_extract_last(		; CHECK-VF4IC1-LABEL: @smax_idx_extract_last(
; CHECK-NOT: vector.body:		; CHECK-VF4IC1-NEXT: entry:
		; CHECK-VF4IC1-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 4
		; CHECK-VF4IC1-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK-VF4IC1: vector.ph:
		; CHECK-VF4IC1-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 4
		; CHECK-VF4IC1-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
		; CHECK-VF4IC1-NEXT: [[MINMAX_IDENT_SPLATINSERT:%.]] = insertelement <4 x i64> poison, i64 [[MM:%.]], i64 0
		; CHECK-VF4IC1-NEXT: [[MINMAX_IDENT_SPLAT:%.*]] = shufflevector <4 x i64> [[MINMAX_IDENT_SPLATINSERT]], <4 x i64> poison, <4 x i32> zeroinitializer
		; CHECK-VF4IC1-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK-VF4IC1: vector.body:
		; CHECK-VF4IC1-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[VEC_PHI:%.]] = phi <4 x i64> [ [[MINMAX_IDENT_SPLAT]], [[VECTOR_PH]] ], [ [[TMP3:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[VEC_PHI1:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP5:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
		; CHECK-VF4IC1-NEXT: [[TMP1:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP0]]
		; CHECK-VF4IC1-NEXT: [[TMP2:%.*]] = getelementptr inbounds i64, ptr [[TMP1]], i32 0
		; CHECK-VF4IC1-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i64>, ptr [[TMP2]], align 4
		; CHECK-VF4IC1-NEXT: [[TMP3]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[VEC_PHI]], <4 x i64> [[WIDE_LOAD]])
		; CHECK-VF4IC1-NEXT: [[TMP4:%.*]] = icmp sgt <4 x i64> [[VEC_PHI]], [[WIDE_LOAD]]
		; CHECK-VF4IC1-NEXT: [[TMP5]] = select <4 x i1> [[TMP4]], <4 x i64> [[VEC_PHI1]], <4 x i64> [[VEC_IND]]
		; CHECK-VF4IC1-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
		; CHECK-VF4IC1-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC1-NEXT: [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
		; CHECK-VF4IC1-NEXT: br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]
		; CHECK-VF4IC1: middle.block:
		; CHECK-VF4IC1-NEXT: [[TMP7:%.*]] = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> [[TMP3]])
		; CHECK-VF4IC1-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <4 x i64> poison, i64 [[TMP7]], i64 0
		; CHECK-VF4IC1-NEXT: [[DOTSPLAT:%.*]] = shufflevector <4 x i64> [[DOTSPLATINSERT]], <4 x i64> poison, <4 x i32> zeroinitializer
		; CHECK-VF4IC1-NEXT: [[MASK_CMP:%.*]] = icmp eq <4 x i64> [[DOTSPLAT]], [[TMP3]]
		; CHECK-VF4IC1-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
		; CHECK-VF4IC1-NEXT: [[MASK_SELECT:%.*]] = select <4 x i1> [[MASK_CMP]], <4 x i64> [[TMP5]], <4 x i64> <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>
		; CHECK-VF4IC1-NEXT: [[TMP8:%.*]] = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> [[MASK_SELECT]])
		; CHECK-VF4IC1-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[TMP8]], -9223372036854775808
		; CHECK-VF4IC1-NEXT: [[RDX_SELECT:%.]] = select i1 [[RDX_SELECT_CMP]], i64 [[TMP8]], i64 [[II:%.]]
		; CHECK-VF4IC1-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
		; CHECK-VF4IC1: scalar.ph:
		; CHECK-VF4IC1-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
		; CHECK-VF4IC1-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ [[MM]], [[ENTRY]] ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC1-NEXT: [[BC_MERGE_RDX2:%.*]] = phi i64 [ [[II]], [[ENTRY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC1-NEXT: br label [[FOR_BODY:%.*]]
		; CHECK-VF4IC1: for.body:
		; CHECK-VF4IC1-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[MAX_09:%.]] = phi i64 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[TMP10:%.]], [[FOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[IDX_011:%.]] = phi i64 [ [[BC_MERGE_RDX2]], [[SCALAR_PH]] ], [ [[SPEC_SELECT7:%.]], [[FOR_BODY]] ]
		; CHECK-VF4IC1-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[INDVARS_IV]]
		; CHECK-VF4IC1-NEXT: [[TMP9:%.*]] = load i64, ptr [[ARRAYIDX]], align 4
		; CHECK-VF4IC1-NEXT: [[TMP10]] = tail call i64 @llvm.smax.i64(i64 [[MAX_09]], i64 [[TMP9]])
		; CHECK-VF4IC1-NEXT: [[CMP1_NOT:%.*]] = icmp sgt i64 [[MAX_09]], [[TMP9]]
		; CHECK-VF4IC1-NEXT: [[SPEC_SELECT7]] = select i1 [[CMP1_NOT]], i64 [[IDX_011]], i64 [[INDVARS_IV]]
		; CHECK-VF4IC1-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
		; CHECK-VF4IC1-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]
		; CHECK-VF4IC1-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP9:![0-9]+]]
		; CHECK-VF4IC1: exit:
		; CHECK-VF4IC1-NEXT: [[DOTLCSSA:%.*]] = phi i64 [ [[TMP10]], [[FOR_BODY]] ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC1-NEXT: [[SPEC_SELECT7_LCSSA:%.*]] = phi i64 [ [[SPEC_SELECT7]], [[FOR_BODY]] ], [ [[RDX_SELECT]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC1-NEXT: store i64 [[DOTLCSSA]], ptr [[RES_MAX:%.*]], align 4
		; CHECK-VF4IC1-NEXT: ret i64 [[SPEC_SELECT7_LCSSA]]
		;
		; CHECK-VF4IC4-LABEL: @smax_idx_extract_last(
		; CHECK-VF4IC4-NEXT: entry:
		; CHECK-VF4IC4-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 16
		; CHECK-VF4IC4-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK-VF4IC4: vector.ph:
		; CHECK-VF4IC4-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 16
		; CHECK-VF4IC4-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
		; CHECK-VF4IC4-NEXT: [[MINMAX_IDENT_SPLATINSERT:%.]] = insertelement <4 x i64> poison, i64 [[MM:%.]], i64 0
		; CHECK-VF4IC4-NEXT: [[MINMAX_IDENT_SPLAT:%.*]] = shufflevector <4 x i64> [[MINMAX_IDENT_SPLATINSERT]], <4 x i64> poison, <4 x i32> zeroinitializer
		; CHECK-VF4IC4-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK-VF4IC4: vector.body:
		; CHECK-VF4IC4-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI:%.]] = phi <4 x i64> [ [[MINMAX_IDENT_SPLAT]], [[VECTOR_PH]] ], [ [[TMP12:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI4:%.]] = phi <4 x i64> [ [[MINMAX_IDENT_SPLAT]], [[VECTOR_PH]] ], [ [[TMP13:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI5:%.]] = phi <4 x i64> [ [[MINMAX_IDENT_SPLAT]], [[VECTOR_PH]] ], [ [[TMP14:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI6:%.]] = phi <4 x i64> [ [[MINMAX_IDENT_SPLAT]], [[VECTOR_PH]] ], [ [[TMP15:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI7:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP20:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI8:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP21:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI9:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP22:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[VEC_PHI10:%.]] = phi <4 x i64> [ <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>, [[VECTOR_PH]] ], [ [[TMP23:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[STEP_ADD:%.*]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC4-NEXT: [[STEP_ADD1:%.*]] = add <4 x i64> [[STEP_ADD]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC4-NEXT: [[STEP_ADD2:%.*]] = add <4 x i64> [[STEP_ADD1]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC4-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
		; CHECK-VF4IC4-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 4
		; CHECK-VF4IC4-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 8
		; CHECK-VF4IC4-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 12
		; CHECK-VF4IC4-NEXT: [[TMP4:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP0]]
		; CHECK-VF4IC4-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP1]]
		; CHECK-VF4IC4-NEXT: [[TMP6:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP2]]
		; CHECK-VF4IC4-NEXT: [[TMP7:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP3]]
		; CHECK-VF4IC4-NEXT: [[TMP8:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 0
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i64>, ptr [[TMP8]], align 4
		; CHECK-VF4IC4-NEXT: [[TMP9:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 4
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD11:%.*]] = load <4 x i64>, ptr [[TMP9]], align 4
		; CHECK-VF4IC4-NEXT: [[TMP10:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 8
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD12:%.*]] = load <4 x i64>, ptr [[TMP10]], align 4
		; CHECK-VF4IC4-NEXT: [[TMP11:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 12
		; CHECK-VF4IC4-NEXT: [[WIDE_LOAD13:%.*]] = load <4 x i64>, ptr [[TMP11]], align 4
		; CHECK-VF4IC4-NEXT: [[TMP12]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[VEC_PHI]], <4 x i64> [[WIDE_LOAD]])
		; CHECK-VF4IC4-NEXT: [[TMP13]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[VEC_PHI4]], <4 x i64> [[WIDE_LOAD11]])
		; CHECK-VF4IC4-NEXT: [[TMP14]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[VEC_PHI5]], <4 x i64> [[WIDE_LOAD12]])
		; CHECK-VF4IC4-NEXT: [[TMP15]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[VEC_PHI6]], <4 x i64> [[WIDE_LOAD13]])
		; CHECK-VF4IC4-NEXT: [[TMP16:%.*]] = icmp sgt <4 x i64> [[VEC_PHI]], [[WIDE_LOAD]]
		; CHECK-VF4IC4-NEXT: [[TMP17:%.*]] = icmp sgt <4 x i64> [[VEC_PHI4]], [[WIDE_LOAD11]]
		; CHECK-VF4IC4-NEXT: [[TMP18:%.*]] = icmp sgt <4 x i64> [[VEC_PHI5]], [[WIDE_LOAD12]]
		; CHECK-VF4IC4-NEXT: [[TMP19:%.*]] = icmp sgt <4 x i64> [[VEC_PHI6]], [[WIDE_LOAD13]]
		; CHECK-VF4IC4-NEXT: [[TMP20]] = select <4 x i1> [[TMP16]], <4 x i64> [[VEC_PHI7]], <4 x i64> [[VEC_IND]]
		; CHECK-VF4IC4-NEXT: [[TMP21]] = select <4 x i1> [[TMP17]], <4 x i64> [[VEC_PHI8]], <4 x i64> [[STEP_ADD]]
		; CHECK-VF4IC4-NEXT: [[TMP22]] = select <4 x i1> [[TMP18]], <4 x i64> [[VEC_PHI9]], <4 x i64> [[STEP_ADD1]]
		; CHECK-VF4IC4-NEXT: [[TMP23]] = select <4 x i1> [[TMP19]], <4 x i64> [[VEC_PHI10]], <4 x i64> [[STEP_ADD2]]
		; CHECK-VF4IC4-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16
		; CHECK-VF4IC4-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[STEP_ADD2]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-VF4IC4-NEXT: [[TMP24:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
		; CHECK-VF4IC4-NEXT: br i1 [[TMP24]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]
		; CHECK-VF4IC4: middle.block:
		; CHECK-VF4IC4-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp sgt <4 x i64> [[TMP12]], [[TMP13]]
		; CHECK-VF4IC4-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP]], <4 x i64> [[TMP12]], <4 x i64> [[TMP13]]
		; CHECK-VF4IC4-NEXT: [[RDX_MINMAX_CMP14:%.*]] = icmp sgt <4 x i64> [[RDX_MINMAX_SELECT]], [[TMP14]]
		; CHECK-VF4IC4-NEXT: [[RDX_MINMAX_SELECT15:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP14]], <4 x i64> [[RDX_MINMAX_SELECT]], <4 x i64> [[TMP14]]
		; CHECK-VF4IC4-NEXT: [[RDX_MINMAX_CMP16:%.*]] = icmp sgt <4 x i64> [[RDX_MINMAX_SELECT15]], [[TMP15]]
		; CHECK-VF4IC4-NEXT: [[RDX_MINMAX_SELECT17:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP16]], <4 x i64> [[RDX_MINMAX_SELECT15]], <4 x i64> [[TMP15]]
		; CHECK-VF4IC4-NEXT: [[TMP25:%.*]] = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> [[RDX_MINMAX_SELECT17]])
		; CHECK-VF4IC4-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <4 x i64> poison, i64 [[TMP25]], i64 0
		; CHECK-VF4IC4-NEXT: [[DOTSPLAT:%.*]] = shufflevector <4 x i64> [[DOTSPLATINSERT]], <4 x i64> poison, <4 x i32> zeroinitializer
		; CHECK-VF4IC4-NEXT: [[MASK_CMP:%.*]] = icmp eq <4 x i64> [[DOTSPLAT]], [[RDX_MINMAX_SELECT17]]
		; CHECK-VF4IC4-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
		; CHECK-VF4IC4-NEXT: [[RDX_SELECT:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP]], <4 x i64> [[TMP20]], <4 x i64> [[TMP21]]
		; CHECK-VF4IC4-NEXT: [[RDX_SELECT18:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP14]], <4 x i64> [[RDX_SELECT]], <4 x i64> [[TMP22]]
		; CHECK-VF4IC4-NEXT: [[RDX_SELECT19:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP16]], <4 x i64> [[RDX_SELECT18]], <4 x i64> [[TMP23]]
		; CHECK-VF4IC4-NEXT: [[MASK_SELECT:%.*]] = select <4 x i1> [[MASK_CMP]], <4 x i64> [[RDX_SELECT19]], <4 x i64> <i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808, i64 -9223372036854775808>
		; CHECK-VF4IC4-NEXT: [[TMP26:%.*]] = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> [[MASK_SELECT]])
		; CHECK-VF4IC4-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[TMP26]], -9223372036854775808
		; CHECK-VF4IC4-NEXT: [[RDX_SELECT20:%.]] = select i1 [[RDX_SELECT_CMP]], i64 [[TMP26]], i64 [[II:%.]]
		; CHECK-VF4IC4-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
		; CHECK-VF4IC4: scalar.ph:
		; CHECK-VF4IC4-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
		; CHECK-VF4IC4-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ [[MM]], [[ENTRY]] ], [ [[TMP25]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC4-NEXT: [[BC_MERGE_RDX21:%.*]] = phi i64 [ [[II]], [[ENTRY]] ], [ [[RDX_SELECT20]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC4-NEXT: br label [[FOR_BODY:%.*]]
		; CHECK-VF4IC4: for.body:
		; CHECK-VF4IC4-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[MAX_09:%.]] = phi i64 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[TMP28:%.]], [[FOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[IDX_011:%.]] = phi i64 [ [[BC_MERGE_RDX21]], [[SCALAR_PH]] ], [ [[SPEC_SELECT7:%.]], [[FOR_BODY]] ]
		; CHECK-VF4IC4-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[INDVARS_IV]]
		; CHECK-VF4IC4-NEXT: [[TMP27:%.*]] = load i64, ptr [[ARRAYIDX]], align 4
		; CHECK-VF4IC4-NEXT: [[TMP28]] = tail call i64 @llvm.smax.i64(i64 [[MAX_09]], i64 [[TMP27]])
		; CHECK-VF4IC4-NEXT: [[CMP1_NOT:%.*]] = icmp sgt i64 [[MAX_09]], [[TMP27]]
		; CHECK-VF4IC4-NEXT: [[SPEC_SELECT7]] = select i1 [[CMP1_NOT]], i64 [[IDX_011]], i64 [[INDVARS_IV]]
		; CHECK-VF4IC4-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
		; CHECK-VF4IC4-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]
		; CHECK-VF4IC4-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP9:![0-9]+]]
		; CHECK-VF4IC4: exit:
		; CHECK-VF4IC4-NEXT: [[DOTLCSSA:%.*]] = phi i64 [ [[TMP28]], [[FOR_BODY]] ], [ [[TMP25]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC4-NEXT: [[SPEC_SELECT7_LCSSA:%.*]] = phi i64 [ [[SPEC_SELECT7]], [[FOR_BODY]] ], [ [[RDX_SELECT20]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF4IC4-NEXT: store i64 [[DOTLCSSA]], ptr [[RES_MAX:%.*]], align 4
		; CHECK-VF4IC4-NEXT: ret i64 [[SPEC_SELECT7_LCSSA]]
		;
		; CHECK-VF1IC4-LABEL: @smax_idx_extract_last(
		; CHECK-VF1IC4-NEXT: entry:
		; CHECK-VF1IC4-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 4
		; CHECK-VF1IC4-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK-VF1IC4: vector.ph:
		; CHECK-VF1IC4-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 4
		; CHECK-VF1IC4-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
		; CHECK-VF1IC4-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK-VF1IC4: vector.body:
		; CHECK-VF1IC4-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI:%.]] = phi i64 [ [[MM:%.]], [[VECTOR_PH]] ], [ [[TMP12:%.*]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI1:%.]] = phi i64 [ [[MM]], [[VECTOR_PH]] ], [ [[TMP13:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI2:%.]] = phi i64 [ [[MM]], [[VECTOR_PH]] ], [ [[TMP14:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI3:%.]] = phi i64 [ [[MM]], [[VECTOR_PH]] ], [ [[TMP15:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI4:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP20:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI5:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP21:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI6:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP22:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[VEC_PHI7:%.]] = phi i64 [ -9223372036854775808, [[VECTOR_PH]] ], [ [[TMP23:%.]], [[VECTOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
		; CHECK-VF1IC4-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
		; CHECK-VF1IC4-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 2
		; CHECK-VF1IC4-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 3
		; CHECK-VF1IC4-NEXT: [[TMP4:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP0]]
		; CHECK-VF1IC4-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP1]]
		; CHECK-VF1IC4-NEXT: [[TMP6:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP2]]
		; CHECK-VF1IC4-NEXT: [[TMP7:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP3]]
		; CHECK-VF1IC4-NEXT: [[TMP8:%.*]] = load i64, ptr [[TMP4]], align 4
		; CHECK-VF1IC4-NEXT: [[TMP9:%.*]] = load i64, ptr [[TMP5]], align 4
		; CHECK-VF1IC4-NEXT: [[TMP10:%.*]] = load i64, ptr [[TMP6]], align 4
		; CHECK-VF1IC4-NEXT: [[TMP11:%.*]] = load i64, ptr [[TMP7]], align 4
		; CHECK-VF1IC4-NEXT: [[TMP12]] = tail call i64 @llvm.smax.i64(i64 [[VEC_PHI]], i64 [[TMP8]])
		; CHECK-VF1IC4-NEXT: [[TMP13]] = tail call i64 @llvm.smax.i64(i64 [[VEC_PHI1]], i64 [[TMP9]])
		; CHECK-VF1IC4-NEXT: [[TMP14]] = tail call i64 @llvm.smax.i64(i64 [[VEC_PHI2]], i64 [[TMP10]])
		; CHECK-VF1IC4-NEXT: [[TMP15]] = tail call i64 @llvm.smax.i64(i64 [[VEC_PHI3]], i64 [[TMP11]])
		; CHECK-VF1IC4-NEXT: [[TMP16:%.*]] = icmp sgt i64 [[VEC_PHI]], [[TMP8]]
		; CHECK-VF1IC4-NEXT: [[TMP17:%.*]] = icmp sgt i64 [[VEC_PHI1]], [[TMP9]]
		; CHECK-VF1IC4-NEXT: [[TMP18:%.*]] = icmp sgt i64 [[VEC_PHI2]], [[TMP10]]
		; CHECK-VF1IC4-NEXT: [[TMP19:%.*]] = icmp sgt i64 [[VEC_PHI3]], [[TMP11]]
		; CHECK-VF1IC4-NEXT: [[TMP20]] = select i1 [[TMP16]], i64 [[VEC_PHI4]], i64 [[TMP0]]
		; CHECK-VF1IC4-NEXT: [[TMP21]] = select i1 [[TMP17]], i64 [[VEC_PHI5]], i64 [[TMP1]]
		; CHECK-VF1IC4-NEXT: [[TMP22]] = select i1 [[TMP18]], i64 [[VEC_PHI6]], i64 [[TMP2]]
		; CHECK-VF1IC4-NEXT: [[TMP23]] = select i1 [[TMP19]], i64 [[VEC_PHI7]], i64 [[TMP3]]
		; CHECK-VF1IC4-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
		; CHECK-VF1IC4-NEXT: [[TMP24:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
		; CHECK-VF1IC4-NEXT: br i1 [[TMP24]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]
		; CHECK-VF1IC4: middle.block:
		; CHECK-VF1IC4-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp sgt i64 [[TMP12]], [[TMP13]]
		; CHECK-VF1IC4-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select i1 [[RDX_MINMAX_CMP]], i64 [[TMP12]], i64 [[TMP13]]
		; CHECK-VF1IC4-NEXT: [[RDX_MINMAX_CMP8:%.*]] = icmp sgt i64 [[RDX_MINMAX_SELECT]], [[TMP14]]
		; CHECK-VF1IC4-NEXT: [[RDX_MINMAX_SELECT9:%.*]] = select i1 [[RDX_MINMAX_CMP8]], i64 [[RDX_MINMAX_SELECT]], i64 [[TMP14]]
		; CHECK-VF1IC4-NEXT: [[RDX_MINMAX_CMP10:%.*]] = icmp sgt i64 [[RDX_MINMAX_SELECT9]], [[TMP15]]
		; CHECK-VF1IC4-NEXT: [[RDX_MINMAX_SELECT11:%.*]] = select i1 [[RDX_MINMAX_CMP10]], i64 [[RDX_MINMAX_SELECT9]], i64 [[TMP15]]
		; CHECK-VF1IC4-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
		; CHECK-VF1IC4-NEXT: [[RDX_SELECT:%.*]] = select i1 [[RDX_MINMAX_CMP]], i64 [[TMP20]], i64 [[TMP21]]
		; CHECK-VF1IC4-NEXT: [[RDX_SELECT12:%.*]] = select i1 [[RDX_MINMAX_CMP8]], i64 [[RDX_SELECT]], i64 [[TMP22]]
		; CHECK-VF1IC4-NEXT: [[RDX_SELECT13:%.*]] = select i1 [[RDX_MINMAX_CMP10]], i64 [[RDX_SELECT12]], i64 [[TMP23]]
		; CHECK-VF1IC4-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i64 [[RDX_SELECT13]], -9223372036854775808
		; CHECK-VF1IC4-NEXT: [[RDX_SELECT14:%.]] = select i1 [[RDX_SELECT_CMP]], i64 [[RDX_SELECT13]], i64 [[II:%.]]
		; CHECK-VF1IC4-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
		; CHECK-VF1IC4: scalar.ph:
		; CHECK-VF1IC4-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
		; CHECK-VF1IC4-NEXT: [[BC_MERGE_RDX:%.*]] = phi i64 [ [[MM]], [[ENTRY]] ], [ [[RDX_MINMAX_SELECT11]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF1IC4-NEXT: [[BC_MERGE_RDX15:%.*]] = phi i64 [ [[II]], [[ENTRY]] ], [ [[RDX_SELECT14]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF1IC4-NEXT: br label [[FOR_BODY:%.*]]
		; CHECK-VF1IC4: for.body:
		; CHECK-VF1IC4-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[MAX_09:%.]] = phi i64 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[TMP26:%.]], [[FOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[IDX_011:%.]] = phi i64 [ [[BC_MERGE_RDX15]], [[SCALAR_PH]] ], [ [[SPEC_SELECT7:%.]], [[FOR_BODY]] ]
		; CHECK-VF1IC4-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[INDVARS_IV]]
		; CHECK-VF1IC4-NEXT: [[TMP25:%.*]] = load i64, ptr [[ARRAYIDX]], align 4
		; CHECK-VF1IC4-NEXT: [[TMP26]] = tail call i64 @llvm.smax.i64(i64 [[MAX_09]], i64 [[TMP25]])
		; CHECK-VF1IC4-NEXT: [[CMP1_NOT:%.*]] = icmp sgt i64 [[MAX_09]], [[TMP25]]
		; CHECK-VF1IC4-NEXT: [[SPEC_SELECT7]] = select i1 [[CMP1_NOT]], i64 [[IDX_011]], i64 [[INDVARS_IV]]
		; CHECK-VF1IC4-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
		; CHECK-VF1IC4-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]
		; CHECK-VF1IC4-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP9:![0-9]+]]
		; CHECK-VF1IC4: exit:
		; CHECK-VF1IC4-NEXT: [[DOTLCSSA:%.*]] = phi i64 [ [[TMP26]], [[FOR_BODY]] ], [ [[RDX_MINMAX_SELECT11]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF1IC4-NEXT: [[SPEC_SELECT7_LCSSA:%.*]] = phi i64 [ [[SPEC_SELECT7]], [[FOR_BODY]] ], [ [[RDX_SELECT14]], [[MIDDLE_BLOCK]] ]
		; CHECK-VF1IC4-NEXT: store i64 [[DOTLCSSA]], ptr [[RES_MAX:%.*]], align 4
		; CHECK-VF1IC4-NEXT: ret i64 [[SPEC_SELECT7_LCSSA]]
;		;
entry:		entry:
br label %for.body		br label %for.body

for.body:		for.body:
%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]		%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
%max.09 = phi i64 [ %mm, %entry ], [ %1, %for.body ]		%max.09 = phi i64 [ %mm, %entry ], [ %1, %for.body ]
%idx.011 = phi i64 [ %ii, %entry ], [ %spec.select7, %for.body ]		%idx.011 = phi i64 [ %ii, %entry ], [ %spec.select7, %for.body ]
▲ Show 20 Lines • Show All 73 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorize] Vectorize the reduction pattern of integer min/max with index.Needs ReviewPublic

Details

The Concept and Approach

The Implementation in LLVM

Select-Cmp Reduction

Internal User Issue: UserRecurPhi and UserRecurKind

The Second Phase of Recognition

Code Generation and Reduction Fix

Diff Detail

Event Timeline

Revision Contents

Diff 528430

llvm/include/llvm/Analysis/IVDescriptors.h

llvm/include/llvm/Transforms/Utils/LoopUtils.h

llvm/lib/Analysis/IVDescriptors.cpp

llvm/lib/Transforms/Utils/LoopUtils.cpp

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

llvm/test/Transforms/LoopVectorize/select-min-index.ll

llvm/test/Transforms/LoopVectorize/smax-idx.ll

[LoopVectorize] Vectorize the reduction pattern of integer min/max with index.
Needs ReviewPublic