This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
4/4
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
-
AArch64/
2/4
strict-fadd.ll
1
reduction-inloop-pred.ll
-
reduction-inloop.ll

Differential D112548

[LoopVectorize] Propagate fast-math flags for inloop reductions
ClosedPublic

Authored by RosieSumpter on Oct 26 2021, 8:15 AM.

Download Raw Diff

Details

Reviewers

paulwalker-arm
david-arm
kmclaughlin
sdesmalen

Commits

rGdcb8222d8777: [LoopVectorize] Propagate fast-math flags for inloop reductions

Summary

This patch updates VPReductionRecipe::execute so that the fast-math
flags associated with the underlying instruction of the VPReductionRecipe are
propagated through to the reductions which are created.

Depends on D112547

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

RosieSumpter created this revision.Oct 26 2021, 8:15 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptOct 26 2021, 8:15 AM

RosieSumpter requested review of this revision.Oct 26 2021, 8:15 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 26 2021, 8:15 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B130715: Diff 382333.Oct 26 2021, 8:16 AM

Note: this patch builds on the NFC patch D112547

RosieSumpter mentioned this in D111555: [LoopVectorize] Add vector reduction support for fmuladd intrinsic.Oct 26 2021, 8:31 AM

Thanks for this patch @RosieSumpter - seems like a nice fix! I wonder if it's also worth changing createOrderedReduction to be similar to createTargetReduction and set the flags in there too? Also, you've fixed up the VF=1 cases too.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9779	I completely forgot we can actually just use a guard here, i.e. IRBuilderBase::FastMathFlagGuard FMFGuard(State.Builder); State.Builder.setFastMathFlags(RdxDesc->getFastMathFlags()); then you don't need the extra line at the bottom anymore to reset the flags.
9780	Maybe we should do this for all cases here, i.e. ordered and non-ordered?
9782	I just realised that you might be able to replace the whole if block with: State.Builder.setFastMathFlags(RdxDesc->getFastMathFlags()); here.
9801	I just realised you're actually also fixing the case where VF=1. Is it worth having a test to ensure flags are propagated for VF=1 as well?

RosieSumpter edited the summary of this revision. (Show Details)Oct 28 2021, 1:34 AM

RosieSumpter added a parent revision: D112547: [LoopVectorize] Clean up VPReductionRecipe::execute. NFC.

RosieSumpter added a child revision: D111555: [LoopVectorize] Add vector reduction support for fmuladd intrinsic.

Use IRBuilderBase::FastMathFlagGuard
Propagate fast-math flags for both ordered and non-ordered reductions
Updated tests

RosieSumpter marked 4 inline comments as done.Oct 28 2021, 7:43 AM

Harbormaster completed remote builds in B131204: Diff 383031.Oct 28 2021, 8:28 AM

Thank you @RosieSumpter, this patch looks good to me and I think all of the comments have been addressed.

llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll
854	Should `fast` here be replaced with `nnan`?

This revision is now accepted and ready to land.Oct 28 2021, 9:01 AM

RosieSumpter added inline comments.Oct 28 2021, 9:06 AM

llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll
854	Yes it should! I will change that before committing.

david-arm added inline comments.Nov 1 2021, 2:32 AM

llvm/test/Transforms/LoopVectorize/reduction-inloop-pred.ll
1020	Nice! Instcombine has folded the call and the fadd together into one!

Just adding an extra datapoint to do with as you wish.

llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll
854	Generally speaking it is better for `CHECK-NOT` lines to be as small/simple as possible so that you do not end up creating a passing test just because some minor detail has changed. In this instance the `CHECK-NOT` does not care about the instruction's flags but rather wants to ensure there is never a call to `llvm.vector.reduce.fadd`. In this regard I think just having `CHECK-UNORDERED-NOT: @llvm.vector.reduce.fadd` would yield a more resilient test. That said, given all the expected output is explicitly checked for I'm not sure what value these `CHECK-NOT` lines provide.

RosieSumpter added inline comments.Nov 1 2021, 9:43 AM

llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll
854	Thanks @paulwalker-arm, I will change the `CHECK-NOT` lines. Although I see what you mean about them not having value here, maybe it's best to remove them.

Closed by commit rGdcb8222d8777: [LoopVectorize] Propagate fast-math flags for inloop reductions (authored by RosieSumpter). · Explain WhyNov 2 2021, 2:04 AM

This revision was automatically updated to reflect the committed changes.

RosieSumpter added a commit: rGdcb8222d8777: [LoopVectorize] Propagate fast-math flags for inloop reductions.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

3 lines

test/

Transforms/

LoopVectorize/

AArch64/

strict-fadd.ll

119 lines

reduction-inloop-pred.ll

24 lines

reduction-inloop.ll

16 lines

Diff 384003

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 9,770 Lines • ▼ Show 20 Lines	State.ILV->vectorizeInterleaveGroup(IG, definedValues(), State, getAddr(),
getStoredValues(), getMask());		getStoredValues(), getMask());
}		}

void VPReductionRecipe::execute(VPTransformState &State) {		void VPReductionRecipe::execute(VPTransformState &State) {
assert(!State.Instance && "Reduction being replicated.");		assert(!State.Instance && "Reduction being replicated.");
Value *PrevInChain = State.get(getChainOp(), 0);		Value *PrevInChain = State.get(getChainOp(), 0);
RecurKind Kind = RdxDesc->getRecurrenceKind();		RecurKind Kind = RdxDesc->getRecurrenceKind();
bool IsOrdered = State.ILV->useOrderedReductions(*RdxDesc);		bool IsOrdered = State.ILV->useOrderedReductions(*RdxDesc);
		// Propagate the fast-math flags carried by the underlying instruction.
		david-armUnsubmitted Done Reply Inline Actions I completely forgot we can actually just use a guard here, i.e. IRBuilderBase::FastMathFlagGuard FMFGuard(State.Builder); State.Builder.setFastMathFlags(RdxDesc->getFastMathFlags()); then you don't need the extra line at the bottom anymore to reset the flags. david-arm: I completely forgot we can actually just use a guard here, i.e. IRBuilderBase…
		IRBuilderBase::FastMathFlagGuard FMFGuard(State.Builder);
		david-armUnsubmitted Done Reply Inline Actions Maybe we should do this for all cases here, i.e. ordered and non-ordered? david-arm: Maybe we should do this for all cases here, i.e. ordered and non-ordered?
		State.Builder.setFastMathFlags(RdxDesc->getFastMathFlags());
for (unsigned Part = 0; Part < State.UF; ++Part) {		for (unsigned Part = 0; Part < State.UF; ++Part) {
		david-armUnsubmitted Done Reply Inline Actions I just realised that you might be able to replace the whole if block with: State.Builder.setFastMathFlags(RdxDesc->getFastMathFlags()); here. david-arm: I just realised that you might be able to replace the whole if block with: State.Builder.
Value *NewVecOp = State.get(getVecOp(), Part);		Value *NewVecOp = State.get(getVecOp(), Part);
if (VPValue *Cond = getCondOp()) {		if (VPValue *Cond = getCondOp()) {
Value *NewCond = State.get(Cond, Part);		Value *NewCond = State.get(Cond, Part);
VectorType *VecTy = cast<VectorType>(NewVecOp->getType());		VectorType *VecTy = cast<VectorType>(NewVecOp->getType());
Value *Iden = RdxDesc->getRecurrenceIdentity(		Value *Iden = RdxDesc->getRecurrenceIdentity(
Kind, VecTy->getElementType(), RdxDesc->getFastMathFlags());		Kind, VecTy->getElementType(), RdxDesc->getFastMathFlags());
Value *IdenVec =		Value *IdenVec =
State.Builder.CreateVectorSplat(VecTy->getElementCount(), Iden);		State.Builder.CreateVectorSplat(VecTy->getElementCount(), Iden);
Value *Select = State.Builder.CreateSelect(NewCond, NewVecOp, IdenVec);		Value *Select = State.Builder.CreateSelect(NewCond, NewVecOp, IdenVec);
NewVecOp = Select;		NewVecOp = Select;
}		}
Value *NewRed;		Value *NewRed;
Value *NextInChain;		Value *NextInChain;
if (IsOrdered) {		if (IsOrdered) {
if (State.VF.isVector())		if (State.VF.isVector())
NewRed = createOrderedReduction(State.Builder, *RdxDesc, NewVecOp,		NewRed = createOrderedReduction(State.Builder, *RdxDesc, NewVecOp,
PrevInChain);		PrevInChain);
else		else
NewRed = State.Builder.CreateBinOp(		NewRed = State.Builder.CreateBinOp(
		david-armUnsubmitted Done Reply Inline Actions I just realised you're actually also fixing the case where VF=1. Is it worth having a test to ensure flags are propagated for VF=1 as well? david-arm: I just realised you're actually also fixing the case where VF=1. Is it worth having a test to…
(Instruction::BinaryOps)RdxDesc->getOpcode(Kind), PrevInChain,		(Instruction::BinaryOps)RdxDesc->getOpcode(Kind), PrevInChain,
NewVecOp);		NewVecOp);
PrevInChain = NewRed;		PrevInChain = NewRed;
} else {		} else {
PrevInChain = State.get(getChainOp(), Part);		PrevInChain = State.get(getChainOp(), Part);
NewRed = createTargetReduction(State.Builder, TTI, *RdxDesc, NewVecOp);		NewRed = createTargetReduction(State.Builder, TTI, *RdxDesc, NewVecOp);
}		}
if (RecurrenceDescriptor::isMinMaxRecurrenceKind(Kind)) {		if (RecurrenceDescriptor::isMinMaxRecurrenceKind(Kind)) {
▲ Show 20 Lines • Show All 830 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll

Show First 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	for.body:
%iv.next = add nuw nsw i64 %iv, 1		%iv.next = add nuw nsw i64 %iv, 1
%exitcond.not = icmp eq i64 %iv.next, %n		%exitcond.not = icmp eq i64 %iv.next, %n
br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0		br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0

for.end:		for.end:
ret float %add		ret float %add
}		}

		; Same as above but where fadd has a fast-math flag.
		define float @fadd_strict_fmf(float* noalias nocapture readonly %a, i64 %n) {
		; CHECK-ORDERED-LABEL: @fadd_strict_fmf
		; CHECK-ORDERED: vector.body:
		; CHECK-ORDERED: [[VEC_PHI:%.]] = phi float [ 0.000000e+00, %vector.ph ], [ [[RDX:%.]], %vector.body ]
		; CHECK-ORDERED: [[LOAD_VEC:%.]] = load <8 x float>, <8 x float>
		; CHECK-ORDERED: [[RDX]] = call nnan float @llvm.vector.reduce.fadd.v8f32(float [[VEC_PHI]], <8 x float> [[LOAD_VEC]])
		; CHECK-ORDERED: for.end:
		; CHECK-ORDERED: [[RES:%.]] = phi float [ [[SCALAR:%.]], %for.body ], [ [[RDX]], %middle.block ]
		; CHECK-ORDERED: ret float [[RES]]

		; CHECK-UNORDERED-LABEL: @fadd_strict_fmf
		; CHECK-UNORDERED: vector.body:
		; CHECK-UNORDERED: [[VEC_PHI:%.]] = phi <8 x float> [ <float 0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00>, %vector.ph ], [ [[FADD_VEC:%.]], %vector.body ]
		; CHECK-UNORDERED: [[LOAD_VEC:%.]] = load <8 x float>, <8 x float>
		; CHECK-UNORDERED: [[FADD_VEC]] = fadd nnan <8 x float> [[LOAD_VEC]], [[VEC_PHI]]
		; CHECK-UNORDERED-NOT: @llvm.vector.reduce.fadd
		; CHECK-UNORDERED: middle.block:
		; CHECK-UNORDERED: [[RDX:%.*]] = call nnan float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[FADD_VEC]])
		; CHECK-UNORDERED: for.body:
		; CHECK-UNORDERED: [[LOAD:%.]] = load float, float
		; CHECK-UNORDERED: [[FADD:%.]] = fadd nnan float [[LOAD]], {{.}}
		; CHECK-UNORDERED: for.end:
		; CHECK-UNORDERED: [[RES:%.*]] = phi float [ [[FADD]], %for.body ], [ [[RDX]], %middle.block ]
		; CHECK-UNORDERED: ret float [[RES]]

		; CHECK-NOT-VECTORIZED-LABEL: @fadd_strict_fmf
		; CHECK-NOT-VECTORIZED-NOT: vector.body

		entry:
		br label %for.body

		for.body:
		%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
		%sum.07 = phi float [ 0.000000e+00, %entry ], [ %add, %for.body ]
		%arrayidx = getelementptr inbounds float, float* %a, i64 %iv
		%0 = load float, float* %arrayidx, align 4
		%add = fadd nnan float %0, %sum.07
		%iv.next = add nuw nsw i64 %iv, 1
		%exitcond.not = icmp eq i64 %iv.next, %n
		br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0

		for.end:
		ret float %add
		}

define float @fadd_strict_unroll(float* noalias nocapture readonly %a, i64 %n) {		define float @fadd_strict_unroll(float* noalias nocapture readonly %a, i64 %n) {
; CHECK-ORDERED-LABEL: @fadd_strict_unroll		; CHECK-ORDERED-LABEL: @fadd_strict_unroll
; CHECK-ORDERED: vector.body:		; CHECK-ORDERED: vector.body:
; CHECK-ORDERED: %[[VEC_PHI1:.]] = phi float [ 0.000000e+00, %vector.ph ], [ %[[RDX4:.]], %vector.body ]		; CHECK-ORDERED: %[[VEC_PHI1:.]] = phi float [ 0.000000e+00, %vector.ph ], [ %[[RDX4:.]], %vector.body ]
; CHECK-ORDERED-NOT: phi float [ 0.000000e+00, %vector.ph ], [ %[[RDX4]], %vector.body ]		; CHECK-ORDERED-NOT: phi float [ 0.000000e+00, %vector.ph ], [ %[[RDX4]], %vector.body ]
; CHECK-ORDERED: %[[LOAD1:.]] = load <8 x float>, <8 x float>		; CHECK-ORDERED: %[[LOAD1:.]] = load <8 x float>, <8 x float>
; CHECK-ORDERED: %[[LOAD2:.]] = load <8 x float>, <8 x float>		; CHECK-ORDERED: %[[LOAD2:.]] = load <8 x float>, <8 x float>
; CHECK-ORDERED: %[[LOAD3:.]] = load <8 x float>, <8 x float>		; CHECK-ORDERED: %[[LOAD3:.]] = load <8 x float>, <8 x float>
▲ Show 20 Lines • Show All 702 Lines • ▼ Show 20 Lines	for.body:
%iv.next = add nuw nsw i64 %iv, 1		%iv.next = add nuw nsw i64 %iv, 1
%exitcond.not = icmp eq i64 %iv.next, %n		%exitcond.not = icmp eq i64 %iv.next, %n
br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !4		br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !4

for.end:		for.end:
ret float %add		ret float %add
}		}

		; Same as above but where fadd has a fast-math flag.
		define float @fadd_scalar_vf_fmf(float* noalias nocapture readonly %a, i64 %n) {
		; CHECK-ORDERED-LABEL: @fadd_scalar_vf_fmf
		; CHECK-ORDERED: vector.body:
		; CHECK-ORDERED: [[VEC_PHI:%.]] = phi float [ 0.000000e+00, %vector.ph ], [ [[FADD4:%.]], %vector.body ]
		; CHECK-ORDERED: [[LOAD1:%.]] = load float, float
		; CHECK-ORDERED: [[LOAD2:%.]] = load float, float
		; CHECK-ORDERED: [[LOAD3:%.]] = load float, float
		; CHECK-ORDERED: [[LOAD4:%.]] = load float, float
		; CHECK-ORDERED: [[FADD1:%.*]] = fadd nnan float [[VEC_PHI]], [[LOAD1]]
		; CHECK-ORDERED: [[FADD2:%.*]] = fadd nnan float [[FADD1]], [[LOAD2]]
		; CHECK-ORDERED: [[FADD3:%.*]] = fadd nnan float [[FADD2]], [[LOAD3]]
		; CHECK-ORDERED: [[FADD4]] = fadd nnan float [[FADD3]], [[LOAD4]]
		; CHECK-ORDERED-NOT: @llvm.vector.reduce.fadd
		; CHECK-ORDERED: scalar.ph:
		; CHECK-ORDERED: [[MERGE_RDX:%.*]] = phi float [ 0.000000e+00, %entry ], [ [[FADD4]], %middle.block ]
		; CHECK-ORDERED: for.body:
		; CHECK-ORDERED: [[SUM_07:%.]] = phi float [ [[MERGE_RDX]], %scalar.ph ], [ [[FADD5:%.]], %for.body ]
		; CHECK-ORDERED: [[LOAD5:%.]] = load float, float
		; CHECK-ORDERED: [[FADD5]] = fadd nnan float [[LOAD5]], [[SUM_07]]
		; CHECK-ORDERED: for.end:
		; CHECK-ORDERED: [[RES:%.*]] = phi float [ [[FADD5]], %for.body ], [ [[FADD4]], %middle.block ]
		; CHECK-ORDERED: ret float [[RES]]

		; CHECK-UNORDERED-LABEL: @fadd_scalar_vf_fmf
		; CHECK-UNORDERED: vector.body:
		; CHECK-UNORDERED: [[VEC_PHI1:%.]] = phi float [ 0.000000e+00, %vector.ph ], [ [[FADD1:%.]], %vector.body ]
		; CHECK-UNORDERED: [[VEC_PHI2:%.]] = phi float [ -0.000000e+00, %vector.ph ], [ [[FADD2:%.]], %vector.body ]
		; CHECK-UNORDERED: [[VEC_PHI3:%.]] = phi float [ -0.000000e+00, %vector.ph ], [ [[FADD3:%.]], %vector.body ]
		; CHECK-UNORDERED: [[VEC_PHI4:%.]] = phi float [ -0.000000e+00, %vector.ph ], [ [[FADD4:%.]], %vector.body ]
		; CHECK-UNORDERED: [[LOAD1:%.]] = load float, float
		; CHECK-UNORDERED: [[LOAD2:%.]] = load float, float
		; CHECK-UNORDERED: [[LOAD3:%.]] = load float, float
		; CHECK-UNORDERED: [[LOAD4:%.]] = load float, float
		; CHECK-UNORDERED: [[FADD1]] = fadd nnan float [[LOAD1]], [[VEC_PHI1]]
		; CHECK-UNORDERED: [[FADD2]] = fadd nnan float [[LOAD2]], [[VEC_PHI2]]
		; CHECK-UNORDERED: [[FADD3]] = fadd nnan float [[LOAD3]], [[VEC_PHI3]]
		; CHECK-UNORDERED: [[FADD4]] = fadd nnan float [[LOAD4]], [[VEC_PHI4]]
		; CHECK-UNORDERED-NOT: @llvm.vector.reduce.fadd
		kmclaughlinUnsubmitted Not Done Reply Inline Actions Should `fast` here be replaced with `nnan`? kmclaughlin: Should `fast` here be replaced with `nnan`?
		RosieSumpterAuthorUnsubmitted Done Reply Inline Actions Yes it should! I will change that before committing. RosieSumpter: Yes it should! I will change that before committing.
		paulwalker-armUnsubmitted Not Done Reply Inline Actions Generally speaking it is better for `CHECK-NOT` lines to be as small/simple as possible so that you do not end up creating a passing test just because some minor detail has changed. In this instance the `CHECK-NOT` does not care about the instruction's flags but rather wants to ensure there is never a call to `llvm.vector.reduce.fadd`. In this regard I think just having `CHECK-UNORDERED-NOT: @llvm.vector.reduce.fadd` would yield a more resilient test. That said, given all the expected output is explicitly checked for I'm not sure what value these `CHECK-NOT` lines provide. paulwalker-arm: Generally speaking it is better for `CHECK-NOT` lines to be as small/simple as possible so that…
		RosieSumpterAuthorUnsubmitted Done Reply Inline Actions Thanks @paulwalker-arm, I will change the `CHECK-NOT` lines. Although I see what you mean about them not having value here, maybe it's best to remove them. RosieSumpter: Thanks @paulwalker-arm, I will change the `CHECK-NOT` lines. Although I see what you mean about…
		; CHECK-UNORDERED: middle.block:
		; CHECK-UNORDERED: [[BIN_RDX1:%.*]] = fadd nnan float [[FADD2]], [[FADD1]]
		; CHECK-UNORDERED: [[BIN_RDX2:%.*]] = fadd nnan float [[FADD3]], [[BIN_RDX1]]
		; CHECK-UNORDERED: [[BIN_RDX3:%.*]] = fadd nnan float [[FADD4]], [[BIN_RDX2]]
		; CHECK-UNORDERED: scalar.ph:
		; CHECK-UNORDERED: [[MERGE_RDX:%.*]] = phi float [ 0.000000e+00, %entry ], [ [[BIN_RDX3]], %middle.block ]
		; CHECK-UNORDERED: for.body:
		; CHECK-UNORDERED: [[SUM_07:%.]] = phi float [ [[MERGE_RDX]], %scalar.ph ], [ [[FADD5:%.]], %for.body ]
		; CHECK-UNORDERED: [[LOAD5:%.]] = load float, float
		; CHECK-UNORDERED: [[FADD5]] = fadd nnan float [[LOAD5]], [[SUM_07]]
		; CHECK-UORDERED: for.end
		; CHECK-UNORDERED: [[RES:%.*]] = phi float [ [[FADD5]], %for.body ], [ [[BIN_RDX3]], %middle.block ]
		; CHECK-UNORDERED: ret float [[RES]]

		; CHECK-NOT-VECTORIZED-LABEL: @fadd_scalar_vf_fmf
		; CHECK-NOT-VECTORIZED-NOT: vector.body

		entry:
		br label %for.body

		for.body:
		%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
		%sum.07 = phi float [ 0.000000e+00, %entry ], [ %add, %for.body ]
		%arrayidx = getelementptr inbounds float, float* %a, i64 %iv
		%0 = load float, float* %arrayidx, align 4
		%add = fadd nnan float %0, %sum.07
		%iv.next = add nuw nsw i64 %iv, 1
		%exitcond.not = icmp eq i64 %iv.next, %n
		br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !4

		for.end:
		ret float %add
		}

; Test case where the reduction step is a first-order recurrence.		; Test case where the reduction step is a first-order recurrence.
define double @reduction_increment_by_first_order_recurrence() {		define double @reduction_increment_by_first_order_recurrence() {
; CHECK-ORDERED-LABEL: @reduction_increment_by_first_order_recurrence(		; CHECK-ORDERED-LABEL: @reduction_increment_by_first_order_recurrence(
; CHECK-ORDERED: vector.body:		; CHECK-ORDERED: vector.body:
; CHECK-ORDERED: [[RED:%.]] = phi double [ 0.000000e+00, %vector.ph ], [ [[RED_NEXT:%.]], %vector.body ]		; CHECK-ORDERED: [[RED:%.]] = phi double [ 0.000000e+00, %vector.ph ], [ [[RED_NEXT:%.]], %vector.body ]
; CHECK-ORDERED: [[VECTOR_RECUR:%.]] = phi <4 x double> [ <double poison, double poison, double poison, double 0.000000e+00>, %vector.ph ], [ [[FOR_NEXT:%.]], %vector.body ]		; CHECK-ORDERED: [[VECTOR_RECUR:%.]] = phi <4 x double> [ <double poison, double poison, double poison, double 0.000000e+00>, %vector.ph ], [ [[FOR_NEXT:%.]], %vector.body ]
; CHECK-ORDERED: [[FOR_NEXT]] = sitofp <4 x i32> %vec.ind to <4 x double>		; CHECK-ORDERED: [[FOR_NEXT]] = sitofp <4 x i32> %vec.ind to <4 x double>
; CHECK-ORDERED: [[TMP1:%.*]] = shufflevector <4 x double> [[VECTOR_RECUR]], <4 x double> [[FOR_NEXT]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>		; CHECK-ORDERED: [[TMP1:%.*]] = shufflevector <4 x double> [[VECTOR_RECUR]], <4 x double> [[FOR_NEXT]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
▲ Show 20 Lines • Show All 51 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/reduction-inloop-pred.ll

	Show First 20 Lines • Show All 1,010 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP34:%.*]] = insertelement <4 x float> [[TMP28]], float [[TMP33]], i32 3			; CHECK-NEXT: [[TMP34:%.*]] = insertelement <4 x float> [[TMP28]], float [[TMP33]], i32 3
	; CHECK-NEXT: [[TMP35:%.]] = getelementptr inbounds float, float [[B]], i64 [[TMP31]]			; CHECK-NEXT: [[TMP35:%.]] = getelementptr inbounds float, float [[B]], i64 [[TMP31]]
	; CHECK-NEXT: [[TMP36:%.]] = load float, float [[TMP35]], align 4			; CHECK-NEXT: [[TMP36:%.]] = load float, float [[TMP35]], align 4
	; CHECK-NEXT: [[TMP37:%.*]] = insertelement <4 x float> [[TMP29]], float [[TMP36]], i32 3			; CHECK-NEXT: [[TMP37:%.*]] = insertelement <4 x float> [[TMP29]], float [[TMP36]], i32 3
	; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE6]]			; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE6]]
	; CHECK: pred.load.continue6:			; CHECK: pred.load.continue6:
	; CHECK-NEXT: [[TMP38:%.*]] = phi <4 x float> [ [[TMP28]], [[PRED_LOAD_CONTINUE4]] ], [ [[TMP34]], [[PRED_LOAD_IF5]] ]			; CHECK-NEXT: [[TMP38:%.*]] = phi <4 x float> [ [[TMP28]], [[PRED_LOAD_CONTINUE4]] ], [ [[TMP34]], [[PRED_LOAD_IF5]] ]
	; CHECK-NEXT: [[TMP39:%.*]] = phi <4 x float> [ [[TMP29]], [[PRED_LOAD_CONTINUE4]] ], [ [[TMP37]], [[PRED_LOAD_IF5]] ]			; CHECK-NEXT: [[TMP39:%.*]] = phi <4 x float> [ [[TMP29]], [[PRED_LOAD_CONTINUE4]] ], [ [[TMP37]], [[PRED_LOAD_IF5]] ]
	; CHECK-NEXT: [[TMP40:%.*]] = select <4 x i1> [[TMP0]], <4 x float> [[TMP38]], <4 x float> zeroinitializer			; CHECK-NEXT: [[TMP40:%.*]] = select fast <4 x i1> [[TMP0]], <4 x float> [[TMP38]], <4 x float> zeroinitializer
	; CHECK-NEXT: [[TMP41:%.*]] = call fast float @llvm.vector.reduce.fadd.v4f32(float -0.000000e+00, <4 x float> [[TMP40]])			; CHECK-NEXT: [[TMP41:%.*]] = call fast float @llvm.vector.reduce.fadd.v4f32(float [[VEC_PHI]], <4 x float> [[TMP40]])
				david-armUnsubmitted Not Done Reply Inline Actions Nice! Instcombine has folded the call and the fadd together into one! david-arm: Nice! Instcombine has folded the call and the fadd together into one!
	; CHECK-NEXT: [[TMP42:%.*]] = fadd float [[TMP41]], [[VEC_PHI]]			; CHECK-NEXT: [[TMP42:%.*]] = select fast <4 x i1> [[TMP0]], <4 x float> [[TMP39]], <4 x float> zeroinitializer
	; CHECK-NEXT: [[TMP43:%.*]] = select <4 x i1> [[TMP0]], <4 x float> [[TMP39]], <4 x float> zeroinitializer			; CHECK-NEXT: [[TMP43:%.*]] = call fast float @llvm.vector.reduce.fadd.v4f32(float [[TMP41]], <4 x float> [[TMP42]])
	; CHECK-NEXT: [[TMP44:%.*]] = call fast float @llvm.vector.reduce.fadd.v4f32(float -0.000000e+00, <4 x float> [[TMP43]])
	; CHECK-NEXT: [[TMP45]] = fadd float [[TMP44]], [[TMP42]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>			; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
	; CHECK-NEXT: [[TMP46:%.*]] = icmp eq i64 [[INDEX_NEXT]], 260			; CHECK-NEXT: [[TMP44:%.*]] = icmp eq i64 [[INDEX_NEXT]], 260
	; CHECK-NEXT: br i1 [[TMP46]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP20:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP44]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP20:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: br i1 undef, label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP21:![0-9]+]]			; CHECK-NEXT: br i1 undef, label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP21:![0-9]+]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: [[RESULT_0_LCSSA:%.*]] = phi float [ undef, [[FOR_BODY]] ], [ [[TMP45]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[RESULT_0_LCSSA:%.*]] = phi float [ undef, [[FOR_BODY]] ], [ [[TMP43]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: ret float [[RESULT_0_LCSSA]]			; CHECK-NEXT: ret float [[RESULT_0_LCSSA]]
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %entry, %for.body			for.body: ; preds = %entry, %for.body
	%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]			%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]
	%result.08 = phi float [ %fadd, %for.body ], [ 0.0, %entry ]			%result.08 = phi float [ %fadd, %for.body ], [ 0.0, %entry ]
	▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP34:%.*]] = insertelement <4 x float> [[TMP28]], float [[TMP33]], i32 3			; CHECK-NEXT: [[TMP34:%.*]] = insertelement <4 x float> [[TMP28]], float [[TMP33]], i32 3
	; CHECK-NEXT: [[TMP35:%.]] = getelementptr inbounds float, float [[B]], i64 [[TMP31]]			; CHECK-NEXT: [[TMP35:%.]] = getelementptr inbounds float, float [[B]], i64 [[TMP31]]
	; CHECK-NEXT: [[TMP36:%.]] = load float, float [[TMP35]], align 4			; CHECK-NEXT: [[TMP36:%.]] = load float, float [[TMP35]], align 4
	; CHECK-NEXT: [[TMP37:%.*]] = insertelement <4 x float> [[TMP29]], float [[TMP36]], i32 3			; CHECK-NEXT: [[TMP37:%.*]] = insertelement <4 x float> [[TMP29]], float [[TMP36]], i32 3
	; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE6]]			; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE6]]
	; CHECK: pred.load.continue6:			; CHECK: pred.load.continue6:
	; CHECK-NEXT: [[TMP38:%.*]] = phi <4 x float> [ [[TMP28]], [[PRED_LOAD_CONTINUE4]] ], [ [[TMP34]], [[PRED_LOAD_IF5]] ]			; CHECK-NEXT: [[TMP38:%.*]] = phi <4 x float> [ [[TMP28]], [[PRED_LOAD_CONTINUE4]] ], [ [[TMP34]], [[PRED_LOAD_IF5]] ]
	; CHECK-NEXT: [[TMP39:%.*]] = phi <4 x float> [ [[TMP29]], [[PRED_LOAD_CONTINUE4]] ], [ [[TMP37]], [[PRED_LOAD_IF5]] ]			; CHECK-NEXT: [[TMP39:%.*]] = phi <4 x float> [ [[TMP29]], [[PRED_LOAD_CONTINUE4]] ], [ [[TMP37]], [[PRED_LOAD_IF5]] ]
	; CHECK-NEXT: [[TMP40:%.*]] = select <4 x i1> [[TMP0]], <4 x float> [[TMP38]], <4 x float> <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>			; CHECK-NEXT: [[TMP40:%.*]] = select fast <4 x i1> [[TMP0]], <4 x float> [[TMP38]], <4 x float> <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>
	; CHECK-NEXT: [[TMP41:%.*]] = call fast float @llvm.vector.reduce.fmul.v4f32(float 1.000000e+00, <4 x float> [[TMP40]])			; CHECK-NEXT: [[TMP41:%.*]] = call fast float @llvm.vector.reduce.fmul.v4f32(float 1.000000e+00, <4 x float> [[TMP40]])
	; CHECK-NEXT: [[TMP42:%.*]] = fmul float [[TMP41]], [[VEC_PHI]]			; CHECK-NEXT: [[TMP42:%.*]] = fmul fast float [[TMP41]], [[VEC_PHI]]
	; CHECK-NEXT: [[TMP43:%.*]] = select <4 x i1> [[TMP0]], <4 x float> [[TMP39]], <4 x float> <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>			; CHECK-NEXT: [[TMP43:%.*]] = select fast <4 x i1> [[TMP0]], <4 x float> [[TMP39]], <4 x float> <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>
	; CHECK-NEXT: [[TMP44:%.*]] = call fast float @llvm.vector.reduce.fmul.v4f32(float 1.000000e+00, <4 x float> [[TMP43]])			; CHECK-NEXT: [[TMP44:%.*]] = call fast float @llvm.vector.reduce.fmul.v4f32(float 1.000000e+00, <4 x float> [[TMP43]])
	; CHECK-NEXT: [[TMP45]] = fmul float [[TMP44]], [[TMP42]]			; CHECK-NEXT: [[TMP45]] = fmul fast float [[TMP44]], [[TMP42]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>			; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
	; CHECK-NEXT: [[TMP46:%.*]] = icmp eq i64 [[INDEX_NEXT]], 260			; CHECK-NEXT: [[TMP46:%.*]] = icmp eq i64 [[INDEX_NEXT]], 260
	; CHECK-NEXT: br i1 [[TMP46]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP22:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP46]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP22:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	▲ Show 20 Lines • Show All 487 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/reduction-inloop.ll

	Show First 20 Lines • Show All 552 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI:%.]] = phi float [ 0.000000e+00, [[VECTOR_PH]] ], [ [[TMP7:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI:%.]] = phi float [ 0.000000e+00, [[VECTOR_PH]] ], [ [[TMP7:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds float, float [[A:%.*]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds float, float [[A:%.*]], i64 [[INDEX]]
	; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[TMP0]] to <4 x float>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[TMP0]] to <4 x float>*
	; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4			; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds float, float [[B:%.*]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds float, float [[B:%.*]], i64 [[INDEX]]
	; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[TMP2]] to <4 x float>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[TMP2]] to <4 x float>*
	; CHECK-NEXT: [[WIDE_LOAD1:%.]] = load <4 x float>, <4 x float> [[TMP3]], align 4			; CHECK-NEXT: [[WIDE_LOAD1:%.]] = load <4 x float>, <4 x float> [[TMP3]], align 4
	; CHECK-NEXT: [[TMP4:%.*]] = call fast float @llvm.vector.reduce.fadd.v4f32(float -0.000000e+00, <4 x float> [[WIDE_LOAD]])			; CHECK-NEXT: [[TMP4:%.*]] = call fast float @llvm.vector.reduce.fadd.v4f32(float [[VEC_PHI]], <4 x float> [[WIDE_LOAD]])
	; CHECK-NEXT: [[TMP5:%.*]] = fadd float [[TMP4]], [[VEC_PHI]]			; CHECK-NEXT: [[TMP5:%.*]] = call fast float @llvm.vector.reduce.fadd.v4f32(float [[TMP4]], <4 x float> [[WIDE_LOAD1]])
	; CHECK-NEXT: [[TMP6:%.*]] = call fast float @llvm.vector.reduce.fadd.v4f32(float -0.000000e+00, <4 x float> [[WIDE_LOAD1]])
	; CHECK-NEXT: [[TMP7]] = fadd float [[TMP6]], [[TMP5]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; CHECK-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], 256			; CHECK-NEXT: [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], 256
	; CHECK-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP22:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP22:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: br i1 undef, label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP23:![0-9]+]]			; CHECK-NEXT: br i1 undef, label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP23:![0-9]+]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: [[RESULT_0_LCSSA:%.*]] = phi float [ undef, [[FOR_BODY]] ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[RESULT_0_LCSSA:%.*]] = phi float [ undef, [[FOR_BODY]] ], [ [[TMP5]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: ret float [[RESULT_0_LCSSA]]			; CHECK-NEXT: ret float [[RESULT_0_LCSSA]]
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %entry, %for.body			for.body: ; preds = %entry, %for.body
	%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]			%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]
	%result.08 = phi float [ %fadd, %for.body ], [ 0.0, %entry ]			%result.08 = phi float [ %fadd, %for.body ], [ 0.0, %entry ]
	Show All 24 Lines
	; CHECK-NEXT: [[VEC_PHI:%.]] = phi float [ 0.000000e+00, [[VECTOR_PH]] ], [ [[TMP7:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI:%.]] = phi float [ 0.000000e+00, [[VECTOR_PH]] ], [ [[TMP7:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds float, float [[A:%.*]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds float, float [[A:%.*]], i64 [[INDEX]]
	; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[TMP0]] to <4 x float>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[TMP0]] to <4 x float>*
	; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4			; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds float, float [[B:%.*]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds float, float [[B:%.*]], i64 [[INDEX]]
	; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[TMP2]] to <4 x float>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[TMP2]] to <4 x float>*
	; CHECK-NEXT: [[WIDE_LOAD1:%.]] = load <4 x float>, <4 x float> [[TMP3]], align 4			; CHECK-NEXT: [[WIDE_LOAD1:%.]] = load <4 x float>, <4 x float> [[TMP3]], align 4
	; CHECK-NEXT: [[TMP4:%.*]] = call fast float @llvm.vector.reduce.fmul.v4f32(float 1.000000e+00, <4 x float> [[WIDE_LOAD]])			; CHECK-NEXT: [[TMP4:%.*]] = call fast float @llvm.vector.reduce.fmul.v4f32(float 1.000000e+00, <4 x float> [[WIDE_LOAD]])
	; CHECK-NEXT: [[TMP5:%.*]] = fmul float [[TMP4]], [[VEC_PHI]]			; CHECK-NEXT: [[TMP5:%.*]] = fmul fast float [[TMP4]], [[VEC_PHI]]
	; CHECK-NEXT: [[TMP6:%.*]] = call fast float @llvm.vector.reduce.fmul.v4f32(float 1.000000e+00, <4 x float> [[WIDE_LOAD1]])			; CHECK-NEXT: [[TMP6:%.*]] = call fast float @llvm.vector.reduce.fmul.v4f32(float 1.000000e+00, <4 x float> [[WIDE_LOAD1]])
	; CHECK-NEXT: [[TMP7]] = fmul float [[TMP6]], [[TMP5]]			; CHECK-NEXT: [[TMP7]] = fmul fast float [[TMP6]], [[TMP5]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; CHECK-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], 256			; CHECK-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], 256
	; CHECK-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP24:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP24:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	▲ Show 20 Lines • Show All 472 Lines • Show Last 20 Lines