This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
4/4
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
-
AArch64/
2/4
strict-fadd.ll
1
reduction-inloop-pred.ll
-
reduction-inloop.ll

Differential D112548

[LoopVectorize] Propagate fast-math flags for inloop reductions
ClosedPublic

Authored by RosieSumpter on Oct 26 2021, 8:15 AM.

Download Raw Diff

Details

Reviewers

paulwalker-arm
david-arm
kmclaughlin
sdesmalen

Commits

rGdcb8222d8777: [LoopVectorize] Propagate fast-math flags for inloop reductions

Summary

This patch updates VPReductionRecipe::execute so that the fast-math
flags associated with the underlying instruction of the VPReductionRecipe are
propagated through to the reductions which are created.

Depends on D112547

Diff Detail

Event Timeline

RosieSumpter created this revision.Oct 26 2021, 8:15 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptOct 26 2021, 8:15 AM

RosieSumpter requested review of this revision.Oct 26 2021, 8:15 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 26 2021, 8:15 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B130715: Diff 382333.Oct 26 2021, 8:16 AM

Note: this patch builds on the NFC patch D112547

RosieSumpter mentioned this in D111555: [LoopVectorize] Add vector reduction support for fmuladd intrinsic.Oct 26 2021, 8:31 AM

Thanks for this patch @RosieSumpter - seems like a nice fix! I wonder if it's also worth changing createOrderedReduction to be similar to createTargetReduction and set the flags in there too? Also, you've fixed up the VF=1 cases too.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9752	I completely forgot we can actually just use a guard here, i.e. IRBuilderBase::FastMathFlagGuard FMFGuard(State.Builder); State.Builder.setFastMathFlags(RdxDesc->getFastMathFlags()); then you don't need the extra line at the bottom anymore to reset the flags.
9753	Maybe we should do this for all cases here, i.e. ordered and non-ordered?
9755	I just realised that you might be able to replace the whole if block with: State.Builder.setFastMathFlags(RdxDesc->getFastMathFlags()); here.
9774	I just realised you're actually also fixing the case where VF=1. Is it worth having a test to ensure flags are propagated for VF=1 as well?

RosieSumpter edited the summary of this revision. (Show Details)Oct 28 2021, 1:34 AM

RosieSumpter added a parent revision: D112547: [LoopVectorize] Clean up VPReductionRecipe::execute. NFC.

RosieSumpter added a child revision: D111555: [LoopVectorize] Add vector reduction support for fmuladd intrinsic.

Use IRBuilderBase::FastMathFlagGuard
Propagate fast-math flags for both ordered and non-ordered reductions
Updated tests

RosieSumpter marked 4 inline comments as done.Oct 28 2021, 7:43 AM

Harbormaster completed remote builds in B131204: Diff 383031.Oct 28 2021, 8:28 AM

Thank you @RosieSumpter, this patch looks good to me and I think all of the comments have been addressed.

llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll
854	Should `fast` here be replaced with `nnan`?

This revision is now accepted and ready to land.Oct 28 2021, 9:01 AM

RosieSumpter added inline comments.Oct 28 2021, 9:06 AM

llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll
854	Yes it should! I will change that before committing.

david-arm added inline comments.Nov 1 2021, 2:32 AM

llvm/test/Transforms/LoopVectorize/reduction-inloop-pred.ll
1020	Nice! Instcombine has folded the call and the fadd together into one!

Just adding an extra datapoint to do with as you wish.

llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll
854	Generally speaking it is better for `CHECK-NOT` lines to be as small/simple as possible so that you do not end up creating a passing test just because some minor detail has changed. In this instance the `CHECK-NOT` does not care about the instruction's flags but rather wants to ensure there is never a call to `llvm.vector.reduce.fadd`. In this regard I think just having `CHECK-UNORDERED-NOT: @llvm.vector.reduce.fadd` would yield a more resilient test. That said, given all the expected output is explicitly checked for I'm not sure what value these `CHECK-NOT` lines provide.

RosieSumpter added inline comments.Nov 1 2021, 9:43 AM

llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll
854	Thanks @paulwalker-arm, I will change the `CHECK-NOT` lines. Although I see what you mean about them not having value here, maybe it's best to remove them.

Closed by commit rGdcb8222d8777: [LoopVectorize] Propagate fast-math flags for inloop reductions (authored by RosieSumpter). · Explain WhyNov 2 2021, 2:04 AM

This revision was automatically updated to reflect the committed changes.

RosieSumpter added a commit: rGdcb8222d8777: [LoopVectorize] Propagate fast-math flags for inloop reductions.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

3 lines

test/

Transforms/

LoopVectorize/

AArch64/

strict-fadd.ll

119 lines

reduction-inloop-pred.ll

24 lines

reduction-inloop.ll

16 lines

Diff 383031

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 9,743 Lines • ▼ Show 20 Lines	State.ILV->vectorizeInterleaveGroup(IG, definedValues(), State, getAddr(),
getStoredValues(), getMask());		getStoredValues(), getMask());
}		}

void VPReductionRecipe::execute(VPTransformState &State) {		void VPReductionRecipe::execute(VPTransformState &State) {
assert(!State.Instance && "Reduction being replicated.");		assert(!State.Instance && "Reduction being replicated.");
Value *PrevInChain = State.get(getChainOp(), 0);		Value *PrevInChain = State.get(getChainOp(), 0);
RecurKind Kind = RdxDesc->getRecurrenceKind();		RecurKind Kind = RdxDesc->getRecurrenceKind();
bool IsOrdered = State.ILV->useOrderedReductions(*RdxDesc);		bool IsOrdered = State.ILV->useOrderedReductions(*RdxDesc);
		// Propagate the fast-math flags carried by the underlying instruction.
		david-armUnsubmitted Done Reply Inline Actions I completely forgot we can actually just use a guard here, i.e. IRBuilderBase::FastMathFlagGuard FMFGuard(State.Builder); State.Builder.setFastMathFlags(RdxDesc->getFastMathFlags()); then you don't need the extra line at the bottom anymore to reset the flags. david-arm: I completely forgot we can actually just use a guard here, i.e. IRBuilderBase…
		IRBuilderBase::FastMathFlagGuard FMFGuard(State.Builder);
		david-armUnsubmitted Done Reply Inline Actions Maybe we should do this for all cases here, i.e. ordered and non-ordered? david-arm: Maybe we should do this for all cases here, i.e. ordered and non-ordered?
		State.Builder.setFastMathFlags(RdxDesc->getFastMathFlags());
for (unsigned Part = 0; Part < State.UF; ++Part) {		for (unsigned Part = 0; Part < State.UF; ++Part) {
		david-armUnsubmitted Done Reply Inline Actions I just realised that you might be able to replace the whole if block with: State.Builder.setFastMathFlags(RdxDesc->getFastMathFlags()); here. david-arm: I just realised that you might be able to replace the whole if block with: State.Builder.
Value *NewVecOp = State.get(getVecOp(), Part);		Value *NewVecOp = State.get(getVecOp(), Part);
if (VPValue *Cond = getCondOp()) {		if (VPValue *Cond = getCondOp()) {
Value *NewCond = State.get(Cond, Part);		Value *NewCond = State.get(Cond, Part);
VectorType *VecTy = cast<VectorType>(NewVecOp->getType());		VectorType *VecTy = cast<VectorType>(NewVecOp->getType());
Value *Iden = RdxDesc->getRecurrenceIdentity(		Value *Iden = RdxDesc->getRecurrenceIdentity(
Kind, VecTy->getElementType(), RdxDesc->getFastMathFlags());		Kind, VecTy->getElementType(), RdxDesc->getFastMathFlags());
Value *IdenVec =		Value *IdenVec =
State.Builder.CreateVectorSplat(VecTy->getElementCount(), Iden);		State.Builder.CreateVectorSplat(VecTy->getElementCount(), Iden);
Value *Select = State.Builder.CreateSelect(NewCond, NewVecOp, IdenVec);		Value *Select = State.Builder.CreateSelect(NewCond, NewVecOp, IdenVec);
NewVecOp = Select;		NewVecOp = Select;
}		}
Value *NewRed;		Value *NewRed;
Value *NextInChain;		Value *NextInChain;
if (IsOrdered) {		if (IsOrdered) {
if (State.VF.isVector())		if (State.VF.isVector())
NewRed = createOrderedReduction(State.Builder, *RdxDesc, NewVecOp,		NewRed = createOrderedReduction(State.Builder, *RdxDesc, NewVecOp,
PrevInChain);		PrevInChain);
else		else
NewRed = State.Builder.CreateBinOp(		NewRed = State.Builder.CreateBinOp(
		david-armUnsubmitted Done Reply Inline Actions I just realised you're actually also fixing the case where VF=1. Is it worth having a test to ensure flags are propagated for VF=1 as well? david-arm: I just realised you're actually also fixing the case where VF=1. Is it worth having a test to…
(Instruction::BinaryOps)RdxDesc->getOpcode(Kind), PrevInChain,		(Instruction::BinaryOps)RdxDesc->getOpcode(Kind), PrevInChain,
NewVecOp);		NewVecOp);
PrevInChain = NewRed;		PrevInChain = NewRed;
} else {		} else {
PrevInChain = State.get(getChainOp(), Part);		PrevInChain = State.get(getChainOp(), Part);
NewRed = createTargetReduction(State.Builder, TTI, *RdxDesc, NewVecOp);		NewRed = createTargetReduction(State.Builder, TTI, *RdxDesc, NewVecOp);
}		}
if (RecurrenceDescriptor::isMinMaxRecurrenceKind(Kind)) {		if (RecurrenceDescriptor::isMinMaxRecurrenceKind(Kind)) {
▲ Show 20 Lines • Show All 831 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll

Show First 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	for.body:
%iv.next = add nuw nsw i64 %iv, 1		%iv.next = add nuw nsw i64 %iv, 1
%exitcond.not = icmp eq i64 %iv.next, %n		%exitcond.not = icmp eq i64 %iv.next, %n
br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0		br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0

for.end:		for.end:
ret float %add		ret float %add
}		}

		; Same as above but where fadd has a fast-math flag.
		define float @fadd_strict_fmf(float* noalias nocapture readonly %a, i64 %n) {
		; CHECK-ORDERED-LABEL: @fadd_strict_fmf
		; CHECK-ORDERED: vector.body:
		; CHECK-ORDERED: [[VEC_PHI:%.]] = phi float [ 0.000000e+00, %vector.ph ], [ [[RDX:%.]], %vector.body ]
		; CHECK-ORDERED: [[LOAD_VEC:%.]] = load <8 x float>, <8 x float>
		; CHECK-ORDERED: [[RDX]] = call nnan float @llvm.vector.reduce.fadd.v8f32(float [[VEC_PHI]], <8 x float> [[LOAD_VEC]])
		; CHECK-ORDERED: for.end:
		; CHECK-ORDERED: [[RES:%.]] = phi float [ [[SCALAR:%.]], %for.body ], [ [[RDX]], %middle.block ]
		; CHECK-ORDERED: ret float [[RES]]

		; CHECK-UNORDERED-LABEL: @fadd_strict_fmf
		; CHECK-UNORDERED: vector.body:
		; CHECK-UNORDERED: [[VEC_PHI:%.]] = phi <8 x float> [ <float 0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00>, %vector.ph ], [ [[FADD_VEC:%.]], %vector.body ]
		; CHECK-UNORDERED: [[LOAD_VEC:%.]] = load <8 x float>, <8 x float>
		; CHECK-UNORDERED: [[FADD_VEC]] = fadd nnan <8 x float> [[LOAD_VEC]], [[VEC_PHI]]
		; CHECK-UNORDERED-NOT: call nnan float @llvm.vector.reduce.fadd
		; CHECK-UNORDERED: middle.block:
		; CHECK-UNORDERED: [[RDX:%.*]] = call nnan float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[FADD_VEC]])
		; CHECK-UNORDERED: for.body:
		; CHECK-UNORDERED: [[LOAD:%.]] = load float, float
		; CHECK-UNORDERED: [[FADD:%.]] = fadd nnan float [[LOAD]], {{.}}
		; CHECK-UNORDERED: for.end:
		; CHECK-UNORDERED: [[RES:%.*]] = phi float [ [[FADD]], %for.body ], [ [[RDX]], %middle.block ]
		; CHECK-UNORDERED: ret float [[RES]]

		; CHECK-NOT-VECTORIZED-LABEL: @fadd_strict_fmf
		; CHECK-NOT-VECTORIZED-NOT: vector.body

		entry:
		br label %for.body

		for.body:
		%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
		%sum.07 = phi float [ 0.000000e+00, %entry ], [ %add, %for.body ]
		%arrayidx = getelementptr inbounds float, float* %a, i64 %iv
		%0 = load float, float* %arrayidx, align 4
		%add = fadd nnan float %0, %sum.07
		%iv.next = add nuw nsw i64 %iv, 1
		%exitcond.not = icmp eq i64 %iv.next, %n
		br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0

		for.end:
		ret float %add
		}

define float @fadd_strict_unroll(float* noalias nocapture readonly %a, i64 %n) {		define float @fadd_strict_unroll(float* noalias nocapture readonly %a, i64 %n) {
; CHECK-ORDERED-LABEL: @fadd_strict_unroll		; CHECK-ORDERED-LABEL: @fadd_strict_unroll
; CHECK-ORDERED: vector.body:		; CHECK-ORDERED: vector.body:
; CHECK-ORDERED: %[[VEC_PHI1:.]] = phi float [ 0.000000e+00, %vector.ph ], [ %[[RDX4:.]], %vector.body ]		; CHECK-ORDERED: %[[VEC_PHI1:.]] = phi float [ 0.000000e+00, %vector.ph ], [ %[[RDX4:.]], %vector.body ]
; CHECK-ORDERED-NOT: phi float [ 0.000000e+00, %vector.ph ], [ %[[RDX4]], %vector.body ]		; CHECK-ORDERED-NOT: phi float [ 0.000000e+00, %vector.ph ], [ %[[RDX4]], %vector.body ]
; CHECK-ORDERED: %[[LOAD1:.]] = load <8 x float>, <8 x float>		; CHECK-ORDERED: %[[LOAD1:.]] = load <8 x float>, <8 x float>
; CHECK-ORDERED: %[[LOAD2:.]] = load <8 x float>, <8 x float>		; CHECK-ORDERED: %[[LOAD2:.]] = load <8 x float>, <8 x float>
; CHECK-ORDERED: %[[LOAD3:.]] = load <8 x float>, <8 x float>		; CHECK-ORDERED: %[[LOAD3:.]] = load <8 x float>, <8 x float>
▲ Show 20 Lines • Show All 702 Lines • ▼ Show 20 Lines	for.body:
%iv.next = add nuw nsw i64 %iv, 1		%iv.next = add nuw nsw i64 %iv, 1
%exitcond.not = icmp eq i64 %iv.next, %n		%exitcond.not = icmp eq i64 %iv.next, %n
br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !4		br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !4

for.end:		for.end:
ret float %add		ret float %add
}		}

		; Same as above but where fadd has a fast-math flag.
		define float @fadd_scalar_vf_fmf(float* noalias nocapture readonly %a, i64 %n) {
		; CHECK-ORDERED-LABEL: @fadd_scalar_vf_fmf
		; CHECK-ORDERED: vector.body:
		; CHECK-ORDERED: [[VEC_PHI:%.]] = phi float [ 0.000000e+00, %vector.ph ], [ [[FADD4:%.]], %vector.body ]
		; CHECK-ORDERED: [[LOAD1:%.]] = load float, float
		; CHECK-ORDERED: [[LOAD2:%.]] = load float, float
		; CHECK-ORDERED: [[LOAD3:%.]] = load float, float
		; CHECK-ORDERED: [[LOAD4:%.]] = load float, float
		; CHECK-ORDERED: [[FADD1:%.*]] = fadd nnan float [[VEC_PHI]], [[LOAD1]]
		; CHECK-ORDERED: [[FADD2:%.*]] = fadd nnan float [[FADD1]], [[LOAD2]]
		; CHECK-ORDERED: [[FADD3:%.*]] = fadd nnan float [[FADD2]], [[LOAD3]]
		; CHECK-ORDERED: [[FADD4]] = fadd nnan float [[FADD3]], [[LOAD4]]
		; CHECK-ORDERED-NOT: call nnan float @llvm.vector.reduce.fadd
		; CHECK-ORDERED: scalar.ph:
		; CHECK-ORDERED: [[MERGE_RDX:%.*]] = phi float [ 0.000000e+00, %entry ], [ [[FADD4]], %middle.block ]
		; CHECK-ORDERED: for.body:
		; CHECK-ORDERED: [[SUM_07:%.]] = phi float [ [[MERGE_RDX]], %scalar.ph ], [ [[FADD5:%.]], %for.body ]
		; CHECK-ORDERED: [[LOAD5:%.]] = load float, float
		; CHECK-ORDERED: [[FADD5]] = fadd nnan float [[LOAD5]], [[SUM_07]]
		; CHECK-ORDERED: for.end:
		; CHECK-ORDERED: [[RES:%.*]] = phi float [ [[FADD5]], %for.body ], [ [[FADD4]], %middle.block ]
		; CHECK-ORDERED: ret float [[RES]]

		; CHECK-UNORDERED-LABEL: @fadd_scalar_vf_fmf
		; CHECK-UNORDERED: vector.body:
		; CHECK-UNORDERED: [[VEC_PHI1:%.]] = phi float [ 0.000000e+00, %vector.ph ], [ [[FADD1:%.]], %vector.body ]
		; CHECK-UNORDERED: [[VEC_PHI2:%.]] = phi float [ -0.000000e+00, %vector.ph ], [ [[FADD2:%.]], %vector.body ]
		; CHECK-UNORDERED: [[VEC_PHI3:%.]] = phi float [ -0.000000e+00, %vector.ph ], [ [[FADD3:%.]], %vector.body ]
		; CHECK-UNORDERED: [[VEC_PHI4:%.]] = phi float [ -0.000000e+00, %vector.ph ], [ [[FADD4:%.]], %vector.body ]
		; CHECK-UNORDERED: [[LOAD1:%.]] = load float, float
		; CHECK-UNORDERED: [[LOAD2:%.]] = load float, float
		; CHECK-UNORDERED: [[LOAD3:%.]] = load float, float
		; CHECK-UNORDERED: [[LOAD4:%.]] = load float, float
		; CHECK-UNORDERED: [[FADD1]] = fadd nnan float [[LOAD1]], [[VEC_PHI1]]
		; CHECK-UNORDERED: [[FADD2]] = fadd nnan float [[LOAD2]], [[VEC_PHI2]]
		; CHECK-UNORDERED: [[FADD3]] = fadd nnan float [[LOAD3]], [[VEC_PHI3]]
		; CHECK-UNORDERED: [[FADD4]] = fadd nnan float [[LOAD4]], [[VEC_PHI4]]
		; CHECK-UNORDERED-NOT: call fast float @llvm.vector.reduce.fadd
		kmclaughlinUnsubmitted Not Done Reply Inline Actions Should `fast` here be replaced with `nnan`? kmclaughlin: Should `fast` here be replaced with `nnan`?
		RosieSumpterAuthorUnsubmitted Done Reply Inline Actions Yes it should! I will change that before committing. RosieSumpter: Yes it should! I will change that before committing.
		paulwalker-armUnsubmitted Not Done Reply Inline Actions Generally speaking it is better for `CHECK-NOT` lines to be as small/simple as possible so that you do not end up creating a passing test just because some minor detail has changed. In this instance the `CHECK-NOT` does not care about the instruction's flags but rather wants to ensure there is never a call to `llvm.vector.reduce.fadd`. In this regard I think just having `CHECK-UNORDERED-NOT: @llvm.vector.reduce.fadd` would yield a more resilient test. That said, given all the expected output is explicitly checked for I'm not sure what value these `CHECK-NOT` lines provide. paulwalker-arm: Generally speaking it is better for `CHECK-NOT` lines to be as small/simple as possible so that…
		RosieSumpterAuthorUnsubmitted Done Reply Inline Actions Thanks @paulwalker-arm, I will change the `CHECK-NOT` lines. Although I see what you mean about them not having value here, maybe it's best to remove them. RosieSumpter: Thanks @paulwalker-arm, I will change the `CHECK-NOT` lines. Although I see what you mean about…
		; CHECK-UNORDERED: middle.block:
		; CHECK-UNORDERED: [[BIN_RDX1:%.*]] = fadd nnan float [[FADD2]], [[FADD1]]
		; CHECK-UNORDERED: [[BIN_RDX2:%.*]] = fadd nnan float [[FADD3]], [[BIN_RDX1]]
		; CHECK-UNORDERED: [[BIN_RDX3:%.*]] = fadd nnan float [[FADD4]], [[BIN_RDX2]]
		; CHECK-UNORDERED: scalar.ph:
		; CHECK-UNORDERED: [[MERGE_RDX:%.*]] = phi float [ 0.000000e+00, %entry ], [ [[BIN_RDX3]], %middle.block ]
		; CHECK-UNORDERED: for.body:
		; CHECK-UNORDERED: [[SUM_07:%.]] = phi float [ [[MERGE_RDX]], %scalar.ph ], [ [[FADD5:%.]], %for.body ]
		; CHECK-UNORDERED: [[LOAD5:%.]] = load float, float
		; CHECK-UNORDERED: [[FADD5]] = fadd nnan float [[LOAD5]], [[SUM_07]]
		; CHECK-UORDERED: for.end
		; CHECK-UNORDERED: [[RES:%.*]] = phi float [ [[FADD5]], %for.body ], [ [[BIN_RDX3]], %middle.block ]
		; CHECK-UNORDERED: ret float [[RES]]

		; CHECK-NOT-VECTORIZED-LABEL: @fadd_scalar_vf_fmf
		; CHECK-NOT-VECTORIZED-NOT: vector.body

		entry:
		br label %for.body

		for.body:
		%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
		%sum.07 = phi float [ 0.000000e+00, %entry ], [ %add, %for.body ]
		%arrayidx = getelementptr inbounds float, float* %a, i64 %iv
		%0 = load float, float* %arrayidx, align 4
		%add = fadd nnan float %0, %sum.07
		%iv.next = add nuw nsw i64 %iv, 1
		%exitcond.not = icmp eq i64 %iv.next, %n
		br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !4

		for.end:
		ret float %add
		}

; Test case where the reduction step is a first-order recurrence.		; Test case where the reduction step is a first-order recurrence.
define double @reduction_increment_by_first_order_recurrence() {		define double @reduction_increment_by_first_order_recurrence() {
; CHECK-ORDERED-LABEL: @reduction_increment_by_first_order_recurrence(		; CHECK-ORDERED-LABEL: @reduction_increment_by_first_order_recurrence(
; CHECK-ORDERED: vector.body:		; CHECK-ORDERED: vector.body:
; CHECK-ORDERED: [[RED:%.]] = phi double [ 0.000000e+00, %vector.ph ], [ [[RED_NEXT:%.]], %vector.body ]		; CHECK-ORDERED: [[RED:%.]] = phi double [ 0.000000e+00, %vector.ph ], [ [[RED_NEXT:%.]], %vector.body ]
; CHECK-ORDERED: [[VECTOR_RECUR:%.]] = phi <4 x double> [ <double poison, double poison, double poison, double 0.000000e+00>, %vector.ph ], [ [[FOR_NEXT:%.]], %vector.body ]		; CHECK-ORDERED: [[VECTOR_RECUR:%.]] = phi <4 x double> [ <double poison, double poison, double poison, double 0.000000e+00>, %vector.ph ], [ [[FOR_NEXT:%.]], %vector.body ]
; CHECK-ORDERED: [[FOR_NEXT]] = sitofp <4 x i32> %vec.ind to <4 x double>		; CHECK-ORDERED: [[FOR_NEXT]] = sitofp <4 x i32> %vec.ind to <4 x double>
; CHECK-ORDERED: [[TMP1:%.*]] = shufflevector <4 x double> [[VECTOR_RECUR]], <4 x double> [[FOR_NEXT]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>		; CHECK-ORDERED: [[TMP1:%.*]] = shufflevector <4 x double> [[VECTOR_RECUR]], <4 x double> [[FOR_NEXT]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
▲ Show 20 Lines • Show All 51 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/reduction-inloop-pred.ll

	Show First 20 Lines • Show All 1,010 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP34:%.*]] = insertelement <4 x float> [[TMP28]], float [[TMP33]], i32 3			; CHECK-NEXT: [[TMP34:%.*]] = insertelement <4 x float> [[TMP28]], float [[TMP33]], i32 3
	; CHECK-NEXT: [[TMP35:%.]] = getelementptr inbounds float, float [[B]], i64 [[TMP31]]			; CHECK-NEXT: [[TMP35:%.]] = getelementptr inbounds float, float [[B]], i64 [[TMP31]]
	; CHECK-NEXT: [[TMP36:%.]] = load float, float [[TMP35]], align 4			; CHECK-NEXT: [[TMP36:%.]] = load float, float [[TMP35]], align 4
	; CHECK-NEXT: [[TMP37:%.*]] = insertelement <4 x float> [[TMP29]], float [[TMP36]], i32 3			; CHECK-NEXT: [[TMP37:%.*]] = insertelement <4 x float> [[TMP29]], float [[TMP36]], i32 3
	; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE6]]			; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE6]]
	; CHECK: pred.load.continue6:			; CHECK: pred.load.continue6:
	; CHECK-NEXT: [[TMP38:%.*]] = phi <4 x float> [ [[TMP28]], [[PRED_LOAD_CONTINUE4]] ], [ [[TMP34]], [[PRED_LOAD_IF5]] ]			; CHECK-NEXT: [[TMP38:%.*]] = phi <4 x float> [ [[TMP28]], [[PRED_LOAD_CONTINUE4]] ], [ [[TMP34]], [[PRED_LOAD_IF5]] ]
	; CHECK-NEXT: [[TMP39:%.*]] = phi <4 x float> [ [[TMP29]], [[PRED_LOAD_CONTINUE4]] ], [ [[TMP37]], [[PRED_LOAD_IF5]] ]			; CHECK-NEXT: [[TMP39:%.*]] = phi <4 x float> [ [[TMP29]], [[PRED_LOAD_CONTINUE4]] ], [ [[TMP37]], [[PRED_LOAD_IF5]] ]
	; CHECK-NEXT: [[TMP40:%.*]] = select <4 x i1> [[TMP0]], <4 x float> [[TMP38]], <4 x float> zeroinitializer			; CHECK-NEXT: [[TMP40:%.*]] = select fast <4 x i1> [[TMP0]], <4 x float> [[TMP38]], <4 x float> zeroinitializer
	; CHECK-NEXT: [[TMP41:%.*]] = call fast float @llvm.vector.reduce.fadd.v4f32(float -0.000000e+00, <4 x float> [[TMP40]])			; CHECK-NEXT: [[TMP41:%.*]] = call fast float @llvm.vector.reduce.fadd.v4f32(float [[VEC_PHI]], <4 x float> [[TMP40]])
				david-armUnsubmitted Not Done Reply Inline Actions Nice! Instcombine has folded the call and the fadd together into one! david-arm: Nice! Instcombine has folded the call and the fadd together into one!
	; CHECK-NEXT: [[TMP42:%.*]] = fadd float [[TMP41]], [[VEC_PHI]]			; CHECK-NEXT: [[TMP42:%.*]] = select fast <4 x i1> [[TMP0]], <4 x float> [[TMP39]], <4 x float> zeroinitializer
	; CHECK-NEXT: [[TMP43:%.*]] = select <4 x i1> [[TMP0]], <4 x float> [[TMP39]], <4 x float> zeroinitializer			; CHECK-NEXT: [[TMP43:%.*]] = call fast float @llvm.vector.reduce.fadd.v4f32(float [[TMP41]], <4 x float> [[TMP42]])
	; CHECK-NEXT: [[TMP44:%.*]] = call fast float @llvm.vector.reduce.fadd.v4f32(float -0.000000e+00, <4 x float> [[TMP43]])
	; CHECK-NEXT: [[TMP45]] = fadd float [[TMP44]], [[TMP42]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>			; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
	; CHECK-NEXT: [[TMP46:%.*]] = icmp eq i64 [[INDEX_NEXT]], 260			; CHECK-NEXT: [[TMP44:%.*]] = icmp eq i64 [[INDEX_NEXT]], 260
	; CHECK-NEXT: br i1 [[TMP46]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP20:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP44]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP20:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: br i1 undef, label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP21:![0-9]+]]			; CHECK-NEXT: br i1 undef, label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP21:![0-9]+]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: [[RESULT_0_LCSSA:%.*]] = phi float [ undef, [[FOR_BODY]] ], [ [[TMP45]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[RESULT_0_LCSSA:%.*]] = phi float [ undef, [[FOR_BODY]] ], [ [[TMP43]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: ret float [[RESULT_0_LCSSA]]			; CHECK-NEXT: ret float [[RESULT_0_LCSSA]]
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %entry, %for.body			for.body: ; preds = %entry, %for.body
	%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]			%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]
	%result.08 = phi float [ %fadd, %for.body ], [ 0.0, %entry ]			%result.08 = phi float [ %fadd, %for.body ], [ 0.0, %entry ]
	▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP34:%.*]] = insertelement <4 x float> [[TMP28]], float [[TMP33]], i32 3			; CHECK-NEXT: [[TMP34:%.*]] = insertelement <4 x float> [[TMP28]], float [[TMP33]], i32 3
	; CHECK-NEXT: [[TMP35:%.]] = getelementptr inbounds float, float [[B]], i64 [[TMP31]]			; CHECK-NEXT: [[TMP35:%.]] = getelementptr inbounds float, float [[B]], i64 [[TMP31]]
	; CHECK-NEXT: [[TMP36:%.]] = load float, float [[TMP35]], align 4			; CHECK-NEXT: [[TMP36:%.]] = load float, float [[TMP35]], align 4
	; CHECK-NEXT: [[TMP37:%.*]] = insertelement <4 x float> [[TMP29]], float [[TMP36]], i32 3			; CHECK-NEXT: [[TMP37:%.*]] = insertelement <4 x float> [[TMP29]], float [[TMP36]], i32 3
	; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE6]]			; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE6]]
	; CHECK: pred.load.continue6:			; CHECK: pred.load.continue6:
	; CHECK-NEXT: [[TMP38:%.*]] = phi <4 x float> [ [[TMP28]], [[PRED_LOAD_CONTINUE4]] ], [ [[TMP34]], [[PRED_LOAD_IF5]] ]			; CHECK-NEXT: [[TMP38:%.*]] = phi <4 x float> [ [[TMP28]], [[PRED_LOAD_CONTINUE4]] ], [ [[TMP34]], [[PRED_LOAD_IF5]] ]
	; CHECK-NEXT: [[TMP39:%.*]] = phi <4 x float> [ [[TMP29]], [[PRED_LOAD_CONTINUE4]] ], [ [[TMP37]], [[PRED_LOAD_IF5]] ]			; CHECK-NEXT: [[TMP39:%.*]] = phi <4 x float> [ [[TMP29]], [[PRED_LOAD_CONTINUE4]] ], [ [[TMP37]], [[PRED_LOAD_IF5]] ]
	; CHECK-NEXT: [[TMP40:%.*]] = select <4 x i1> [[TMP0]], <4 x float> [[TMP38]], <4 x float> <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>			; CHECK-NEXT: [[TMP40:%.*]] = select fast <4 x i1> [[TMP0]], <4 x float> [[TMP38]], <4 x float> <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>
	; CHECK-NEXT: [[TMP41:%.*]] = call fast float @llvm.vector.reduce.fmul.v4f32(float 1.000000e+00, <4 x float> [[TMP40]])			; CHECK-NEXT: [[TMP41:%.*]] = call fast float @llvm.vector.reduce.fmul.v4f32(float 1.000000e+00, <4 x float> [[TMP40]])
	; CHECK-NEXT: [[TMP42:%.*]] = fmul float [[TMP41]], [[VEC_PHI]]			; CHECK-NEXT: [[TMP42:%.*]] = fmul fast float [[TMP41]], [[VEC_PHI]]
	; CHECK-NEXT: [[TMP43:%.*]] = select <4 x i1> [[TMP0]], <4 x float> [[TMP39]], <4 x float> <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>			; CHECK-NEXT: [[TMP43:%.*]] = select fast <4 x i1> [[TMP0]], <4 x float> [[TMP39]], <4 x float> <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>
	; CHECK-NEXT: [[TMP44:%.*]] = call fast float @llvm.vector.reduce.fmul.v4f32(float 1.000000e+00, <4 x float> [[TMP43]])			; CHECK-NEXT: [[TMP44:%.*]] = call fast float @llvm.vector.reduce.fmul.v4f32(float 1.000000e+00, <4 x float> [[TMP43]])
	; CHECK-NEXT: [[TMP45]] = fmul float [[TMP44]], [[TMP42]]			; CHECK-NEXT: [[TMP45]] = fmul fast float [[TMP44]], [[TMP42]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>			; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
	; CHECK-NEXT: [[TMP46:%.*]] = icmp eq i64 [[INDEX_NEXT]], 260			; CHECK-NEXT: [[TMP46:%.*]] = icmp eq i64 [[INDEX_NEXT]], 260
	; CHECK-NEXT: br i1 [[TMP46]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP22:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP46]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP22:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	▲ Show 20 Lines • Show All 487 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/reduction-inloop.ll

	Show First 20 Lines • Show All 552 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI:%.]] = phi float [ 0.000000e+00, [[VECTOR_PH]] ], [ [[TMP7:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI:%.]] = phi float [ 0.000000e+00, [[VECTOR_PH]] ], [ [[TMP7:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds float, float [[A:%.*]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds float, float [[A:%.*]], i64 [[INDEX]]
	; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[TMP0]] to <4 x float>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[TMP0]] to <4 x float>*
	; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4			; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds float, float [[B:%.*]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds float, float [[B:%.*]], i64 [[INDEX]]
	; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[TMP2]] to <4 x float>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[TMP2]] to <4 x float>*
	; CHECK-NEXT: [[WIDE_LOAD1:%.]] = load <4 x float>, <4 x float> [[TMP3]], align 4			; CHECK-NEXT: [[WIDE_LOAD1:%.]] = load <4 x float>, <4 x float> [[TMP3]], align 4
	; CHECK-NEXT: [[TMP4:%.*]] = call fast float @llvm.vector.reduce.fadd.v4f32(float -0.000000e+00, <4 x float> [[WIDE_LOAD]])			; CHECK-NEXT: [[TMP4:%.*]] = call fast float @llvm.vector.reduce.fadd.v4f32(float [[VEC_PHI]], <4 x float> [[WIDE_LOAD]])
	; CHECK-NEXT: [[TMP5:%.*]] = fadd float [[TMP4]], [[VEC_PHI]]			; CHECK-NEXT: [[TMP5:%.*]] = call fast float @llvm.vector.reduce.fadd.v4f32(float [[TMP4]], <4 x float> [[WIDE_LOAD1]])
	; CHECK-NEXT: [[TMP6:%.*]] = call fast float @llvm.vector.reduce.fadd.v4f32(float -0.000000e+00, <4 x float> [[WIDE_LOAD1]])
	; CHECK-NEXT: [[TMP7]] = fadd float [[TMP6]], [[TMP5]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; CHECK-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], 256			; CHECK-NEXT: [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], 256
	; CHECK-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP22:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP22:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: br i1 undef, label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP23:![0-9]+]]			; CHECK-NEXT: br i1 undef, label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP23:![0-9]+]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: [[RESULT_0_LCSSA:%.*]] = phi float [ undef, [[FOR_BODY]] ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[RESULT_0_LCSSA:%.*]] = phi float [ undef, [[FOR_BODY]] ], [ [[TMP5]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: ret float [[RESULT_0_LCSSA]]			; CHECK-NEXT: ret float [[RESULT_0_LCSSA]]
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %entry, %for.body			for.body: ; preds = %entry, %for.body
	%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]			%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]
	%result.08 = phi float [ %fadd, %for.body ], [ 0.0, %entry ]			%result.08 = phi float [ %fadd, %for.body ], [ 0.0, %entry ]
	Show All 24 Lines
	; CHECK-NEXT: [[VEC_PHI:%.]] = phi float [ 0.000000e+00, [[VECTOR_PH]] ], [ [[TMP7:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI:%.]] = phi float [ 0.000000e+00, [[VECTOR_PH]] ], [ [[TMP7:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds float, float [[A:%.*]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds float, float [[A:%.*]], i64 [[INDEX]]
	; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[TMP0]] to <4 x float>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[TMP0]] to <4 x float>*
	; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4			; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds float, float [[B:%.*]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds float, float [[B:%.*]], i64 [[INDEX]]
	; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[TMP2]] to <4 x float>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[TMP2]] to <4 x float>*
	; CHECK-NEXT: [[WIDE_LOAD1:%.]] = load <4 x float>, <4 x float> [[TMP3]], align 4			; CHECK-NEXT: [[WIDE_LOAD1:%.]] = load <4 x float>, <4 x float> [[TMP3]], align 4
	; CHECK-NEXT: [[TMP4:%.*]] = call fast float @llvm.vector.reduce.fmul.v4f32(float 1.000000e+00, <4 x float> [[WIDE_LOAD]])			; CHECK-NEXT: [[TMP4:%.*]] = call fast float @llvm.vector.reduce.fmul.v4f32(float 1.000000e+00, <4 x float> [[WIDE_LOAD]])
	; CHECK-NEXT: [[TMP5:%.*]] = fmul float [[TMP4]], [[VEC_PHI]]			; CHECK-NEXT: [[TMP5:%.*]] = fmul fast float [[TMP4]], [[VEC_PHI]]
	; CHECK-NEXT: [[TMP6:%.*]] = call fast float @llvm.vector.reduce.fmul.v4f32(float 1.000000e+00, <4 x float> [[WIDE_LOAD1]])			; CHECK-NEXT: [[TMP6:%.*]] = call fast float @llvm.vector.reduce.fmul.v4f32(float 1.000000e+00, <4 x float> [[WIDE_LOAD1]])
	; CHECK-NEXT: [[TMP7]] = fmul float [[TMP6]], [[TMP5]]			; CHECK-NEXT: [[TMP7]] = fmul fast float [[TMP6]], [[TMP5]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; CHECK-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], 256			; CHECK-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], 256
	; CHECK-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP24:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP24:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	▲ Show 20 Lines • Show All 472 Lines • Show Last 20 Lines