This is an archive of the discontinued LLVM Phabricator instance.

We could do that - I guess we might see quite a few test failures though. Is there potential for less optimal code because the FP vector can no longer be a zeroinitializer?

Changed getRecurrenceIdentity to always generate -0.0 for FAdd.

@david-arm - I also wondered if we should try to use zeroinitializer where possible, though when I changed this to always return -0.0 for FAdd the only tests affected were in test/Transforms/LoopVectorize

Herald added subscribers: kerbowa, nhaehnle, jvesely, nemanjai. · View Herald TranscriptMar 24 2021, 11:45 AM

kmclaughlin added a child revision: D98435: [LoopVectorize] Add strict in-order reduction support for fixed-width vectorization.Mar 24 2021, 11:45 AM

Harbormaster completed remote builds in B95549: Diff 333074.Mar 24 2021, 8:22 PM

I think the original version was probably better, only doing this without nsz. A vector like <0.0, -0.0, -0.0 -0.0> is going to be more difficult to materialize than one that is all zeros, without an obvious way of converting it to all zeros. And we should try to not pessimize the existing -Ofast cases.

Sorry, That was what I was trying to say above, it wasn't very clear.

If you change this back the the original, then it LGTM.

Do we know why we're seeing the mixed zeros in the tests? Is it an artifact of the test (because the test includes a +0.0 to start with) that we don't see in practice?
Note that phi instructions can have FMF ( D67564 ), so if we assume that was the right direction, and we are losing the flags somewhere in the opt pipeline, we should try to fix that.

It would presumably need to be SROA that added flags to phi's it created? I'm not sure where it would get that info from though.

Do we know why we're seeing the mixed zeros in the tests? Is it an artifact of the test (because the test includes a +0.0 to start with) that we don't see in practice?

I think most reductions will start out as

float s = 0;
for(int i = 0; i < n; i++)
  s += x[i];

I feel it would be unusual for a user to deliberately use -0.0 as the start value! 0 is always going to be the most common choice.

In D98963#2650388, @dmgreen wrote:

It would presumably need to be SROA that added flags to phi's it created? I'm not sure where it would get that info from though.

I'm not sure either. We might need to apply FMF to load, stores, and function args to fill the gap. Another option might be to back-propagate the FMF from the fadd to its phi operand:

%s.0 = phi float [ 0.000000e+00, %entry ], [ %add, %for.body ]
%add = fadd fast float %s.0, %0

Not sure if there's some corner-case I'm overlooking, but I'm imagining something like what instcombine does to fill-in / restore no-wrap flags (and so the FMF setting could also happen in instcombine).

Do we know why we're seeing the mixed zeros in the tests? Is it an artifact of the test (because the test includes a +0.0 to start with) that we don't see in practice?

I think most reductions will start out as
float s = 0;
for(int i = 0; i < n; i++)
  s += x[i];
I feel it would be unusual for a user to deliberately use -0.0 as the start value! 0 is always going to be the most common choice.

Yes, I agree - hardly anyone considers the subtlety of -0.0 in FP math, so it's almost never specified in source.

In D98963#2650421, @spatel wrote:
In D98963#2650388, @dmgreen wrote:

It would presumably need to be SROA that added flags to phi's it created? I'm not sure where it would get that info from though.

I'm not sure either. We might need to apply FMF to load, stores, and function args to fill the gap. Another option might be to back-propagate the FMF from the fadd to its phi operand:
%s.0 = phi float [ 0.000000e+00, %entry ], [ %add, %for.body ]
%add = fadd fast float %s.0, %0
Not sure if there's some corner-case I'm overlooking, but I'm imagining something like what instcombine does to fill-in / restore no-wrap flags (and so the FMF setting could also happen in instcombine).

That sounds great. Sounds like it might be useful past just these reductions. So long as we can say that flags propagate like that, or come up with some rules for when they do.

What do you folks think for the context of this review? Is it best to have a go at adding that into instcombine, plus a fold to make phi + nsz + -0.0 use 0.0 instead. Or go with the initial version of this patch that just plumbs the nsz flag through and emits 0.0 for the nsz case?

Hi @dmgreen and @spatel, I've been trying to follow the discussion but I'm not entirely sure I follow what you're proposing @kmclaughlin should do here? Are you suggesting that Kerry could change this patch to:

Change InstCombine in two places so that a) prior to vectorisation the PHI inherits the FMF from the instruction it feeds into, then b) apply a second fold in InstCombine as well that changes all 0.0 values to -0.0, including for any possible vector PHIs? Obviously we have to remember InstCombine is run after the vectoriser too so will also look at vector PHIs.
Always set the default identity to be <-0.0, -0.0> regardless of the nsz flag?

I guess this might take some time to implement and test - I imagine there are issues/corner cases that might come up, such as what if the chain of FP operations includes a fadd, etc. with FMF different from the rest? Does that mean we cannot propagate FMF to the PHI?

The other possibility is to revert the patch back to the original so we at least get some sensible behaviour for now, whilst working on a second patch to investigate these new proposals?

Ah sorry, I realise now you meant change PHIs (vector or scalar) with nsz and -0.0 to use +0.0 instead!

In D98963#2651724, @dmgreen wrote:

What do you folks think for the context of this review? Is it best to have a go at adding that into instcombine, plus a fold to make phi + nsz + -0.0 use 0.0 instead. Or go with the initial version of this patch that just plumbs the nsz flag through and emits 0.0 for the nsz case?

Bending FMF propagation to match our goals is a multi-step process and will need more discussion to determine what is right/expected. So I am ok to go back to the initial version of the patch that checked nsz, but please put a FIXME comment there to state the ideal behavior and the mixed-zero problem that we're avoiding.

Reverted back to the original patch

@dmgreen, @spatel, @david-arm Thank you for the all the comments and discussion on this patch; I've tried to summarise the changes required before we can always return -0.0 in the FIXME added to getRecurrenceIdentity() and reverted back to the original version.

Thanks, I'm happy with this. LGTM

This revision is now accepted and ready to land.Mar 26 2021, 7:28 AM

LGTM

Harbormaster completed remote builds in B95871: Diff 333547.Mar 26 2021, 8:02 AM

This revision was landed with ongoing or failed builds.Apr 6 2021, 4:14 AM

Closed by commit rG857b8a73da91: [LoopVectorize] Change the identity element for FAdd (authored by kmclaughlin). · Explain Why

This revision was automatically updated to reflect the committed changes.

kmclaughlin added a commit: rG857b8a73da91: [LoopVectorize] Change the identity element for FAdd.

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

IVDescriptors.h

3 lines

lib/

Analysis/

IVDescriptors.cpp

12 lines

Transforms/

Vectorize/

LoopVectorize.cpp

4 lines

test/

Transforms/

LoopVectorize/

X86/

reduction-fastmath.ll

24 lines

Diff 335474

llvm/include/llvm/Analysis/IVDescriptors.h

Show First 20 Lines • Show All 130 Lines • ▼ Show 20 Lines	public:
static InstDesc isMinMaxSelectCmpPattern(Instruction *I,		static InstDesc isMinMaxSelectCmpPattern(Instruction *I,
const InstDesc &Prev);		const InstDesc &Prev);

/// Returns a struct describing if the instruction is a		/// Returns a struct describing if the instruction is a
/// Select(FCmp(X, Y), (Z = X op PHINode), PHINode) instruction pattern.		/// Select(FCmp(X, Y), (Z = X op PHINode), PHINode) instruction pattern.
static InstDesc isConditionalRdxPattern(RecurKind Kind, Instruction *I);		static InstDesc isConditionalRdxPattern(RecurKind Kind, Instruction *I);

/// Returns identity corresponding to the RecurrenceKind.		/// Returns identity corresponding to the RecurrenceKind.
static Constant getRecurrenceIdentity(RecurKind K, Type Tp);		static Constant getRecurrenceIdentity(RecurKind K, Type Tp,
		FastMathFlags FMF);

/// Returns the opcode corresponding to the RecurrenceKind.		/// Returns the opcode corresponding to the RecurrenceKind.
static unsigned getOpcode(RecurKind Kind);		static unsigned getOpcode(RecurKind Kind);

/// Returns true if Phi is a reduction of type Kind and adds it to the		/// Returns true if Phi is a reduction of type Kind and adds it to the
/// RecurrenceDescriptor. If either \p DB is non-null or \p AC and \p DT are		/// RecurrenceDescriptor. If either \p DB is non-null or \p AC and \p DT are
/// non-null, the minimal bit width needed to compute the reduction will be		/// non-null, the minimal bit width needed to compute the reduction will be
/// computed.		/// computed.
▲ Show 20 Lines • Show All 202 Lines • Show Last 20 Lines

llvm/lib/Analysis/IVDescriptors.cpp

Show First 20 Lines • Show All 755 Lines • ▼ Show 20 Lines	if (Phi->hasOneUse()) {
}		}
}		}

return allUsesDominatedBy(Phi, Previous);		return allUsesDominatedBy(Phi, Previous);
}		}

/// This function returns the identity element (or neutral element) for		/// This function returns the identity element (or neutral element) for
/// the operation K.		/// the operation K.
Constant RecurrenceDescriptor::getRecurrenceIdentity(RecurKind K, Type Tp) {		Constant RecurrenceDescriptor::getRecurrenceIdentity(RecurKind K, Type Tp,
		FastMathFlags FMF) {
switch (K) {		switch (K) {
case RecurKind::Xor:		case RecurKind::Xor:
case RecurKind::Add:		case RecurKind::Add:
case RecurKind::Or:		case RecurKind::Or:
// Adding, Xoring, Oring zero to a number does not change it.		// Adding, Xoring, Oring zero to a number does not change it.
return ConstantInt::get(Tp, 0);		return ConstantInt::get(Tp, 0);
case RecurKind::Mul:		case RecurKind::Mul:
// Multiplying a number by 1 does not change it.		// Multiplying a number by 1 does not change it.
return ConstantInt::get(Tp, 1);		return ConstantInt::get(Tp, 1);
case RecurKind::And:		case RecurKind::And:
// AND-ing a number with an all-1 value does not change it.		// AND-ing a number with an all-1 value does not change it.
return ConstantInt::get(Tp, -1, true);		return ConstantInt::get(Tp, -1, true);
case RecurKind::FMul:		case RecurKind::FMul:
// Multiplying a number by 1 does not change it.		// Multiplying a number by 1 does not change it.
return ConstantFP::get(Tp, 1.0L);		return ConstantFP::get(Tp, 1.0L);
case RecurKind::FAdd:		case RecurKind::FAdd:
// Adding zero to a number does not change it.		// Adding zero to a number does not change it.
		// FIXME: Ideally we should not need to check FMF for FAdd and should always
		// use -0.0. However, this will currently result in mixed vectors of 0.0/-0.0.
		// Instead, we should ensure that 1) the FMF from FAdd are propagated to the PHI
		// nodes where possible, and 2) PHIs with the nsz flag + -0.0 use 0.0. This would
		// mean we can then remove the check for noSignedZeros() below (see D98963).
		if (FMF.noSignedZeros())
return ConstantFP::get(Tp, 0.0L);		return ConstantFP::get(Tp, 0.0L);
		return ConstantFP::get(Tp, -0.0L);
case RecurKind::UMin:		case RecurKind::UMin:
return ConstantInt::get(Tp, -1);		return ConstantInt::get(Tp, -1);
case RecurKind::UMax:		case RecurKind::UMax:
return ConstantInt::get(Tp, 0);		return ConstantInt::get(Tp, 0);
case RecurKind::SMin:		case RecurKind::SMin:
return ConstantInt::get(Tp,		return ConstantInt::get(Tp,
APInt::getSignedMaxValue(Tp->getIntegerBitWidth()));		APInt::getSignedMaxValue(Tp->getIntegerBitWidth()));
case RecurKind::SMax:		case RecurKind::SMax:
▲ Show 20 Lines • Show All 429 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,691 Lines • ▼ Show 20 Lines	if (RdxDesc) {
} else {		} else {
IRBuilderBase::InsertPointGuard IPBuilder(Builder);		IRBuilderBase::InsertPointGuard IPBuilder(Builder);
Builder.SetInsertPoint(LoopVectorPreHeader->getTerminator());		Builder.SetInsertPoint(LoopVectorPreHeader->getTerminator());
StartV = Iden =		StartV = Iden =
Builder.CreateVectorSplat(State.VF, StartV, "minmax.ident");		Builder.CreateVectorSplat(State.VF, StartV, "minmax.ident");
}		}
} else {		} else {
Constant *IdenC = RecurrenceDescriptor::getRecurrenceIdentity(		Constant *IdenC = RecurrenceDescriptor::getRecurrenceIdentity(
RK, VecTy->getScalarType());		RK, VecTy->getScalarType(), RdxDesc->getFastMathFlags());
Iden = IdenC;		Iden = IdenC;

if (!ScalarPHI) {		if (!ScalarPHI) {
Iden = ConstantVector::getSplat(State.VF, IdenC);		Iden = ConstantVector::getSplat(State.VF, IdenC);
IRBuilderBase::InsertPointGuard IPBuilder(Builder);		IRBuilderBase::InsertPointGuard IPBuilder(Builder);
Builder.SetInsertPoint(LoopVectorPreHeader->getTerminator());		Builder.SetInsertPoint(LoopVectorPreHeader->getTerminator());
Constant *Zero = Builder.getInt32(0);		Constant *Zero = Builder.getInt32(0);
StartV = Builder.CreateInsertElement(Iden, StartV, Zero);		StartV = Builder.CreateInsertElement(Iden, StartV, Zero);
▲ Show 20 Lines • Show All 4,493 Lines • ▼ Show 20 Lines	void VPReductionRecipe::execute(VPTransformState &State) {
assert(!State.Instance && "Reduction being replicated.");		assert(!State.Instance && "Reduction being replicated.");
for (unsigned Part = 0; Part < State.UF; ++Part) {		for (unsigned Part = 0; Part < State.UF; ++Part) {
RecurKind Kind = RdxDesc->getRecurrenceKind();		RecurKind Kind = RdxDesc->getRecurrenceKind();
Value *NewVecOp = State.get(getVecOp(), Part);		Value *NewVecOp = State.get(getVecOp(), Part);
if (VPValue *Cond = getCondOp()) {		if (VPValue *Cond = getCondOp()) {
Value *NewCond = State.get(Cond, Part);		Value *NewCond = State.get(Cond, Part);
VectorType *VecTy = cast<VectorType>(NewVecOp->getType());		VectorType *VecTy = cast<VectorType>(NewVecOp->getType());
Constant *Iden = RecurrenceDescriptor::getRecurrenceIdentity(		Constant *Iden = RecurrenceDescriptor::getRecurrenceIdentity(
Kind, VecTy->getElementType());		Kind, VecTy->getElementType(), RdxDesc->getFastMathFlags());
Constant *IdenVec =		Constant *IdenVec =
ConstantVector::getSplat(VecTy->getElementCount(), Iden);		ConstantVector::getSplat(VecTy->getElementCount(), Iden);
Value *Select = State.Builder.CreateSelect(NewCond, NewVecOp, IdenVec);		Value *Select = State.Builder.CreateSelect(NewCond, NewVecOp, IdenVec);
NewVecOp = Select;		NewVecOp = Select;
}		}
Value *NewRed =		Value *NewRed =
createTargetReduction(State.Builder, TTI, *RdxDesc, NewVecOp);		createTargetReduction(State.Builder, TTI, *RdxDesc, NewVecOp);
Value *PrevInChain = State.get(getChainOp(), Part);		Value *PrevInChain = State.get(getChainOp(), Part);
▲ Show 20 Lines • Show All 817 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/X86/reduction-fastmath.ll

	Show First 20 Lines • Show All 123 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[ENTRY_COND:%.*]] = icmp ne i32 0, 4096			; CHECK-NEXT: [[ENTRY_COND:%.*]] = icmp ne i32 0, 4096
	; CHECK-NEXT: br i1 [[ENTRY_COND]], label [[LOOP_PREHEADER:%.]], label [[LOOP_EXIT:%.]]			; CHECK-NEXT: br i1 [[ENTRY_COND]], label [[LOOP_PREHEADER:%.]], label [[LOOP_EXIT:%.]]
	; CHECK: loop.preheader:			; CHECK: loop.preheader:
	; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x float> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP8:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x float> [ <float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00>, [[VECTOR_PH]] ], [ [[TMP8:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI1:%.]] = phi <4 x float> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP9:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI1:%.]] = phi <4 x float> [ <float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00>, [[VECTOR_PH]] ], [ [[TMP9:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0			; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0
	; CHECK-NEXT: [[TMP1:%.*]] = add i32 [[INDEX]], 4			; CHECK-NEXT: [[TMP1:%.*]] = add i32 [[INDEX]], 4
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr float, float [[ARRAY:%.*]], i32 [[TMP0]]			; CHECK-NEXT: [[TMP2:%.]] = getelementptr float, float [[ARRAY:%.*]], i32 [[TMP0]]
	; CHECK-NEXT: [[TMP3:%.]] = getelementptr float, float [[ARRAY]], i32 [[TMP1]]			; CHECK-NEXT: [[TMP3:%.]] = getelementptr float, float [[ARRAY]], i32 [[TMP1]]
	; CHECK-NEXT: [[TMP4:%.]] = getelementptr float, float [[TMP2]], i32 0			; CHECK-NEXT: [[TMP4:%.]] = getelementptr float, float [[TMP2]], i32 0
	; CHECK-NEXT: [[TMP5:%.]] = bitcast float [[TMP4]] to <4 x float>*			; CHECK-NEXT: [[TMP5:%.]] = bitcast float [[TMP4]] to <4 x float>*
	; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x float>, <4 x float> [[TMP5]], align 4			; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x float>, <4 x float> [[TMP5]], align 4
	; CHECK-NEXT: [[TMP6:%.]] = getelementptr float, float [[TMP2]], i32 4			; CHECK-NEXT: [[TMP6:%.]] = getelementptr float, float [[TMP2]], i32 4
	; CHECK-NEXT: [[TMP7:%.]] = bitcast float [[TMP6]] to <4 x float>*			; CHECK-NEXT: [[TMP7:%.]] = bitcast float [[TMP6]] to <4 x float>*
	; CHECK-NEXT: [[WIDE_LOAD2:%.]] = load <4 x float>, <4 x float> [[TMP7]], align 4			; CHECK-NEXT: [[WIDE_LOAD2:%.]] = load <4 x float>, <4 x float> [[TMP7]], align 4
	; CHECK-NEXT: [[TMP8]] = fadd reassoc <4 x float> [[VEC_PHI]], [[WIDE_LOAD]]			; CHECK-NEXT: [[TMP8]] = fadd reassoc <4 x float> [[VEC_PHI]], [[WIDE_LOAD]]
	; CHECK-NEXT: [[TMP9]] = fadd reassoc <4 x float> [[VEC_PHI1]], [[WIDE_LOAD2]]			; CHECK-NEXT: [[TMP9]] = fadd reassoc <4 x float> [[VEC_PHI1]], [[WIDE_LOAD2]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 8			; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 8
	; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i32 [[INDEX_NEXT]], 4096			; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i32 [[INDEX_NEXT]], 4096
	; CHECK-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP4:!llvm.loop !.]]			; CHECK-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP4:!llvm.loop !.]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd reassoc <4 x float> [[TMP9]], [[TMP8]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd reassoc <4 x float> [[TMP9]], [[TMP8]]
	; CHECK-NEXT: [[TMP11:%.*]] = call reassoc float @llvm.vector.reduce.fadd.v4f32(float -0.000000e+00, <4 x float> [[BIN_RDX]])			; CHECK-NEXT: [[TMP11:%.*]] = call reassoc float @llvm.vector.reduce.fadd.v4f32(float -0.000000e+00, <4 x float> [[BIN_RDX]])
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 4096, 4096			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 4096, 4096
	; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOP_EXIT_LOOPEXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOP_EXIT_LOOPEXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ 4096, [[MIDDLE_BLOCK]] ], [ 0, [[LOOP_PREHEADER]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ 4096, [[MIDDLE_BLOCK]] ], [ 0, [[LOOP_PREHEADER]] ]
	; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi float [ 0.000000e+00, [[LOOP_PREHEADER]] ], [ [[TMP11]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi float [ -0.000000e+00, [[LOOP_PREHEADER]] ], [ [[TMP11]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[IDX:%.]] = phi i32 [ [[IDX_INC:%.]], [[LOOP]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[IDX:%.]] = phi i32 [ [[IDX_INC:%.]], [[LOOP]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[SUM:%.]] = phi float [ [[SUM_INC:%.]], [[LOOP]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[SUM:%.]] = phi float [ [[SUM_INC:%.]], [[LOOP]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[ADDRESS:%.]] = getelementptr float, float [[ARRAY]], i32 [[IDX]]			; CHECK-NEXT: [[ADDRESS:%.]] = getelementptr float, float [[ARRAY]], i32 [[IDX]]
	; CHECK-NEXT: [[VALUE:%.]] = load float, float [[ADDRESS]], align 4			; CHECK-NEXT: [[VALUE:%.]] = load float, float [[ADDRESS]], align 4
	; CHECK-NEXT: [[SUM_INC]] = fadd reassoc float [[SUM]], [[VALUE]]			; CHECK-NEXT: [[SUM_INC]] = fadd reassoc float [[SUM]], [[VALUE]]
	; CHECK-NEXT: [[IDX_INC]] = add i32 [[IDX]], 1			; CHECK-NEXT: [[IDX_INC]] = add i32 [[IDX]], 1
	; CHECK-NEXT: [[BE_COND:%.*]] = icmp ne i32 [[IDX_INC]], 4096			; CHECK-NEXT: [[BE_COND:%.*]] = icmp ne i32 [[IDX_INC]], 4096
	; CHECK-NEXT: br i1 [[BE_COND]], label [[LOOP]], label [[LOOP_EXIT_LOOPEXIT]], [[LOOP5:!llvm.loop !.*]]			; CHECK-NEXT: br i1 [[BE_COND]], label [[LOOP]], label [[LOOP_EXIT_LOOPEXIT]], [[LOOP5:!llvm.loop !.*]]
	; CHECK: loop.exit.loopexit:			; CHECK: loop.exit.loopexit:
	; CHECK-NEXT: [[SUM_INC_LCSSA:%.*]] = phi float [ [[SUM_INC]], [[LOOP]] ], [ [[TMP11]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[SUM_INC_LCSSA:%.*]] = phi float [ [[SUM_INC]], [[LOOP]] ], [ [[TMP11]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: br label [[LOOP_EXIT]]			; CHECK-NEXT: br label [[LOOP_EXIT]]
	; CHECK: loop.exit:			; CHECK: loop.exit:
	; CHECK-NEXT: [[SUM_LCSSA:%.]] = phi float [ 0.000000e+00, [[ENTRY:%.]] ], [ [[SUM_INC_LCSSA]], [[LOOP_EXIT_LOOPEXIT]] ]			; CHECK-NEXT: [[SUM_LCSSA:%.]] = phi float [ -0.000000e+00, [[ENTRY:%.]] ], [ [[SUM_INC_LCSSA]], [[LOOP_EXIT_LOOPEXIT]] ]
	; CHECK-NEXT: ret float [[SUM_LCSSA]]			; CHECK-NEXT: ret float [[SUM_LCSSA]]
	;			;
	entry:			entry:
	%entry.cond = icmp ne i32 0, 4096			%entry.cond = icmp ne i32 0, 4096
	br i1 %entry.cond, label %loop, label %loop.exit			br i1 %entry.cond, label %loop, label %loop.exit

	loop:			loop:
	%idx = phi i32 [ 0, %entry ], [ %idx.inc, %loop ]			%idx = phi i32 [ 0, %entry ], [ %idx.inc, %loop ]
	%sum = phi float [ 0.000000e+00, %entry ], [ %sum.inc, %loop ]			%sum = phi float [ -0.000000e+00, %entry ], [ %sum.inc, %loop ]
	%address = getelementptr float, float* %array, i32 %idx			%address = getelementptr float, float* %array, i32 %idx
	%value = load float, float* %address			%value = load float, float* %address
	%sum.inc = fadd reassoc float %sum, %value			%sum.inc = fadd reassoc float %sum, %value
	%idx.inc = add i32 %idx, 1			%idx.inc = add i32 %idx, 1
	%be.cond = icmp ne i32 %idx.inc, 4096			%be.cond = icmp ne i32 %idx.inc, 4096
	br i1 %be.cond, label %loop, label %loop.exit			br i1 %be.cond, label %loop, label %loop.exit

	loop.exit:			loop.exit:
	%sum.lcssa = phi float [ %sum.inc, %loop ], [ 0.000000e+00, %entry ]			%sum.lcssa = phi float [ %sum.inc, %loop ], [ -0.000000e+00, %entry ]
	ret float %sum.lcssa			ret float %sum.lcssa
	}			}

	define float @reduction_sum_float_only_reassoc_and_contract(i32 %n, float* %array) {			define float @reduction_sum_float_only_reassoc_and_contract(i32 %n, float* %array) {
	; CHECK-LABEL: @reduction_sum_float_only_reassoc_and_contract(			; CHECK-LABEL: @reduction_sum_float_only_reassoc_and_contract(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ENTRY_COND:%.*]] = icmp ne i32 0, 4096			; CHECK-NEXT: [[ENTRY_COND:%.*]] = icmp ne i32 0, 4096
	; CHECK-NEXT: br i1 [[ENTRY_COND]], label [[LOOP_PREHEADER:%.]], label [[LOOP_EXIT:%.]]			; CHECK-NEXT: br i1 [[ENTRY_COND]], label [[LOOP_PREHEADER:%.]], label [[LOOP_EXIT:%.]]
	; CHECK: loop.preheader:			; CHECK: loop.preheader:
	; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x float> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP8:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x float> [ <float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00>, [[VECTOR_PH]] ], [ [[TMP8:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI1:%.]] = phi <4 x float> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP9:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI1:%.]] = phi <4 x float> [ <float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00>, [[VECTOR_PH]] ], [ [[TMP9:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0			; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0
	; CHECK-NEXT: [[TMP1:%.*]] = add i32 [[INDEX]], 4			; CHECK-NEXT: [[TMP1:%.*]] = add i32 [[INDEX]], 4
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr float, float [[ARRAY:%.*]], i32 [[TMP0]]			; CHECK-NEXT: [[TMP2:%.]] = getelementptr float, float [[ARRAY:%.*]], i32 [[TMP0]]
	; CHECK-NEXT: [[TMP3:%.]] = getelementptr float, float [[ARRAY]], i32 [[TMP1]]			; CHECK-NEXT: [[TMP3:%.]] = getelementptr float, float [[ARRAY]], i32 [[TMP1]]
	; CHECK-NEXT: [[TMP4:%.]] = getelementptr float, float [[TMP2]], i32 0			; CHECK-NEXT: [[TMP4:%.]] = getelementptr float, float [[TMP2]], i32 0
	; CHECK-NEXT: [[TMP5:%.]] = bitcast float [[TMP4]] to <4 x float>*			; CHECK-NEXT: [[TMP5:%.]] = bitcast float [[TMP4]] to <4 x float>*
	; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x float>, <4 x float> [[TMP5]], align 4			; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x float>, <4 x float> [[TMP5]], align 4
	; CHECK-NEXT: [[TMP6:%.]] = getelementptr float, float [[TMP2]], i32 4			; CHECK-NEXT: [[TMP6:%.]] = getelementptr float, float [[TMP2]], i32 4
	; CHECK-NEXT: [[TMP7:%.]] = bitcast float [[TMP6]] to <4 x float>*			; CHECK-NEXT: [[TMP7:%.]] = bitcast float [[TMP6]] to <4 x float>*
	; CHECK-NEXT: [[WIDE_LOAD2:%.]] = load <4 x float>, <4 x float> [[TMP7]], align 4			; CHECK-NEXT: [[WIDE_LOAD2:%.]] = load <4 x float>, <4 x float> [[TMP7]], align 4
	; CHECK-NEXT: [[TMP8]] = fadd reassoc contract <4 x float> [[VEC_PHI]], [[WIDE_LOAD]]			; CHECK-NEXT: [[TMP8]] = fadd reassoc contract <4 x float> [[VEC_PHI]], [[WIDE_LOAD]]
	; CHECK-NEXT: [[TMP9]] = fadd reassoc contract <4 x float> [[VEC_PHI1]], [[WIDE_LOAD2]]			; CHECK-NEXT: [[TMP9]] = fadd reassoc contract <4 x float> [[VEC_PHI1]], [[WIDE_LOAD2]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 8			; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 8
	; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i32 [[INDEX_NEXT]], 4096			; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i32 [[INDEX_NEXT]], 4096
	; CHECK-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP6:!llvm.loop !.]]			; CHECK-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP6:!llvm.loop !.]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd reassoc contract <4 x float> [[TMP9]], [[TMP8]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd reassoc contract <4 x float> [[TMP9]], [[TMP8]]
	; CHECK-NEXT: [[TMP11:%.*]] = call reassoc contract float @llvm.vector.reduce.fadd.v4f32(float -0.000000e+00, <4 x float> [[BIN_RDX]])			; CHECK-NEXT: [[TMP11:%.*]] = call reassoc contract float @llvm.vector.reduce.fadd.v4f32(float -0.000000e+00, <4 x float> [[BIN_RDX]])
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 4096, 4096			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 4096, 4096
	; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOP_EXIT_LOOPEXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOP_EXIT_LOOPEXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ 4096, [[MIDDLE_BLOCK]] ], [ 0, [[LOOP_PREHEADER]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ 4096, [[MIDDLE_BLOCK]] ], [ 0, [[LOOP_PREHEADER]] ]
	; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi float [ 0.000000e+00, [[LOOP_PREHEADER]] ], [ [[TMP11]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi float [ -0.000000e+00, [[LOOP_PREHEADER]] ], [ [[TMP11]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[IDX:%.]] = phi i32 [ [[IDX_INC:%.]], [[LOOP]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[IDX:%.]] = phi i32 [ [[IDX_INC:%.]], [[LOOP]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[SUM:%.]] = phi float [ [[SUM_INC:%.]], [[LOOP]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[SUM:%.]] = phi float [ [[SUM_INC:%.]], [[LOOP]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[ADDRESS:%.]] = getelementptr float, float [[ARRAY]], i32 [[IDX]]			; CHECK-NEXT: [[ADDRESS:%.]] = getelementptr float, float [[ARRAY]], i32 [[IDX]]
	; CHECK-NEXT: [[VALUE:%.]] = load float, float [[ADDRESS]], align 4			; CHECK-NEXT: [[VALUE:%.]] = load float, float [[ADDRESS]], align 4
	; CHECK-NEXT: [[SUM_INC]] = fadd reassoc contract float [[SUM]], [[VALUE]]			; CHECK-NEXT: [[SUM_INC]] = fadd reassoc contract float [[SUM]], [[VALUE]]
	; CHECK-NEXT: [[IDX_INC]] = add i32 [[IDX]], 1			; CHECK-NEXT: [[IDX_INC]] = add i32 [[IDX]], 1
	; CHECK-NEXT: [[BE_COND:%.*]] = icmp ne i32 [[IDX_INC]], 4096			; CHECK-NEXT: [[BE_COND:%.*]] = icmp ne i32 [[IDX_INC]], 4096
	; CHECK-NEXT: br i1 [[BE_COND]], label [[LOOP]], label [[LOOP_EXIT_LOOPEXIT]], [[LOOP7:!llvm.loop !.*]]			; CHECK-NEXT: br i1 [[BE_COND]], label [[LOOP]], label [[LOOP_EXIT_LOOPEXIT]], [[LOOP7:!llvm.loop !.*]]
	; CHECK: loop.exit.loopexit:			; CHECK: loop.exit.loopexit:
	; CHECK-NEXT: [[SUM_INC_LCSSA:%.*]] = phi float [ [[SUM_INC]], [[LOOP]] ], [ [[TMP11]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[SUM_INC_LCSSA:%.*]] = phi float [ [[SUM_INC]], [[LOOP]] ], [ [[TMP11]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: br label [[LOOP_EXIT]]			; CHECK-NEXT: br label [[LOOP_EXIT]]
	; CHECK: loop.exit:			; CHECK: loop.exit:
	; CHECK-NEXT: [[SUM_LCSSA:%.]] = phi float [ 0.000000e+00, [[ENTRY:%.]] ], [ [[SUM_INC_LCSSA]], [[LOOP_EXIT_LOOPEXIT]] ]			; CHECK-NEXT: [[SUM_LCSSA:%.]] = phi float [ -0.000000e+00, [[ENTRY:%.]] ], [ [[SUM_INC_LCSSA]], [[LOOP_EXIT_LOOPEXIT]] ]
	; CHECK-NEXT: ret float [[SUM_LCSSA]]			; CHECK-NEXT: ret float [[SUM_LCSSA]]
	;			;
	entry:			entry:
	%entry.cond = icmp ne i32 0, 4096			%entry.cond = icmp ne i32 0, 4096
	br i1 %entry.cond, label %loop, label %loop.exit			br i1 %entry.cond, label %loop, label %loop.exit

	loop:			loop:
	%idx = phi i32 [ 0, %entry ], [ %idx.inc, %loop ]			%idx = phi i32 [ 0, %entry ], [ %idx.inc, %loop ]
	%sum = phi float [ 0.000000e+00, %entry ], [ %sum.inc, %loop ]			%sum = phi float [ -0.000000e+00, %entry ], [ %sum.inc, %loop ]
	%address = getelementptr float, float* %array, i32 %idx			%address = getelementptr float, float* %array, i32 %idx
	%value = load float, float* %address			%value = load float, float* %address
	%sum.inc = fadd reassoc contract float %sum, %value			%sum.inc = fadd reassoc contract float %sum, %value
	%idx.inc = add i32 %idx, 1			%idx.inc = add i32 %idx, 1
	%be.cond = icmp ne i32 %idx.inc, 4096			%be.cond = icmp ne i32 %idx.inc, 4096
	br i1 %be.cond, label %loop, label %loop.exit			br i1 %be.cond, label %loop, label %loop.exit

	loop.exit:			loop.exit:
	%sum.lcssa = phi float [ %sum.inc, %loop ], [ 0.000000e+00, %entry ]			%sum.lcssa = phi float [ %sum.inc, %loop ], [ -0.000000e+00, %entry ]
	ret float %sum.lcssa			ret float %sum.lcssa
	}			}

	; New instructions should have the same FMF as the original code.			; New instructions should have the same FMF as the original code.
	; Note that the select inherits FMF from its fcmp condition.			; Note that the select inherits FMF from its fcmp condition.

	define float @PR35538(float* nocapture readonly %a, i32 %N) #0 {			define float @PR35538(float* nocapture readonly %a, i32 %N) #0 {
	; CHECK-LABEL: @PR35538(			; CHECK-LABEL: @PR35538(
	▲ Show 20 Lines • Show All 171 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorize] Change the identity element for FAddClosedPublic

Details

Diff Detail