This is an archive of the discontinued LLVM Phabricator instance.

[LV] Add a new reduction pattern match
ClosedPublic

Authored by rengolin on Jul 11 2018, 12:30 AM.

Download Raw Diff

Details

Reviewers

mkuper
karthikthecool
TylerNowicki
mcrosier
t.p.northover
fhahn
RKSimon
dcaballe
hsaito
takahiro.miyoshi

Commits

rGcb19c8e3aafd: [LV] Add a new reduction pattern match
rL344172: [LV] Add a new reduction pattern match

Summary

Adding a new reduction pattern match for vectorizing code similar to TSVC s3111:

for (int i = 0; i < N; i++)
  if (a[i] > b)
    sum += a[i];

This patch adds support for fadd, fsub and fmull, as well as multiple
branches and different (but compatible) instructions (ex. add+sub) in
different branches.

I have forwarded to trunk, added fsub and fmul functionality and
additional tests, but the credit goes to Takahiro, who did most of the
actual work.

Patch by Takahiro Miyoshi <takahiro.miyoshi@linaro.org>.

Diff Detail

Event Timeline

takahiro.miyoshi created this revision.Jul 11 2018, 12:30 AM

Herald added a subscriber: llvm-commits. · View Herald TranscriptJul 11 2018, 12:30 AM

Hi Takahiro,

The patch looks good, but I'm adding more people to have a closer look, as it has been a while since last time I touched this code.

cheers,
--renato

Takahiro,

I'm not familiar with the Recurrence Descriptor code, but I suppose the following is considered as RK_FloatAdd. If that's the case, we should be beefing up RK_FloatAdd rather than adding a new Kind. We can't keep adding new kind every time we encounter a different pattern of reduction sum/product. Downside is possibly exposing a downstream bug, but that should only help generalizing reduction handling code. From reduction analysis perspective, select (IF-converted) and phi (IF) should be the same thing. So, trying to handle this within FloatAdd/FloatMult should also help generalize recurrence analysis code. That's how I look at the issue. Hope this helps.

Thanks,
Hideki

float foo(float *a, int n){

float sum=0;
for (int i=0;i<n;i++){
  if (a[i]>1.0){
    sum+=a[i];
  }
  else if (a[i]<3.0){
    sum+=2*a[i];
  }
}
return sum;

}

for.body: ; preds = %for.inc, %for.body.preheader

%indvars.iv = phi i64 [ 0, %for.body.preheader ], [ %indvars.iv.next, %for.inc ]
%sum.026 = phi float [ 0.000000e+00, %for.body.preheader ], [ %sum.1, %for.inc ]
%arrayidx = getelementptr inbounds float, float* %a, i64 %indvars.iv
%0 = load float, float* %arrayidx, align 4, !tbaa !2
%cmp1 = fcmp ogt float %0, 1.000000e+00
br i1 %cmp1, label %if.then, label %if.else

if.then: ; preds = %for.body

%add = fadd fast float %0, %sum.026
br label %for.inc

if.else: ; preds = %for.body

%cmp8 = fcmp olt float %0, 3.000000e+00
br i1 %cmp8, label %if.then10, label %for.inc

if.then10: ; preds = %if.else

%mul = fmul fast float %0, 2.000000e+00
%add13 = fadd fast float %mul, %sum.026
br label %for.inc

for.inc: ; preds = %if.then, %if.then10, %if.else

%sum.1 = phi float [ %add, %if.then ], [ %add13, %if.then10 ], [ %sum.026, %if.else ]
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%exitcond = icmp eq i64 %indvars.iv.next, %wide.trip.count
br i1 %exitcond, label %for.cond.cleanup.loopexit, label %for.body

Hi Hideki,

Thank you for your comments.

At first, as you said, I supposed that the target loop was FloatAdd pattern. But, I supposed this was also a little similar to FloatMixMax.
It means that the target one contains the meaning of FloatAdd and FloatMixMax, and I add the new recurrence descriptor to express this.

Certainly, I think keeping on adding new kind for a different pattern of reduction doesn't make sense.
So, I will retry to handle this within FloatAdd/FloatMult.

Best regards,
Takahiro

I modified my patch to use RK_FloatAdd instead of adding a new recurrence descriptor.
And, input IRs of this target loop is already converted into a select instruction, so I don't extend If-convert functionality.

takahiro.miyoshi updated this revision to Diff 159445.Aug 6 2018, 7:26 PM

takahiro.miyoshi updated this revision to Diff 159447.Aug 6 2018, 7:32 PM

In D49168#1190457, @takahiro.miyoshi wrote:

I modified my patch to use RK_FloatAdd instead of adding a new recurrence descriptor.
And, input IRs of this target loop is already converted into a select instruction, so I don't extend If-convert functionality.

Is this ready for another round of review? I took a quick look. I think this is the right direction to follow. Any specific reasons for restricting to FloatAdd? Should be the same for integers and SUB and MUL, as well, isn't it?
Please also add a negative test for "two use" cases but outside of the pattern you are looking for as well as "three use" negative test.

By any chance, did you try looking into creating a LIT test for the following conditional reduction where both IFs are converted to selects? I'm just curious about how much extension of your code would be needed to capture that.
If this is already caught great. If low hanging, its nice to extend a bit further (and try to see if that covers 3, 4, 5, ..., N cases).

if (cond1)

sum+=...

if (cond2)

sum+=...

[
if (cond3)

sum+=...

if (cond4)

sum+=...

...
if (condN)

sum+=...

]

Thanks,
Hideki

Ayal mentioned this in D50474: [LV] Vectorize header phis that feed from if-convertable latch phis.Aug 9 2018, 3:27 PM

Hi Hideki,

Takahiro is on leave, so I'm taking this work to make sure we don't delay much more.

I have added fsub and fmul functionality and a few tests (with multiple branches), including some that should not vectorise.

All the credit still goes to Takahiro.

cheers,
--renato

Herald added a subscriber: rkruppe. · View Herald TranscriptOct 8 2018, 10:15 AM

rengolin updated this revision to Diff 168678.Oct 8 2018, 10:16 AM

rengolin edited the summary of this revision. (Show Details)

Thanks a lot, Renato. Will take a look quick.

Code looks good. Just a minor suggestion on the comment. Looking at the LIT test.

lib/Analysis/IVDescriptors.cpp
503 ↗	(On Diff #168678)	where the Instruction argument I is the last select in the chain.

rengolin added inline comments.Oct 8 2018, 2:05 PM

lib/Analysis/IVDescriptors.cpp
503 ↗	(On Diff #168678)	Good point! Probably better to also rename the argument and simplify the cast in the beginning of the function. I didn´t want to change much, but I guess that's more cosmetic than anything. :)

LGTM. Please wait for a few days to give others time to respond if they'd like to.

test/Transforms/LoopVectorize/if-reduction.ll
2	My preference is to have vectorization/non-vectorization checked by itself and then have another RUN line to check the expected optimization by InstCombine. That way, we'll quickly know which part changed when the test fails. I don't insist, though.
584	Is this the correct check here?

This revision is now accepted and ready to land.Oct 8 2018, 2:28 PM

In D49168#1258185, @hsaito wrote:

LGTM. Please wait for a few days to give others time to respond if they'd like to.

Thanks Hideki!

I'll update with the review comments and wait a few days.

test/Transforms/LoopVectorize/if-reduction.ll
2	I see what you mean, will try to separate them. I'm not even sure the instcombine is necessary for the results we check, though.
584	ouch, no, regex left-over. Will fix.

hsaito added inline comments.Oct 8 2018, 3:12 PM

test/Transforms/LoopVectorize/if-reduction.ll
37	One comment somehow went missing. I suggest adding one more negative test, for example, storing %add to y[i]. Single use of %add should be checked, I think. If we find a bug there, that's an easy thing to remedy.

Changes to comment:

Improved comments on isConditionalRdxPattern
Removed instcombine pass from test
Added write negative test
Fix typo in CHECK line

rengolin marked 7 inline comments as done.Oct 9 2018, 3:09 AM

LGTM.

Closed by commit rL344172: [LV] Add a new reduction pattern match (authored by rengolin). · Explain WhyOct 10 2018, 11:51 AM

This revision was automatically updated to reflect the committed changes.

This patch doesn't correctly handle isFast(). By default isRecurrenceInstr() should check I->isFast(), but for this pattern I is Select, isFast() doesn't apply to it, it should be checked against FAdd/FMul inside isConditionalRdxPattern().

It caused our several internal applications failed. Following is a simple reproduction.

static double bar(double* v) {

double t = 0.0;
for (int i=0; i<10000; i++) {
  double s = v[i];
  if (s > 0) {
    t += s;
  }
}
return t;

}

double foo(double* v)
{

return bar(v);

}

clang++ -msse4.2 -c -O2 t9.cc -save-temps

In the generated code, the loop is wrongly vectorized.

In D49168#1277837, @Carrot wrote:

This patch doesn't correctly handle isFast(). By default isRecurrenceInstr() should check I->isFast(), but for this pattern I is Select, isFast() doesn't apply to it, it should be checked against FAdd/FMul inside isConditionalRdxPattern().

Benjamin has said something similar, but I could not reproduce. I'll revert the patch and fix the issue. Thanks for the reproducer case!

--renato

Reverted in r345465. Will take a look and land again when fixed. Thanks!

Revision Contents

Path

Size

include/

llvm/

Transforms/

Utils/

LoopUtils.h

7 lines

lib/

Transforms/

Utils/

LoopUtils.cpp

59 lines

test/

Transforms/

LoopVectorize/

if-reduction.ll

429 lines

Diff 154758

include/llvm/Transforms/Utils/LoopUtils.h

Show First 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	enum RecurrenceKind {
RK_IntegerAdd, ///< Sum of integers.		RK_IntegerAdd, ///< Sum of integers.
RK_IntegerMult, ///< Product of integers.		RK_IntegerMult, ///< Product of integers.
RK_IntegerOr, ///< Bitwise or logical OR of numbers.		RK_IntegerOr, ///< Bitwise or logical OR of numbers.
RK_IntegerAnd, ///< Bitwise or logical AND of numbers.		RK_IntegerAnd, ///< Bitwise or logical AND of numbers.
RK_IntegerXor, ///< Bitwise or logical XOR of numbers.		RK_IntegerXor, ///< Bitwise or logical XOR of numbers.
RK_IntegerMinMax, ///< Min/max implemented in terms of select(cmp()).		RK_IntegerMinMax, ///< Min/max implemented in terms of select(cmp()).
RK_FloatAdd, ///< Sum of floats.		RK_FloatAdd, ///< Sum of floats.
RK_FloatMult, ///< Product of floats.		RK_FloatMult, ///< Product of floats.
RK_FloatMinMax ///< Min/max implemented in terms of select(cmp()).		RK_FloatMinMax, ///< Min/max implemented in terms of select(cmp()).
		RK_SelectFAdd ///< Sum of selected floats.
};		};

// This enum represents the kind of minmax recurrence.		// This enum represents the kind of minmax recurrence.
enum MinMaxRecurrenceKind {		enum MinMaxRecurrenceKind {
MRK_Invalid,		MRK_Invalid,
MRK_UIntMin,		MRK_UIntMin,
MRK_UIntMax,		MRK_UIntMax,
MRK_SIntMin,		MRK_SIntMin,
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	public:
/// Returns true if all uses of the instruction I is within the Set.		/// Returns true if all uses of the instruction I is within the Set.
static bool areAllUsesIn(Instruction I, SmallPtrSetImpl<Instruction > &Set);		static bool areAllUsesIn(Instruction I, SmallPtrSetImpl<Instruction > &Set);

/// Returns a struct describing if the instruction if the instruction is a		/// Returns a struct describing if the instruction if the instruction is a
/// Select(ICmp(X, Y), X, Y) instruction pattern corresponding to a min(X, Y)		/// Select(ICmp(X, Y), X, Y) instruction pattern corresponding to a min(X, Y)
/// or max(X, Y).		/// or max(X, Y).
static InstDesc isMinMaxSelectCmpPattern(Instruction *I, InstDesc &Prev);		static InstDesc isMinMaxSelectCmpPattern(Instruction *I, InstDesc &Prev);

		/// Returns a struct describing if the instruction is a
		/// Select(FCmp(X, Y), (Z = X op PHINode), PHINode) instruction pattern.
		static InstDesc isSelectPattern(Instruction *I, RecurrenceKind K);

/// Returns identity corresponding to the RecurrenceKind.		/// Returns identity corresponding to the RecurrenceKind.
static Constant getRecurrenceIdentity(RecurrenceKind K, Type Tp);		static Constant getRecurrenceIdentity(RecurrenceKind K, Type Tp);

/// Returns the opcode of binary operation corresponding to the		/// Returns the opcode of binary operation corresponding to the
/// RecurrenceKind.		/// RecurrenceKind.
static unsigned getRecurrenceBinOp(RecurrenceKind Kind);		static unsigned getRecurrenceBinOp(RecurrenceKind Kind);

/// Returns a Min/Max operation corresponding to MinMaxRecurrenceKind.		/// Returns a Min/Max operation corresponding to MinMaxRecurrenceKind.
▲ Show 20 Lines • Show All 392 Lines • Show Last 20 Lines

lib/Transforms/Utils/LoopUtils.cpp

Show First 20 Lines • Show All 295 Lines • ▼ Show 20 Lines	while (!Worklist.empty()) {
if (Cur != Start) {		if (Cur != Start) {
ReduxDesc = isRecurrenceInstr(Cur, Kind, ReduxDesc, HasFunNoNaNAttr);		ReduxDesc = isRecurrenceInstr(Cur, Kind, ReduxDesc, HasFunNoNaNAttr);
if (!ReduxDesc.isRecurrence())		if (!ReduxDesc.isRecurrence())
return false;		return false;
}		}

// A reduction operation must only have one use of the reduction value.		// A reduction operation must only have one use of the reduction value.
if (!IsAPhi && Kind != RK_IntegerMinMax && Kind != RK_FloatMinMax &&		if (!IsAPhi && Kind != RK_IntegerMinMax && Kind != RK_FloatMinMax &&
hasMultipleUsesOf(Cur, VisitedInsts))		Kind != RK_SelectFAdd && hasMultipleUsesOf(Cur, VisitedInsts))
return false;		return false;

// All inputs to a PHI node must be a reduction value.		// All inputs to a PHI node must be a reduction value.
if (IsAPhi && Cur != Phi && !areAllUsesIn(Cur, VisitedInsts))		if (IsAPhi && Cur != Phi && !areAllUsesIn(Cur, VisitedInsts))
return false;		return false;

if (Kind == RK_IntegerMinMax &&		if (Kind == RK_IntegerMinMax &&
(isa<ICmpInst>(Cur) \|\| isa<SelectInst>(Cur)))		(isa<ICmpInst>(Cur) \|\| isa<SelectInst>(Cur)))
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	for (User *U : Cur->users()) {
if (VisitedInsts.insert(UI).second) {		if (VisitedInsts.insert(UI).second) {
if (isa<PHINode>(UI))		if (isa<PHINode>(UI))
PHIs.push_back(UI);		PHIs.push_back(UI);
else		else
NonPHIs.push_back(UI);		NonPHIs.push_back(UI);
} else if (!isa<PHINode>(UI) &&		} else if (!isa<PHINode>(UI) &&
((!isa<FCmpInst>(UI) && !isa<ICmpInst>(UI) &&		((!isa<FCmpInst>(UI) && !isa<ICmpInst>(UI) &&
!isa<SelectInst>(UI)) \|\|		!isa<SelectInst>(UI)) \|\|
!isMinMaxSelectCmpPattern(UI, IgnoredVal).isRecurrence()))		(!isSelectPattern(UI, Kind).isRecurrence() &&
		!isMinMaxSelectCmpPattern(UI, IgnoredVal).isRecurrence())))
return false;		return false;

// Remember that we completed the cycle.		// Remember that we completed the cycle.
if (UI == Phi)		if (UI == Phi)
FoundStartPHI = true;		FoundStartPHI = true;
}		}
Worklist.append(PHIs.begin(), PHIs.end());		Worklist.append(PHIs.begin(), PHIs.end());
Worklist.append(NonPHIs.begin(), NonPHIs.end());		Worklist.append(NonPHIs.begin(), NonPHIs.end());
▲ Show 20 Lines • Show All 112 Lines • ▼ Show 20 Lines	RecurrenceDescriptor::isMinMaxSelectCmpPattern(Instruction *I, InstDesc &Prev) {
else if (m_UnordFMin(m_Value(CmpLeft), m_Value(CmpRight)).match(Select))		else if (m_UnordFMin(m_Value(CmpLeft), m_Value(CmpRight)).match(Select))
return InstDesc(Select, MRK_FloatMin);		return InstDesc(Select, MRK_FloatMin);
else if (m_UnordFMax(m_Value(CmpLeft), m_Value(CmpRight)).match(Select))		else if (m_UnordFMax(m_Value(CmpLeft), m_Value(CmpRight)).match(Select))
return InstDesc(Select, MRK_FloatMax);		return InstDesc(Select, MRK_FloatMax);

return InstDesc(false, I);		return InstDesc(false, I);
}		}

		/// Returns true if the instruction has a following chain of instructions.
		/// %sum.1 = phi [...], [ %sum.2, ...]
		/// ...
		/// %cmp = fcmp pred %0, %1
		/// %add = fadd %0, %sum.1
		/// %sum.2 = select %cmp, %add, %sum.1
		RecurrenceDescriptor::InstDesc
		RecurrenceDescriptor::isSelectPattern(Instruction *I, RecurrenceKind Kind) {
		SelectInst *SI = dyn_cast<SelectInst>(I);
		// Handle only select instruction.
		if (!SI)
		return InstDesc(false, I);

		CmpInst *CI = dyn_cast<CmpInst>(SI->getCondition());
		// Handle only single use cases for now.
		if (!CI \|\| !CI->hasOneUse())
		return InstDesc(false, I);

		Value *TrueVal = SI->getTrueValue();
		Value *FalseVal = SI->getFalseValue();
		// Handle only when either of SelectInst operands is a PHI node for now.
		if ((isa<PHINode>(TrueVal) && isa<PHINode>(FalseVal)) \|\|
		(!isa<PHINode>(TrueVal) && !isa<PHINode>(FalseVal)))
		return InstDesc(false, I);

		Instruction Inst = isa<PHINode>(TrueVal)
		? dyn_cast<Instruction>(FalseVal) : dyn_cast<Instruction>(TrueVal);
		if (!Inst)
		return InstDesc(false, I);

		Value Op1, Op2;
		// Handle only the case of fadd for now.
		if (m_FAdd(m_Value(Op1), m_Value(Op2)).match(Inst))
		return InstDesc(Kind == RK_SelectFAdd, SI);
		return InstDesc(false, I);
		}

RecurrenceDescriptor::InstDesc		RecurrenceDescriptor::InstDesc
RecurrenceDescriptor::isRecurrenceInstr(Instruction *I, RecurrenceKind Kind,		RecurrenceDescriptor::isRecurrenceInstr(Instruction *I, RecurrenceKind Kind,
InstDesc &Prev, bool HasFunNoNaNAttr) {		InstDesc &Prev, bool HasFunNoNaNAttr) {
bool FP = I->getType()->isFloatingPointTy();		bool FP = I->getType()->isFloatingPointTy();
Instruction *UAI = Prev.getUnsafeAlgebraInst();		Instruction *UAI = Prev.getUnsafeAlgebraInst();
if (!UAI && FP && !I->isFast())		if (!UAI && FP && !I->isFast())
UAI = I; // Found an unsafe (unvectorizable) algebra instruction.		UAI = I; // Found an unsafe (unvectorizable) algebra instruction.

Show All 11 Lines	case Instruction::And:
return InstDesc(Kind == RK_IntegerAnd, I);		return InstDesc(Kind == RK_IntegerAnd, I);
case Instruction::Or:		case Instruction::Or:
return InstDesc(Kind == RK_IntegerOr, I);		return InstDesc(Kind == RK_IntegerOr, I);
case Instruction::Xor:		case Instruction::Xor:
return InstDesc(Kind == RK_IntegerXor, I);		return InstDesc(Kind == RK_IntegerXor, I);
case Instruction::FMul:		case Instruction::FMul:
return InstDesc(Kind == RK_FloatMult, I, UAI);		return InstDesc(Kind == RK_FloatMult, I, UAI);
case Instruction::FSub:		case Instruction::FSub:
case Instruction::FAdd:
return InstDesc(Kind == RK_FloatAdd, I, UAI);		return InstDesc(Kind == RK_FloatAdd, I, UAI);
		case Instruction::FAdd:
		return InstDesc(Kind == RK_FloatAdd \|\| Kind == RK_SelectFAdd, I, UAI);
		case Instruction::Select:
		if (Kind == RK_SelectFAdd)
		return isSelectPattern(I, Kind);
		LLVM_FALLTHROUGH;
case Instruction::FCmp:		case Instruction::FCmp:
case Instruction::ICmp:		case Instruction::ICmp:
case Instruction::Select:
if (Kind != RK_IntegerMinMax &&		if (Kind != RK_IntegerMinMax &&
(!HasFunNoNaNAttr \|\| Kind != RK_FloatMinMax))		(!HasFunNoNaNAttr \|\| Kind != RK_FloatMinMax))
return InstDesc(false, I);		return InstDesc(false, I);
return isMinMaxSelectCmpPattern(I, Prev);		return isMinMaxSelectCmpPattern(I, Prev);
}		}
}		}

bool RecurrenceDescriptor::hasMultipleUsesOf(		bool RecurrenceDescriptor::hasMultipleUsesOf(
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	if (AddReductionVar(Phi, RK_FloatAdd, TheLoop, HasFunNoNaNAttr, RedDes, DB,
return true;		return true;
}		}
if (AddReductionVar(Phi, RK_FloatMinMax, TheLoop, HasFunNoNaNAttr, RedDes, DB,		if (AddReductionVar(Phi, RK_FloatMinMax, TheLoop, HasFunNoNaNAttr, RedDes, DB,
AC, DT)) {		AC, DT)) {
LLVM_DEBUG(dbgs() << "Found an float MINMAX reduction PHI." << *Phi		LLVM_DEBUG(dbgs() << "Found an float MINMAX reduction PHI." << *Phi
<< "\n");		<< "\n");
return true;		return true;
}		}
		if (AddReductionVar(Phi, RK_SelectFAdd, TheLoop, HasFunNoNaNAttr, RedDes, DB,
		AC, DT)) {
		LLVM_DEBUG(dbgs() << "Found an float SelectFAdd reduction PHI." << *Phi
		<< "\n");
		return true;
		}
// Not a reduction of known type.		// Not a reduction of known type.
return false;		return false;
}		}

bool RecurrenceDescriptor::isFirstOrderRecurrence(		bool RecurrenceDescriptor::isFirstOrderRecurrence(
PHINode Phi, Loop TheLoop,		PHINode Phi, Loop TheLoop,
DenseMap<Instruction , Instruction > &SinkAfter, DominatorTree *DT) {		DenseMap<Instruction , Instruction > &SinkAfter, DominatorTree *DT) {

▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	case RK_IntegerMult:
return ConstantInt::get(Tp, 1);		return ConstantInt::get(Tp, 1);
case RK_IntegerAnd:		case RK_IntegerAnd:
// AND-ing a number with an all-1 value does not change it.		// AND-ing a number with an all-1 value does not change it.
return ConstantInt::get(Tp, -1, true);		return ConstantInt::get(Tp, -1, true);
case RK_FloatMult:		case RK_FloatMult:
// Multiplying a number by 1 does not change it.		// Multiplying a number by 1 does not change it.
return ConstantFP::get(Tp, 1.0L);		return ConstantFP::get(Tp, 1.0L);
case RK_FloatAdd:		case RK_FloatAdd:
		case RK_SelectFAdd:
// Adding zero to a number does not change it.		// Adding zero to a number does not change it.
return ConstantFP::get(Tp, 0.0L);		return ConstantFP::get(Tp, 0.0L);
default:		default:
llvm_unreachable("Unknown recurrence kind");		llvm_unreachable("Unknown recurrence kind");
}		}
}		}

/// This function translates the recurrence kind to an LLVM binary operator.		/// This function translates the recurrence kind to an LLVM binary operator.
unsigned RecurrenceDescriptor::getRecurrenceBinOp(RecurrenceKind Kind) {		unsigned RecurrenceDescriptor::getRecurrenceBinOp(RecurrenceKind Kind) {
switch (Kind) {		switch (Kind) {
case RK_IntegerAdd:		case RK_IntegerAdd:
return Instruction::Add;		return Instruction::Add;
case RK_IntegerMult:		case RK_IntegerMult:
return Instruction::Mul;		return Instruction::Mul;
case RK_IntegerOr:		case RK_IntegerOr:
return Instruction::Or;		return Instruction::Or;
case RK_IntegerAnd:		case RK_IntegerAnd:
return Instruction::And;		return Instruction::And;
case RK_IntegerXor:		case RK_IntegerXor:
return Instruction::Xor;		return Instruction::Xor;
case RK_FloatMult:		case RK_FloatMult:
return Instruction::FMul;		return Instruction::FMul;
case RK_FloatAdd:		case RK_FloatAdd:
		case RK_SelectFAdd:
return Instruction::FAdd;		return Instruction::FAdd;
case RK_IntegerMinMax:		case RK_IntegerMinMax:
return Instruction::ICmp;		return Instruction::ICmp;
case RK_FloatMinMax:		case RK_FloatMinMax:
return Instruction::FCmp;		return Instruction::FCmp;
default:		default:
llvm_unreachable("Unknown recurrence operation");		llvm_unreachable("Unknown recurrence operation");
}		}
▲ Show 20 Lines • Show All 979 Lines • ▼ Show 20 Lines	Value *llvm::createTargetReduction(IRBuilder<> &B,
bool NoNaN) {		bool NoNaN) {
// TODO: Support in-order reductions based on the recurrence descriptor.		// TODO: Support in-order reductions based on the recurrence descriptor.
using RD = RecurrenceDescriptor;		using RD = RecurrenceDescriptor;
RD::RecurrenceKind RecKind = Desc.getRecurrenceKind();		RD::RecurrenceKind RecKind = Desc.getRecurrenceKind();
TargetTransformInfo::ReductionFlags Flags;		TargetTransformInfo::ReductionFlags Flags;
Flags.NoNaN = NoNaN;		Flags.NoNaN = NoNaN;
switch (RecKind) {		switch (RecKind) {
case RD::RK_FloatAdd:		case RD::RK_FloatAdd:
		case RD::RK_SelectFAdd:
return createSimpleTargetReduction(B, TTI, Instruction::FAdd, Src, Flags);		return createSimpleTargetReduction(B, TTI, Instruction::FAdd, Src, Flags);
case RD::RK_FloatMult:		case RD::RK_FloatMult:
return createSimpleTargetReduction(B, TTI, Instruction::FMul, Src, Flags);		return createSimpleTargetReduction(B, TTI, Instruction::FMul, Src, Flags);
case RD::RK_IntegerAdd:		case RD::RK_IntegerAdd:
return createSimpleTargetReduction(B, TTI, Instruction::Add, Src, Flags);		return createSimpleTargetReduction(B, TTI, Instruction::Add, Src, Flags);
case RD::RK_IntegerMult:		case RD::RK_IntegerMult:
return createSimpleTargetReduction(B, TTI, Instruction::Mul, Src, Flags);		return createSimpleTargetReduction(B, TTI, Instruction::Mul, Src, Flags);
case RD::RK_IntegerAnd:		case RD::RK_IntegerAnd:
Show All 38 Lines

test/Transforms/LoopVectorize/if-reduction.ll

				; RUN: opt -S -loop-vectorize -instcombine -force-vector-width=4 -force-vector-interleave=1 < %s \| FileCheck %s

				hsaitoUnsubmitted Done Reply Inline Actions My preference is to have vectorization/non-vectorization checked by itself and then have another RUN line to check the expected optimization by InstCombine. That way, we'll quickly know which part changed when the test fails. I don't insist, though. hsaito: My preference is to have vectorization/non-vectorization checked by itself and then have…
				rengolinAuthorUnsubmitted Done Reply Inline Actions I see what you mean, will try to separate them. I'm not even sure the instcombine is necessary for the results we check, though. rengolin: I see what you mean, will try to separate them. I'm not even sure the instcombine is necessary…
				target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"

				; Float pattern:
				; Check vectorization of reduction code which has an fadd instruction after
				; an fcmp instruction which compares an array element and 0.
				;
				; float fcmp_0_fadd_select1(float * restrict x, const int N) {
				; float sum = 0.
				; for (int i = 0; i < N; ++i)
				; if (x[i] > (float)0.)
				; sum += x[i];
				; return sum;
				; }

				; CHECK-LABEL: @fcmp_0_fadd_select1(
				; CHECK: %[[V1:.]] = fcmp fast ogt <4 x float> %[[V0:.]], zeroinitializer
				; CHECK-NEXT: %[[V3:.]] = fadd fast <4 x float> %[[V0]], %[[V2:.]]
				; CHECK-NEXT: select <4 x i1> %[[V1]], <4 x float> %[[V3]], <4 x float> %[[V2]]
				define float @fcmp_0_fadd_select1(float* noalias %x, i32 %N) nounwind readonly {
				entry:
				%cmp.1 = icmp sgt i32 %N, 0
				br i1 %cmp.1, label %for.header, label %for.end

				for.header: ; preds = %entry
				%zext = zext i32 %N to i64
				br label %for.body

				for.body: ; preds = %header, %for.body
				%indvars.iv = phi i64 [ 0, %for.header ], [ %indvars.iv.next, %for.body ]
				%sum.1 = phi float [ 0.000000e+00, %for.header ], [ %sum.2, %for.body ]
				%arrayidx = getelementptr inbounds float, float* %x, i64 %indvars.iv
				%0 = load float, float* %arrayidx, align 4
				%cmp.2 = fcmp fast ogt float %0, 0.000000e+00
				%add = fadd fast float %0, %sum.1
				%sum.2 = select i1 %cmp.2, float %add, float %sum.1
				hsaitoUnsubmitted Done Reply Inline Actions One comment somehow went missing. I suggest adding one more negative test, for example, storing %add to y[i]. Single use of %add should be checked, I think. If we find a bug there, that's an easy thing to remedy. hsaito: One comment somehow went missing. I suggest adding one more negative test, for example…
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %zext
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body, %entry
				%1 = phi float [ 0.000000e+00, %entry ], [ %sum.2, %for.body ]
				ret float %1
				}

				; Double pattern:
				; Check vectorization of reduction code which has an fadd instruction after
				; an fcmp instruction which compares an array element and 0.
				;
				; double fcmp_0_fadd_select2(double * restrict x, const int N) {
				; double sum = 0.
				; for (int i = 0; i < N; ++i)
				; if (x[i] > 0.)
				; sum += x[i];
				; return sum;
				; }

				; CHECK-LABEL: @fcmp_0_fadd_select2(
				; CHECK: %[[V1:.]] = fcmp fast ogt <4 x double> %[[V0:.]], zeroinitializer
				; CHECK-NEXT: %[[V3:.]] = fadd fast <4 x double> %[[V0]], %[[V2:.]]
				; CHECK-NEXT: select <4 x i1> %[[V1]], <4 x double> %[[V3]], <4 x double> %[[V2]]
				define double @fcmp_0_fadd_select2(double* noalias %x, i32 %N) nounwind readonly {
				entry:
				%cmp.1 = icmp sgt i32 %N, 0
				br i1 %cmp.1, label %for.header, label %for.end

				for.header: ; preds = %entry
				%zext = zext i32 %N to i64
				br label %for.body

				for.body: ; preds = %header, %for.body
				%indvars.iv = phi i64 [ 0, %for.header ], [ %indvars.iv.next, %for.body ]
				%sum.1 = phi double [ 0.000000e+00, %for.header ], [ %sum.2, %for.body ]
				%arrayidx = getelementptr inbounds double, double* %x, i64 %indvars.iv
				%0 = load double, double* %arrayidx, align 4
				%cmp.2 = fcmp fast ogt double %0, 0.000000e+00
				%add = fadd fast double %0, %sum.1
				%sum.2 = select i1 %cmp.2, double %add, double %sum.1
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %zext
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body, %entry
				%1 = phi double [ 0.000000e+00, %entry ], [ %sum.2, %for.body ]
				ret double %1
				}

				; Float pattern:
				; Check vectorization of reduction code which has an fadd instruction after
				; an fcmp instruction which compares an array element and a floating-point
				; value.
				;
				; float fcmp_val_fadd_select1(float * restrict x, float y, const int N) {
				; float sum = 0.
				; for (int i = 0; i < N; ++i)
				; if (x[i] > y)
				; sum += x[i];
				; return sum;
				; }

				; CHECK-LABEL: @fcmp_val_fadd_select1(
				; CHECK: %[[V1:.]] = fcmp fast ogt <4 x float> %[[V0:.]], %broadcast.splat2
				; CHECK-NEXT: %[[V3:.]] = fadd fast <4 x float> %[[V0]], %[[V2:.]]
				; CHECK-NEXT: select <4 x i1> %[[V1]], <4 x float> %[[V3]], <4 x float> %[[V2]]
				define float @fcmp_val_fadd_select1(float* noalias %x, float %y, i32 %N) nounwind readonly {
				entry:
				%cmp.1 = icmp sgt i32 %N, 0
				br i1 %cmp.1, label %for.header, label %for.end

				for.header: ; preds = %entry
				%zext = zext i32 %N to i64
				br label %for.body

				for.body: ; preds = %header, %for.body
				%indvars.iv = phi i64 [ 0, %for.header ], [ %indvars.iv.next, %for.body ]
				%sum.1 = phi float [ 0.000000e+00, %for.header ], [ %sum.2, %for.body ]
				%arrayidx = getelementptr inbounds float, float* %x, i64 %indvars.iv
				%0 = load float, float* %arrayidx, align 4
				%cmp.2 = fcmp fast ogt float %0, %y
				%add = fadd fast float %0, %sum.1
				%sum.2 = select i1 %cmp.2, float %add, float %sum.1
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %zext
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body, %entry
				%1 = phi float [ 0.000000e+00, %entry ], [ %sum.2, %for.body ]
				ret float %1
				}

				; Double pattern:
				; Check vectorization of reduction code which has an fadd instruction after
				; an fcmp instruction which compares an array element and a floating-point
				; value.
				;
				; double fcmp_val_fadd_select2(double * restrict x, double y, const int N) {
				; double sum = 0.
				; for (int i = 0; i < N; ++i)
				; if (x[i] > y)
				; sum += x[i];
				; return sum;
				; }

				; CHECK-LABEL: @fcmp_val_fadd_select2(
				; CHECK: %[[V1:.]] = fcmp fast ogt <4 x double> %[[V0:.]], %broadcast.splat2
				; CHECK-NEXT: %[[V3:.]] = fadd fast <4 x double> %[[V0]], %[[V2:.]]
				; CHECK-NEXT: select <4 x i1> %[[V1]], <4 x double> %[[V3]], <4 x double> %[[V2]]
				define double @fcmp_val_fadd_select2(double* noalias %x, double %y, i32 %N) nounwind readonly {
				entry:
				%cmp.1 = icmp sgt i32 %N, 0
				br i1 %cmp.1, label %for.header, label %for.end

				for.header: ; preds = %entry
				%zext = zext i32 %N to i64
				br label %for.body

				for.body: ; preds = %header, %for.body
				%indvars.iv = phi i64 [ 0, %for.header ], [ %indvars.iv.next, %for.body ]
				%sum.1 = phi double [ 0.000000e+00, %for.header ], [ %sum.2, %for.body ]
				%arrayidx = getelementptr inbounds double, double* %x, i64 %indvars.iv
				%0 = load double, double* %arrayidx, align 4
				%cmp.2 = fcmp fast ogt double %0, %y
				%add = fadd fast double %0, %sum.1
				%sum.2 = select i1 %cmp.2, double %add, double %sum.1
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %zext
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body, %entry
				%1 = phi double [ 0.000000e+00, %entry ], [ %sum.2, %for.body ]
				ret double %1
				}

				; Float pattern:
				; Check vectorization of reduction code which has an fadd instruction after
				; an fcmp instruction which compares an array element and another array
				; element.
				;
				; float fcmp_array_elm_fadd_select1(float * restrict x, float * restrict y,
				; const int N) {
				; float sum = 0.
				; for (int i = 0; i < N; ++i)
				; if (x[i] > y[i])
				; sum += x[i];
				; return sum;
				; }

				; CHECK-LABEL: @fcmp_array_elm_fadd_select1(
				; CHECK: %[[V2:.]] = fcmp fast ogt <4 x float> %[[V0:.]], %[[V1:.*]]
				; CHECK-NEXT: %[[V4:.]] = fadd fast <4 x float> %[[V0]], %[[V3:.]]
				; CHECK-NEXT: select <4 x i1> %[[V2]], <4 x float> %[[V4]], <4 x float> %[[V3]]
				define float @fcmp_array_elm_fadd_select1(float* noalias %x, float* noalias %y, i32 %N) nounwind readonly {
				entry:
				%cmp.1 = icmp sgt i32 %N, 0
				br i1 %cmp.1, label %for.header, label %for.end

				for.header: ; preds = %entry
				%zext = zext i32 %N to i64
				br label %for.body

				for.body: ; preds = %for.body, %for.header
				%indvars.iv = phi i64 [ 0, %for.header ], [ %indvars.iv.next, %for.body ]
				%sum.1 = phi float [ 0.000000e+00, %for.header ], [ %sum.2, %for.body ]
				%arrayidx.1 = getelementptr inbounds float, float* %x, i64 %indvars.iv
				%0 = load float, float* %arrayidx.1, align 4
				%arrayidx.2 = getelementptr inbounds float, float* %y, i64 %indvars.iv
				%1 = load float, float* %arrayidx.2, align 4
				%cmp.2 = fcmp fast ogt float %0, %1
				%add = fadd fast float %0, %sum.1
				%sum.2 = select i1 %cmp.2, float %add, float %sum.1
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %zext
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body, %entry
				%2 = phi float [ 0.000000e+00, %entry ], [ %sum.2, %for.body ]
				ret float %2
				}

				; Double pattern:
				; Check vectorization of reduction code which has an fadd instruction after
				; an fcmp instruction which compares an array element and another array
				; element.
				;
				; double fcmp_array_elm_fadd_select2(double * restrict x, double * restrict y,
				; const int N) {
				; double sum = 0.
				; for (int i = 0; i < N; ++i)
				; if (x[i] > y[i])
				; sum += x[i];
				; return sum;
				; }

				; CHECK-LABEL: @fcmp_array_elm_fadd_select2(
				; CHECK: %[[V2:.]] = fcmp fast ogt <4 x double> %[[V0:.]], %[[V1:.*]]
				; CHECK-NEXT: %[[V4:.]] = fadd fast <4 x double> %[[V0]], %[[V3:.]]
				; CHECK-NEXT: select <4 x i1> %[[V2]], <4 x double> %[[V4]], <4 x double> %[[V3]]
				define double @fcmp_array_elm_fadd_select2(double* noalias %x, double* noalias %y, i32 %N) nounwind readonly {
				entry:
				%cmp.1 = icmp sgt i32 %N, 0
				br i1 %cmp.1, label %for.header, label %for.end

				for.header: ; preds = %entry
				%zext = zext i32 %N to i64
				br label %for.body

				for.body: ; preds = %for.body, %for.header
				%indvars.iv = phi i64 [ 0, %for.header ], [ %indvars.iv.next, %for.body ]
				%sum.1 = phi double [ 0.000000e+00, %for.header ], [ %sum.2, %for.body ]
				%arrayidx.1 = getelementptr inbounds double, double* %x, i64 %indvars.iv
				%0 = load double, double* %arrayidx.1, align 4
				%arrayidx.2 = getelementptr inbounds double, double* %y, i64 %indvars.iv
				%1 = load double, double* %arrayidx.2, align 4
				%cmp.2 = fcmp fast ogt double %0, %1
				%add = fadd fast double %0, %sum.1
				%sum.2 = select i1 %cmp.2, double %add, double %sum.1
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %zext
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body, %entry
				%2 = phi double [ 0.000000e+00, %entry ], [ %sum.2, %for.body ]
				ret double %2
				}

				; Float pattern:
				; Check fsub not vectorized.
				;
				; float fcmp_0_fsub_select1(float * restrict x, const int N) {
				; float sum = 0.
				; for (int i = 0; i < N; ++i)
				; if (x[i] > (float)0.)
				; sum -= x[i];
				; return sum;
				; }

				; CHECK-LABEL: @fcmp_0_fsub_select1(
				; CHECK: %[[V1:.]] = fcmp ogt float %[[V0:.]], 0.000000e+00
				; CHECK-NEXT: %[[V3:.]] = fsub float %[[V2:.]], %[[V0]]
				; CHECK-NEXT: select i1 %[[V1]], float %[[V3]], float %[[V2:.*]]
				define float @fcmp_0_fsub_select1(float* noalias %x, i32 %N) nounwind readonly {
				entry:
				%cmp.1 = icmp sgt i32 %N, 0
				br i1 %cmp.1, label %for.header, label %for.end

				for.header: ; preds = %entry
				%zext = zext i32 %N to i64
				br label %for.body

				for.body: ; preds = %for.body, %for.header
				%indvars.iv = phi i64 [ 0, %for.header ], [ %indvars.iv.next, %for.body ]
				%sum.1 = phi float [ 0.000000e+00, %for.header ], [ %sum.2, %for.body ]
				%arrayidx = getelementptr inbounds float, float* %x, i64 %indvars.iv
				%0 = load float, float* %arrayidx, align 4
				%cmp.2 = fcmp ogt float %0, 0.000000e+00
				%sub = fsub float %sum.1, %0
				%sum.2 = select i1 %cmp.2, float %sub, float %sum.1
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %zext
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body, %entry
				%1 = phi float [ 0.000000e+00, %entry ], [ %sum.2, %for.body ]
				ret float %1
				}

				; Double pattern:
				; Check fsub not vectorized.
				;
				; double fcmp_0_fsub_select2(double * restrict x, const int N) {
				; double sum = 0.
				; for (int i = 0; i < N; ++i)
				; if (x[i] > 0.)
				; sum -= x[i];
				; return sum;
				; }

				; CHECK-LABEL: @fcmp_0_fsub_select2(
				; CHECK: %[[V1:.]] = fcmp ogt double %[[V0:.]], 0.000000e+00
				; CHECK-NEXT: %[[V3:.]] = fsub double %[[V2:.]], %[[V0]]
				; CHECK-NEXT: select i1 %[[V1]], double %[[V3]], double %[[V2:.*]]
				define double @fcmp_0_fsub_select2(double* noalias %x, i32 %N) nounwind readonly {
				entry:
				%cmp.1 = icmp sgt i32 %N, 0
				br i1 %cmp.1, label %for.header, label %for.end

				for.header: ; preds = %entry
				%zext = zext i32 %N to i64
				br label %for.body

				for.body: ; preds = %for.body, %for.header
				%indvars.iv = phi i64 [ 0, %for.header ], [ %indvars.iv.next, %for.body ]
				%sum.1 = phi double [ 0.000000e+00, %for.header ], [ %sum.2, %for.body ]
				%arrayidx = getelementptr inbounds double, double* %x, i64 %indvars.iv
				%0 = load double, double* %arrayidx, align 4
				%cmp.2 = fcmp ogt double %0, 0.000000e+00
				%sub = fsub double %sum.1, %0
				%sum.2 = select i1 %cmp.2, double %sub, double %sum.1
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %zext
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body, %entry
				%1 = phi double [ 0.000000e+00, %entry ], [ %sum.2, %for.body ]
				ret double %1
				}

				; Float pattern:
				; Check fmul not vectorized.
				;
				; float fcmp_0_fmult_select1(float * restrict x, const int N) {
				; float sum = 0.
				; for (int i = 0; i < N; ++i)
				; if (x[i] > (float)0.)
				; sum *= x[i];
				; return sum;
				; }

				; CHECK-LABEL: @fcmp_0_fmult_select1(
				; CHECK: %[[V1:.]] = fcmp ogt float %[[V0:.]], 0.000000e+00
				; CHECK-NEXT: %[[V3:.]] = fmul float %[[V2:.]], %[[V0]]
				; CHECK-NEXT: select i1 %[[V1]], float %[[V3]], float %[[V2]]
				define float @fcmp_0_fmult_select1(float* noalias %x, i32 %N) nounwind readonly {
				entry:
				%cmp.1 = icmp sgt i32 %N, 0
				br i1 %cmp.1, label %for.header, label %for.end

				for.header: ; preds = %entry
				%zext = zext i32 %N to i64
				br label %for.body

				for.body: ; preds = %for.body, %for.header
				%indvars.iv = phi i64 [ 0, %for.header ], [ %indvars.iv.next, %for.body ]
				%sum.1 = phi float [ 0.000000e+00, %for.header ], [ %sum.2, %for.body ]
				%arrayidx = getelementptr inbounds float, float* %x, i64 %indvars.iv
				%0 = load float, float* %arrayidx, align 4
				%cmp.2 = fcmp ogt float %0, 0.000000e+00
				%mult = fmul float %sum.1, %0
				%sum.2 = select i1 %cmp.2, float %mult, float %sum.1
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %zext
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body, %entry
				%1 = phi float [ 0.000000e+00, %entry ], [ %sum.2, %for.body ]
				ret float %1
				}

				; Double pattern:
				; Check fmul not vectorized.
				;
				; double fcmp_0_fmult_select2(double * restrict x, const int N) {
				; double sum = 0.
				; for (int i = 0; i < N; ++i)
				; if (x[i] > 0.)
				; sum *= x[i];
				; return sum;
				; }

				; CHECK-LABEL: @fcmp_0_fmult_select2(
				; CHECK: %[[V1:.]] = fcmp ogt double %[[V0:.]], 0.000000e+00
				; CHECK-NEXT: %[[V3:.]] = fmul double %[[V2:.]], %[[V0]]
				; CHECK-NEXT: select i1 %[[V1]], double %[[V3]], double %[[V2]]
				define double @fcmp_0_fmult_select2(double* noalias %x, i32 %N) nounwind readonly {
				entry:
				%cmp.1 = icmp sgt i32 %N, 0
				br i1 %cmp.1, label %for.header, label %for.end

				for.header: ; preds = %entry
				%zext = zext i32 %N to i64
				br label %for.body

				for.body: ; preds = %for.body, %for.header
				%indvars.iv = phi i64 [ 0, %for.header ], [ %indvars.iv.next, %for.body ]
				%sum.1 = phi double [ 0.000000e+00, %for.header ], [ %sum.2, %for.body ]
				%arrayidx = getelementptr inbounds double, double* %x, i64 %indvars.iv
				%0 = load double, double* %arrayidx, align 4
				%cmp.2 = fcmp ogt double %0, 0.000000e+00
				%mult = fmul double %sum.1, %0
				%sum.2 = select i1 %cmp.2, double %mult, double %sum.1
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %zext
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body, %entry
				%1 = phi double [ 0.000000e+00, %entry ], [ %sum.2, %for.body ]
				ret double %1
				}
				hsaitoUnsubmitted Done Reply Inline Actions Is this the correct check here? hsaito: Is this the correct check here?
				rengolinAuthorUnsubmitted Done Reply Inline Actions ouch, no, regex left-over. Will fix. rengolin: ouch, no, regex left-over. Will fix.

This is an archive of the discontinued LLVM Phabricator instance.

[LV] Add a new reduction pattern matchClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 154758

include/llvm/Transforms/Utils/LoopUtils.h

lib/Transforms/Utils/LoopUtils.cpp

test/Transforms/LoopVectorize/if-reduction.ll

[LV] Add a new reduction pattern match
ClosedPublic