This is an archive of the discontinued LLVM Phabricator instance.

[TTI] Add a hook to TTI for choosing scalarized shuffle-reduction sequence for reduction idiom
AbandonedPublic

Authored by FarhanaAleen on Apr 19 2018, 12:08 PM.

Download Raw Diff

Details

Reviewers

Summary

This patch adds a hook to TTI for choosing scalarized shuffle-reduction as opposed to vectorized shuffle-reduction sequence for reduction idiom.

Allows generation:

%0 = extractelement <4 x float> %bin.rdx, i32 0
%1 = extractelement <4 x float> %bin.rdx, i32 1
%res = fadd fast float %0, %1

Instead of

%rdx.shuf1 = shufflevector <4 x float> %bin.rdx, <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
%bin.rdx2 = fadd fast <4 x float> %bin.rdx, %rdx.shuf1
%res = extractelement <4 x float> %bin.rdx2, i32 0

Hi Simon,

This patch reflects your suggestion on https://reviews.llvm.org/D45393

Diff Detail

Event Timeline

FarhanaAleen created this revision.Apr 19 2018, 12:08 PM

Farhana,

It looks to me that you are trying to fine-tune the final step of log-2 reduction. In my opinion, such an optimization should be done in the Target.
There are probably downstream optimizers looking for proper log-2 shuffle sequence to detect reduction last value compute and this change will
break that for the affected target. Is that what you'd like to do? Also, the newly introduced TTI interface name doesn't convey enough information
about fine-tuning only the last step.

If you really think this change should benefit all targets (you mentioned that in D45393), you shouldn't be doing this through TTI as a long
term solution. Have a proper discussion in llvm-dev and get major Targets to agree with you first. Then, TTI should help migrate Targets

and finally get rid of TTI interface after the migration.

Thanks,
Hideki

Hi Hideki.

"It looks to me that you are trying to fine-tune the final step of log-2 reduction."

This patch actually tries to generate required sized vectors(precise vector length) for shuffleReduction (where scalarization follows naturally) instead of keeping the starting vector length fixed all the way through which are filled with unnecessary undefs. I agree it was not clear in the previous patch. To make the distinction clear, I've created a separate function called getVariableLengthShuffleReduction() with additional changes.

I am not opposed to do this in the target/Codegen, it's just that the implementation is a little messy since there is no good way to classify scalar operations as opposed to vector operations whereas it can be implemented in a cleaner way at the first place with a TTI hook.

Does it look reasonable to add a TTI hook and allow SLP to generate precise vector length for the shuffle reduction at the first place if target wants that?

In D45834#1074249, @FarhanaAleen wrote:

Hi Hideki.

"It looks to me that you are trying to fine-tune the final step of log-2 reduction."

This patch actually tries to generate required sized vectors(precise vector length) for shuffleReduction (where scalarization follows naturally) instead of keeping the starting vector length fixed all the way through which are filled with unnecessary undefs. I agree it was not clear in the previous patch. To make the distinction clear, I've created a separate function called getVariableLengthShuffleReduction() with additional changes.

I am not opposed to do this in the target/Codegen, it's just that the implementation is a little messy since there is no good way to classify scalar operations as opposed to vector operations whereas it can be implemented in a cleaner way at the first place with a TTI hook.

Does it look reasonable to add a TTI hook and allow SLP to generate precise vector length for the shuffle reduction at the first place if target wants that?

I don't know about your original intention, but your previous patch (on the left pane as of now) wasn't doing what you described above ---- and the right pane does what you've described above.

This time, I can see why you'd want to do this upfront --- except for the last step in your code. You may call this last step more optimal and I don't have any data to support or go against your opinion about that ---- but I can say it's not consistent. Why bother? Is that REALLY needed? In any case, you seem to be introducing a new "canonical form" for reduction last value compute. You are better off having a good discussion in llvm-dev, by explicitly asking for those who "latch on" to reduction last value compute, to see if they are okay to add another form (or if you really think this is a better representation, you should argue that the old form should be retired). I suggest sending an RFC to llvm-dev. At this moment, you fail to convince me why the community should support both canonical forms (i.e., why you can't optimize from the current form ---- or why you'd want to keep the old form). If others support having both is a great idea, I won't insist.

BTW, have you looked at ARM's last value compute experimental intrinsic: e.g., int_experimental_vector_reduce_add? If you don't mind losing IR-level optimizations for reduction last value code, this is the easiest way to do custom lowering in the Target. I don't know whether this meets your needs, but I thought it's worth mentioning.

Thanks Hideki, I will think about your suggestion.

BTW, have you looked at ARM's last value compute experimental intrinsic: e.g., int_experimental_vector_reduce_add? If you don't mind losing IR-level optimizations for reduction last value code, this is the easiest way to do custom lowering in the Target. I don't know whether this meets your needs, but I thought it's worth mentioning.

Yes, I have. They are my fallback option :).

Revision Contents

Path

Size

include/

llvm/

Analysis/

TargetTransformInfo.h

32 lines

TargetTransformInfoImpl.h

2 lines

Transforms/

Utils/

LoopUtils.h

6 lines

lib/

Analysis/

TargetTransformInfo.cpp

4 lines

Transforms/

Utils/

LoopUtils.cpp

62 lines

Diff 143410

include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 960 Lines • ▼ Show 20 Lines	public:
};		};

/// \returns True if the target wants to handle the given reduction idiom in		/// \returns True if the target wants to handle the given reduction idiom in
/// the intrinsics form instead of the shuffle form.		/// the intrinsics form instead of the shuffle form.
bool useReductionIntrinsic(unsigned Opcode, Type *Ty,		bool useReductionIntrinsic(unsigned Opcode, Type *Ty,
ReductionFlags Flags) const;		ReductionFlags Flags) const;

/// \returns True if the target wants to handle the given reduction idiom in		/// \returns True if the target wants to handle the given reduction idiom in
/// scalarized shuffle form instead of vectorized shuffle form.		/// variable-length-vector shuffle form instead of fixed-length-vector
		/// shuffle form (which gets generated by getShuffleReduction()).
/// E.g.		/// E.g.
///		///
/// Scalarized shuffle form:		/// Variable-length-vector shuffle form:
/// %rdx.shuf = shufflevector <4 x float> %a, <4 x float> undef, <4 x i32>		/// %rdx.shuf1 = shufflevector <4 x float> %a, <4 x float> undef, <2 x i32>
/// <i32 2, i32 3, i32 undef, i32 undef>		/// <i32 0, i32 1>
/// %bin.rdx = fadd fast <4 x float> %a, %rdx.shuf		/// %rdx.shuf2 = shufflevector <4 x float> %a, <4 x float> undef, <2 x i32>
/// %0 = extractelement <4 x float> %bin.rdx, i32 0		/// <i32 2, i32 3>
/// %1 = extractelement <4 x float> %bin.rdx, i32 1		/// %bin.rdx = fadd fast <2 x float> %rdx.shuf1, %rdx.shuf2
/// %res = fadd fast float %0, %1 // scalar operation follows.		/// %0 = extractelement <2 x float> %bin.rdx, i32 0
		/// %1 = extractelement <2 x float> %bin.rdx, i32 1
		/// %res = fadd fast float %0, %1
///		///
/// Vectorized shuffle form:		/// Fixed-length-vector shuffle form:
/// %rdx.shuf = shufflevector <4 x float> %a, <4 x float> undef,		/// %rdx.shuf = shufflevector <4 x float> %a, <4 x float> undef,
/// <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>		/// <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
/// %bin.rdx = fadd fast <4 x float> %a, %rdx.shuf		/// %bin.rdx = fadd fast <4 x float> %a, %rdx.shuf
/// %rdx.shuf1 = shufflevector <4 x float> %bin.rdx, <4 x float> undef,		/// %rdx.shuf1 = shufflevector <4 x float> %bin.rdx, <4 x float> undef,
/// <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>		/// <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
/// %bin.rdx2 = fadd fast <4 x float> %bin.rdx, %rdx.shuf1 // vector operation		/// %bin.rdx2 = fadd fast <4 x float> %bin.rdx, %rdx.shuf1
/// // follows.
/// %res = extractelement <4 x float> %bin.rdx2, i32 0		/// %res = extractelement <4 x float> %bin.rdx2, i32 0
bool useScalarizedShuffleReduction() const;		bool useVariableLengthShuffleReduction() const;

/// \returns True if the target wants to expand the given reduction intrinsic		/// \returns True if the target wants to expand the given reduction intrinsic
/// into a shuffle sequence.		/// into a shuffle sequence.
bool shouldExpandReduction(const IntrinsicInst *II) const;		bool shouldExpandReduction(const IntrinsicInst *II) const;
/// @}		/// @}

private:		private:
/// \brief Estimate the latency of specified instruction.		/// \brief Estimate the latency of specified instruction.
▲ Show 20 Lines • Show All 186 Lines • ▼ Show 20 Lines	public:
virtual unsigned getLoadVectorFactor(unsigned VF, unsigned LoadSize,		virtual unsigned getLoadVectorFactor(unsigned VF, unsigned LoadSize,
unsigned ChainSizeInBytes,		unsigned ChainSizeInBytes,
VectorType *VecTy) const = 0;		VectorType *VecTy) const = 0;
virtual unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize,		virtual unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize,
unsigned ChainSizeInBytes,		unsigned ChainSizeInBytes,
VectorType *VecTy) const = 0;		VectorType *VecTy) const = 0;
virtual bool useReductionIntrinsic(unsigned Opcode, Type *Ty,		virtual bool useReductionIntrinsic(unsigned Opcode, Type *Ty,
ReductionFlags) const = 0;		ReductionFlags) const = 0;
virtual bool useScalarizedShuffleReduction() const = 0;		virtual bool useVariableLengthShuffleReduction() const = 0;
virtual bool shouldExpandReduction(const IntrinsicInst *II) const = 0;		virtual bool shouldExpandReduction(const IntrinsicInst *II) const = 0;
virtual int getInstructionLatency(const Instruction *I) = 0;		virtual int getInstructionLatency(const Instruction *I) = 0;
};		};

template <typename T>		template <typename T>
class TargetTransformInfo::Model final : public TargetTransformInfo::Concept {		class TargetTransformInfo::Model final : public TargetTransformInfo::Concept {
T Impl;		T Impl;

▲ Show 20 Lines • Show All 379 Lines • ▼ Show 20 Lines	unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize,
unsigned ChainSizeInBytes,		unsigned ChainSizeInBytes,
VectorType *VecTy) const override {		VectorType *VecTy) const override {
return Impl.getStoreVectorFactor(VF, StoreSize, ChainSizeInBytes, VecTy);		return Impl.getStoreVectorFactor(VF, StoreSize, ChainSizeInBytes, VecTy);
}		}
bool useReductionIntrinsic(unsigned Opcode, Type *Ty,		bool useReductionIntrinsic(unsigned Opcode, Type *Ty,
ReductionFlags Flags) const override {		ReductionFlags Flags) const override {
return Impl.useReductionIntrinsic(Opcode, Ty, Flags);		return Impl.useReductionIntrinsic(Opcode, Ty, Flags);
}		}
bool useScalarizedShuffleReduction() const override {		bool useVariableLengthShuffleReduction() const override {
return Impl.useScalarizedShuffleReduction();		return Impl.useVariableLengthShuffleReduction();
}		}
bool shouldExpandReduction(const IntrinsicInst *II) const override {		bool shouldExpandReduction(const IntrinsicInst *II) const override {
return Impl.shouldExpandReduction(II);		return Impl.shouldExpandReduction(II);
}		}
int getInstructionLatency(const Instruction *I) override {		int getInstructionLatency(const Instruction *I) override {
return Impl.getInstructionLatency(I);		return Impl.getInstructionLatency(I);
}		}
};		};
▲ Show 20 Lines • Show All 101 Lines • Show Last 20 Lines

include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 558 Lines • ▼ Show 20 Lines	unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize,
return VF;		return VF;
}		}

bool useReductionIntrinsic(unsigned Opcode, Type *Ty,		bool useReductionIntrinsic(unsigned Opcode, Type *Ty,
TTI::ReductionFlags Flags) const {		TTI::ReductionFlags Flags) const {
return false;		return false;
}		}

bool useScalarizedShuffleReduction() const {		bool useVariableLengthShuffleReduction() const {
return false;		return false;
}		}

bool shouldExpandReduction(const IntrinsicInst *II) const {		bool shouldExpandReduction(const IntrinsicInst *II) const {
return true;		return true;
}		}

protected:		protected:
▲ Show 20 Lines • Show All 281 Lines • Show Last 20 Lines

include/llvm/Transforms/Utils/LoopUtils.h

	Show First 20 Lines • Show All 511 Lines • ▼ Show 20 Lines
	/// Generates an ordered vector reduction using extracts to reduce the value.			/// Generates an ordered vector reduction using extracts to reduce the value.
	Value *			Value *
	getOrderedReduction(IRBuilder<> &Builder, Value Acc, Value Src, unsigned Op,			getOrderedReduction(IRBuilder<> &Builder, Value Acc, Value Src, unsigned Op,
	RecurrenceDescriptor::MinMaxRecurrenceKind MinMaxKind =			RecurrenceDescriptor::MinMaxRecurrenceKind MinMaxKind =
	RecurrenceDescriptor::MRK_Invalid,			RecurrenceDescriptor::MRK_Invalid,
	ArrayRef<Value *> RedOps = None);			ArrayRef<Value *> RedOps = None);

	/// Generates a vector reduction using shufflevectors to reduce the value.			/// Generates a vector reduction using shufflevectors to reduce the value.
	/// If \p ScalarizationFollows is set, getShuffleReduction() generates
	/// scalar result instead of vector result (Shuffles are followed by a
	/// scalar operation instead of a vector operation).
	Value getShuffleReduction(IRBuilder<> &Builder, Value Src, unsigned Op,			Value getShuffleReduction(IRBuilder<> &Builder, Value Src, unsigned Op,
	RecurrenceDescriptor::MinMaxRecurrenceKind			RecurrenceDescriptor::MinMaxRecurrenceKind
	MinMaxKind = RecurrenceDescriptor::MRK_Invalid,			MinMaxKind = RecurrenceDescriptor::MRK_Invalid,
	ArrayRef<Value *> RedOps = None,			ArrayRef<Value *> RedOps = None);
	bool ScalarizationFollows = false);

	/// Create a target reduction of the given vector. The reduction operation			/// Create a target reduction of the given vector. The reduction operation
	/// is described by the \p Opcode parameter. min/max reductions require			/// is described by the \p Opcode parameter. min/max reductions require
	/// additional information supplied in \p Flags.			/// additional information supplied in \p Flags.
	/// The target is queried to determine if intrinsics or shuffle sequences are			/// The target is queried to determine if intrinsics or shuffle sequences are
	/// required to implement the reduction.			/// required to implement the reduction.
	Value *			Value *
	createSimpleTargetReduction(IRBuilder<> &B, const TargetTransformInfo *TTI,			createSimpleTargetReduction(IRBuilder<> &B, const TargetTransformInfo *TTI,
	Show All 22 Lines

lib/Analysis/TargetTransformInfo.cpp

Show First 20 Lines • Show All 616 Lines • ▼ Show 20 Lines	unsigned TargetTransformInfo::getStoreVectorFactor(unsigned VF,
return TTIImpl->getStoreVectorFactor(VF, StoreSize, ChainSizeInBytes, VecTy);		return TTIImpl->getStoreVectorFactor(VF, StoreSize, ChainSizeInBytes, VecTy);
}		}

bool TargetTransformInfo::useReductionIntrinsic(unsigned Opcode,		bool TargetTransformInfo::useReductionIntrinsic(unsigned Opcode,
Type *Ty, ReductionFlags Flags) const {		Type *Ty, ReductionFlags Flags) const {
return TTIImpl->useReductionIntrinsic(Opcode, Ty, Flags);		return TTIImpl->useReductionIntrinsic(Opcode, Ty, Flags);
}		}

bool TargetTransformInfo::useScalarizedShuffleReduction() const {		bool TargetTransformInfo::useVariableLengthShuffleReduction() const {
return TTIImpl->useScalarizedShuffleReduction();		return TTIImpl->useVariableLengthShuffleReduction();
}		}

bool TargetTransformInfo::shouldExpandReduction(const IntrinsicInst *II) const {		bool TargetTransformInfo::shouldExpandReduction(const IntrinsicInst *II) const {
return TTIImpl->shouldExpandReduction(II);		return TTIImpl->shouldExpandReduction(II);
}		}

int TargetTransformInfo::getInstructionLatency(const Instruction *I) const {		int TargetTransformInfo::getInstructionLatency(const Instruction *I) const {
return TTIImpl->getInstructionLatency(I);		return TTIImpl->getInstructionLatency(I);
▲ Show 20 Lines • Show All 598 Lines • Show Last 20 Lines

lib/Transforms/Utils/LoopUtils.cpp

Show All 20 Lines
#include "llvm/Analysis/LoopPass.h"		#include "llvm/Analysis/LoopPass.h"
#include "llvm/Analysis/MustExecute.h"		#include "llvm/Analysis/MustExecute.h"
#include "llvm/Analysis/ScalarEvolution.h"		#include "llvm/Analysis/ScalarEvolution.h"
#include "llvm/Analysis/ScalarEvolutionAliasAnalysis.h"		#include "llvm/Analysis/ScalarEvolutionAliasAnalysis.h"
#include "llvm/Analysis/ScalarEvolutionExpander.h"		#include "llvm/Analysis/ScalarEvolutionExpander.h"
#include "llvm/Analysis/ScalarEvolutionExpressions.h"		#include "llvm/Analysis/ScalarEvolutionExpressions.h"
#include "llvm/Analysis/TargetTransformInfo.h"		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
		#include "llvm/Analysis/VectorUtils.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
#include "llvm/IR/ValueHandle.h"		#include "llvm/IR/ValueHandle.h"
#include "llvm/Pass.h"		#include "llvm/Pass.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/KnownBits.h"		#include "llvm/Support/KnownBits.h"
▲ Show 20 Lines • Show All 1,539 Lines • ▼ Show 20 Lines	Value createReductionOp(IRBuilder<> &Builder, Value Lhs, Value *Rhs,
if (!RedOps.empty())		if (!RedOps.empty())
propagateIRFlags(Res, RedOps);		propagateIRFlags(Res, RedOps);

return Res;		return Res;
}		}

// Helper to generate a log2 shuffle reduction.		// Helper to generate a log2 shuffle reduction.
Value *		Value *
		getVariableLengthShuffleReduction(IRBuilder<> &Builder, Value *Src, unsigned Op,
		RecurrenceDescriptor::MinMaxRecurrenceKind MinMaxKind,
		ArrayRef<Value *> RedOps) {
		unsigned VF = Src->getType()->getVectorNumElements();
		// VF is a power of 2 so we can emit the reduction using log2(VF) shuffles
		// and vector ops, reducing the set of values being computed by half each
		// round.
		assert(isPowerOf2_32(VF) &&
		"Reduction emission only supported for pow2 vectors!");
		Value *TmpVec = Src;

		for (unsigned i = VF; i != 2; i >>= 1) {
		// Extract the lower half.
		Value *Shuf1 = Builder.CreateShuffleVector(
		TmpVec, UndefValue::get(TmpVec->getType()),
		createSequentialMask(Builder, 0, i/2, 0), "rdx.shuf1");

		// Extract the uppoer half.
		Value *Shuf2 = Builder.CreateShuffleVector(
		TmpVec, UndefValue::get(TmpVec->getType()),
		createSequentialMask(Builder, i / 2, i / 2, 0), "rdx.shuf2");
		TmpVec = createReductionOp(Builder, Shuf1, Shuf2, Op, MinMaxKind, RedOps);
		}

		// The result comes from performing the scalar operation on the first two
		// elements of the vector.
		return createReductionOp(
		Builder,
		Builder.CreateExtractElement(TmpVec, Builder.getInt32(0)),
		Builder.CreateExtractElement(TmpVec, Builder.getInt32(1)),
		Op, MinMaxKind, RedOps);
		}

		// Helper to generate a log2 shuffle reduction.
		Value *
llvm::getShuffleReduction(IRBuilder<> &Builder, Value *Src, unsigned Op,		llvm::getShuffleReduction(IRBuilder<> &Builder, Value *Src, unsigned Op,
RecurrenceDescriptor::MinMaxRecurrenceKind MinMaxKind,		RecurrenceDescriptor::MinMaxRecurrenceKind MinMaxKind,
ArrayRef<Value *> RedOps,		ArrayRef<Value *> RedOps) {
bool ScalarizedShufRed) {
unsigned VF = Src->getType()->getVectorNumElements();		unsigned VF = Src->getType()->getVectorNumElements();
// VF is a power of 2 so we can emit the reduction using log2(VF) shuffles		// VF is a power of 2 so we can emit the reduction using log2(VF) shuffles
// and vector ops, reducing the set of values being computed by half each		// and vector ops, reducing the set of values being computed by half each
// round.		// round.
assert(isPowerOf2_32(VF) &&		assert(isPowerOf2_32(VF) &&
"Reduction emission only supported for pow2 vectors!");		"Reduction emission only supported for pow2 vectors!");
Value *TmpVec = Src;		Value *TmpVec = Src;
SmallVector<Constant *, 32> ShuffleMask(VF, nullptr);		SmallVector<Constant *, 32> ShuffleMask(VF, nullptr);
unsigned UB = ScalarizedShufRed ? 2 : 1;		for (unsigned i = VF; i != 1; i >>= 1) {

for (unsigned i = VF; i != UB; i >>= 1) {
// Move the upper half of the vector to the lower half.		// Move the upper half of the vector to the lower half.
for (unsigned j = 0; j != i / 2; ++j)		for (unsigned j = 0; j != i / 2; ++j)
ShuffleMask[j] = Builder.getInt32(i / 2 + j);		ShuffleMask[j] = Builder.getInt32(i / 2 + j);

// Fill the rest of the mask with undef.		// Fill the rest of the mask with undef.
std::fill(&ShuffleMask[i / 2], ShuffleMask.end(),		std::fill(&ShuffleMask[i / 2], ShuffleMask.end(),
UndefValue::get(Builder.getInt32Ty()));		UndefValue::get(Builder.getInt32Ty()));

Value *Shuf = Builder.CreateShuffleVector(		Value *Shuf = Builder.CreateShuffleVector(
TmpVec, UndefValue::get(TmpVec->getType()),		TmpVec, UndefValue::get(TmpVec->getType()),
ConstantVector::get(ShuffleMask), "rdx.shuf");		ConstantVector::get(ShuffleMask), "rdx.shuf");
TmpVec = createReductionOp(Builder, TmpVec, Shuf, Op, MinMaxKind, RedOps);		TmpVec = createReductionOp(Builder, TmpVec, Shuf, Op, MinMaxKind, RedOps);
}		}

if (!ScalarizedShufRed)
// The result is in the first element of the vector.		// The result is in the first element of the vector.
return Builder.CreateExtractElement(TmpVec, Builder.getInt32(0));		return Builder.CreateExtractElement(TmpVec, Builder.getInt32(0));

// The result comes from performing the scalar operation on the first two
// elements of the vector.
return createReductionOp(
Builder,
Builder.CreateExtractElement(TmpVec, Builder.getInt32(0)),
Builder.CreateExtractElement(TmpVec, Builder.getInt32(1)),
Op, MinMaxKind, RedOps);
}		}

/// Create a simple vector reduction specified by an opcode and some		/// Create a simple vector reduction specified by an opcode and some
/// flags (if generating min/max reductions).		/// flags (if generating min/max reductions).
Value *llvm::createSimpleTargetReduction(		Value *llvm::createSimpleTargetReduction(
IRBuilder<> &Builder, const TargetTransformInfo *TTI, unsigned Opcode,		IRBuilder<> &Builder, const TargetTransformInfo *TTI, unsigned Opcode,
Value *Src, TargetTransformInfo::ReductionFlags Flags,		Value *Src, TargetTransformInfo::ReductionFlags Flags,
ArrayRef<Value *> RedOps) {		ArrayRef<Value *> RedOps) {
▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines	case Instruction::FCmp:
break;		break;
default:		default:
llvm_unreachable("Unhandled opcode");		llvm_unreachable("Unhandled opcode");
break;		break;
}		}
if (TTI->useReductionIntrinsic(Opcode, Src->getType(), Flags))		if (TTI->useReductionIntrinsic(Opcode, Src->getType(), Flags))
return BuildFunc();		return BuildFunc();

return getShuffleReduction(Builder, Src, Opcode, MinMaxKind, RedOps,		if (TTI->useVariableLengthShuffleReduction())
TTI->useScalarizedShuffleReduction());		return getVariableLengthShuffleReduction(Builder, Src, Opcode, MinMaxKind, RedOps);

		return getShuffleReduction(Builder, Src, Opcode, MinMaxKind, RedOps);
}		}

/// Create a vector reduction using a given recurrence descriptor.		/// Create a vector reduction using a given recurrence descriptor.
Value *llvm::createTargetReduction(IRBuilder<> &B,		Value *llvm::createTargetReduction(IRBuilder<> &B,
const TargetTransformInfo *TTI,		const TargetTransformInfo *TTI,
RecurrenceDescriptor &Desc, Value *Src,		RecurrenceDescriptor &Desc, Value *Src,
bool NoNaN) {		bool NoNaN) {
// TODO: Support in-order reductions based on the recurrence descriptor.		// TODO: Support in-order reductions based on the recurrence descriptor.
▲ Show 20 Lines • Show All 52 Lines • Show Last 20 Lines