This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/Transforms/Utils/
-
llvm/
-
Transforms/
-
Utils/
-
LoopUtils.h
-
lib/Transforms/
-
Transforms/
-
Utils/
4/4
LoopUtils.cpp
-
Vectorize/
-
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
-
reduction-small-size.ll

Differential D42309

[LV] Use Demanded Bits and ValueTracking for reduction type-shrinking
ClosedPublic

Authored by mcrosier on Jan 19 2018, 11:20 AM.

Download Raw Diff

Details

Reviewers

Ayal
gilr
mkuper
dcaballe
mssimpso

Commits

rG68344bec975d: Merging r324195: --------------------------------------------------------------…
rGa097bc69df64: [LV] Use Demanded Bits and ValueTracking for reduction type-shrinking
rL325508: Merging r324195:
rL324195: [LV] Use Demanded Bits and ValueTracking for reduction type-shrinking

Summary

The type-shrinking logic in reduction detection, although narrow in scope, is also rather ad-hoc, which has led to bugs (e.g., PR35734). This patch modifies the approach to rely on the demanded bits and value tracking analyses, if available. We currently perform type-shrinking separately for reductions and other instructions in the loop. Long-term, we should probably think about computing minimal bit widths in a more complete way for the loops we want to vectorize.

Reference: https://bugs.llvm.org/show_bug.cgi?id=35734

Diff Detail

Event Timeline

mssimpso created this revision.Jan 19 2018, 11:20 AM

mssimpso edited the summary of this revision. (Show Details)

dcaballe added a subscriber: dcaballe.Jan 19 2018, 11:33 AM

Hi Matthew,

thanks for taking care of this! I like the general idea of the fix but I have a concern regarding the TODO in line 408:

// TODO: We should not rely on InstCombine to rewrite the reduction in the
//       smaller type. We should just generate a correctly typed expression
//       to begin with.

The cost model is relying on an optimization that will hopefully be applied by InstCombine. I wonder if it would be too complicated to implement the actual type shrinking in LV code gen as part of the fix.
That would better align cost modeling with LV code generation, which is one the major concerns of the current infrastructure.

In the future VPlan infrastructure, we will definitely need to address this optimization as a VPlan-to-VPlan transformation. Thanks for bringing up this issue.
Diego

lib/Transforms/Utils/LoopUtils.cpp
168–169	(I don't know DB/ValueTracking in depth so maybe I'm missing something.) If I understand correctly, we would try with value tracking in the following scenario: 117 MaxBitWidth = 64 ... 127 MaxBitWidth = 33 ... 129 MaxBitWidth = 64 ... 132 First condition is true } But we wouldn't in the following one: 117 MaxBitWidth = 64 ... 127 MaxBitWidth = 31 ... 129 MaxBitWidth = 32 ... 132 First condition is false } Is this the expected behavior? In other words, if DB returns a width narrower than the original one and it's later rounded up to the original width (first scenario), could value tracking return an even narrower width? If so, shouldn't we always try with value tracking, even when MaxBitWidth != DL.getTypeSizeInBits? Otherwise, shouldn't we skip value tracking for those cases (first scenario)? We could invoke isPowerOf2_64 only once at the end of the function.
386	may been -> may have been?

dcaballe added a reviewer: dcaballe.Jan 22 2018, 9:08 PM

In D42309#984646, @dcaballe wrote:
Hi Matthew,

thanks for taking care of this! I like the general idea of the fix but I have a concern regarding the TODO in line 408:
// TODO: We should not rely on InstCombine to rewrite the reduction in the
//       smaller type. We should just generate a correctly typed expression
//       to begin with.
The cost model is relying on an optimization that will hopefully be applied by InstCombine. I wonder if it would be too complicated to implement the actual type shrinking in LV code gen as part of the fix.
That would better align cost modeling with LV code generation, which is one the major concerns of the current infrastructure.

In the future VPlan infrastructure, we will definitely need to address this optimization as a VPlan-to-VPlan transformation. Thanks for bringing up this issue.
Diego

Yes, my comment was intended to draw attention to something that needs to be addressed in the future VPlan work. I think we will want to generate the actual type-shrunk code instead of inserting trunc/ext pairs, and assuming InstCombine will do what we want it to. We also generate trunc/ext pairs in SLP for type-shrinking, which needs to be addressed as well, I think. As you point out, this will better align the cost model with code generation.

lib/Transforms/Utils/LoopUtils.cpp
168–169	Demanded bits and value tracking will report different bit widths, in general. My thinking was that if demanded bits is able to limit the bit width at all, we use that width. Otherwise, we can try to do something with value tracking, which I think can be a more expensive analysis. So I think what we should do there is move the isPowerOf2_64 round-ups to the very end. The `MaxBitWidth == DL.getTypeSizeInBits(Exit->getType())` condition will then check if demanded bits was able to tell us anything before we do the rounding.
386	Nice catch!

kmitropo added a subscriber: kmitropo.Jan 24 2018, 4:36 PM

ping.

Hi Chad,

I was waiting for the comments in lines 132 and 389 to be addressed.
Other than that, LGTM.

Thanks,
Diego

Address Diego's comments.

Hi Digeo,
Matt will be out of the office for a few weeks and he asked me to follow up on this patch. Hopefully, the new version addresses all of your concerns.

Regards,
Chad

LGTM

This revision is now accepted and ready to land.Feb 2 2018, 2:12 PM

Closed by commit rL324195: [LV] Use Demanded Bits and ValueTracking for reduction type-shrinking (authored by mcrosier). · Explain WhyFeb 4 2018, 7:46 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

include/

llvm/

Transforms/

Utils/

LoopUtils.h

41 lines

lib/

Transforms/

Utils/

LoopUtils.cpp

216 lines

Vectorize/

LoopVectorize.cpp

18 lines

test/

Transforms/

LoopVectorize/

reduction-small-size.ll

37 lines

Diff 132601

include/llvm/Transforms/Utils/LoopUtils.h

Show All 15 Lines

#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/Optional.h"		#include "llvm/ADT/Optional.h"
#include "llvm/ADT/SetVector.h"		#include "llvm/ADT/SetVector.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
		#include "llvm/Analysis/DemandedBits.h"
#include "llvm/Analysis/EHPersonalities.h"		#include "llvm/Analysis/EHPersonalities.h"
#include "llvm/Analysis/TargetTransformInfo.h"		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/InstrTypes.h"		#include "llvm/IR/InstrTypes.h"
#include "llvm/IR/Operator.h"		#include "llvm/IR/Operator.h"
#include "llvm/IR/ValueHandle.h"		#include "llvm/IR/ValueHandle.h"
#include "llvm/Support/Casting.h"		#include "llvm/Support/Casting.h"
▲ Show 20 Lines • Show All 135 Lines • ▼ Show 20 Lines	public:
/// RecurrenceKind.		/// RecurrenceKind.
static unsigned getRecurrenceBinOp(RecurrenceKind Kind);		static unsigned getRecurrenceBinOp(RecurrenceKind Kind);

/// Returns a Min/Max operation corresponding to MinMaxRecurrenceKind.		/// Returns a Min/Max operation corresponding to MinMaxRecurrenceKind.
static Value *createMinMaxOp(IRBuilder<> &Builder, MinMaxRecurrenceKind RK,		static Value *createMinMaxOp(IRBuilder<> &Builder, MinMaxRecurrenceKind RK,
Value Left, Value Right);		Value Left, Value Right);

/// Returns true if Phi is a reduction of type Kind and adds it to the		/// Returns true if Phi is a reduction of type Kind and adds it to the
/// RecurrenceDescriptor.		/// RecurrenceDescriptor. If either \p DB is non-null or \p AC and \p DT are
		/// non-null, the minimal bit width needed to compute the reduction will be
		/// computed.
static bool AddReductionVar(PHINode Phi, RecurrenceKind Kind, Loop TheLoop,		static bool AddReductionVar(PHINode Phi, RecurrenceKind Kind, Loop TheLoop,
bool HasFunNoNaNAttr,		bool HasFunNoNaNAttr,
RecurrenceDescriptor &RedDes);		RecurrenceDescriptor &RedDes,
		DemandedBits *DB = nullptr,
/// Returns true if Phi is a reduction in TheLoop. The RecurrenceDescriptor is		AssumptionCache *AC = nullptr,
/// returned in RedDes.		DominatorTree *DT = nullptr);

		/// Returns true if Phi is a reduction in TheLoop. The RecurrenceDescriptor
		/// is returned in RedDes. If either \p DB is non-null or \p AC and \p DT are
		/// non-null, the minimal bit width needed to compute the reduction will be
		/// computed.
static bool isReductionPHI(PHINode Phi, Loop TheLoop,		static bool isReductionPHI(PHINode Phi, Loop TheLoop,
RecurrenceDescriptor &RedDes);		RecurrenceDescriptor &RedDes,
		DemandedBits *DB = nullptr,
		AssumptionCache *AC = nullptr,
		DominatorTree *DT = nullptr);

/// Returns true if Phi is a first-order recurrence. A first-order recurrence		/// Returns true if Phi is a first-order recurrence. A first-order recurrence
/// is a non-reduction recurrence relation in which the value of the		/// is a non-reduction recurrence relation in which the value of the
/// recurrence in the current loop iteration equals a value defined in the		/// recurrence in the current loop iteration equals a value defined in the
/// previous iteration. \p SinkAfter includes pairs of instructions where the		/// previous iteration. \p SinkAfter includes pairs of instructions where the
/// first will be rescheduled to appear after the second if/when the loop is		/// first will be rescheduled to appear after the second if/when the loop is
/// vectorized. It may be augmented with additional pairs if needed in order		/// vectorized. It may be augmented with additional pairs if needed in order
/// to handle Phi as a first-order recurrence.		/// to handle Phi as a first-order recurrence.
Show All 21 Lines	public:
static bool isIntegerRecurrenceKind(RecurrenceKind Kind);		static bool isIntegerRecurrenceKind(RecurrenceKind Kind);

/// Returns true if the recurrence kind is a floating point kind.		/// Returns true if the recurrence kind is a floating point kind.
static bool isFloatingPointRecurrenceKind(RecurrenceKind Kind);		static bool isFloatingPointRecurrenceKind(RecurrenceKind Kind);

/// Returns true if the recurrence kind is an arithmetic kind.		/// Returns true if the recurrence kind is an arithmetic kind.
static bool isArithmeticRecurrenceKind(RecurrenceKind Kind);		static bool isArithmeticRecurrenceKind(RecurrenceKind Kind);

/// Determines if Phi may have been type-promoted. If Phi has a single user
/// that ANDs the Phi with a type mask, return the user. RT is updated to
/// account for the narrower bit width represented by the mask, and the AND
/// instruction is added to CI.
static Instruction lookThroughAnd(PHINode Phi, Type *&RT,
SmallPtrSetImpl<Instruction *> &Visited,
SmallPtrSetImpl<Instruction *> &CI);

/// Returns true if all the source operands of a recurrence are either
/// SExtInsts or ZExtInsts. This function is intended to be used with
/// lookThroughAnd to determine if the recurrence has been type-promoted. The
/// source operands are added to CI, and IsSigned is updated to indicate if
/// all source operands are SExtInsts.
static bool getSourceExtensionKind(Instruction Start, Instruction Exit,
Type *RT, bool &IsSigned,
SmallPtrSetImpl<Instruction *> &Visited,
SmallPtrSetImpl<Instruction *> &CI);

/// Returns the type of the recurrence. This type can be narrower than the		/// Returns the type of the recurrence. This type can be narrower than the
/// actual type of the Phi if the recurrence has been type-promoted.		/// actual type of the Phi if the recurrence has been type-promoted.
Type *getRecurrenceType() { return RecurrenceType; }		Type *getRecurrenceType() { return RecurrenceType; }

/// Returns a reference to the instructions used for type-promoting the		/// Returns a reference to the instructions used for type-promoting the
/// recurrence.		/// recurrence.
SmallPtrSet<Instruction *, 8> &getCastInsts() { return CastInsts; }		SmallPtrSet<Instruction *, 8> &getCastInsts() { return CastInsts; }

▲ Show 20 Lines • Show All 329 Lines • Show Last 20 Lines

lib/Transforms/Utils/LoopUtils.cpp

Show All 17 Lines
#include "llvm/Analysis/GlobalsModRef.h"		#include "llvm/Analysis/GlobalsModRef.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/LoopPass.h"		#include "llvm/Analysis/LoopPass.h"
#include "llvm/Analysis/ScalarEvolution.h"		#include "llvm/Analysis/ScalarEvolution.h"
#include "llvm/Analysis/ScalarEvolutionAliasAnalysis.h"		#include "llvm/Analysis/ScalarEvolutionAliasAnalysis.h"
#include "llvm/Analysis/ScalarEvolutionExpander.h"		#include "llvm/Analysis/ScalarEvolutionExpander.h"
#include "llvm/Analysis/ScalarEvolutionExpressions.h"		#include "llvm/Analysis/ScalarEvolutionExpressions.h"
#include "llvm/Analysis/TargetTransformInfo.h"		#include "llvm/Analysis/TargetTransformInfo.h"
		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
#include "llvm/IR/ValueHandle.h"		#include "llvm/IR/ValueHandle.h"
#include "llvm/Pass.h"		#include "llvm/Pass.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
		#include "llvm/Support/KnownBits.h"
#include "llvm/Transforms/Utils/BasicBlockUtils.h"		#include "llvm/Transforms/Utils/BasicBlockUtils.h"

using namespace llvm;		using namespace llvm;
using namespace llvm::PatternMatch;		using namespace llvm::PatternMatch;

#define DEBUG_TYPE "loop-utils"		#define DEBUG_TYPE "loop-utils"

bool RecurrenceDescriptor::areAllUsesIn(Instruction *I,		bool RecurrenceDescriptor::areAllUsesIn(Instruction *I,
Show All 31 Lines	bool RecurrenceDescriptor::isArithmeticRecurrenceKind(RecurrenceKind Kind) {
case RK_IntegerMult:		case RK_IntegerMult:
case RK_FloatAdd:		case RK_FloatAdd:
case RK_FloatMult:		case RK_FloatMult:
return true;		return true;
}		}
return false;		return false;
}		}

Instruction *		/// Determines if Phi may have been type-promoted. If Phi has a single user
RecurrenceDescriptor::lookThroughAnd(PHINode Phi, Type &RT,		/// that ANDs the Phi with a type mask, return the user. RT is updated to
		/// account for the narrower bit width represented by the mask, and the AND
		/// instruction is added to CI.
		static Instruction lookThroughAnd(PHINode Phi, Type *&RT,
SmallPtrSetImpl<Instruction *> &Visited,		SmallPtrSetImpl<Instruction *> &Visited,
SmallPtrSetImpl<Instruction *> &CI) {		SmallPtrSetImpl<Instruction *> &CI) {
if (!Phi->hasOneUse())		if (!Phi->hasOneUse())
return Phi;		return Phi;

const APInt *M = nullptr;		const APInt *M = nullptr;
Instruction I, J = cast<Instruction>(Phi->use_begin()->getUser());		Instruction I, J = cast<Instruction>(Phi->use_begin()->getUser());

// Matches either I & 2^x-1 or 2^x-1 & I. If we find a match, we update RT		// Matches either I & 2^x-1 or 2^x-1 & I. If we find a match, we update RT
// with a new integer type of the corresponding bit width.		// with a new integer type of the corresponding bit width.
if (match(J, m_c_And(m_Instruction(I), m_APInt(M)))) {		if (match(J, m_c_And(m_Instruction(I), m_APInt(M)))) {
int32_t Bits = (*M + 1).exactLogBase2();		int32_t Bits = (*M + 1).exactLogBase2();
if (Bits > 0) {		if (Bits > 0) {
RT = IntegerType::get(Phi->getContext(), Bits);		RT = IntegerType::get(Phi->getContext(), Bits);
Visited.insert(Phi);		Visited.insert(Phi);
CI.insert(J);		CI.insert(J);
return J;		return J;
}		}
}		}
return Phi;		return Phi;
}		}

bool RecurrenceDescriptor::getSourceExtensionKind(		/// Compute the minimal bit width needed to represent a reduction whose exit
Instruction Start, Instruction Exit, Type *RT, bool &IsSigned,		/// instruction is given by Exit.
SmallPtrSetImpl<Instruction *> &Visited,		static std::pair<Type , bool> computeRecurrenceType(Instruction Exit,
SmallPtrSetImpl<Instruction *> &CI) {		DemandedBits *DB,
		AssumptionCache *AC,
		DominatorTree *DT) {
		bool IsSigned = false;
		const DataLayout &DL = Exit->getModule()->getDataLayout();
		uint64_t MaxBitWidth = DL.getTypeSizeInBits(Exit->getType());

		if (DB) {
		// Use the demanded bits analysis to determine the bits that are live out
		// of the exit instruction, rounding up to the nearest power of two. If the
		// use of demanded bits results in a smaller bit width, we know the value
		// must be positive (i.e., IsSigned = false), because if this were not the
		// case, the sign bit would have been demanded.
		auto Mask = DB->getDemandedBits(Exit);
		MaxBitWidth = Mask.getBitWidth() - Mask.countLeadingZeros();
		}

		if (MaxBitWidth == DL.getTypeSizeInBits(Exit->getType()) && AC && DT) {
		// If demanded bits wasn't able to limit the bit width, we can try to use
		// value tracking instead. This can be the case, for example, if the value
		// may be negative.
		auto NumSignBits = ComputeNumSignBits(Exit, DL, 0, AC, nullptr, DT);
		auto NumTypeBits = DL.getTypeSizeInBits(Exit->getType());
		MaxBitWidth = NumTypeBits - NumSignBits;
		KnownBits Bits = computeKnownBits(Exit, DL);
		if (!Bits.isNonNegative()) {
		// If the value is not known to be non-negative, we set IsSigned to true,
		// meaning that we will use sext instructions instead of zext
		// instructions to restore the original type.
		IsSigned = true;
		if (!Bits.isNegative())
		// If the value is not known to be negative, we don't known what the
		// upper bit is, and therefore, we don't know what kind of extend we
		// will need. In this case, just increase the bit width by one bit and
		// use sext.
		++MaxBitWidth;
		}
		}
		if (!isPowerOf2_64(MaxBitWidth))
		MaxBitWidth = NextPowerOf2(MaxBitWidth);

		return std::make_pair(Type::getIntNTy(Exit->getContext(), MaxBitWidth),
		IsSigned);
		}

		/// Collect cast instructions that can be ignored in the vectorizer's cost
		/// model, given a reduction exit value and the minimal type in which the
		/// reduction can be represented.
		static void collectCastsToIgnore(Loop TheLoop, Instruction Exit,
		Type *RecurrenceType,
		SmallPtrSetImpl<Instruction *> &Casts) {

SmallVector<Instruction *, 8> Worklist;		SmallVector<Instruction *, 8> Worklist;
bool FoundOneOperand = false;		SmallPtrSet<Instruction *, 8> Visited;
unsigned DstSize = RT->getPrimitiveSizeInBits();
Worklist.push_back(Exit);		Worklist.push_back(Exit);

// Traverse the instructions in the reduction expression, beginning with the
// exit value.
while (!Worklist.empty()) {		while (!Worklist.empty()) {
Instruction *I = Worklist.pop_back_val();		Instruction *Val = Worklist.pop_back_val();
		dcaballeUnsubmitted Done Reply Inline Actions (I don't know DB/ValueTracking in depth so maybe I'm missing something.) If I understand correctly, we would try with value tracking in the following scenario: 117 MaxBitWidth = 64 ... 127 MaxBitWidth = 33 ... 129 MaxBitWidth = 64 ... 132 First condition is true } But we wouldn't in the following one: 117 MaxBitWidth = 64 ... 127 MaxBitWidth = 31 ... 129 MaxBitWidth = 32 ... 132 First condition is false } Is this the expected behavior? In other words, if DB returns a width narrower than the original one and it's later rounded up to the original width (first scenario), could value tracking return an even narrower width? If so, shouldn't we always try with value tracking, even when MaxBitWidth != DL.getTypeSizeInBits? Otherwise, shouldn't we skip value tracking for those cases (first scenario)? We could invoke isPowerOf2_64 only once at the end of the function. dcaballe: (I don't know DB/ValueTracking in depth so maybe I'm missing something.) If I understand…
		mssimpsoUnsubmitted Done Reply Inline Actions Demanded bits and value tracking will report different bit widths, in general. My thinking was that if demanded bits is able to limit the bit width at all, we use that width. Otherwise, we can try to do something with value tracking, which I think can be a more expensive analysis. So I think what we should do there is move the isPowerOf2_64 round-ups to the very end. The `MaxBitWidth == DL.getTypeSizeInBits(Exit->getType())` condition will then check if demanded bits was able to tell us anything before we do the rounding. mssimpso: Demanded bits and value tracking will report different bit widths, in general. My thinking was…
for (Use &U : I->operands()) {		Visited.insert(Val);
		if (auto *Cast = dyn_cast<CastInst>(Val))
// Terminate the traversal if the operand is not an instruction, or we		if (Cast->getSrcTy() == RecurrenceType) {
// reach the starting value.		// If the source type of a cast instruction is equal to the recurrence
Instruction *J = dyn_cast<Instruction>(U.get());		// type, it will be eliminated, and should be ignored in the vectorizer
if (!J \|\| J == Start)		// cost model.
continue;		Casts.insert(Cast);

// Otherwise, investigate the operation if it is also in the expression.
if (Visited.count(J)) {
Worklist.push_back(J);
continue;		continue;
}		}

// If the operand is not in Visited, it is not a reduction operation, but		// Add all operands to the work list if they are loop-varying values that
// it does feed into one. Make sure it is either a single-use sign- or		// we haven't yet visited.
// zero-extend instruction.		for (Value *O : cast<User>(Val)->operands())
CastInst *Cast = dyn_cast<CastInst>(J);		if (auto *I = dyn_cast<Instruction>(O))
bool IsSExtInst = isa<SExtInst>(J);		if (TheLoop->contains(I) && !Visited.count(I))
if (!Cast \|\| !Cast->hasOneUse() \|\| !(isa<ZExtInst>(J) \|\| IsSExtInst))		Worklist.push_back(I);
return false;

// Ensure the source type of the extend is no larger than the reduction
// type. It is not necessary for the types to be identical.
unsigned SrcSize = Cast->getSrcTy()->getPrimitiveSizeInBits();
if (SrcSize > DstSize)
return false;

// Furthermore, ensure that all such extends are of the same kind.
if (FoundOneOperand) {
if (IsSigned != IsSExtInst)
return false;
} else {
FoundOneOperand = true;
IsSigned = IsSExtInst;
}

// Lastly, if the source type of the extend matches the reduction type,
// add the extend to CI so that we can avoid accounting for it in the
// cost model.
if (SrcSize == DstSize)
CI.insert(Cast);
}		}
}		}
return true;
}

bool RecurrenceDescriptor::AddReductionVar(PHINode *Phi, RecurrenceKind Kind,		bool RecurrenceDescriptor::AddReductionVar(PHINode *Phi, RecurrenceKind Kind,
Loop *TheLoop, bool HasFunNoNaNAttr,		Loop *TheLoop, bool HasFunNoNaNAttr,
RecurrenceDescriptor &RedDes) {		RecurrenceDescriptor &RedDes,
		DemandedBits *DB,
		AssumptionCache *AC,
		DominatorTree *DT) {
if (Phi->getNumIncomingValues() != 2)		if (Phi->getNumIncomingValues() != 2)
return false;		return false;

// Reduction variables are only found in the loop header block.		// Reduction variables are only found in the loop header block.
if (Phi->getParent() != TheLoop->getHeader())		if (Phi->getParent() != TheLoop->getHeader())
return false;		return false;

// Obtain the reduction start value from the value that comes from the loop		// Obtain the reduction start value from the value that comes from the loop
▲ Show 20 Lines • Show All 172 Lines • ▼ Show 20 Lines	bool RecurrenceDescriptor::AddReductionVar(PHINode *Phi, RecurrenceKind Kind,
// pattern or more than just a select and cmp.		// pattern or more than just a select and cmp.
if ((Kind == RK_IntegerMinMax \|\| Kind == RK_FloatMinMax) &&		if ((Kind == RK_IntegerMinMax \|\| Kind == RK_FloatMinMax) &&
NumCmpSelectPatternInst != 2)		NumCmpSelectPatternInst != 2)
return false;		return false;

if (!FoundStartPHI \|\| !FoundReduxOp \|\| !ExitInstruction)		if (!FoundStartPHI \|\| !FoundReduxOp \|\| !ExitInstruction)
return false;		return false;

// If we think Phi may have been type-promoted, we also need to ensure that		if (Start != Phi) {
// all source operands of the reduction are either SExtInsts or ZEstInsts. If		// If the starting value is not the same as the phi node, we speculatively
// so, we will be able to evaluate the reduction in the narrower bit width.		// looked through an 'and' instruction when evaluating a potential
if (Start != Phi)		// arithmetic reduction to determine if it may have been type-promoted.
		dcaballeUnsubmitted Done Reply Inline Actions may been -> may have been? dcaballe: may been -> may have been?
		mssimpsoUnsubmitted Done Reply Inline Actions Nice catch! mssimpso: Nice catch!
if (!getSourceExtensionKind(Start, ExitInstruction, RecurrenceType,		//
IsSigned, VisitedInsts, CastInsts))		// We now compute the minimal bit width that is required to represent the
		// reduction. If this is the same width that was indicated by the 'and', we
		// can represent the reduction in the smaller type. The 'and' instruction
		// will be eliminated since it will essentially be a cast instruction that
		// can be ignore in the cost model. If we compute a different type than we
		// did when evaluating the 'and', the 'and' will not be eliminated, and we
		// will end up with different kinds of operations in the recurrence
		// expression (e.g., RK_IntegerAND, RK_IntegerADD). We give up if this is
		// the case.
		//
		// The vectorizer relies on InstCombine to perform the actual
		// type-shrinking. It does this by inserting instructions to truncate the
		// exit value of the reduction to the width indicated by RecurrenceType and
		// then extend this value back to the original width. If IsSigned is false,
		// a 'zext' instruction will be generated; otherwise, a 'sext' will be
		// used.
		//
		// TODO: We should not rely on InstCombine to rewrite the reduction in the
		// smaller type. We should just generate a correctly typed expression
		// to begin with.
		Type *ComputedType;
		std::tie(ComputedType, IsSigned) =
		computeRecurrenceType(ExitInstruction, DB, AC, DT);
		if (ComputedType != RecurrenceType)
return false;		return false;

		// The recurrence expression will be represented in a narrower type. If
		// there are any cast instructions that will be unnecessary, collect them
		// in CastInsts. Note that the 'and' instruction was already included in
		// this list.
		//
		// TODO: A better way to represent this may be to tag in some way all the
		// instructions that are a part of the reduction. The vectorizer cost
		// model could then apply the recurrence type to these instructions,
		// without needing a white list of instructions to ignore.
		collectCastsToIgnore(TheLoop, ExitInstruction, RecurrenceType, CastInsts);
		}

// We found a reduction var if we have reached the original phi node and we		// We found a reduction var if we have reached the original phi node and we
// only have a single instruction with out-of-loop users.		// only have a single instruction with out-of-loop users.

// The ExitInstruction(Instruction which is allowed to have out-of-loop users)		// The ExitInstruction(Instruction which is allowed to have out-of-loop users)
// is saved as part of the RecurrenceDescriptor.		// is saved as part of the RecurrenceDescriptor.

// Save the description of this reduction variable.		// Save the description of this reduction variable.
RecurrenceDescriptor RD(		RecurrenceDescriptor RD(
▲ Show 20 Lines • Show All 103 Lines • ▼ Show 20 Lines	if (Insts.count(dyn_cast<Instruction>(*Use)))
++NumUses;		++NumUses;
if (NumUses > 1)		if (NumUses > 1)
return true;		return true;
}		}

return false;		return false;
}		}
bool RecurrenceDescriptor::isReductionPHI(PHINode Phi, Loop TheLoop,		bool RecurrenceDescriptor::isReductionPHI(PHINode Phi, Loop TheLoop,
RecurrenceDescriptor &RedDes) {		RecurrenceDescriptor &RedDes,
		DemandedBits DB, AssumptionCache AC,
		DominatorTree *DT) {

BasicBlock *Header = TheLoop->getHeader();		BasicBlock *Header = TheLoop->getHeader();
Function &F = *Header->getParent();		Function &F = *Header->getParent();
bool HasFunNoNaNAttr =		bool HasFunNoNaNAttr =
F.getFnAttribute("no-nans-fp-math").getValueAsString() == "true";		F.getFnAttribute("no-nans-fp-math").getValueAsString() == "true";

if (AddReductionVar(Phi, RK_IntegerAdd, TheLoop, HasFunNoNaNAttr, RedDes)) {		if (AddReductionVar(Phi, RK_IntegerAdd, TheLoop, HasFunNoNaNAttr, RedDes, DB,
		AC, DT)) {
DEBUG(dbgs() << "Found an ADD reduction PHI." << *Phi << "\n");		DEBUG(dbgs() << "Found an ADD reduction PHI." << *Phi << "\n");
return true;		return true;
}		}
if (AddReductionVar(Phi, RK_IntegerMult, TheLoop, HasFunNoNaNAttr, RedDes)) {		if (AddReductionVar(Phi, RK_IntegerMult, TheLoop, HasFunNoNaNAttr, RedDes, DB,
		AC, DT)) {
DEBUG(dbgs() << "Found a MUL reduction PHI." << *Phi << "\n");		DEBUG(dbgs() << "Found a MUL reduction PHI." << *Phi << "\n");
return true;		return true;
}		}
if (AddReductionVar(Phi, RK_IntegerOr, TheLoop, HasFunNoNaNAttr, RedDes)) {		if (AddReductionVar(Phi, RK_IntegerOr, TheLoop, HasFunNoNaNAttr, RedDes, DB,
		AC, DT)) {
DEBUG(dbgs() << "Found an OR reduction PHI." << *Phi << "\n");		DEBUG(dbgs() << "Found an OR reduction PHI." << *Phi << "\n");
return true;		return true;
}		}
if (AddReductionVar(Phi, RK_IntegerAnd, TheLoop, HasFunNoNaNAttr, RedDes)) {		if (AddReductionVar(Phi, RK_IntegerAnd, TheLoop, HasFunNoNaNAttr, RedDes, DB,
		AC, DT)) {
DEBUG(dbgs() << "Found an AND reduction PHI." << *Phi << "\n");		DEBUG(dbgs() << "Found an AND reduction PHI." << *Phi << "\n");
return true;		return true;
}		}
if (AddReductionVar(Phi, RK_IntegerXor, TheLoop, HasFunNoNaNAttr, RedDes)) {		if (AddReductionVar(Phi, RK_IntegerXor, TheLoop, HasFunNoNaNAttr, RedDes, DB,
		AC, DT)) {
DEBUG(dbgs() << "Found a XOR reduction PHI." << *Phi << "\n");		DEBUG(dbgs() << "Found a XOR reduction PHI." << *Phi << "\n");
return true;		return true;
}		}
if (AddReductionVar(Phi, RK_IntegerMinMax, TheLoop, HasFunNoNaNAttr,		if (AddReductionVar(Phi, RK_IntegerMinMax, TheLoop, HasFunNoNaNAttr, RedDes,
RedDes)) {		DB, AC, DT)) {
DEBUG(dbgs() << "Found a MINMAX reduction PHI." << *Phi << "\n");		DEBUG(dbgs() << "Found a MINMAX reduction PHI." << *Phi << "\n");
return true;		return true;
}		}
if (AddReductionVar(Phi, RK_FloatMult, TheLoop, HasFunNoNaNAttr, RedDes)) {		if (AddReductionVar(Phi, RK_FloatMult, TheLoop, HasFunNoNaNAttr, RedDes, DB,
		AC, DT)) {
DEBUG(dbgs() << "Found an FMult reduction PHI." << *Phi << "\n");		DEBUG(dbgs() << "Found an FMult reduction PHI." << *Phi << "\n");
return true;		return true;
}		}
if (AddReductionVar(Phi, RK_FloatAdd, TheLoop, HasFunNoNaNAttr, RedDes)) {		if (AddReductionVar(Phi, RK_FloatAdd, TheLoop, HasFunNoNaNAttr, RedDes, DB,
		AC, DT)) {
DEBUG(dbgs() << "Found an FAdd reduction PHI." << *Phi << "\n");		DEBUG(dbgs() << "Found an FAdd reduction PHI." << *Phi << "\n");
return true;		return true;
}		}
if (AddReductionVar(Phi, RK_FloatMinMax, TheLoop, HasFunNoNaNAttr, RedDes)) {		if (AddReductionVar(Phi, RK_FloatMinMax, TheLoop, HasFunNoNaNAttr, RedDes, DB,
		AC, DT)) {
DEBUG(dbgs() << "Found an float MINMAX reduction PHI." << *Phi << "\n");		DEBUG(dbgs() << "Found an float MINMAX reduction PHI." << *Phi << "\n");
return true;		return true;
}		}
// Not a reduction of known type.		// Not a reduction of known type.
return false;		return false;
}		}

bool RecurrenceDescriptor::isFirstOrderRecurrence(		bool RecurrenceDescriptor::isFirstOrderRecurrence(
▲ Show 20 Lines • Show All 1,143 Lines • Show Last 20 Lines

lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,533 Lines • ▼ Show 20 Lines
class LoopVectorizationLegality {		class LoopVectorizationLegality {
public:		public:
LoopVectorizationLegality(		LoopVectorizationLegality(
Loop L, PredicatedScalarEvolution &PSE, DominatorTree DT,		Loop L, PredicatedScalarEvolution &PSE, DominatorTree DT,
TargetLibraryInfo TLI, AliasAnalysis AA, Function *F,		TargetLibraryInfo TLI, AliasAnalysis AA, Function *F,
const TargetTransformInfo *TTI,		const TargetTransformInfo *TTI,
std::function<const LoopAccessInfo &(Loop &)> GetLAA, LoopInfo LI,		std::function<const LoopAccessInfo &(Loop &)> GetLAA, LoopInfo LI,
OptimizationRemarkEmitter ORE, LoopVectorizationRequirements R,		OptimizationRemarkEmitter ORE, LoopVectorizationRequirements R,
LoopVectorizeHints *H)		LoopVectorizeHints H, DemandedBits DB, AssumptionCache *AC)
: TheLoop(L), PSE(PSE), TLI(TLI), TTI(TTI), DT(DT), GetLAA(GetLAA),		: TheLoop(L), PSE(PSE), TLI(TLI), TTI(TTI), DT(DT), GetLAA(GetLAA),
ORE(ORE), InterleaveInfo(PSE, L, DT, LI), Requirements(R), Hints(H) {}		ORE(ORE), InterleaveInfo(PSE, L, DT, LI), Requirements(R), Hints(H),
		DB(DB), AC(AC) {}

/// ReductionList contains the reduction descriptors for all		/// ReductionList contains the reduction descriptors for all
/// of the reductions that were found in the loop.		/// of the reductions that were found in the loop.
using ReductionList = DenseMap<PHINode *, RecurrenceDescriptor>;		using ReductionList = DenseMap<PHINode *, RecurrenceDescriptor>;

/// InductionList saves induction variables and maps them to the		/// InductionList saves induction variables and maps them to the
/// induction descriptor.		/// induction descriptor.
using InductionList = MapVector<PHINode *, InductionDescriptor>;		using InductionList = MapVector<PHINode *, InductionDescriptor>;
▲ Show 20 Lines • Show All 272 Lines • ▼ Show 20 Lines	private:
bool HasFunNoNaNAttr = false;		bool HasFunNoNaNAttr = false;

/// Vectorization requirements that will go through late-evaluation.		/// Vectorization requirements that will go through late-evaluation.
LoopVectorizationRequirements *Requirements;		LoopVectorizationRequirements *Requirements;

/// Used to emit an analysis of any legality issues.		/// Used to emit an analysis of any legality issues.
LoopVectorizeHints *Hints;		LoopVectorizeHints *Hints;

		/// The demanded bits analsyis is used to compute the minimum type size in
		/// which a reduction can be computed.
		DemandedBits *DB;

		/// The assumption cache analysis is used to compute the minimum type size in
		/// which a reduction can be computed.
		AssumptionCache *AC;

/// While vectorizing these instructions we have to generate a		/// While vectorizing these instructions we have to generate a
/// call to the appropriate masked intrinsic		/// call to the appropriate masked intrinsic
SmallPtrSet<const Instruction *, 8> MaskedOp;		SmallPtrSet<const Instruction *, 8> MaskedOp;
};		};

/// LoopVectorizationCostModel - estimates the expected speedups due to		/// LoopVectorizationCostModel - estimates the expected speedups due to
/// vectorization.		/// vectorization.
/// In many cases vectorization is not profitable. This can happen because of		/// In many cases vectorization is not profitable. This can happen because of
▲ Show 20 Lines • Show All 3,259 Lines • ▼ Show 20 Lines	for (Instruction &I : *BB) {
if (Phi->getNumIncomingValues() != 2) {		if (Phi->getNumIncomingValues() != 2) {
ORE->emit(createMissedAnalysis("CFGNotUnderstood", Phi)		ORE->emit(createMissedAnalysis("CFGNotUnderstood", Phi)
<< "control flow not understood by vectorizer");		<< "control flow not understood by vectorizer");
DEBUG(dbgs() << "LV: Found an invalid PHI.\n");		DEBUG(dbgs() << "LV: Found an invalid PHI.\n");
return false;		return false;
}		}

RecurrenceDescriptor RedDes;		RecurrenceDescriptor RedDes;
if (RecurrenceDescriptor::isReductionPHI(Phi, TheLoop, RedDes)) {		if (RecurrenceDescriptor::isReductionPHI(Phi, TheLoop, RedDes, DB, AC,
		DT)) {
if (RedDes.hasUnsafeAlgebra())		if (RedDes.hasUnsafeAlgebra())
Requirements->addUnsafeAlgebraInst(RedDes.getUnsafeAlgebraInst());		Requirements->addUnsafeAlgebraInst(RedDes.getUnsafeAlgebraInst());
AllowedExit.insert(RedDes.getLoopExitInstr());		AllowedExit.insert(RedDes.getLoopExitInstr());
Reductions[Phi] = RedDes;		Reductions[Phi] = RedDes;
continue;		continue;
}		}

InductionDescriptor ID;		InductionDescriptor ID;
▲ Show 20 Lines • Show All 3,201 Lines • ▼ Show 20 Lines	if (!Hints.allowVectorization(F, L, AlwaysVectorize)) {
return false;		return false;
}		}

PredicatedScalarEvolution PSE(SE, L);		PredicatedScalarEvolution PSE(SE, L);

// Check if it is legal to vectorize the loop.		// Check if it is legal to vectorize the loop.
LoopVectorizationRequirements Requirements(*ORE);		LoopVectorizationRequirements Requirements(*ORE);
LoopVectorizationLegality LVL(L, PSE, DT, TLI, AA, F, TTI, GetLAA, LI, ORE,		LoopVectorizationLegality LVL(L, PSE, DT, TLI, AA, F, TTI, GetLAA, LI, ORE,
&Requirements, &Hints);		&Requirements, &Hints, DB, AC);
if (!LVL.canVectorize()) {		if (!LVL.canVectorize()) {
DEBUG(dbgs() << "LV: Not vectorizing: Cannot prove legality.\n");		DEBUG(dbgs() << "LV: Not vectorizing: Cannot prove legality.\n");
emitMissedWarning(F, L, Hints, ORE);		emitMissedWarning(F, L, Hints, ORE);
return false;		return false;
}		}

// Check the function attributes to find out if this function should be		// Check the function attributes to find out if this function should be
// optimized for size.		// optimized for size.
▲ Show 20 Lines • Show All 300 Lines • Show Last 20 Lines

test/Transforms/LoopVectorize/reduction-small-size.ll

	; RUN: opt < %s -force-vector-width=4 -force-vector-interleave=1 -loop-vectorize -S \| FileCheck %s			; RUN: opt < %s -force-vector-width=4 -force-vector-interleave=1 -loop-vectorize -S \| FileCheck %s

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	; CHECK-LABEL: @PR34687(			; CHECK-LABEL: @PR34687(
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, %vector.ph ], [ [[INDEX_NEXT:%.]], %[[LATCH:.*]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, %vector.ph ], [ [[INDEX_NEXT:%.]], %[[LATCH:.*]] ]
	; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x i32> [ zeroinitializer, %vector.ph ], [ [[TMP17:%.]], %[[LATCH]] ]			; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x i32> [ zeroinitializer, %vector.ph ], [ [[TMP17:%.]], %[[LATCH]] ]
	; CHECK: [[LATCH]]:			; CHECK: [[LATCH]]:
	; CHECK: [[TMP13:%.*]] = and <4 x i32> [[VEC_PHI]], <i32 255, i32 255, i32 255, i32 255>			; CHECK: [[TMP13:%.*]] = and <4 x i32> [[VEC_PHI]], <i32 255, i32 255, i32 255, i32 255>
	; CHECK-NEXT: [[TMP14:%.]] = add nuw nsw <4 x i32> [[TMP13]], {{.}}			; CHECK-NEXT: [[TMP14:%.]] = add nuw nsw <4 x i32> [[TMP13]], {{.}}
	; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4
	; CHECK: [[TMP16:%.*]] = trunc <4 x i32> [[TMP14]] to <4 x i8>			; CHECK: [[TMP16:%.*]] = trunc <4 x i32> [[TMP14]] to <4 x i8>
	; CHECK-NEXT: [[TMP17]] = zext <4 x i8> [[TMP16]] to <4 x i32>			; CHECK-NEXT: [[TMP17]] = zext <4 x i8> [[TMP16]] to <4 x i32>
	; CHECK-NEXT: br i1 {{.*}}, label %middle.block, label %vector.body			; CHECK-NEXT: br i1 {{.*}}, label %middle.block, label %vector.body
	;			;
	define void @PR34687(i1 %c, i32 %x, i32 %n) {			define i8 @PR34687(i1 %c, i32 %x, i32 %n) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%i = phi i32 [ 0, %entry ], [ %i.next, %if.end ]			%i = phi i32 [ 0, %entry ], [ %i.next, %if.end ]
	%r = phi i32 [ 0, %entry ], [ %r.next, %if.end ]			%r = phi i32 [ 0, %entry ], [ %r.next, %if.end ]
	br i1 %c, label %if.then, label %if.end			br i1 %c, label %if.then, label %if.end

	if.then:			if.then:
	%tmp0 = sdiv i32 undef, undef			%tmp0 = sdiv i32 undef, undef
	br label %if.end			br label %if.end

	if.end:			if.end:
	%tmp1 = and i32 %r, 255			%tmp1 = and i32 %r, 255
	%i.next = add nsw i32 %i, 1			%i.next = add nsw i32 %i, 1
	%r.next = add nuw nsw i32 %tmp1, %x			%r.next = add nuw nsw i32 %tmp1, %x
	%cond = icmp eq i32 %i.next, %n			%cond = icmp eq i32 %i.next, %n
	br i1 %cond, label %for.end, label %for.body			br i1 %cond, label %for.end, label %for.body

	for.end:			for.end:
	%tmp2 = phi i32 [ %r.next, %if.end ]			%tmp2 = phi i32 [ %r.next, %if.end ]
	ret void			%tmp3 = trunc i32 %tmp2 to i8
				ret i8 %tmp3
				}

				; CHECK-LABEL: @PR35734(
				; CHECK: vector.ph:
				; CHECK: [[TMP3:%.*]] = insertelement <4 x i32> zeroinitializer, i32 %y, i32 0
				; CHECK-NEXT: br label %vector.body
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, %vector.ph ], [ [[INDEX_NEXT:%.]], %vector.body ]
				; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x i32> [ [[TMP3]], %vector.ph ], [ [[TMP9:%.]], %vector.body ]
				; CHECK: [[TMP5:%.*]] = and <4 x i32> [[VEC_PHI]], <i32 1, i32 1, i32 1, i32 1>
				; CHECK-NEXT: [[TMP6:%.*]] = add <4 x i32> [[TMP5]], <i32 -1, i32 -1, i32 -1, i32 -1>
				; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4
				; CHECK: [[TMP8:%.*]] = trunc <4 x i32> [[TMP6]] to <4 x i1>
				; CHECK-NEXT: [[TMP9]] = sext <4 x i1> [[TMP8]] to <4 x i32>
				; CHECK-NEXT: br i1 {{.*}}, label %middle.block, label %vector.body
				;
				define i32 @PR35734(i32 %x, i32 %y) {
				entry:
				br label %for.body

				for.body:
				%i = phi i32 [ %x, %entry ], [ %i.next, %for.body ]
				%r = phi i32 [ %y, %entry ], [ %r.next, %for.body ]
				%tmp0 = and i32 %r, 1
				%r.next = add i32 %tmp0, -1
				%i.next = add nsw i32 %i, 1
				%cond = icmp sgt i32 %i, 77
				br i1 %cond, label %for.end, label %for.body

				for.end:
				%tmp1 = phi i32 [ %r.next, %for.body ]
				ret i32 %tmp1
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[LV] Use Demanded Bits and ValueTracking for reduction type-shrinkingClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 132601

include/llvm/Transforms/Utils/LoopUtils.h

lib/Transforms/Utils/LoopUtils.cpp

lib/Transforms/Vectorize/LoopVectorize.cpp

test/Transforms/LoopVectorize/reduction-small-size.ll

[LV] Use Demanded Bits and ValueTracking for reduction type-shrinking
ClosedPublic