This is an archive of the discontinued LLVM Phabricator instance.

[LoopStrengthReduction] Treat SCEVUnknown pessimistically in LSR
AbandonedPublic

Authored by mkazantsev on Jun 6 2017, 12:50 AM.

Download Raw Diff

Details

Reviewers

qcolombet
sanjoy
reames
evstupac
wmi

Summary

When evaluating formulae for LSR, we sometimes may have SCEVUnknown included
into the expression. When we try to make a fixup for an instruction outside the loop
and this SCEVUnknown is loop-variant, we are unable to duplicate/sink its s out of loop.
We must reuse the instruction from the loop after fixup. As result, it will not be trivially
dead for sure. We are also unable to evaluate the number of other instructions that will
also not become dead, because they are still used by it. This makes our reasoning about
profitability of application of the formula misleading.

On the other hand, while doing this transformation, we may create LSR IV Phis in the loop,
thus increasing the code. These new Phis will also be (likely) not dead. As result, this
misleading choise of a formula ends up with code increase without any benefits. This only
leads to performance, compile time and code size degradations.

The examples of tests where the LSR only produces new instructions inside and ouside the
loop are demonstrated in the attached test.

To avoid this situation, this patch changes LSR cost model so that it rejects formulae with
registers that include loop-variant SCEVUnknown values if they require at least one fixup
outside the loop.

Diff Detail

Event Timeline

mkazantsev created this revision.Jun 6 2017, 12:50 AM

Herald added subscribers: javed.absar, mzolotukhin. · View Herald TranscriptJun 6 2017, 12:50 AM

Fixed a typo in comment.

Ping

mkazantsev added reviewers: evstupac, wmi.Jun 20 2017, 12:32 AM

Hi Max,

The way you treat formula containing loop variant unknown as loser will definitely help for some cases. But I can also think of some cases that LSRUse with all-fixups-outside-loop may have no available formula after this filtering and LSR will end up doing nothing for the whole loop. This may cause regression. Think about a hypothetical loop with 10 different induction variables inside of loop. Ideally LSR can replace those 10 induction variables with a single one and will be very helpful for performance. If because a LSRUse with all-fixups-outside-loop has no available formula after filtering all the losers and LSR finds no solution for the loop, it will be a big lost.

Thanks,
Wei.

This patch doesn't only help, but in some cases produces the worse code. The modification of the test rm-and-tst-peephole.ll is actually a red flad. Need to look closer into the situations when this patch makes us worse.

No obvious reasons to keep it since it is not purely profitable transform.

Revision Contents

Path

Size

include/

llvm/

Analysis/

ScalarEvolution.h

4 lines

lib/

Analysis/

ScalarEvolution.cpp

25 lines

Transforms/

Scalar/

LoopStrengthReduce.cpp

58 lines

test/

CodeGen/

ARM/

arm-and-tst-peephole.ll

6 lines

Transforms/

LoopStrengthReduce/

X86/

reject-scev-unknown.ll

58 lines

Diff 101520

include/llvm/Analysis/ScalarEvolution.h

Show First 20 Lines • Show All 1,532 Lines • ▼ Show 20 Lines	public:
/// specified loop.		/// specified loop.
bool isLoopInvariant(const SCEV S, const Loop L);		bool isLoopInvariant(const SCEV S, const Loop L);

/// Determine if the SCEV can be evaluated at loop's entry. It is true if it		/// Determine if the SCEV can be evaluated at loop's entry. It is true if it
/// doesn't depend on a SCEVUnknown of an instruction which is dominated by		/// doesn't depend on a SCEVUnknown of an instruction which is dominated by
/// the header of loop L.		/// the header of loop L.
bool isAvailableAtLoopEntry(const SCEV S, const Loop L);		bool isAvailableAtLoopEntry(const SCEV S, const Loop L);

		/// Determine if the SCEV depends on a SCEVUnknown which is not invariant
		/// w.r.t. the loop.
		bool dependsOnVariantSCEVUnknown(const SCEV S, const Loop L);

/// Return true if the given SCEV changes value in a known way in the		/// Return true if the given SCEV changes value in a known way in the
/// specified loop. This property being true implies that the value is		/// specified loop. This property being true implies that the value is
/// variant in the loop AND that we can emit an expression to compute the		/// variant in the loop AND that we can emit an expression to compute the
/// value of the expression at any particular loop iteration.		/// value of the expression at any particular loop iteration.
bool hasComputableLoopEvolution(const SCEV S, const Loop L);		bool hasComputableLoopEvolution(const SCEV S, const Loop L);

/// Return the "disposition" of the given SCEV with respect to the given		/// Return the "disposition" of the given SCEV with respect to the given
/// block.		/// block.
▲ Show 20 Lines • Show All 296 Lines • Show Last 20 Lines

lib/Analysis/ScalarEvolution.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,228 Lines • ▼ Show 20 Lines	bool ScalarEvolution::isAvailableAtLoopEntry(const SCEV S, const Loop L) {
};		};

FindDominatedSCEVUnknown FSU(L, DT, LI);		FindDominatedSCEVUnknown FSU(L, DT, LI);
SCEVTraversal<FindDominatedSCEVUnknown> ST(FSU);		SCEVTraversal<FindDominatedSCEVUnknown> ST(FSU);
ST.visitAll(S);		ST.visitAll(S);
return !FSU.Found;		return !FSU.Found;
}		}

		bool
		ScalarEvolution::dependsOnVariantSCEVUnknown(const SCEV S, const Loop L) {
		struct FindSCEVUnknown {
		const Loop *L;
		ScalarEvolution *SE;
		bool Found = false;

		FindSCEVUnknown(const Loop L, ScalarEvolution SE)
		: L(L), SE(SE) {}

		bool follow(const SCEV *S) {
		if (isa<SCEVUnknown>(S) && !SE->isLoopInvariant(S, L))
		Found = true;
		return true;
		}

		bool isDone() { return Found; }
		};

		FindSCEVUnknown FSU = FindSCEVUnknown(L, this);
		SCEVTraversal<FindSCEVUnknown> ST(FSU);
		ST.visitAll(S);
		return FSU.Found;
		}

/// Get a canonical add expression, or something simpler if possible.		/// Get a canonical add expression, or something simpler if possible.
const SCEV ScalarEvolution::getAddExpr(SmallVectorImpl<const SCEV > &Ops,		const SCEV ScalarEvolution::getAddExpr(SmallVectorImpl<const SCEV > &Ops,
SCEV::NoWrapFlags Flags,		SCEV::NoWrapFlags Flags,
unsigned Depth) {		unsigned Depth) {
assert(!(Flags & ~(SCEV::FlagNUW \| SCEV::FlagNSW)) &&		assert(!(Flags & ~(SCEV::FlagNUW \| SCEV::FlagNSW)) &&
"only nuw or nsw allowed");		"only nuw or nsw allowed");
assert(!Ops.empty() && "Cannot get empty add!");		assert(!Ops.empty() && "Cannot get empty add!");
if (Ops.size() == 1) return Ops[0];		if (Ops.size() == 1) return Ops[0];
▲ Show 20 Lines • Show All 8,777 Lines • Show Last 20 Lines

lib/Transforms/Scalar/LoopStrengthReduce.cpp

Show First 20 Lines • Show All 995 Lines • ▼ Show 20 Lines	#endif

void print(raw_ostream &OS) const;		void print(raw_ostream &OS) const;
void dump() const;		void dump() const;

private:		private:
void RateRegister(const SCEV *Reg,		void RateRegister(const SCEV *Reg,
SmallPtrSetImpl<const SCEV *> &Regs,		SmallPtrSetImpl<const SCEV *> &Regs,
const Loop *L,		const Loop *L,
ScalarEvolution &SE, DominatorTree &DT);		ScalarEvolution &SE, DominatorTree &DT,
		const LSRUse &LU);
void RatePrimaryRegister(const SCEV *Reg,		void RatePrimaryRegister(const SCEV *Reg,
SmallPtrSetImpl<const SCEV *> &Regs,		SmallPtrSetImpl<const SCEV *> &Regs,
const Loop *L,		const Loop *L,
ScalarEvolution &SE, DominatorTree &DT,		ScalarEvolution &SE, DominatorTree &DT,
SmallPtrSetImpl<const SCEV > LoserRegs);		SmallPtrSetImpl<const SCEV > LoserRegs,
		const LSRUse &LU);
};		};

/// An operand value in an instruction which is to be replaced with some		/// An operand value in an instruction which is to be replaced with some
/// equivalent, possibly strength-reduced, replacement.		/// equivalent, possibly strength-reduced, replacement.
struct LSRFixup {		struct LSRFixup {
/// The instruction which will be updated.		/// The instruction which will be updated.
Instruction *UserInst;		Instruction *UserInst;

/// The operand of the instruction which will be replaced. The operand may be		/// The operand of the instruction which will be replaced. The operand may be
/// used more than once; every instance will be replaced.		/// used more than once; every instance will be replaced.
▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	public:
/// Keep track of the min and max offsets of the fixups.		/// Keep track of the min and max offsets of the fixups.
int64_t MinOffset;		int64_t MinOffset;
int64_t MaxOffset;		int64_t MaxOffset;

/// This records whether all of the fixups using this LSRUse are outside of		/// This records whether all of the fixups using this LSRUse are outside of
/// the loop, in which case some special-case heuristics may be used.		/// the loop, in which case some special-case heuristics may be used.
bool AllFixupsOutsideLoop;		bool AllFixupsOutsideLoop;

		/// This records whether all of the fixups using this LSRUse are inside
		/// the loop, in which case some special-case heuristics may be used.
		bool AllFixupsInsideLoop;

/// RigidFormula is set to true to guarantee that this use will be associated		/// RigidFormula is set to true to guarantee that this use will be associated
/// with a single formula--the one that initially matched. Some SCEV		/// with a single formula--the one that initially matched. Some SCEV
/// expressions cannot be expanded. This allows LSR to consider the registers		/// expressions cannot be expanded. This allows LSR to consider the registers
/// used by those expressions without the need to expand them later after		/// used by those expressions without the need to expand them later after
/// changing the formula.		/// changing the formula.
bool RigidFormula;		bool RigidFormula;

/// This records the widest use type for any fixup using this		/// This records the widest use type for any fixup using this
/// LSRUse. FindUseWithSimilarFormula can't consider uses with different max		/// LSRUse. FindUseWithSimilarFormula can't consider uses with different max
/// fixup widths to be equivalent, because the narrower one may be relying on		/// fixup widths to be equivalent, because the narrower one may be relying on
/// the implicit truncation to truncate away bogus bits.		/// the implicit truncation to truncate away bogus bits.
Type *WidestFixupType;		Type *WidestFixupType;

/// A list of ways to build a value that can satisfy this user. After the		/// A list of ways to build a value that can satisfy this user. After the
/// list is populated, one of these is selected heuristically and used to		/// list is populated, one of these is selected heuristically and used to
/// formulate a replacement for OperandValToReplace in UserInst.		/// formulate a replacement for OperandValToReplace in UserInst.
SmallVector<Formula, 12> Formulae;		SmallVector<Formula, 12> Formulae;

/// The set of register candidates used by all formulae in this LSRUse.		/// The set of register candidates used by all formulae in this LSRUse.
SmallPtrSet<const SCEV *, 4> Regs;		SmallPtrSet<const SCEV *, 4> Regs;

LSRUse(KindType K, MemAccessTy AT)		LSRUse(KindType K, MemAccessTy AT)
: Kind(K), AccessTy(AT), MinOffset(INT64_MAX), MaxOffset(INT64_MIN),		: Kind(K), AccessTy(AT), MinOffset(INT64_MAX), MaxOffset(INT64_MIN),
AllFixupsOutsideLoop(true), RigidFormula(false),		AllFixupsOutsideLoop(true), AllFixupsInsideLoop(true),
WidestFixupType(nullptr) {}		RigidFormula(false), WidestFixupType(nullptr) {}

LSRFixup &getNewFixup() {		LSRFixup &getNewFixup() {
Fixups.push_back(LSRFixup());		Fixups.push_back(LSRFixup());
return Fixups.back();		return Fixups.back();
}		}

void pushFixup(LSRFixup &f) {		void pushFixup(LSRFixup &f) {
Fixups.push_back(f);		Fixups.push_back(f);
if (f.Offset > MaxOffset)		if (f.Offset > MaxOffset)
MaxOffset = f.Offset;		MaxOffset = f.Offset;
if (f.Offset < MinOffset)		if (f.Offset < MinOffset)
MinOffset = f.Offset;		MinOffset = f.Offset;
}		}

bool HasFormulaWithSameRegs(const Formula &F) const;		bool HasFormulaWithSameRegs(const Formula &F) const;
float getNotSelectedProbability(const SCEV *Reg) const;		float getNotSelectedProbability(const SCEV *Reg) const;
bool InsertFormula(const Formula &F, const Loop &L);		bool InsertFormula(const Formula &F, const Loop &L);
void DeleteFormula(Formula &F);		void DeleteFormula(Formula &F);
void RecomputeRegs(size_t LUIdx, RegUseTracker &Reguses);		void RecomputeRegs(size_t LUIdx, RegUseTracker &Reguses);

void print(raw_ostream &OS) const;		void print(raw_ostream &OS) const;
void dump() const;		void dump() const;
};		};

} // end anonymous namespace		} // end anonymous namespace

/// Tally up interesting quantities from the given register.		/// Tally up interesting quantities from the given register.
void Cost::RateRegister(const SCEV *Reg,		void Cost::RateRegister(const SCEV *Reg,
SmallPtrSetImpl<const SCEV *> &Regs,		SmallPtrSetImpl<const SCEV *> &Regs,
const Loop *L,		const Loop *L,
ScalarEvolution &SE, DominatorTree &DT) {		ScalarEvolution &SE, DominatorTree &DT,
		const LSRUse &LU) {
		// If we have a use of loop-variant SCEVUnknown outside the loop, it is likely
		// that the optimization won't give us any benefits here: we will have to
		// reuse these values from within the loop and cannot adequately evaluate how
		// many other instructions from the loop they use. So in fact it is possible
		// that we will not be able to optimize anything from the loop, but will
		// create LSR Phis and duplicate some instructions, producing a worse code.
		if (!LU.AllFixupsInsideLoop && SE.dependsOnVariantSCEVUnknown(Reg, L)) {
		Lose();
		return;
		}
if (const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(Reg)) {		if (const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(Reg)) {
// If this is an addrec for another loop, it should be an invariant		// If this is an addrec for another loop, it should be an invariant
// with respect to L since L is the innermost loop (at least		// with respect to L since L is the innermost loop (at least
// for now LSR only handles innermost loops).		// for now LSR only handles innermost loops).
if (AR->getLoop() != L) {		if (AR->getLoop() != L) {
// If the AddRec exists, consider it's register free and leave it alone.		// If the AddRec exists, consider it's register free and leave it alone.
if (isExistingPhi(AR, SE))		if (isExistingPhi(AR, SE))
return;		return;
Show All 10 Lines	if (AR->getLoop() != L) {
return;		return;
}		}
AddRecCost += 1; /// TODO: This should be a function of the stride.		AddRecCost += 1; /// TODO: This should be a function of the stride.

// Add the step value register, if it needs one.		// Add the step value register, if it needs one.
// TODO: The non-affine case isn't precisely modeled here.		// TODO: The non-affine case isn't precisely modeled here.
if (!AR->isAffine() \|\| !isa<SCEVConstant>(AR->getOperand(1))) {		if (!AR->isAffine() \|\| !isa<SCEVConstant>(AR->getOperand(1))) {
if (!Regs.count(AR->getOperand(1))) {		if (!Regs.count(AR->getOperand(1))) {
RateRegister(AR->getOperand(1), Regs, L, SE, DT);		RateRegister(AR->getOperand(1), Regs, L, SE, DT, LU);
if (isLoser())		if (isLoser())
return;		return;
}		}
}		}
}		}
++NumRegs;		++NumRegs;

// Rough heuristic; favor registers which don't require extra setup		// Rough heuristic; favor registers which don't require extra setup
Show All 11 Lines

/// Record this register in the set. If we haven't seen it before, rate		/// Record this register in the set. If we haven't seen it before, rate
/// it. Optional LoserRegs provides a way to declare any formula that refers to		/// it. Optional LoserRegs provides a way to declare any formula that refers to
/// one of those regs an instant loser.		/// one of those regs an instant loser.
void Cost::RatePrimaryRegister(const SCEV *Reg,		void Cost::RatePrimaryRegister(const SCEV *Reg,
SmallPtrSetImpl<const SCEV *> &Regs,		SmallPtrSetImpl<const SCEV *> &Regs,
const Loop *L,		const Loop *L,
ScalarEvolution &SE, DominatorTree &DT,		ScalarEvolution &SE, DominatorTree &DT,
SmallPtrSetImpl<const SCEV > LoserRegs) {		SmallPtrSetImpl<const SCEV > LoserRegs,
		const LSRUse &LU) {
if (LoserRegs && LoserRegs->count(Reg)) {		if (LoserRegs && LoserRegs->count(Reg)) {
Lose();		Lose();
return;		return;
}		}
if (Regs.insert(Reg).second) {		if (Regs.insert(Reg).second) {
RateRegister(Reg, Regs, L, SE, DT);		RateRegister(Reg, Regs, L, SE, DT, LU);
if (LoserRegs && isLoser())		if (LoserRegs && isLoser())
LoserRegs->insert(Reg);		LoserRegs->insert(Reg);
}		}
}		}

void Cost::RateFormula(const TargetTransformInfo &TTI,		void Cost::RateFormula(const TargetTransformInfo &TTI,
const Formula &F,		const Formula &F,
SmallPtrSetImpl<const SCEV *> &Regs,		SmallPtrSetImpl<const SCEV *> &Regs,
const DenseSet<const SCEV *> &VisitedRegs,		const DenseSet<const SCEV *> &VisitedRegs,
const Loop *L,		const Loop *L,
ScalarEvolution &SE, DominatorTree &DT,		ScalarEvolution &SE, DominatorTree &DT,
const LSRUse &LU,		const LSRUse &LU,
SmallPtrSetImpl<const SCEV > LoserRegs) {		SmallPtrSetImpl<const SCEV > LoserRegs) {
assert(F.isCanonical(*L) && "Cost is accurate only for canonical formula");		assert(F.isCanonical(*L) && "Cost is accurate only for canonical formula");
// Tally up the registers.		// Tally up the registers.
unsigned PrevAddRecCost = AddRecCost;		unsigned PrevAddRecCost = AddRecCost;
unsigned PrevNumRegs = NumRegs;		unsigned PrevNumRegs = NumRegs;
unsigned PrevNumBaseAdds = NumBaseAdds;		unsigned PrevNumBaseAdds = NumBaseAdds;
if (const SCEV *ScaledReg = F.ScaledReg) {		if (const SCEV *ScaledReg = F.ScaledReg) {
if (VisitedRegs.count(ScaledReg)) {		if (VisitedRegs.count(ScaledReg)) {
Lose();		Lose();
return;		return;
}		}
RatePrimaryRegister(ScaledReg, Regs, L, SE, DT, LoserRegs);		RatePrimaryRegister(ScaledReg, Regs, L, SE, DT, LoserRegs, LU);
if (isLoser())		if (isLoser())
return;		return;
}		}
for (const SCEV *BaseReg : F.BaseRegs) {		for (const SCEV *BaseReg : F.BaseRegs) {
if (VisitedRegs.count(BaseReg)) {		if (VisitedRegs.count(BaseReg)) {
Lose();		Lose();
return;		return;
}		}
RatePrimaryRegister(BaseReg, Regs, L, SE, DT, LoserRegs);		RatePrimaryRegister(BaseReg, Regs, L, SE, DT, LoserRegs, LU);
if (isLoser())		if (isLoser())
return;		return;
}		}

// Treat every new register that exceeds TTI.getNumberOfRegisters() - 1 as		// Treat every new register that exceeds TTI.getNumberOfRegisters() - 1 as
// additional instruction (at least fill).		// additional instruction (at least fill).
unsigned TTIRegNum = TTI.getNumberOfRegisters(false) - 1;		unsigned TTIRegNum = TTI.getNumberOfRegisters(false) - 1;
if (NumRegs > TTIRegNum) {		if (NumRegs > TTIRegNum) {
▲ Show 20 Lines • Show All 249 Lines • ▼ Show 20 Lines	for (const LSRFixup &Fixup : Fixups) {
OS << Fixup.Offset;		OS << Fixup.Offset;
NeedComma = true;		NeedComma = true;
}		}
OS << '}';		OS << '}';

if (AllFixupsOutsideLoop)		if (AllFixupsOutsideLoop)
OS << ", all-fixups-outside-loop";		OS << ", all-fixups-outside-loop";

		if (AllFixupsInsideLoop)
		OS << ", all-fixups-inside-loop";

if (WidestFixupType)		if (WidestFixupType)
OS << ", widest fixup type: " << *WidestFixupType;		OS << ", widest fixup type: " << *WidestFixupType;
}		}

#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)		#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)
LLVM_DUMP_METHOD void LSRUse::dump() const {		LLVM_DUMP_METHOD void LSRUse::dump() const {
print(errs()); errs() << '\n';		print(errs()); errs() << '\n';
}		}
▲ Show 20 Lines • Show All 1,604 Lines • ▼ Show 20 Lines	for (const IVStrideUse &U : IU) {
MemAccessTy AccessTy;		MemAccessTy AccessTy;
if (isAddressUse(UserInst, U.getOperandValToReplace())) {		if (isAddressUse(UserInst, U.getOperandValToReplace())) {
Kind = LSRUse::Address;		Kind = LSRUse::Address;
AccessTy = getAccessType(UserInst);		AccessTy = getAccessType(UserInst);
}		}

const SCEV *S = IU.getExpr(U);		const SCEV *S = IU.getExpr(U);
PostIncLoopSet TmpPostIncLoops = U.getPostIncLoops();		PostIncLoopSet TmpPostIncLoops = U.getPostIncLoops();

// Equality (== and !=) ICmps are special. We can rewrite (i == N) as		// Equality (== and !=) ICmps are special. We can rewrite (i == N) as
// (N - i == 0), and this allows (N - i) to be the expression that we work		// (N - i == 0), and this allows (N - i) to be the expression that we work
// with rather than just N or i, so we can consider the register		// with rather than just N or i, so we can consider the register
// requirements for both N and i at the same time. Limiting this code to		// requirements for both N and i at the same time. Limiting this code to
// equality icmps is not a problem because all interesting loops use		// equality icmps is not a problem because all interesting loops use
// equality icmps, thanks to IndVarSimplify.		// equality icmps, thanks to IndVarSimplify.
if (ICmpInst *CI = dyn_cast<ICmpInst>(UserInst))		if (ICmpInst *CI = dyn_cast<ICmpInst>(UserInst))
if (CI->isEquality()) {		if (CI->isEquality()) {
Show All 32 Lines	for (const IVStrideUse &U : IU) {
LSRUse &LU = Uses[LUIdx];		LSRUse &LU = Uses[LUIdx];

// Record the fixup.		// Record the fixup.
LSRFixup &LF = LU.getNewFixup();		LSRFixup &LF = LU.getNewFixup();
LF.UserInst = UserInst;		LF.UserInst = UserInst;
LF.OperandValToReplace = U.getOperandValToReplace();		LF.OperandValToReplace = U.getOperandValToReplace();
LF.PostIncLoops = TmpPostIncLoops;		LF.PostIncLoops = TmpPostIncLoops;
LF.Offset = Offset;		LF.Offset = Offset;
LU.AllFixupsOutsideLoop &= LF.isUseFullyOutsideLoop(L);		bool UseOutside = LF.isUseFullyOutsideLoop(L);
		LU.AllFixupsOutsideLoop &= UseOutside;
		LU.AllFixupsInsideLoop &= !UseOutside;

if (!LU.WidestFixupType \|\|		if (!LU.WidestFixupType \|\|
SE.getTypeSizeInBits(LU.WidestFixupType) <		SE.getTypeSizeInBits(LU.WidestFixupType) <
SE.getTypeSizeInBits(LF.OperandValToReplace->getType()))		SE.getTypeSizeInBits(LF.OperandValToReplace->getType()))
LU.WidestFixupType = LF.OperandValToReplace->getType();		LU.WidestFixupType = LF.OperandValToReplace->getType();

// If this is the first use of this LSRUse, give it a formula.		// If this is the first use of this LSRUse, give it a formula.
if (LU.Formulae.empty()) {		if (LU.Formulae.empty()) {
▲ Show 20 Lines • Show All 132 Lines • ▼ Show 20 Lines	else if (const SCEVUDivExpr *D = dyn_cast<SCEVUDivExpr>(S)) {
S, LSRUse::Basic, MemAccessTy());		S, LSRUse::Basic, MemAccessTy());
size_t LUIdx = P.first;		size_t LUIdx = P.first;
int64_t Offset = P.second;		int64_t Offset = P.second;
LSRUse &LU = Uses[LUIdx];		LSRUse &LU = Uses[LUIdx];
LSRFixup &LF = LU.getNewFixup();		LSRFixup &LF = LU.getNewFixup();
LF.UserInst = const_cast<Instruction *>(UserInst);		LF.UserInst = const_cast<Instruction *>(UserInst);
LF.OperandValToReplace = U;		LF.OperandValToReplace = U;
LF.Offset = Offset;		LF.Offset = Offset;
LU.AllFixupsOutsideLoop &= LF.isUseFullyOutsideLoop(L);		bool UseOutside = LF.isUseFullyOutsideLoop(L);
		LU.AllFixupsOutsideLoop &= UseOutside;
		LU.AllFixupsInsideLoop &= !UseOutside;
if (!LU.WidestFixupType \|\|		if (!LU.WidestFixupType \|\|
SE.getTypeSizeInBits(LU.WidestFixupType) <		SE.getTypeSizeInBits(LU.WidestFixupType) <
SE.getTypeSizeInBits(LF.OperandValToReplace->getType()))		SE.getTypeSizeInBits(LF.OperandValToReplace->getType()))
LU.WidestFixupType = LF.OperandValToReplace->getType();		LU.WidestFixupType = LF.OperandValToReplace->getType();
InsertSupplementalFormula(US, LU, LUIdx);		InsertSupplementalFormula(US, LU, LUIdx);
CountRegisters(LU.Formulae.back(), Uses.size() - 1);		CountRegisters(LU.Formulae.back(), Uses.size() - 1);
break;		break;
}		}
▲ Show 20 Lines • Show All 902 Lines • ▼ Show 20 Lines	for (const Formula &F : LU.Formulae) {

if (!reconcileNewOffset(LUThatHas, F.BaseOffset, /HasBaseReg=*/ false,		if (!reconcileNewOffset(LUThatHas, F.BaseOffset, /HasBaseReg=*/ false,
LU.Kind, LU.AccessTy))		LU.Kind, LU.AccessTy))
continue;		continue;

DEBUG(dbgs() << " Deleting use "; LU.print(dbgs()); dbgs() << '\n');		DEBUG(dbgs() << " Deleting use "; LU.print(dbgs()); dbgs() << '\n');

LUThatHas->AllFixupsOutsideLoop &= LU.AllFixupsOutsideLoop;		LUThatHas->AllFixupsOutsideLoop &= LU.AllFixupsOutsideLoop;
		LUThatHas->AllFixupsInsideLoop &= LU.AllFixupsInsideLoop;

// Transfer the fixups of LU to LUThatHas.		// Transfer the fixups of LU to LUThatHas.
for (LSRFixup &Fixup : LU.Fixups) {		for (LSRFixup &Fixup : LU.Fixups) {
Fixup.Offset += F.BaseOffset;		Fixup.Offset += F.BaseOffset;
LUThatHas->pushFixup(Fixup);		LUThatHas->pushFixup(Fixup);
DEBUG(dbgs() << "New fixup has offset " << Fixup.Offset << '\n');		DEBUG(dbgs() << "New fixup has offset " << Fixup.Offset << '\n');
}		}

// Delete formulae from the new use which are no longer legal.		// Delete formulae from the new use which are no longer legal.
bool Any = false;		bool Any = false;
for (size_t i = 0, e = LUThatHas->Formulae.size(); i != e; ++i) {		for (size_t i = 0, e = LUThatHas->Formulae.size(); i != e; ++i) {
Formula &F = LUThatHas->Formulae[i];		Formula &F = LUThatHas->Formulae[i];
if (!isLegalUse(TTI, LUThatHas->MinOffset, LUThatHas->MaxOffset,		if (!isLegalUse(TTI, LUThatHas->MinOffset, LUThatHas->MaxOffset,
LUThatHas->Kind, LUThatHas->AccessTy, F)) {		LUThatHas->Kind, LUThatHas->AccessTy, F)) {
DEBUG(dbgs() << " Deleting "; F.print(dbgs());		DEBUG(dbgs() << " Deleting "; F.print(dbgs());
dbgs() << '\n');		dbgs() << '\n');
▲ Show 20 Lines • Show All 1,084 Lines • Show Last 20 Lines

test/CodeGen/ARM/arm-and-tst-peephole.ll

Show All 22 Lines	tailrecurse: ; preds = %sw.bb, %entry
%scevgep5 = getelementptr i8, i8* %lsr.iv24, i32 -1		%scevgep5 = getelementptr i8, i8* %lsr.iv24, i32 -1
%tmp2 = load i8, i8* %scevgep5		%tmp2 = load i8, i8* %scevgep5
%0 = ptrtoint i8* %tmp2 to i32		%0 = ptrtoint i8* %tmp2 to i32

; ARM: ands {{r[0-9]+}}, {{r[0-9]+}}, #3		; ARM: ands {{r[0-9]+}}, {{r[0-9]+}}, #3
; ARM-NEXT: beq		; ARM-NEXT: beq

; THUMB: movs r[[R0:[0-9]+]], #3		; THUMB: movs r[[R0:[0-9]+]], #3
; THUMB-NEXT: ands r[[R0]], r		; THUMB-NEXT: mvns r[[R1:[0-9]+]], r[[R0]]
; THUMB-NEXT: cmp r[[R0]], #0		; THUMB-NEXT: ldr r[[R1]],
		; THUMB-NEXT: ands r[[R1]], r[[R0]]
		; THUMB-NEXT: cmp r[[R1]], #0
; THUMB-NEXT: beq		; THUMB-NEXT: beq

; T2: ands {{r[0-9]+}}, {{r[0-9]+}}, #3		; T2: ands {{r[0-9]+}}, {{r[0-9]+}}, #3
; T2-NEXT: beq		; T2-NEXT: beq

%and = and i32 %0, 3		%and = and i32 %0, 3
%tst = icmp eq i32 %and, 0		%tst = icmp eq i32 %and, 0
br i1 %tst, label %sw.bb, label %tailrecurse.switch		br i1 %tst, label %sw.bb, label %tailrecurse.switch
▲ Show 20 Lines • Show All 147 Lines • Show Last 20 Lines

test/Transforms/LoopStrengthReduce/X86/reject-scev-unknown.ll

				; RUN: opt -loop-reduce -S < %s \| FileCheck %s

				target datalayout = "e-m:e-i32:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				; Make sure that LSR does not try to get something out of this loop,
				; because the sinking formula depends on SCEVUnknown Phi node %tmp0.
				define i32 @test_01() {
				; CHECK-LABEL: @test_01
				; CHECK: exit:
				; CHECK-NEXT: ret i32 %tmp7
				; CHECK-NOT: lsr-iv
				entry:
				br label %loop

				exit: ; preds = %loop
				ret i32 %tmp7

				loop: ; preds = %loop, %entry
				%tmp0 = phi i32 [ 0, %entry ], [ %tmp7, %loop ]
				%indvars.iv = phi i32 [ 4, %entry ], [ %indvars.iv.next, %loop ]
				%tmp3 = add i32 %tmp0, -1
				%tmp4 = mul i32 %tmp3, %tmp3
				%tmp5 = sub i32 %tmp0, %indvars.iv
				%tmp6 = mul i32 %tmp4, %indvars.iv
				%tmp7 = add i32 %tmp5, %tmp6
				%indvars.iv.next = add nuw nsw i32 %indvars.iv, 1
				%exitcond = icmp eq i32 %indvars.iv.next, 80
				br i1 %exitcond, label %exit, label %loop
				}

				; Make sure that LSR does not try to get something out of this loop,
				; because the sinking formula depends on SCEVUnknown Load %lv.
				define i32 @test_02(i32* %p) {
				; CHECK-LABEL: @test_02
				; CHECK: exit:
				; CHECK-NEXT: ret i32 %tmp7
				; CHECK-NOT: lsr-iv
				entry:
				br label %loop

				exit: ; preds = %loop
				ret i32 %tmp7

				loop: ; preds = %loop, %entry
				%tmp0 = phi i32 [ 0, %entry ], [ %tmp7, %loop ]
				%indvars.iv = phi i32 [ 4, %entry ], [ %indvars.iv.next, %loop ]
				%lp = getelementptr inbounds i32, i32* %p, i32 %tmp0
				%lv = load i32, i32* %lp
				%tmp3 = add i32 %lv, -1
				%tmp4 = mul i32 %tmp3, %tmp3
				%tmp5 = sub i32 %lv, %indvars.iv
				%tmp6 = mul i32 %tmp4, %indvars.iv
				%tmp7 = add i32 %tmp5, %tmp6
				%indvars.iv.next = add nuw nsw i32 %indvars.iv, 1
				%exitcond = icmp eq i32 %indvars.iv.next, 80
				br i1 %exitcond, label %exit, label %loop
				}