This is an archive of the discontinued LLVM Phabricator instance.

Differential D3830

[LSR] Canonicalize reg1 + ... + regN into reg1 + ... + 1*regN
ClosedPublic

Authored by qcolombet on May 19 2014, 1:59 PM.

Download Raw Diff

Details

Reviewers

atrick

Summary

Hi,

This patch introduces a canonical representation for the formulae used in loop strength reduce.

Thanks for your review/feedbacks.

Context **

Loop strength reduce represents differently the formulae reg1 + reg2 + ... + regN and reg1 + reg2 + ... + 1*regN. Indeed, the first form keeps the list of registers reg1...regN within the BaseRegs field, whereas the second keeps the list of registers reg1...regN-1 within the BaseRegs field and regN in the ScaledReg field.

These two representations cannot live at the same time in an LSR instance (they are uniqued the same way and thus cannot be inserted together in the set of formulae) but yield different costs.
Moreover, at several places in our cost model we assume that ScaledReg == nullptr implies that only one base register is set, which is currently wrong.
For instance, an addressing mode using reg1 + reg2 will not query the target hook getScalingFactorCost, whereas reg1 + 1*reg2 will!

Proposed Solution **

This patch introduces a canonical representation for the formulae.
Basically, as soon as a formula has more that one base register, the scaled register field is used for one of them. The register put into the scaled register is preferably a loop variant.
The proposed patch refactors how the formulae are built in order to produce such representation.
This yields a more accurate, but still perfectible, cost model.

Note: This patch will change the way the code is generated for x86 since even a simple reg1 + reg2 addressing mode is more expensive than a simple reg1 access for this target. To some extend this patch may change the code for other targets as the cost model is supposedly more accurate.

Note2: We could do some more refactoring, (see the FIXME for instance) but those would induce a change in behavior whereas the goal here was to not change the behavior if possible.

Current Results **

I have benchmarked the patch on a Ivy Bridge machine fixed at 2900Mhz for both O3 and Os. On average I do not see any difference, but there are a few speed-ups.
Here are the results for the best of 10 runs for tests that run for at least 1 second. Smaller is better.

O3 *

Benchmark_ID Reference Test Expansion Percent

CINT2000/186.crafty/186.crafty 4.2485 4.2097 0.99 -1%
CINT2000/253.perlbmk/253.perlbmk 5.7467 5.5281 0.96 -4%
CINT2000/300.twolf/300.twolf 3.2226 3.2407 1.01 +1%
Polybench/stencils/fdtd-apml/fdtd-apml 1.0065 1.0123 1.01 +1%
Shootout-C++/lists 10.8331 10.8998 1.01 +1%
Shootout-C++/sieve 1.9528 1.9906 1.02 +2%
Shootout/lists 5.0692 5.1081 1.01 +1%
Shootout/matrix 1.4662 1.4505 0.99 -1%
TSVC/ControlFlow-dbl/ControlFlow-dbl 3.629 3.6646 1.01 +1%
TSVC/ControlFlow-flt/ControlFlow-flt 3.1577 3.215 1.02 +2%
TSVC/CrossingThresholds-flt/CrossingThr 2.3061 2.2619 0.98 -2%
lemon/lemon 1.151 1.141 0.99 -1%

llubenchmark/llu 3.8547 3.8213 0.99 -1%

Min (13) - - 0.96 -

Max (13) - - 1.02 -

Sum (13) 48 48 1 +0%

A.Mean (13) - - 1 +0%

G.Mean 2 (13) - - 1 +0%

Regressions:

CINT2000/300.twolf/300.twolf: The changes should not affect the performances. I have checked with IACA (Intel Architecture Code Analyzer) the few loops that are modified and the throughput and latency are identical.

I have looked at the encoding and the new encoding for this case is slightly more compact (incq uses 1-byte less encoding space than addq). Some alignment issues? Anyhow, this seems a side effect.

Polybench/stencils/fdtd-apml: Slightly different spill placement. Average is better after the refactoring but the minimum is slightly better for the old implementation, though in the noise (< 0.006s).
Shootout-C++/lists: Noise, same binary.
Shootout-C++/sieve: Noise, same binary.
Shootout/lists: Noise, same binary.
TSVC/ControlFlow-dbl/ControlFlow-dbl: The code looks slightly better: With the refactoring we got rid of one complex addressing mode in a loop. I do not know why we are observing that! Looks like a side effect too.
TSVC/ControlFlow-flt/ControlFlow-flt: Same as ControlFlow-dbl.

Os *

Benchmark_ID Reference Test Expansion Percent

7zip/7zip-benchmark 8.1713 8.1235 0.99 -1%
ASC_Sequoia/AMGmk/AMGmk 7.0611 6.9804 0.99 -1%
CFP2006/447.dealII/447.dealII 15.8169 15.294 0.97 -3%
CINT2000/175.vpr/175.vpr 2.8925 2.8674 0.99 -1%
CINT2000/197.parser/197.parser 2.2631 2.2849 1.01 +1%
CINT2000/300.twolf/300.twolf 3.2271 3.1912 0.99 -1%
CINT2006/401.bzip2/401.bzip2 2.0734 2.0838 1.01 +1%
CINT2006/456.hmmer/456.hmmer 2.5935 2.5781 0.99 -1%
CINT2006/464.h264ref/464.h264ref 12.3972 12.4701 1.01 +1%
CoyoteBench/lpbench 2.9263 2.9418 1.01 +1%
SIBsim4/SIBsim4 2.7361 2.6705 0.98 -2%
Shootout-C++/lists 10.8231 10.9064 1.01 +1%
TSVC/ControlFlow-flt/ControlFlow-flt 3.2565 3.3259 1.02 +2%
TSVC/CrossingThresholds-dbl/CrossingThr 3.4094 3.2462 0.95 -5%
TSVC/CrossingThresholds-flt/CrossingThr 2.6827 2.4896 0.93 -7%
lemon/lemon 1.1847 1.0853 0.92 -8%

mafft/pairlocalalign 24.6637 24.9703 1.01 +1%

Min (17) - - 0.92 -

Max (17) - - 1.02 -

Sum (17) 108 108 0.99 +1%

A.Mean (17) - - 0.99 -1%

G.Mean 2 (17) - - 0.99 -1%

Regressions:

CINT2000/197.parser/197.parser: Similar assembly, IACA gives the same throughput for the few loops that changed that I have checked.
CINT2006/401.bzip2/401.bzip2: The picked solutions for the loops in blocksort.c are slightly different due to the new cost. This results in different register allocation and different spill code placement. I do not see any wrong here.
CINT2006/464.h264ref/464.h264ref: Some of the loops (in dct_luma and dct_chroma for instance) avoid the cost of the 1* scaling factor but use a different comparison (i.e., increase in ImmCost). As a result, the comparison is now against 2 instead of 0. Therefore, we were previously able to remove the comparison since the addq was setting the zero flag but now we have to keep the comparison. That said, according to IACA we end up with 16 uops vs. 15 uops, but thanks to the new unscaled addressing mode, we have a throughput of 3.75 cycles vs. 3.9.
CoyoteBench/lpbench: Same as O3 - CINT2000/300.twolf/300.twolf.
Shootout-C++/lists: Noise, same binary.
TSVC/ControlFlow-flt/ControlFlow-flt: Same as O3 TSVC/ControlFlow-flt/ControlFlow-flt.
mafft/pairlocalalign: Same as CoyoteBench/lpbench.

rdar://problem/16731508

Thanks,
-Quentin

Diff Detail

Event Timeline

qcolombet updated this revision to Diff 9583.May 19 2014, 1:59 PM

qcolombet retitled this revision from to [LSR] Canonicalize reg1 + ... + regN into reg1 + ... + 1*regN.

qcolombet updated this object.

qcolombet edited the test plan for this revision. (Show Details)

qcolombet added a reviewer: atrick.

qcolombet added a subscriber: Unknown Object (MLST).

PS: Tested version was trunk r208630.

test/CodeGen/X86/avoid_complex_am.ll
25	Note: Since reg1 + <{0,+,8}> has the same cost as reg1 + <{0,+,1}> * 8 on x86, IIRC, in that example LSR chooses the second as the ImmCost is lower.
test/CodeGen/X86/masked-iv-safe.ll
8	Same here.

Great work cleaning up this aspect of LSR!

lib/Transforms/Scalar/LoopStrengthReduce.cpp
839–851	I don't understand the name isAMCompletelyFolded(TTI, LU, F) in this case. It seems to be used to mean "can fold exactly 2 registers".
1461–1463	Word wrap.

This revision is now accepted and ready to land.May 19 2014, 5:10 PM

qcolombet added inline comments.May 19 2014, 5:26 PM

lib/Transforms/Scalar/LoopStrengthReduce.cpp
839–851	I killed "can fold exactly 2 registers” (no longer needed[1]) and renamed the isLegalUse into isAMCompletelyFolded (isLegalUse has a different semantic now). So, now isAMCompletelyFolded is used for every cost related decision. The fact that it ends up in this specific position in the diff is indeed confusing but necessary because of its uses. [1] Thanks to the canonicalization, we have now the "two registers" in BaseRegs and ScaledReg. The base register is built with the successive base adds and we add the scaled reg on top of that according to whether or not it is folded or not. More specifically this code: // Determine how many (unfolded) adds we'll need inside the loop. size_t NumBaseParts = F.getNumRegs(); if (NumBaseParts > 1) // Do not count the base and a possible second register if the target // allows to fold 2 registers. NumBaseAdds += NumBaseParts - (1 + isAMCompletelyFolded(TTI, LU, F)); NumBaseAdds += (F.UnfoldedOffset != 0);
1461–1463	Good catch! Looks like clang-format does not reconcile the comments when it splits them.

Refine the comments on isAMCompletelyFolded.
Fix a think-o in the base adds computation.

Hi Andy,

Thanks again for the feedbacks and the review!

Right. I was looking at the use of this function in exactly this code. isAMCompletelyFolded seems to mean that we can use two registers in the addressing mode for free. If we have 2 baseregs + scaled reg, the code will assume we need an extra add.

Correct!

I guess “completely” folded means 1 base + 1 scaled + offset can be folded, but never anything else. That’s what I would expect, but the name is a tad misleading without any comments.

Exactly. This matches indeed the existing behavior. I have updated the comments to detail that.
What do you think?

I derived the naming from the original comment of isLegalUse and retrospectively, this is not great. If you have ideas I take them :).

Thanks,
-Quentin

The extra comments help. Thanks. LGTM.

Thanks!

Committed revision 209230.

This is a really nice improvement. Thanks, guys!

-Jim

Revision Contents

Path

Size

lib/

Transforms/

Scalar/

LoopStrengthReduce.cpp

556 lines

test/

CodeGen/

X86/

avoid_complex_am.ll

11 lines

masked-iv-safe.ll

6 lines

Diff 9630

lib/Transforms/Scalar/LoopStrengthReduce.cpp

Show First 20 Lines • Show All 232 Lines • ▼ Show 20 Lines	struct Formula {

/// Whether any complex addressing has a base register.		/// Whether any complex addressing has a base register.
bool HasBaseReg;		bool HasBaseReg;

/// The scale of any complex addressing.		/// The scale of any complex addressing.
int64_t Scale;		int64_t Scale;

/// BaseRegs - The list of "base" registers for this use. When this is		/// BaseRegs - The list of "base" registers for this use. When this is
/// non-empty,		/// non-empty. The canonical representation of a formula is
		/// 1. BaseRegs.size > 1 implies ScaledReg != NULL and
		/// 2. ScaledReg != NULL implies Scale != 1 \|\| !BaseRegs.empty().
		/// #1 enforces that the scaled register is always used when at least two
		/// registers are needed by the formula: e.g., reg1 + reg2 is reg1 + 1 * reg2.
		/// #2 enforces that 1 * reg is reg.
		/// This invariant can be temporarly broken while building a formula.
		/// However, every formula inserted into the LSRInstance must be in canonical
		/// form.
SmallVector<const SCEV *, 4> BaseRegs;		SmallVector<const SCEV *, 4> BaseRegs;

/// ScaledReg - The 'scaled' register for this use. This should be non-null		/// ScaledReg - The 'scaled' register for this use. This should be non-null
/// when Scale is not zero.		/// when Scale is not zero.
const SCEV *ScaledReg;		const SCEV *ScaledReg;

/// UnfoldedOffset - An additional constant offset which added near the		/// UnfoldedOffset - An additional constant offset which added near the
/// use. This requires a temporary register, but the offset itself can		/// use. This requires a temporary register, but the offset itself can
/// live in an add immediate field rather than a register.		/// live in an add immediate field rather than a register.
int64_t UnfoldedOffset;		int64_t UnfoldedOffset;

Formula()		Formula()
: BaseGV(nullptr), BaseOffset(0), HasBaseReg(false), Scale(0),		: BaseGV(nullptr), BaseOffset(0), HasBaseReg(false), Scale(0),
ScaledReg(nullptr), UnfoldedOffset(0) {}		ScaledReg(nullptr), UnfoldedOffset(0) {}

void InitialMatch(const SCEV S, Loop L, ScalarEvolution &SE);		void InitialMatch(const SCEV S, Loop L, ScalarEvolution &SE);

		bool isCanonical() const;

		void Canonicalize();

		bool Unscale();

size_t getNumRegs() const;		size_t getNumRegs() const;
Type *getType() const;		Type *getType() const;

void DeleteBaseReg(const SCEV *&S);		void DeleteBaseReg(const SCEV *&S);

bool referencesReg(const SCEV *S) const;		bool referencesReg(const SCEV *S) const;
bool hasRegsUsedByUsesOtherThan(size_t LUIdx,		bool hasRegsUsedByUsesOtherThan(size_t LUIdx,
const RegUseTracker &RegUses) const;		const RegUseTracker &RegUses) const;
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	if (!Good.empty()) {
HasBaseReg = true;		HasBaseReg = true;
}		}
if (!Bad.empty()) {		if (!Bad.empty()) {
const SCEV *Sum = SE.getAddExpr(Bad);		const SCEV *Sum = SE.getAddExpr(Bad);
if (!Sum->isZero())		if (!Sum->isZero())
BaseRegs.push_back(Sum);		BaseRegs.push_back(Sum);
HasBaseReg = true;		HasBaseReg = true;
}		}
		Canonicalize();
		}

		/// \brief Check whether or not this formula statisfies the canonical
		/// representation.
		/// \see Formula::BaseRegs.
		bool Formula::isCanonical() const {
		if (ScaledReg)
		return Scale != 1 \|\| !BaseRegs.empty();
		return BaseRegs.size() <= 1;
		}

		/// \brief Helper method to morph a formula into its canonical representation.
		/// \see Formula::BaseRegs.
		/// Every formula having more than one base register, must use the ScaledReg
		/// field. Otherwise, we would have to do special cases everywhere in LSR
		/// to treat reg1 + reg2 + ... the same way as reg1 + 1*reg2 + ...
		/// On the other hand, 1*reg should be canonicalized into reg.
		void Formula::Canonicalize() {
		if (isCanonical())
		return;
		// So far we did not need this case. This is easy to implement but it is
		// useless to maintain dead code. Beside it could hurt compile time.
		assert(!BaseRegs.empty() && "1*reg => reg, should not be needed.");
		// Keep the invariant sum in BaseRegs and one of the variant sum in ScaledReg.
		ScaledReg = BaseRegs.back();
		BaseRegs.pop_back();
		Scale = 1;
		size_t BaseRegsSize = BaseRegs.size();
		size_t Try = 0;
		// If ScaledReg is an invariant, try to find a variant expression.
		while (Try < BaseRegsSize && !isa<SCEVAddRecExpr>(ScaledReg))
		std::swap(ScaledReg, BaseRegs[Try++]);
		}

		/// \brief Get rid of the scale in the formula.
		/// In other words, this method morphes reg1 + 1*reg2 into reg1 + reg2.
		/// \return true if it was possible to get rid of the scale, false otherwise.
		/// \note After this operation the formula may not be in the canonical form.
		bool Formula::Unscale() {
		if (Scale != 1)
		return false;
		Scale = 0;
		BaseRegs.push_back(ScaledReg);
		ScaledReg = nullptr;
		return true;
}		}

/// getNumRegs - Return the total number of register operands used by this		/// getNumRegs - Return the total number of register operands used by this
/// formula. This does not include register uses implied by non-constant		/// formula. This does not include register uses implied by non-constant
/// addrec strides.		/// addrec strides.
size_t Formula::getNumRegs() const {		size_t Formula::getNumRegs() const {
return !!ScaledReg + BaseRegs.size();		return !!ScaledReg + BaseRegs.size();
}		}
▲ Show 20 Lines • Show All 414 Lines • ▼ Show 20 Lines	DeleteTriviallyDeadInstructions(SmallVectorImpl<WeakVH> &DeadInsts) {
}		}

return Changed;		return Changed;
}		}

namespace {		namespace {
class LSRUse;		class LSRUse;
}		}
// Check if it is legal to fold 2 base registers.
static bool isLegal2RegAMUse(const TargetTransformInfo &TTI, const LSRUse &LU,		/// \brief Check if the addressing mode defined by \p F is completely
const Formula &F);		/// folded in \p LU at isel time.
		/// This includes address-mode folding and special icmp tricks.
		/// This function returns true if \p LU can accommodate what \p F
		/// defines and up to 1 base + 1 scaled + offset.
		/// In other words, if \p F has several base registers, this function may
		/// still return true. Therefore, users still need to account for
		/// additional base registers and/or unfolded offsets to derive an
		/// accurate cost model.
		static bool isAMCompletelyFolded(const TargetTransformInfo &TTI,
		const LSRUse &LU, const Formula &F);
// Get the cost of the scaling factor used in F for LU.		// Get the cost of the scaling factor used in F for LU.
		atrickUnsubmitted Not Done Reply Inline Actions I don't understand the name isAMCompletelyFolded(TTI, LU, F) in this case. It seems to be used to mean "can fold exactly 2 registers". atrick: I don't understand the name isAMCompletelyFolded(TTI, LU, F) in this case. It seems to be used…
		qcolombetAuthorUnsubmitted Not Done Reply Inline Actions I killed "can fold exactly 2 registers” (no longer needed[1]) and renamed the isLegalUse into isAMCompletelyFolded (isLegalUse has a different semantic now). So, now isAMCompletelyFolded is used for every cost related decision. The fact that it ends up in this specific position in the diff is indeed confusing but necessary because of its uses. [1] Thanks to the canonicalization, we have now the "two registers" in BaseRegs and ScaledReg. The base register is built with the successive base adds and we add the scaled reg on top of that according to whether or not it is folded or not. More specifically this code: // Determine how many (unfolded) adds we'll need inside the loop. size_t NumBaseParts = F.getNumRegs(); if (NumBaseParts > 1) // Do not count the base and a possible second register if the target // allows to fold 2 registers. NumBaseAdds += NumBaseParts - (1 + isAMCompletelyFolded(TTI, LU, F)); NumBaseAdds += (F.UnfoldedOffset != 0); qcolombet: I killed "can fold exactly 2 registers” (no longer needed[1]) and renamed the isLegalUse into…
static unsigned getScalingFactorCost(const TargetTransformInfo &TTI,		static unsigned getScalingFactorCost(const TargetTransformInfo &TTI,
const LSRUse &LU, const Formula &F);		const LSRUse &LU, const Formula &F);

namespace {		namespace {

/// Cost - This class is used to measure and compare candidate formulae.		/// Cost - This class is used to measure and compare candidate formulae.
class Cost {		class Cost {
/// TODO: Some of these could be merged. Also, a lexical ordering		/// TODO: Some of these could be merged. Also, a lexical ordering
▲ Show 20 Lines • Show All 126 Lines • ▼ Show 20 Lines	void Cost::RateFormula(const TargetTransformInfo &TTI,
const Formula &F,		const Formula &F,
SmallPtrSet<const SCEV *, 16> &Regs,		SmallPtrSet<const SCEV *, 16> &Regs,
const DenseSet<const SCEV *> &VisitedRegs,		const DenseSet<const SCEV *> &VisitedRegs,
const Loop *L,		const Loop *L,
const SmallVectorImpl<int64_t> &Offsets,		const SmallVectorImpl<int64_t> &Offsets,
ScalarEvolution &SE, DominatorTree &DT,		ScalarEvolution &SE, DominatorTree &DT,
const LSRUse &LU,		const LSRUse &LU,
SmallPtrSet<const SCEV , 16> LoserRegs) {		SmallPtrSet<const SCEV , 16> LoserRegs) {
		assert(F.isCanonical() && "Cost is accurate only for canonical formula");
// Tally up the registers.		// Tally up the registers.
if (const SCEV *ScaledReg = F.ScaledReg) {		if (const SCEV *ScaledReg = F.ScaledReg) {
if (VisitedRegs.count(ScaledReg)) {		if (VisitedRegs.count(ScaledReg)) {
Lose();		Lose();
return;		return;
}		}
RatePrimaryRegister(ScaledReg, Regs, L, SE, DT, LoserRegs);		RatePrimaryRegister(ScaledReg, Regs, L, SE, DT, LoserRegs);
if (isLoser())		if (isLoser())
return;		return;
}		}
for (SmallVectorImpl<const SCEV *>::const_iterator I = F.BaseRegs.begin(),		for (SmallVectorImpl<const SCEV *>::const_iterator I = F.BaseRegs.begin(),
E = F.BaseRegs.end(); I != E; ++I) {		E = F.BaseRegs.end(); I != E; ++I) {
const SCEV BaseReg = I;		const SCEV BaseReg = I;
if (VisitedRegs.count(BaseReg)) {		if (VisitedRegs.count(BaseReg)) {
Lose();		Lose();
return;		return;
}		}
RatePrimaryRegister(BaseReg, Regs, L, SE, DT, LoserRegs);		RatePrimaryRegister(BaseReg, Regs, L, SE, DT, LoserRegs);
if (isLoser())		if (isLoser())
return;		return;
}		}

// Determine how many (unfolded) adds we'll need inside the loop.		// Determine how many (unfolded) adds we'll need inside the loop.
size_t NumBaseParts = F.BaseRegs.size() + (F.UnfoldedOffset != 0);		size_t NumBaseParts = F.getNumRegs();
if (NumBaseParts > 1)		if (NumBaseParts > 1)
// Do not count the base and a possible second register if the target		// Do not count the base and a possible second register if the target
// allows to fold 2 registers.		// allows to fold 2 registers.
NumBaseAdds += NumBaseParts - (1 + isLegal2RegAMUse(TTI, LU, F));		NumBaseAdds +=
		NumBaseParts - (1 + (F.Scale && isAMCompletelyFolded(TTI, LU, F)));
		NumBaseAdds += (F.UnfoldedOffset != 0);

// Accumulate non-free scaling amounts.		// Accumulate non-free scaling amounts.
ScaleCost += getScalingFactorCost(TTI, LU, F);		ScaleCost += getScalingFactorCost(TTI, LU, F);

// Tally up the non-zero immediates.		// Tally up the non-zero immediates.
for (SmallVectorImpl<int64_t>::const_iterator I = Offsets.begin(),		for (SmallVectorImpl<int64_t>::const_iterator I = Offsets.begin(),
E = Offsets.end(); I != E; ++I) {		E = Offsets.end(); I != E; ++I) {
int64_t Offset = (uint64_t)*I + F.BaseOffset;		int64_t Offset = (uint64_t)*I + F.BaseOffset;
▲ Show 20 Lines • Show All 244 Lines • ▼ Show 20 Lines	bool LSRUse::HasFormulaWithSameRegs(const Formula &F) const {
if (F.ScaledReg) Key.push_back(F.ScaledReg);		if (F.ScaledReg) Key.push_back(F.ScaledReg);
// Unstable sort by host order ok, because this is only used for uniquifying.		// Unstable sort by host order ok, because this is only used for uniquifying.
std::sort(Key.begin(), Key.end());		std::sort(Key.begin(), Key.end());
return Uniquifier.count(Key);		return Uniquifier.count(Key);
}		}

/// InsertFormula - If the given formula has not yet been inserted, add it to		/// InsertFormula - If the given formula has not yet been inserted, add it to
/// the list, and return true. Return false otherwise.		/// the list, and return true. Return false otherwise.
		/// The formula must be in canonical form.
bool LSRUse::InsertFormula(const Formula &F) {		bool LSRUse::InsertFormula(const Formula &F) {
		assert(F.isCanonical() && "Invalid canonical representation");

if (!Formulae.empty() && RigidFormula)		if (!Formulae.empty() && RigidFormula)
return false;		return false;

SmallVector<const SCEV *, 4> Key = F.BaseRegs;		SmallVector<const SCEV *, 4> Key = F.BaseRegs;
if (F.ScaledReg) Key.push_back(F.ScaledReg);		if (F.ScaledReg) Key.push_back(F.ScaledReg);
// Unstable sort by host order ok, because this is only used for uniquifying.		// Unstable sort by host order ok, because this is only used for uniquifying.
std::sort(Key.begin(), Key.end());		std::sort(Key.begin(), Key.end());

Show All 9 Lines	for (SmallVectorImpl<const SCEV *>::const_iterator I =
assert(!(*I)->isZero() && "Zero allocated in a base register!");		assert(!(*I)->isZero() && "Zero allocated in a base register!");
#endif		#endif

// Add the formula to the list.		// Add the formula to the list.
Formulae.push_back(F);		Formulae.push_back(F);

// Record registers now being used by this use.		// Record registers now being used by this use.
Regs.insert(F.BaseRegs.begin(), F.BaseRegs.end());		Regs.insert(F.BaseRegs.begin(), F.BaseRegs.end());
		if (F.ScaledReg)
		Regs.insert(F.ScaledReg);

return true;		return true;
}		}

/// DeleteFormula - Remove the given formula from this use's list.		/// DeleteFormula - Remove the given formula from this use's list.
void LSRUse::DeleteFormula(Formula &F) {		void LSRUse::DeleteFormula(Formula &F) {
if (&F != &Formulae.back())		if (&F != &Formulae.back())
std::swap(F, Formulae.back());		std::swap(F, Formulae.back());
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
}		}

#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)		#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)
void LSRUse::dump() const {		void LSRUse::dump() const {
print(errs()); errs() << '\n';		print(errs()); errs() << '\n';
}		}
#endif		#endif

/// isLegalUse - Test whether the use described by AM is "legal", meaning it can		static bool isAMCompletelyFolded(const TargetTransformInfo &TTI,
/// be completely folded into the user instruction at isel time. This includes		LSRUse::KindType Kind, Type *AccessTy,
/// address-mode folding and special icmp tricks.		GlobalValue *BaseGV, int64_t BaseOffset,
static bool isLegalUse(const TargetTransformInfo &TTI, LSRUse::KindType Kind,
Type AccessTy, GlobalValue BaseGV, int64_t BaseOffset,
bool HasBaseReg, int64_t Scale) {		bool HasBaseReg, int64_t Scale) {
switch (Kind) {		switch (Kind) {
case LSRUse::Address:		case LSRUse::Address:
return TTI.isLegalAddressingMode(AccessTy, BaseGV, BaseOffset, HasBaseReg, Scale);		return TTI.isLegalAddressingMode(AccessTy, BaseGV, BaseOffset, HasBaseReg, Scale);

// Otherwise, just guess that reg+reg addressing is legal.		// Otherwise, just guess that reg+reg addressing is legal.
//return ;		//return ;

case LSRUse::ICmpZero:		case LSRUse::ICmpZero:
Show All 34 Lines	static bool isAMCompletelyFolded(const TargetTransformInfo &TTI,
case LSRUse::Special:		case LSRUse::Special:
// Special case Basic to handle -1 scales.		// Special case Basic to handle -1 scales.
return !BaseGV && (Scale == 0 \|\| Scale == -1) && BaseOffset == 0;		return !BaseGV && (Scale == 0 \|\| Scale == -1) && BaseOffset == 0;
}		}

llvm_unreachable("Invalid LSRUse Kind!");		llvm_unreachable("Invalid LSRUse Kind!");
}		}

static bool isLegalUse(const TargetTransformInfo &TTI, int64_t MinOffset,		static bool isAMCompletelyFolded(const TargetTransformInfo &TTI,
int64_t MaxOffset, LSRUse::KindType Kind, Type *AccessTy,		int64_t MinOffset, int64_t MaxOffset,
GlobalValue *BaseGV, int64_t BaseOffset, bool HasBaseReg,		LSRUse::KindType Kind, Type *AccessTy,
int64_t Scale) {		GlobalValue *BaseGV, int64_t BaseOffset,
		bool HasBaseReg, int64_t Scale) {
// Check for overflow.		// Check for overflow.
if (((int64_t)((uint64_t)BaseOffset + MinOffset) > BaseOffset) !=		if (((int64_t)((uint64_t)BaseOffset + MinOffset) > BaseOffset) !=
(MinOffset > 0))		(MinOffset > 0))
return false;		return false;
MinOffset = (uint64_t)BaseOffset + MinOffset;		MinOffset = (uint64_t)BaseOffset + MinOffset;
if (((int64_t)((uint64_t)BaseOffset + MaxOffset) > BaseOffset) !=		if (((int64_t)((uint64_t)BaseOffset + MaxOffset) > BaseOffset) !=
(MaxOffset > 0))		(MaxOffset > 0))
return false;		return false;
MaxOffset = (uint64_t)BaseOffset + MaxOffset;		MaxOffset = (uint64_t)BaseOffset + MaxOffset;

return isLegalUse(TTI, Kind, AccessTy, BaseGV, MinOffset, HasBaseReg,		return isAMCompletelyFolded(TTI, Kind, AccessTy, BaseGV, MinOffset,
Scale) &&		HasBaseReg, Scale) &&
isLegalUse(TTI, Kind, AccessTy, BaseGV, MaxOffset, HasBaseReg, Scale);		isAMCompletelyFolded(TTI, Kind, AccessTy, BaseGV, MaxOffset,
		HasBaseReg, Scale);
		}

		static bool isAMCompletelyFolded(const TargetTransformInfo &TTI,
		int64_t MinOffset, int64_t MaxOffset,
		LSRUse::KindType Kind, Type *AccessTy,
		const Formula &F) {
		// For the purpose of isAMCompletelyFolded either having a canonical formula
		// or a scale not equal to zero is correct.
		// Problems may arise from non canonical formulae having a scale == 0.
		atrickUnsubmitted Not Done Reply Inline Actions Word wrap. atrick: Word wrap.
		qcolombetAuthorUnsubmitted Not Done Reply Inline Actions Good catch! Looks like clang-format does not reconcile the comments when it splits them. qcolombet: Good catch! Looks like clang-format does not reconcile the comments when it splits them.
		// Strictly speaking it would best to just rely on canonical formulae.
		// However, when we generate the scaled formulae, we first check that the
		// scaling factor is profitable before computing the actual ScaledReg for
		// compile time sake.
		assert((F.isCanonical() \|\| F.Scale != 0));
		return isAMCompletelyFolded(TTI, MinOffset, MaxOffset, Kind, AccessTy,
		F.BaseGV, F.BaseOffset, F.HasBaseReg, F.Scale);
		}

		/// isLegalUse - Test whether we know how to expand the current formula.
		static bool isLegalUse(const TargetTransformInfo &TTI, int64_t MinOffset,
		int64_t MaxOffset, LSRUse::KindType Kind, Type *AccessTy,
		GlobalValue *BaseGV, int64_t BaseOffset, bool HasBaseReg,
		int64_t Scale) {
		// We know how to expand completely foldable formulae.
		return isAMCompletelyFolded(TTI, MinOffset, MaxOffset, Kind, AccessTy, BaseGV,
		BaseOffset, HasBaseReg, Scale) \|\|
		// Or formulae that use a base register produced by a sum of base
		// registers.
		(Scale == 1 &&
		isAMCompletelyFolded(TTI, MinOffset, MaxOffset, Kind, AccessTy,
		BaseGV, BaseOffset, true, 0));
}		}

static bool isLegalUse(const TargetTransformInfo &TTI, int64_t MinOffset,		static bool isLegalUse(const TargetTransformInfo &TTI, int64_t MinOffset,
int64_t MaxOffset, LSRUse::KindType Kind, Type *AccessTy,		int64_t MaxOffset, LSRUse::KindType Kind, Type *AccessTy,
const Formula &F) {		const Formula &F) {
return isLegalUse(TTI, MinOffset, MaxOffset, Kind, AccessTy, F.BaseGV,		return isLegalUse(TTI, MinOffset, MaxOffset, Kind, AccessTy, F.BaseGV,
F.BaseOffset, F.HasBaseReg, F.Scale);		F.BaseOffset, F.HasBaseReg, F.Scale);
}		}

static bool isLegal2RegAMUse(const TargetTransformInfo &TTI, const LSRUse &LU,		static bool isAMCompletelyFolded(const TargetTransformInfo &TTI,
const Formula &F) {		const LSRUse &LU, const Formula &F) {
// If F is used as an Addressing Mode, it may fold one Base plus one		return isAMCompletelyFolded(TTI, LU.MinOffset, LU.MaxOffset, LU.Kind,
// scaled register. If the scaled register is nil, do as if another		LU.AccessTy, F.BaseGV, F.BaseOffset, F.HasBaseReg,
// element of the base regs is a 1-scaled register.		F.Scale);
// This is possible if BaseRegs has at least 2 registers.

// If this is not an address calculation, this is not an addressing mode
// use.
if (LU.Kind != LSRUse::Address)
return false;

// F is already scaled.
if (F.Scale != 0)
return false;

// We need to keep one register for the base and one to scale.
if (F.BaseRegs.size() < 2)
return false;

return isLegalUse(TTI, LU.MinOffset, LU.MaxOffset, LU.Kind, LU.AccessTy,
F.BaseGV, F.BaseOffset, F.HasBaseReg, 1);
}		}

static unsigned getScalingFactorCost(const TargetTransformInfo &TTI,		static unsigned getScalingFactorCost(const TargetTransformInfo &TTI,
const LSRUse &LU, const Formula &F) {		const LSRUse &LU, const Formula &F) {
if (!F.Scale)		if (!F.Scale)
return 0;		return 0;
assert(isLegalUse(TTI, LU.MinOffset, LU.MaxOffset, LU.Kind,
LU.AccessTy, F) && "Illegal formula in use.");		// If the use is not completely folded in that instruction, we will have to
		// pay an extra cost only for scale != 1.
		if (!isAMCompletelyFolded(TTI, LU.MinOffset, LU.MaxOffset, LU.Kind,
		LU.AccessTy, F))
		return F.Scale != 1;

switch (LU.Kind) {		switch (LU.Kind) {
case LSRUse::Address: {		case LSRUse::Address: {
// Check the scaling factor cost with both the min and max offsets.		// Check the scaling factor cost with both the min and max offsets.
int ScaleCostMinOffset =		int ScaleCostMinOffset =
TTI.getScalingFactorCost(LU.AccessTy, F.BaseGV,		TTI.getScalingFactorCost(LU.AccessTy, F.BaseGV,
F.BaseOffset + LU.MinOffset,		F.BaseOffset + LU.MinOffset,
F.HasBaseReg, F.Scale);		F.HasBaseReg, F.Scale);
int ScaleCostMaxOffset =		int ScaleCostMaxOffset =
TTI.getScalingFactorCost(LU.AccessTy, F.BaseGV,		TTI.getScalingFactorCost(LU.AccessTy, F.BaseGV,
F.BaseOffset + LU.MaxOffset,		F.BaseOffset + LU.MaxOffset,
F.HasBaseReg, F.Scale);		F.HasBaseReg, F.Scale);

assert(ScaleCostMinOffset >= 0 && ScaleCostMaxOffset >= 0 &&		assert(ScaleCostMinOffset >= 0 && ScaleCostMaxOffset >= 0 &&
"Legal addressing mode has an illegal cost!");		"Legal addressing mode has an illegal cost!");
return std::max(ScaleCostMinOffset, ScaleCostMaxOffset);		return std::max(ScaleCostMinOffset, ScaleCostMaxOffset);
}		}
case LSRUse::ICmpZero:		case LSRUse::ICmpZero:
// ICmpZero BaseReg + -1*ScaleReg => ICmp BaseReg, ScaleReg.
// Therefore, return 0 in case F.Scale == -1.
return F.Scale != -1;

case LSRUse::Basic:		case LSRUse::Basic:
case LSRUse::Special:		case LSRUse::Special:
		// The use is completely folded, i.e., everything is folded into the
		// instruction.
return 0;		return 0;
}		}

llvm_unreachable("Invalid LSRUse Kind!");		llvm_unreachable("Invalid LSRUse Kind!");
}		}

static bool isAlwaysFoldable(const TargetTransformInfo &TTI,		static bool isAlwaysFoldable(const TargetTransformInfo &TTI,
LSRUse::KindType Kind, Type *AccessTy,		LSRUse::KindType Kind, Type *AccessTy,
GlobalValue *BaseGV, int64_t BaseOffset,		GlobalValue *BaseGV, int64_t BaseOffset,
bool HasBaseReg) {		bool HasBaseReg) {
// Fast-path: zero is always foldable.		// Fast-path: zero is always foldable.
if (BaseOffset == 0 && !BaseGV) return true;		if (BaseOffset == 0 && !BaseGV) return true;

// Conservatively, create an address with an immediate and a		// Conservatively, create an address with an immediate and a
// base and a scale.		// base and a scale.
int64_t Scale = Kind == LSRUse::ICmpZero ? -1 : 1;		int64_t Scale = Kind == LSRUse::ICmpZero ? -1 : 1;

// Canonicalize a scale of 1 to a base register if the formula doesn't		// Canonicalize a scale of 1 to a base register if the formula doesn't
// already have a base register.		// already have a base register.
if (!HasBaseReg && Scale == 1) {		if (!HasBaseReg && Scale == 1) {
Scale = 0;		Scale = 0;
HasBaseReg = true;		HasBaseReg = true;
}		}

return isLegalUse(TTI, Kind, AccessTy, BaseGV, BaseOffset, HasBaseReg, Scale);		return isAMCompletelyFolded(TTI, Kind, AccessTy, BaseGV, BaseOffset,
		HasBaseReg, Scale);
}		}

static bool isAlwaysFoldable(const TargetTransformInfo &TTI,		static bool isAlwaysFoldable(const TargetTransformInfo &TTI,
ScalarEvolution &SE, int64_t MinOffset,		ScalarEvolution &SE, int64_t MinOffset,
int64_t MaxOffset, LSRUse::KindType Kind,		int64_t MaxOffset, LSRUse::KindType Kind,
Type AccessTy, const SCEV S, bool HasBaseReg) {		Type AccessTy, const SCEV S, bool HasBaseReg) {
// Fast-path: zero is always foldable.		// Fast-path: zero is always foldable.
if (S->isZero()) return true;		if (S->isZero()) return true;

// Conservatively, create an address with an immediate and a		// Conservatively, create an address with an immediate and a
// base and a scale.		// base and a scale.
int64_t BaseOffset = ExtractImmediate(S, SE);		int64_t BaseOffset = ExtractImmediate(S, SE);
GlobalValue *BaseGV = ExtractSymbol(S, SE);		GlobalValue *BaseGV = ExtractSymbol(S, SE);

// If there's anything else involved, it's not foldable.		// If there's anything else involved, it's not foldable.
if (!S->isZero()) return false;		if (!S->isZero()) return false;

// Fast-path: zero is always foldable.		// Fast-path: zero is always foldable.
if (BaseOffset == 0 && !BaseGV) return true;		if (BaseOffset == 0 && !BaseGV) return true;

// Conservatively, create an address with an immediate and a		// Conservatively, create an address with an immediate and a
// base and a scale.		// base and a scale.
int64_t Scale = Kind == LSRUse::ICmpZero ? -1 : 1;		int64_t Scale = Kind == LSRUse::ICmpZero ? -1 : 1;

return isLegalUse(TTI, MinOffset, MaxOffset, Kind, AccessTy, BaseGV,		return isAMCompletelyFolded(TTI, MinOffset, MaxOffset, Kind, AccessTy, BaseGV,
BaseOffset, HasBaseReg, Scale);		BaseOffset, HasBaseReg, Scale);
}		}

namespace {		namespace {

/// IVInc - An individual increment in a Chain of IV increments.		/// IVInc - An individual increment in a Chain of IV increments.
/// Relate an IV user to an expression that computes the IV it uses from the IV		/// Relate an IV user to an expression that computes the IV it uses from the IV
/// used by the previous link in the Chain.		/// used by the previous link in the Chain.
///		///
▲ Show 20 Lines • Show All 139 Lines • ▼ Show 20 Lines	class LSRInstance {
void InsertSupplementalFormula(const SCEV *S, LSRUse &LU, size_t LUIdx);		void InsertSupplementalFormula(const SCEV *S, LSRUse &LU, size_t LUIdx);
void CountRegisters(const Formula &F, size_t LUIdx);		void CountRegisters(const Formula &F, size_t LUIdx);
bool InsertFormula(LSRUse &LU, unsigned LUIdx, const Formula &F);		bool InsertFormula(LSRUse &LU, unsigned LUIdx, const Formula &F);

void CollectLoopInvariantFixupsAndFormulae();		void CollectLoopInvariantFixupsAndFormulae();

void GenerateReassociations(LSRUse &LU, unsigned LUIdx, Formula Base,		void GenerateReassociations(LSRUse &LU, unsigned LUIdx, Formula Base,
unsigned Depth = 0);		unsigned Depth = 0);

		void GenerateReassociationsImpl(LSRUse &LU, unsigned LUIdx,
		const Formula &Base, unsigned Depth,
		size_t Idx, bool IsScaledReg = false);
void GenerateCombinations(LSRUse &LU, unsigned LUIdx, Formula Base);		void GenerateCombinations(LSRUse &LU, unsigned LUIdx, Formula Base);
		void GenerateSymbolicOffsetsImpl(LSRUse &LU, unsigned LUIdx,
		const Formula &Base, size_t Idx,
		bool IsScaledReg = false);
void GenerateSymbolicOffsets(LSRUse &LU, unsigned LUIdx, Formula Base);		void GenerateSymbolicOffsets(LSRUse &LU, unsigned LUIdx, Formula Base);
		void GenerateConstantOffsetsImpl(LSRUse &LU, unsigned LUIdx,
		const Formula &Base,
		const SmallVectorImpl<int64_t> &Worklist,
		size_t Idx, bool IsScaledReg = false);
void GenerateConstantOffsets(LSRUse &LU, unsigned LUIdx, Formula Base);		void GenerateConstantOffsets(LSRUse &LU, unsigned LUIdx, Formula Base);
void GenerateICmpZeroScales(LSRUse &LU, unsigned LUIdx, Formula Base);		void GenerateICmpZeroScales(LSRUse &LU, unsigned LUIdx, Formula Base);
void GenerateScales(LSRUse &LU, unsigned LUIdx, Formula Base);		void GenerateScales(LSRUse &LU, unsigned LUIdx, Formula Base);
void GenerateTruncates(LSRUse &LU, unsigned LUIdx, Formula Base);		void GenerateTruncates(LSRUse &LU, unsigned LUIdx, Formula Base);
void GenerateCrossUseConstantOffsets();		void GenerateCrossUseConstantOffsets();
void GenerateAllReuseFormulae();		void GenerateAllReuseFormulae();

void FilterOutUndesirableDedicatedRegisters();		void FilterOutUndesirableDedicatedRegisters();
▲ Show 20 Lines • Show All 486 Lines • ▼ Show 20 Lines	LSRInstance::reconcileNewOffset(LSRUse &LU, int64_t NewOffset, bool HasBaseReg,
int64_t NewMaxOffset = LU.MaxOffset;		int64_t NewMaxOffset = LU.MaxOffset;
Type *NewAccessTy = AccessTy;		Type *NewAccessTy = AccessTy;

// Check for a mismatched kind. It's tempting to collapse mismatched kinds to		// Check for a mismatched kind. It's tempting to collapse mismatched kinds to
// something conservative, however this can pessimize in the case that one of		// something conservative, however this can pessimize in the case that one of
// the uses will have all its uses outside the loop, for example.		// the uses will have all its uses outside the loop, for example.
if (LU.Kind != Kind)		if (LU.Kind != Kind)
return false;		return false;

		// Check for a mismatched access type, and fall back conservatively as needed.
		// TODO: Be less conservative when the type is similar and can use the same
		// addressing modes.
		if (Kind == LSRUse::Address && AccessTy != LU.AccessTy)
		NewAccessTy = Type::getVoidTy(AccessTy->getContext());

// Conservatively assume HasBaseReg is true for now.		// Conservatively assume HasBaseReg is true for now.
if (NewOffset < LU.MinOffset) {		if (NewOffset < LU.MinOffset) {
if (!isAlwaysFoldable(TTI, Kind, AccessTy, /BaseGV=/ nullptr,		if (!isAlwaysFoldable(TTI, Kind, NewAccessTy, /BaseGV=/nullptr,
LU.MaxOffset - NewOffset, HasBaseReg))		LU.MaxOffset - NewOffset, HasBaseReg))
return false;		return false;
NewMinOffset = NewOffset;		NewMinOffset = NewOffset;
} else if (NewOffset > LU.MaxOffset) {		} else if (NewOffset > LU.MaxOffset) {
if (!isAlwaysFoldable(TTI, Kind, AccessTy, /BaseGV=/ nullptr,		if (!isAlwaysFoldable(TTI, Kind, NewAccessTy, /BaseGV=/nullptr,
NewOffset - LU.MinOffset, HasBaseReg))		NewOffset - LU.MinOffset, HasBaseReg))
return false;		return false;
NewMaxOffset = NewOffset;		NewMaxOffset = NewOffset;
}		}
// Check for a mismatched access type, and fall back conservatively as needed.
// TODO: Be less conservative when the type is similar and can use the same
// addressing modes.
if (Kind == LSRUse::Address && AccessTy != LU.AccessTy)
NewAccessTy = Type::getVoidTy(AccessTy->getContext());

// Update the use.		// Update the use.
LU.MinOffset = NewMinOffset;		LU.MinOffset = NewMinOffset;
LU.MaxOffset = NewMaxOffset;		LU.MaxOffset = NewMaxOffset;
LU.AccessTy = NewAccessTy;		LU.AccessTy = NewAccessTy;
if (NewOffset != LU.Offsets.back())		if (NewOffset != LU.Offsets.back())
LU.Offsets.push_back(NewOffset);		LU.Offsets.push_back(NewOffset);
return true;		return true;
▲ Show 20 Lines • Show All 813 Lines • ▼ Show 20 Lines	void LSRInstance::CountRegisters(const Formula &F, size_t LUIdx) {
for (SmallVectorImpl<const SCEV *>::const_iterator I = F.BaseRegs.begin(),		for (SmallVectorImpl<const SCEV *>::const_iterator I = F.BaseRegs.begin(),
E = F.BaseRegs.end(); I != E; ++I)		E = F.BaseRegs.end(); I != E; ++I)
RegUses.CountRegister(*I, LUIdx);		RegUses.CountRegister(*I, LUIdx);
}		}

/// InsertFormula - If the given formula has not yet been inserted, add it to		/// InsertFormula - If the given formula has not yet been inserted, add it to
/// the list, and return true. Return false otherwise.		/// the list, and return true. Return false otherwise.
bool LSRInstance::InsertFormula(LSRUse &LU, unsigned LUIdx, const Formula &F) {		bool LSRInstance::InsertFormula(LSRUse &LU, unsigned LUIdx, const Formula &F) {
		// Do not insert formula that we will not be able to expand.
		assert(isLegalUse(TTI, LU.MinOffset, LU.MaxOffset, LU.Kind, LU.AccessTy, F) &&
		"Formula is illegal");
if (!LU.InsertFormula(F))		if (!LU.InsertFormula(F))
return false;		return false;

CountRegisters(F, LUIdx);		CountRegisters(F, LUIdx);
return true;		return true;
}		}

/// CollectLoopInvariantFixupsAndFormulae - Check for other uses of		/// CollectLoopInvariantFixupsAndFormulae - Check for other uses of
▲ Show 20 Lines • Show All 139 Lines • ▼ Show 20 Lines	if (const SCEVConstant *Op0 =
if (Remainder)		if (Remainder)
Ops.push_back(SE.getMulExpr(C, Remainder));		Ops.push_back(SE.getMulExpr(C, Remainder));
return nullptr;		return nullptr;
}		}
}		}
return S;		return S;
}		}

/// GenerateReassociations - Split out subexpressions from adds and the bases of		/// \brief Helper function for LSRInstance::GenerateReassociations.
/// addrecs.		void LSRInstance::GenerateReassociationsImpl(LSRUse &LU, unsigned LUIdx,
void LSRInstance::GenerateReassociations(LSRUse &LU, unsigned LUIdx,		const Formula &Base,
Formula Base,		unsigned Depth, size_t Idx,
unsigned Depth) {		bool IsScaledReg) {
// Arbitrarily cap recursion to protect compile time.		const SCEV *BaseReg = IsScaledReg ? Base.ScaledReg : Base.BaseRegs[Idx];
if (Depth >= 3) return;

for (size_t i = 0, e = Base.BaseRegs.size(); i != e; ++i) {
const SCEV *BaseReg = Base.BaseRegs[i];

SmallVector<const SCEV *, 8> AddOps;		SmallVector<const SCEV *, 8> AddOps;
const SCEV *Remainder = CollectSubexprs(BaseReg, nullptr, AddOps, L, SE);		const SCEV *Remainder = CollectSubexprs(BaseReg, nullptr, AddOps, L, SE);
if (Remainder)		if (Remainder)
AddOps.push_back(Remainder);		AddOps.push_back(Remainder);

if (AddOps.size() == 1) continue;		if (AddOps.size() == 1)
		return;

for (SmallVectorImpl<const SCEV *>::const_iterator J = AddOps.begin(),		for (SmallVectorImpl<const SCEV *>::const_iterator J = AddOps.begin(),
JE = AddOps.end(); J != JE; ++J) {		JE = AddOps.end();
		J != JE; ++J) {

// Loop-variant "unknown" values are uninteresting; we won't be able to		// Loop-variant "unknown" values are uninteresting; we won't be able to
// do anything meaningful with them.		// do anything meaningful with them.
if (isa<SCEVUnknown>(J) && !SE.isLoopInvariant(J, L))		if (isa<SCEVUnknown>(J) && !SE.isLoopInvariant(J, L))
continue;		continue;

// Don't pull a constant into a register if the constant could be folded		// Don't pull a constant into a register if the constant could be folded
// into an immediate field.		// into an immediate field.
if (isAlwaysFoldable(TTI, SE, LU.MinOffset, LU.MaxOffset, LU.Kind,		if (isAlwaysFoldable(TTI, SE, LU.MinOffset, LU.MaxOffset, LU.Kind,
LU.AccessTy, *J, Base.getNumRegs() > 1))		LU.AccessTy, *J, Base.getNumRegs() > 1))
continue;		continue;

// Collect all operands except *J.		// Collect all operands except *J.
SmallVector<const SCEV *, 8> InnerAddOps(		SmallVector<const SCEV *, 8> InnerAddOps(
((const SmallVector<const SCEV *, 8> &)AddOps).begin(), J);		((const SmallVector<const SCEV *, 8> &)AddOps).begin(), J);
InnerAddOps.append(std::next(J),		InnerAddOps.append(std::next(J),
((const SmallVector<const SCEV *, 8> &)AddOps).end());		((const SmallVector<const SCEV *, 8> &)AddOps).end());

// Don't leave just a constant behind in a register if the constant could		// Don't leave just a constant behind in a register if the constant could
// be folded into an immediate field.		// be folded into an immediate field.
if (InnerAddOps.size() == 1 &&		if (InnerAddOps.size() == 1 &&
isAlwaysFoldable(TTI, SE, LU.MinOffset, LU.MaxOffset, LU.Kind,		isAlwaysFoldable(TTI, SE, LU.MinOffset, LU.MaxOffset, LU.Kind,
LU.AccessTy, InnerAddOps[0], Base.getNumRegs() > 1))		LU.AccessTy, InnerAddOps[0], Base.getNumRegs() > 1))
continue;		continue;

const SCEV *InnerSum = SE.getAddExpr(InnerAddOps);		const SCEV *InnerSum = SE.getAddExpr(InnerAddOps);
if (InnerSum->isZero())		if (InnerSum->isZero())
continue;		continue;
Formula F = Base;		Formula F = Base;

// Add the remaining pieces of the add back into the new formula.		// Add the remaining pieces of the add back into the new formula.
const SCEVConstant *InnerSumSC = dyn_cast<SCEVConstant>(InnerSum);		const SCEVConstant *InnerSumSC = dyn_cast<SCEVConstant>(InnerSum);
if (InnerSumSC &&		if (InnerSumSC && SE.getTypeSizeInBits(InnerSumSC->getType()) <= 64 &&
SE.getTypeSizeInBits(InnerSumSC->getType()) <= 64 &&
TTI.isLegalAddImmediate((uint64_t)F.UnfoldedOffset +		TTI.isLegalAddImmediate((uint64_t)F.UnfoldedOffset +
InnerSumSC->getValue()->getZExtValue())) {		InnerSumSC->getValue()->getZExtValue())) {
F.UnfoldedOffset = (uint64_t)F.UnfoldedOffset +		F.UnfoldedOffset =
InnerSumSC->getValue()->getZExtValue();		(uint64_t)F.UnfoldedOffset + InnerSumSC->getValue()->getZExtValue();
F.BaseRegs.erase(F.BaseRegs.begin() + i);		if (IsScaledReg)
} else		F.ScaledReg = nullptr;
F.BaseRegs[i] = InnerSum;		else
		F.BaseRegs.erase(F.BaseRegs.begin() + Idx);
		} else if (IsScaledReg)
		F.ScaledReg = InnerSum;
		else
		F.BaseRegs[Idx] = InnerSum;

// Add J as its own register, or an unfolded immediate.		// Add J as its own register, or an unfolded immediate.
const SCEVConstant SC = dyn_cast<SCEVConstant>(J);		const SCEVConstant SC = dyn_cast<SCEVConstant>(J);
if (SC && SE.getTypeSizeInBits(SC->getType()) <= 64 &&		if (SC && SE.getTypeSizeInBits(SC->getType()) <= 64 &&
TTI.isLegalAddImmediate((uint64_t)F.UnfoldedOffset +		TTI.isLegalAddImmediate((uint64_t)F.UnfoldedOffset +
SC->getValue()->getZExtValue()))		SC->getValue()->getZExtValue()))
F.UnfoldedOffset = (uint64_t)F.UnfoldedOffset +		F.UnfoldedOffset =
SC->getValue()->getZExtValue();		(uint64_t)F.UnfoldedOffset + SC->getValue()->getZExtValue();
else		else
F.BaseRegs.push_back(*J);		F.BaseRegs.push_back(*J);
		// We may have changed the number of register in base regs, adjust the
		// formula accordingly.
		F.Canonicalize();

if (InsertFormula(LU, LUIdx, F))		if (InsertFormula(LU, LUIdx, F))
// If that formula hadn't been seen before, recurse to find more like		// If that formula hadn't been seen before, recurse to find more like
// it.		// it.
GenerateReassociations(LU, LUIdx, LU.Formulae.back(), Depth+1);		GenerateReassociations(LU, LUIdx, LU.Formulae.back(), Depth + 1);
}		}
}		}

		/// GenerateReassociations - Split out subexpressions from adds and the bases of
		/// addrecs.
		void LSRInstance::GenerateReassociations(LSRUse &LU, unsigned LUIdx,
		Formula Base, unsigned Depth) {
		assert(Base.isCanonical() && "Input must be in the canonical form");
		// Arbitrarily cap recursion to protect compile time.
		if (Depth >= 3)
		return;

		for (size_t i = 0, e = Base.BaseRegs.size(); i != e; ++i)
		GenerateReassociationsImpl(LU, LUIdx, Base, Depth, i);

		if (Base.Scale == 1)
		GenerateReassociationsImpl(LU, LUIdx, Base, Depth,
		/* Idx / -1, / IsScaledReg */ true);
}		}

/// GenerateCombinations - Generate a formula consisting of all of the		/// GenerateCombinations - Generate a formula consisting of all of the
/// loop-dominating registers added into a single register.		/// loop-dominating registers added into a single register.
void LSRInstance::GenerateCombinations(LSRUse &LU, unsigned LUIdx,		void LSRInstance::GenerateCombinations(LSRUse &LU, unsigned LUIdx,
Formula Base) {		Formula Base) {
// This method is only interesting on a plurality of registers.		// This method is only interesting on a plurality of registers.
if (Base.BaseRegs.size() <= 1) return;		if (Base.BaseRegs.size() + (Base.Scale == 1) <= 1)
		return;

		// Flatten the representation, i.e., reg1 + 1*reg2 => reg1 + reg2, before
		// processing the formula.
		Base.Unscale();
Formula F = Base;		Formula F = Base;
F.BaseRegs.clear();		F.BaseRegs.clear();
SmallVector<const SCEV *, 4> Ops;		SmallVector<const SCEV *, 4> Ops;
for (SmallVectorImpl<const SCEV *>::const_iterator		for (SmallVectorImpl<const SCEV *>::const_iterator
I = Base.BaseRegs.begin(), E = Base.BaseRegs.end(); I != E; ++I) {		I = Base.BaseRegs.begin(), E = Base.BaseRegs.end(); I != E; ++I) {
const SCEV BaseReg = I;		const SCEV BaseReg = I;
if (SE.properlyDominates(BaseReg, L->getHeader()) &&		if (SE.properlyDominates(BaseReg, L->getHeader()) &&
!SE.hasComputableLoopEvolution(BaseReg, L))		!SE.hasComputableLoopEvolution(BaseReg, L))
Ops.push_back(BaseReg);		Ops.push_back(BaseReg);
else		else
F.BaseRegs.push_back(BaseReg);		F.BaseRegs.push_back(BaseReg);
}		}
if (Ops.size() > 1) {		if (Ops.size() > 1) {
const SCEV *Sum = SE.getAddExpr(Ops);		const SCEV *Sum = SE.getAddExpr(Ops);
// TODO: If Sum is zero, it probably means ScalarEvolution missed an		// TODO: If Sum is zero, it probably means ScalarEvolution missed an
// opportunity to fold something. For now, just ignore such cases		// opportunity to fold something. For now, just ignore such cases
// rather than proceed with zero in a register.		// rather than proceed with zero in a register.
if (!Sum->isZero()) {		if (!Sum->isZero()) {
F.BaseRegs.push_back(Sum);		F.BaseRegs.push_back(Sum);
		F.Canonicalize();
(void)InsertFormula(LU, LUIdx, F);		(void)InsertFormula(LU, LUIdx, F);
}		}
}		}
}		}

/// GenerateSymbolicOffsets - Generate reuse formulae using symbolic offsets.		/// \brief Helper function for LSRInstance::GenerateSymbolicOffsets.
void LSRInstance::GenerateSymbolicOffsets(LSRUse &LU, unsigned LUIdx,		void LSRInstance::GenerateSymbolicOffsetsImpl(LSRUse &LU, unsigned LUIdx,
Formula Base) {		const Formula &Base, size_t Idx,
// We can't add a symbolic offset if the address already contains one.		bool IsScaledReg) {
if (Base.BaseGV) return;		const SCEV *G = IsScaledReg ? Base.ScaledReg : Base.BaseRegs[Idx];

for (size_t i = 0, e = Base.BaseRegs.size(); i != e; ++i) {
const SCEV *G = Base.BaseRegs[i];
GlobalValue *GV = ExtractSymbol(G, SE);		GlobalValue *GV = ExtractSymbol(G, SE);
if (G->isZero() \|\| !GV)		if (G->isZero() \|\| !GV)
continue;		return;
Formula F = Base;		Formula F = Base;
F.BaseGV = GV;		F.BaseGV = GV;
if (!isLegalUse(TTI, LU.MinOffset, LU.MaxOffset, LU.Kind, LU.AccessTy, F))		if (!isLegalUse(TTI, LU.MinOffset, LU.MaxOffset, LU.Kind, LU.AccessTy, F))
continue;		return;
F.BaseRegs[i] = G;		if (IsScaledReg)
		F.ScaledReg = G;
		else
		F.BaseRegs[Idx] = G;
(void)InsertFormula(LU, LUIdx, F);		(void)InsertFormula(LU, LUIdx, F);
}		}
}

/// GenerateConstantOffsets - Generate reuse formulae using symbolic offsets.		/// GenerateSymbolicOffsets - Generate reuse formulae using symbolic offsets.
void LSRInstance::GenerateConstantOffsets(LSRUse &LU, unsigned LUIdx,		void LSRInstance::GenerateSymbolicOffsets(LSRUse &LU, unsigned LUIdx,
Formula Base) {		Formula Base) {
// TODO: For now, just add the min and max offset, because it usually isn't		// We can't add a symbolic offset if the address already contains one.
// worthwhile looking at everything inbetween.		if (Base.BaseGV) return;
SmallVector<int64_t, 2> Worklist;
Worklist.push_back(LU.MinOffset);
if (LU.MaxOffset != LU.MinOffset)
Worklist.push_back(LU.MaxOffset);

for (size_t i = 0, e = Base.BaseRegs.size(); i != e; ++i) {		for (size_t i = 0, e = Base.BaseRegs.size(); i != e; ++i)
const SCEV *G = Base.BaseRegs[i];		GenerateSymbolicOffsetsImpl(LU, LUIdx, Base, i);
		if (Base.Scale == 1)
		GenerateSymbolicOffsetsImpl(LU, LUIdx, Base, /* Idx */ -1,
		/* IsScaledReg */ true);
		}

		/// \brief Helper function for LSRInstance::GenerateConstantOffsets.
		void LSRInstance::GenerateConstantOffsetsImpl(
		LSRUse &LU, unsigned LUIdx, const Formula &Base,
		const SmallVectorImpl<int64_t> &Worklist, size_t Idx, bool IsScaledReg) {
		const SCEV *G = IsScaledReg ? Base.ScaledReg : Base.BaseRegs[Idx];
for (SmallVectorImpl<int64_t>::const_iterator I = Worklist.begin(),		for (SmallVectorImpl<int64_t>::const_iterator I = Worklist.begin(),
E = Worklist.end(); I != E; ++I) {		E = Worklist.end();
		I != E; ++I) {
Formula F = Base;		Formula F = Base;
F.BaseOffset = (uint64_t)Base.BaseOffset - *I;		F.BaseOffset = (uint64_t)Base.BaseOffset - *I;
if (isLegalUse(TTI, LU.MinOffset - I, LU.MaxOffset - I, LU.Kind,		if (isLegalUse(TTI, LU.MinOffset - I, LU.MaxOffset - I, LU.Kind,
LU.AccessTy, F)) {		LU.AccessTy, F)) {
// Add the offset to the base register.		// Add the offset to the base register.
const SCEV NewG = SE.getAddExpr(SE.getConstant(G->getType(), I), G);		const SCEV NewG = SE.getAddExpr(SE.getConstant(G->getType(), I), G);
// If it cancelled out, drop the base register, otherwise update it.		// If it cancelled out, drop the base register, otherwise update it.
if (NewG->isZero()) {		if (NewG->isZero()) {
std::swap(F.BaseRegs[i], F.BaseRegs.back());		if (IsScaledReg) {
F.BaseRegs.pop_back();		F.Scale = 0;
		F.ScaledReg = nullptr;
} else		} else
F.BaseRegs[i] = NewG;		F.DeleteBaseReg(F.BaseRegs[Idx]);
		F.Canonicalize();
		} else if (IsScaledReg)
		F.ScaledReg = NewG;
		else
		F.BaseRegs[Idx] = NewG;

(void)InsertFormula(LU, LUIdx, F);		(void)InsertFormula(LU, LUIdx, F);
}		}
}		}

int64_t Imm = ExtractImmediate(G, SE);		int64_t Imm = ExtractImmediate(G, SE);
if (G->isZero() \|\| Imm == 0)		if (G->isZero() \|\| Imm == 0)
continue;		return;
Formula F = Base;		Formula F = Base;
F.BaseOffset = (uint64_t)F.BaseOffset + Imm;		F.BaseOffset = (uint64_t)F.BaseOffset + Imm;
if (!isLegalUse(TTI, LU.MinOffset, LU.MaxOffset, LU.Kind, LU.AccessTy, F))		if (!isLegalUse(TTI, LU.MinOffset, LU.MaxOffset, LU.Kind, LU.AccessTy, F))
continue;		return;
F.BaseRegs[i] = G;		if (IsScaledReg)
		F.ScaledReg = G;
		else
		F.BaseRegs[Idx] = G;
(void)InsertFormula(LU, LUIdx, F);		(void)InsertFormula(LU, LUIdx, F);
}		}

		/// GenerateConstantOffsets - Generate reuse formulae using symbolic offsets.
		void LSRInstance::GenerateConstantOffsets(LSRUse &LU, unsigned LUIdx,
		Formula Base) {
		// TODO: For now, just add the min and max offset, because it usually isn't
		// worthwhile looking at everything inbetween.
		SmallVector<int64_t, 2> Worklist;
		Worklist.push_back(LU.MinOffset);
		if (LU.MaxOffset != LU.MinOffset)
		Worklist.push_back(LU.MaxOffset);

		for (size_t i = 0, e = Base.BaseRegs.size(); i != e; ++i)
		GenerateConstantOffsetsImpl(LU, LUIdx, Base, Worklist, i);
		if (Base.Scale == 1)
		GenerateConstantOffsetsImpl(LU, LUIdx, Base, Worklist, /* Idx */ -1,
		/* IsScaledReg */ true);
}		}

/// GenerateICmpZeroScales - For ICmpZero, check to see if we can scale up		/// GenerateICmpZeroScales - For ICmpZero, check to see if we can scale up
/// the comparison. For example, x == y -> xc == yc.		/// the comparison. For example, x == y -> xc == yc.
void LSRInstance::GenerateICmpZeroScales(LSRUse &LU, unsigned LUIdx,		void LSRInstance::GenerateICmpZeroScales(LSRUse &LU, unsigned LUIdx,
Formula Base) {		Formula Base) {
if (LU.Kind != LSRUse::ICmpZero) return;		if (LU.Kind != LSRUse::ICmpZero) return;

▲ Show 20 Lines • Show All 83 Lines • ▼ Show 20 Lines
/// GenerateScales - Generate stride factor reuse formulae by making use of		/// GenerateScales - Generate stride factor reuse formulae by making use of
/// scaled-offset address modes, for example.		/// scaled-offset address modes, for example.
void LSRInstance::GenerateScales(LSRUse &LU, unsigned LUIdx, Formula Base) {		void LSRInstance::GenerateScales(LSRUse &LU, unsigned LUIdx, Formula Base) {
// Determine the integer type for the base formula.		// Determine the integer type for the base formula.
Type *IntTy = Base.getType();		Type *IntTy = Base.getType();
if (!IntTy) return;		if (!IntTy) return;

// If this Formula already has a scaled register, we can't add another one.		// If this Formula already has a scaled register, we can't add another one.
if (Base.Scale != 0) return;		// Try to unscale the formula to generate a better scale.
		if (Base.Scale != 0 && !Base.Unscale())
		return;

		assert(Base.Scale == 0 && "Unscale did not did its job!");

// Check each interesting stride.		// Check each interesting stride.
for (SmallSetVector<int64_t, 8>::const_iterator		for (SmallSetVector<int64_t, 8>::const_iterator
I = Factors.begin(), E = Factors.end(); I != E; ++I) {		I = Factors.begin(), E = Factors.end(); I != E; ++I) {
int64_t Factor = *I;		int64_t Factor = *I;

Base.Scale = Factor;		Base.Scale = Factor;
Base.HasBaseReg = Base.BaseRegs.size() > 1;		Base.HasBaseReg = Base.BaseRegs.size() > 1;
Show All 24 Lines	for (size_t i = 0, e = Base.BaseRegs.size(); i != e; ++i)
continue;		continue;
// Divide out the factor, ignoring high bits, since we'll be		// Divide out the factor, ignoring high bits, since we'll be
// scaling the value back up in the end.		// scaling the value back up in the end.
if (const SCEV *Quotient = getExactSDiv(AR, FactorS, SE, true)) {		if (const SCEV *Quotient = getExactSDiv(AR, FactorS, SE, true)) {
// TODO: This could be optimized to avoid all the copying.		// TODO: This could be optimized to avoid all the copying.
Formula F = Base;		Formula F = Base;
F.ScaledReg = Quotient;		F.ScaledReg = Quotient;
F.DeleteBaseReg(F.BaseRegs[i]);		F.DeleteBaseReg(F.BaseRegs[i]);
		// The canonical representation of 1*reg is reg, which is already in
		// Base. In that case, do not try to insert the formula, it will be
		// rejected anyway.
		if (F.Scale == 1 && F.BaseRegs.empty())
		continue;
(void)InsertFormula(LU, LUIdx, F);		(void)InsertFormula(LU, LUIdx, F);
}		}
}		}
}		}
}		}

/// GenerateTruncates - Generate reuse formulae from different IV types.		/// GenerateTruncates - Generate reuse formulae from different IV types.
void LSRInstance::GenerateTruncates(LSRUse &LU, unsigned LUIdx, Formula Base) {		void LSRInstance::GenerateTruncates(LSRUse &LU, unsigned LUIdx, Formula Base) {
▲ Show 20 Lines • Show All 148 Lines • ▼ Show 20 Lines	for (SmallVectorImpl<WorkItem>::const_iterator I = WorkItems.begin(),
const SCEV *OrigReg = WI.OrigReg;		const SCEV *OrigReg = WI.OrigReg;

Type *IntTy = SE.getEffectiveSCEVType(OrigReg->getType());		Type *IntTy = SE.getEffectiveSCEVType(OrigReg->getType());
const SCEV *NegImmS = SE.getSCEV(ConstantInt::get(IntTy, -(uint64_t)Imm));		const SCEV *NegImmS = SE.getSCEV(ConstantInt::get(IntTy, -(uint64_t)Imm));
unsigned BitWidth = SE.getTypeSizeInBits(IntTy);		unsigned BitWidth = SE.getTypeSizeInBits(IntTy);

// TODO: Use a more targeted data structure.		// TODO: Use a more targeted data structure.
for (size_t L = 0, LE = LU.Formulae.size(); L != LE; ++L) {		for (size_t L = 0, LE = LU.Formulae.size(); L != LE; ++L) {
const Formula &F = LU.Formulae[L];		Formula F = LU.Formulae[L];
		// FIXME: The code for the scaled and unscaled registers looks
		// very similar but slightly different. Investigate if they
		// could be merged. That way, we would not have to unscale the
		// Formula.
		F.Unscale();
// Use the immediate in the scaled register.		// Use the immediate in the scaled register.
if (F.ScaledReg == OrigReg) {		if (F.ScaledReg == OrigReg) {
int64_t Offset = (uint64_t)F.BaseOffset + Imm * (uint64_t)F.Scale;		int64_t Offset = (uint64_t)F.BaseOffset + Imm * (uint64_t)F.Scale;
// Don't create 50 + reg(-50).		// Don't create 50 + reg(-50).
if (F.referencesReg(SE.getSCEV(		if (F.referencesReg(SE.getSCEV(
ConstantInt::get(IntTy, -(uint64_t)Offset))))		ConstantInt::get(IntTy, -(uint64_t)Offset))))
continue;		continue;
Formula NewF = F;		Formula NewF = F;
Show All 9 Lines	for (size_t L = 0, LE = LU.Formulae.size(); L != LE; ++L) {
if (const SCEVConstant *C = dyn_cast<SCEVConstant>(NewF.ScaledReg))		if (const SCEVConstant *C = dyn_cast<SCEVConstant>(NewF.ScaledReg))
if (C->getValue()->isNegative() !=		if (C->getValue()->isNegative() !=
(NewF.BaseOffset < 0) &&		(NewF.BaseOffset < 0) &&
(C->getValue()->getValue().abs() * APInt(BitWidth, F.Scale))		(C->getValue()->getValue().abs() * APInt(BitWidth, F.Scale))
.ule(abs64(NewF.BaseOffset)))		.ule(abs64(NewF.BaseOffset)))
continue;		continue;

// OK, looks good.		// OK, looks good.
		NewF.Canonicalize();
(void)InsertFormula(LU, LUIdx, NewF);		(void)InsertFormula(LU, LUIdx, NewF);
} else {		} else {
// Use the immediate in a base register.		// Use the immediate in a base register.
for (size_t N = 0, NE = F.BaseRegs.size(); N != NE; ++N) {		for (size_t N = 0, NE = F.BaseRegs.size(); N != NE; ++N) {
const SCEV *BaseReg = F.BaseRegs[N];		const SCEV *BaseReg = F.BaseRegs[N];
if (BaseReg != OrigReg)		if (BaseReg != OrigReg)
continue;		continue;
Formula NewF = F;		Formula NewF = F;
Show All 17 Lines	for (size_t L = 0, LE = LU.Formulae.size(); L != LE; ++L) {
if ((C->getValue()->getValue() + NewF.BaseOffset).abs().slt(		if ((C->getValue()->getValue() + NewF.BaseOffset).abs().slt(
abs64(NewF.BaseOffset)) &&		abs64(NewF.BaseOffset)) &&
(C->getValue()->getValue() +		(C->getValue()->getValue() +
NewF.BaseOffset).countTrailingZeros() >=		NewF.BaseOffset).countTrailingZeros() >=
countTrailingZeros<uint64_t>(NewF.BaseOffset))		countTrailingZeros<uint64_t>(NewF.BaseOffset))
goto skip_formula;		goto skip_formula;

// Ok, looks good.		// Ok, looks good.
		NewF.Canonicalize();
(void)InsertFormula(LU, LUIdx, NewF);		(void)InsertFormula(LU, LUIdx, NewF);
break;		break;
skip_formula:;		skip_formula:;
}		}
}		}
}		}
}		}
}		}
▲ Show 20 Lines • Show All 237 Lines • ▼ Show 20 Lines	void LSRInstance::NarrowSearchSpaceByCollapsingUnrolledCode() {

// This is especially useful for unrolled loops.		// This is especially useful for unrolled loops.

for (size_t LUIdx = 0, NumUses = Uses.size(); LUIdx != NumUses; ++LUIdx) {		for (size_t LUIdx = 0, NumUses = Uses.size(); LUIdx != NumUses; ++LUIdx) {
LSRUse &LU = Uses[LUIdx];		LSRUse &LU = Uses[LUIdx];
for (SmallVectorImpl<Formula>::const_iterator I = LU.Formulae.begin(),		for (SmallVectorImpl<Formula>::const_iterator I = LU.Formulae.begin(),
E = LU.Formulae.end(); I != E; ++I) {		E = LU.Formulae.end(); I != E; ++I) {
const Formula &F = *I;		const Formula &F = *I;
if (F.BaseOffset == 0 \|\| F.Scale != 0)		if (F.BaseOffset == 0 \|\| (F.Scale != 0 && F.Scale != 1))
continue;		continue;

LSRUse *LUThatHas = FindUseWithSimilarFormula(F, LU);		LSRUse *LUThatHas = FindUseWithSimilarFormula(F, LU);
if (!LUThatHas)		if (!LUThatHas)
continue;		continue;

if (!reconcileNewOffset(LUThatHas, F.BaseOffset, /HasBaseReg=*/ false,		if (!reconcileNewOffset(LUThatHas, F.BaseOffset, /HasBaseReg=*/ false,
LU.Kind, LU.AccessTy))		LU.Kind, LU.AccessTy))
▲ Show 20 Lines • Show All 444 Lines • ▼ Show 20 Lines	if (F.Scale != 0) {

// If we're expanding for a post-inc user, make the post-inc adjustment.		// If we're expanding for a post-inc user, make the post-inc adjustment.
PostIncLoopSet &Loops = const_cast<PostIncLoopSet &>(LF.PostIncLoops);		PostIncLoopSet &Loops = const_cast<PostIncLoopSet &>(LF.PostIncLoops);
ScaledS = TransformForPostIncUse(Denormalize, ScaledS,		ScaledS = TransformForPostIncUse(Denormalize, ScaledS,
LF.UserInst, LF.OperandValToReplace,		LF.UserInst, LF.OperandValToReplace,
Loops, SE, DT);		Loops, SE, DT);

if (LU.Kind == LSRUse::ICmpZero) {		if (LU.Kind == LSRUse::ICmpZero) {
		// Expand ScaleReg as if it was part of the base regs.
		if (F.Scale == 1)
		Ops.push_back(
		SE.getUnknown(Rewriter.expandCodeFor(ScaledS, nullptr, IP)));
		else {
// An interesting way of "folding" with an icmp is to use a negated		// An interesting way of "folding" with an icmp is to use a negated
// scale, which we'll implement by inserting it into the other operand		// scale, which we'll implement by inserting it into the other operand
// of the icmp.		// of the icmp.
assert(F.Scale == -1 &&		assert(F.Scale == -1 &&
"The only scale supported by ICmpZero uses is -1!");		"The only scale supported by ICmpZero uses is -1!");
ICmpScaledV = Rewriter.expandCodeFor(ScaledS, nullptr, IP);		ICmpScaledV = Rewriter.expandCodeFor(ScaledS, nullptr, IP);
		}
} else {		} else {
// Otherwise just expand the scaled register and an explicit scale,		// Otherwise just expand the scaled register and an explicit scale,
// which is expected to be matched as part of the address.		// which is expected to be matched as part of the address.

// Flush the operand list to suppress SCEVExpander hoisting address modes.		// Flush the operand list to suppress SCEVExpander hoisting address modes.
if (!Ops.empty() && LU.Kind == LSRUse::Address) {		// Unless the addressing mode will not be folded.
		if (!Ops.empty() && LU.Kind == LSRUse::Address &&
		isAMCompletelyFolded(TTI, LU, F)) {
Value *FullV = Rewriter.expandCodeFor(SE.getAddExpr(Ops), Ty, IP);		Value *FullV = Rewriter.expandCodeFor(SE.getAddExpr(Ops), Ty, IP);
Ops.clear();		Ops.clear();
Ops.push_back(SE.getUnknown(FullV));		Ops.push_back(SE.getUnknown(FullV));
}		}
ScaledS = SE.getUnknown(Rewriter.expandCodeFor(ScaledS, nullptr, IP));		ScaledS = SE.getUnknown(Rewriter.expandCodeFor(ScaledS, nullptr, IP));
ScaledS = SE.getMulExpr(ScaledS,		if (F.Scale != 1)
SE.getConstant(ScaledS->getType(), F.Scale));		ScaledS =
		SE.getMulExpr(ScaledS, SE.getConstant(ScaledS->getType(), F.Scale));
Ops.push_back(ScaledS);		Ops.push_back(ScaledS);
}		}
}		}

// Expand the GV portion.		// Expand the GV portion.
if (F.BaseGV) {		if (F.BaseGV) {
// Flush the operand list to suppress SCEVExpander hoisting.		// Flush the operand list to suppress SCEVExpander hoisting.
if (!Ops.empty()) {		if (!Ops.empty()) {
▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines	if (F.Scale == -1) {
Instruction *Cast =		Instruction *Cast =
CastInst::Create(CastInst::getCastOpcode(ICmpScaledV, false,		CastInst::Create(CastInst::getCastOpcode(ICmpScaledV, false,
OpTy, false),		OpTy, false),
ICmpScaledV, OpTy, "tmp", CI);		ICmpScaledV, OpTy, "tmp", CI);
ICmpScaledV = Cast;		ICmpScaledV = Cast;
}		}
CI->setOperand(1, ICmpScaledV);		CI->setOperand(1, ICmpScaledV);
} else {		} else {
assert(F.Scale == 0 &&		// A scale of 1 means that the scale has been expanded as part of the
		// base regs.
		assert((F.Scale == 0 \|\| F.Scale == 1) &&
"ICmp does not support folding a global value and "		"ICmp does not support folding a global value and "
"a scale at the same time!");		"a scale at the same time!");
Constant *C = ConstantInt::getSigned(SE.getEffectiveSCEVType(OpTy),		Constant *C = ConstantInt::getSigned(SE.getEffectiveSCEVType(OpTy),
-(uint64_t)Offset);		-(uint64_t)Offset);
if (C->getType() != OpTy)		if (C->getType() != OpTy)
C = ConstantExpr::getCast(CastInst::getCastOpcode(C, false,		C = ConstantExpr::getCast(CastInst::getCastOpcode(C, false,
OpTy, false),		OpTy, false),
C, OpTy);		C, OpTy);
▲ Show 20 Lines • Show All 413 Lines • Show Last 20 Lines

test/CodeGen/X86/avoid_complex_am.ll

	; RUN: opt -S -loop-reduce < %s \| FileCheck %s			; RUN: opt -S -loop-reduce < %s \| FileCheck %s
	; Complex addressing mode are costly.			; Complex addressing mode are costly.
	; Make loop-reduce prefer unscaled accesses.			; Make loop-reduce prefer unscaled accesses.
				; On X86, reg1 + 1reg2 has the same cost as reg1 + 8reg2.
				; Therefore, LSR currently prefers to fold as much computation as possible
				; in the addressing mode.
	; <rdar://problem/16730541>			; <rdar://problem/16730541>
	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-macosx"			target triple = "x86_64-apple-macosx"

	define void @mulDouble(double* nocapture %a, double* nocapture %b, double* nocapture %c) {			define void @mulDouble(double* nocapture %a, double* nocapture %b, double* nocapture %c) {
	; CHECK: @mulDouble			; CHECK: @mulDouble
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	; CHECK: [[IV:%[^ ]+]] = phi i64 [ [[IVNEXT:%[^,]+]], %for.body ], [ 0, %entry ]			; CHECK: [[IV:%[^ ]+]] = phi i64 [ [[IVNEXT:%[^,]+]], %for.body ], [ 0, %entry ]
	; Only one induction variable should have been generated.			; Only one induction variable should have been generated.
	; CHECK-NOT: phi			; CHECK-NOT: phi
	%indvars.iv = phi i64 [ 1, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 1, %entry ], [ %indvars.iv.next, %for.body ]
	%tmp = add nsw i64 %indvars.iv, -1			%tmp = add nsw i64 %indvars.iv, -1
	%arrayidx = getelementptr inbounds double* %b, i64 %tmp			%arrayidx = getelementptr inbounds double* %b, i64 %tmp
	%tmp1 = load double* %arrayidx, align 8			%tmp1 = load double* %arrayidx, align 8
	; The induction variable should carry the scaling factor: 1 * 8 = 8.			; The induction variable should carry the scaling factor: 1.
	; CHECK: [[IVNEXT]] = add nuw nsw i64 [[IV]], 8			; CHECK: [[IVNEXT]] = add nuw nsw i64 [[IV]], 1
				qcolombetAuthorUnsubmitted Not Done Reply Inline Actions Note: Since reg1 + <{0,+,8}> has the same cost as reg1 + <{0,+,1}> * 8 on x86, IIRC, in that example LSR chooses the second as the ImmCost is lower. qcolombet: Note: Since reg1 + <{0,+,8}> has the same cost as reg1 + <{0,+,1}> * 8 on x86, IIRC, in that…
	%indvars.iv.next = add i64 %indvars.iv, 1			%indvars.iv.next = add i64 %indvars.iv, 1
	%arrayidx2 = getelementptr inbounds double* %c, i64 %indvars.iv.next			%arrayidx2 = getelementptr inbounds double* %c, i64 %indvars.iv.next
	%tmp2 = load double* %arrayidx2, align 8			%tmp2 = load double* %arrayidx2, align 8
	%mul = fmul double %tmp1, %tmp2			%mul = fmul double %tmp1, %tmp2
	%arrayidx4 = getelementptr inbounds double* %a, i64 %indvars.iv			%arrayidx4 = getelementptr inbounds double* %a, i64 %indvars.iv
	store double %mul, double* %arrayidx4, align 8			store double %mul, double* %arrayidx4, align 8
	%lftr.wideiv = trunc i64 %indvars.iv.next to i32			%lftr.wideiv = trunc i64 %indvars.iv.next to i32
	; Comparison should be 19 * 8 = 152.			; Comparison should be 19 * 1 = 19.
	; CHECK: icmp eq i32 {{%[^,]+}}, 152			; CHECK: icmp eq i32 {{%[^,]+}}, 19
	%exitcond = icmp eq i32 %lftr.wideiv, 20			%exitcond = icmp eq i32 %lftr.wideiv, 20
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end: ; preds = %for.body			for.end: ; preds = %for.body
	ret void			ret void
	}			}

test/CodeGen/X86/masked-iv-safe.ll

; RUN: llc < %s -mcpu=generic -march=x86-64 \| FileCheck %s		; RUN: llc < %s -mcpu=generic -march=x86-64 \| FileCheck %s

; Optimize away zext-inreg and sext-inreg on the loop induction		; Optimize away zext-inreg and sext-inreg on the loop induction
; variable using trip-count information.		; variable using trip-count information.

; CHECK-LABEL: count_up		; CHECK-LABEL: count_up
; CHECK-NOT: {{and\|movz\|sar\|shl}}		; CHECK-NOT: {{and\|movz\|sar\|shl}}
; CHECK: addq $8,		; CHECK: incq
		qcolombetAuthorUnsubmitted Not Done Reply Inline Actions Same here. qcolombet: Same here.
; CHECK-NOT: {{and\|movz\|sar\|shl}}		; CHECK-NOT: {{and\|movz\|sar\|shl}}
; CHECK: jne		; CHECK: jne
define void @count_up(double* %d, i64 %n) nounwind {		define void @count_up(double* %d, i64 %n) nounwind {
entry:		entry:
br label %loop		br label %loop

loop:		loop:
%indvar = phi i64 [ 0, %entry ], [ %indvar.next, %loop ]		%indvar = phi i64 [ 0, %entry ], [ %indvar.next, %loop ]
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	loop:
br i1 %exitcond, label %return, label %loop		br i1 %exitcond, label %return, label %loop

return:		return:
ret void		ret void
}		}

; CHECK-LABEL: count_up_signed		; CHECK-LABEL: count_up_signed
; CHECK-NOT: {{and\|movz\|sar\|shl}}		; CHECK-NOT: {{and\|movz\|sar\|shl}}
; CHECK: addq $8,		; CHECK: incq
; CHECK-NOT: {{and\|movz\|sar\|shl}}		; CHECK-NOT: {{and\|movz\|sar\|shl}}
; CHECK: jne		; CHECK: jne
define void @count_up_signed(double* %d, i64 %n) nounwind {		define void @count_up_signed(double* %d, i64 %n) nounwind {
entry:		entry:
br label %loop		br label %loop

loop:		loop:
%indvar = phi i64 [ 0, %entry ], [ %indvar.next, %loop ]		%indvar = phi i64 [ 0, %entry ], [ %indvar.next, %loop ]
▲ Show 20 Lines • Show All 154 Lines • ▼ Show 20 Lines	loop:
br i1 %exitcond, label %return, label %loop		br i1 %exitcond, label %return, label %loop

return:		return:
ret void		ret void
}		}

; CHECK-LABEL: another_count_down_signed		; CHECK-LABEL: another_count_down_signed
; CHECK-NOT: {{and\|movz\|sar\|shl}}		; CHECK-NOT: {{and\|movz\|sar\|shl}}
; CHECK: addq $-8,		; CHECK: decq
; CHECK-NOT: {{and\|movz\|sar\|shl}}		; CHECK-NOT: {{and\|movz\|sar\|shl}}
; CHECK: jne		; CHECK: jne
define void @another_count_down_signed(double* %d, i64 %n) nounwind {		define void @another_count_down_signed(double* %d, i64 %n) nounwind {
entry:		entry:
br label %loop		br label %loop

loop:		loop:
%indvar = phi i64 [ 0, %entry ], [ %indvar.next, %loop ]		%indvar = phi i64 [ 0, %entry ], [ %indvar.next, %loop ]
Show All 23 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[LSR] Canonicalize reg1 + ... + regN into reg1 + ... + 1*regNClosedPublic

Details

Benchmark_ID Reference Test Expansion Percent

llubenchmark/llu 3.8547 3.8213 0.99 -1%

Min (13) - - 0.96 -

Max (13) - - 1.02 -

Sum (13) 48 48 1 +0%

A.Mean (13) - - 1 +0%

G.Mean 2 (13) - - 1 +0%

Benchmark_ID Reference Test Expansion Percent

mafft/pairlocalalign 24.6637 24.9703 1.01 +1%

Min (17) - - 0.92 -

Max (17) - - 1.02 -

Sum (17) 108 108 0.99 +1%

A.Mean (17) - - 0.99 -1%

G.Mean 2 (17) - - 0.99 -1%

Diff Detail

Event Timeline

Revision Contents

Diff 9630

lib/Transforms/Scalar/LoopStrengthReduce.cpp

test/CodeGen/X86/avoid_complex_am.ll

test/CodeGen/X86/masked-iv-safe.ll

[LSR] Canonicalize reg1 + ... + regN into reg1 + ... + 1*regN
ClosedPublic