This is an archive of the discontinued LLVM Phabricator instance.

LSR: an alternative way to resolve complex solution
ClosedPublic

Authored by evstupac on Feb 10 2017, 7:49 PM.

Download Raw Diff

Details

Reviewers

qcolombet
hfinkel

Commits

rGd6aa0d02c2d6: Set option enabling LSR alternative way to resolve complex solution to false.
rG9909872e30b2: The patch introduces new way of narrowing complex (>UINT16 variants) solutions.
rL296959: Set option enabling LSR alternative way to resolve complex solution to false.
rL295704: The patch introduces new way of narrowing complex (>UINT16 variants) solutions.

Summary

The patch introduces alternative method of resolving complex LSR solutions.
The method is based on choosing lowest mathematical expectation of registers in solution.
It should be generally faster. On my benchmarks x86 32 bits performance is better:
164.gzip +2%
viterbi algorithm benchmark ~10%

I'd like to commit this under an option for testing (now true by default for testing).

Diff Detail

Repository: rL LLVM

Event Timeline

evstupac created this revision.Feb 10 2017, 7:49 PM

Hi,

That looks interesting, thanks for working on this.

Regarding the performance numbers you mentioned, I'm a bit confused, are those compile time numbers?

Cheers,
-Quentin

lib/Transforms/Scalar/LoopStrengthReduce.cpp
4337	I am not a fan of using float in heuristic as they can introduce subtle difference from one target to other. Could we keep the numerator and denominator as two separate variables and do the comparison accordingly? We may already have helper class for that. (what is used in BlockFrequency maybe?)

Hi Quentin,

Regarding the performance numbers you mentioned, I'm a bit confused, are those compile time numbers?

This is performance numbers. I don' run compile time tests. The case when we exceed 65536 variants is not that frequent. So I don't expect much difference.
However, we could lower the limit (if performance is unchanged) - that will give some compile time advantage.

Initially motivation example comes from zlib deflate algorithm which becomes ~10-30% (depending on x86 32bit CPU) faster.

Thanks,
Evgeny

lib/Transforms/Scalar/LoopStrengthReduce.cpp
4337	I am not a fan of using float in heuristic as they can introduce subtle difference from one target to other. Could we keep the numerator and denominator as two separate variables and do the comparison accordingly? Yes. That is doable. However, most likely will be slower. I'll take a look what we can do here (maybe using BlockFrequency).

evstupac added a subscriber: llvm-commits.Feb 17 2017, 12:37 PM

Hi,

Thanks for the clarification.

It should be generally faster.

I was confused by this statement thinking you were trying to improve compile time :).

LGTM with two caveats:

Should we set the option to true by default? I believe it would be best to keep it to false and send an email to llvm-dev to ask for benchmarking.
The floating point thingy that I mentioned.

I am fine with moving forward with the current patch as long as you commit to look at #2.

Cheers,
Q.

lib/Transforms/Scalar/LoopStrengthReduce.cpp
4337	BlockFrequency is probably not the best abstraction but maybe it uses something we can reuse here.

qcolombet accepted this revision.Feb 17 2017, 2:44 PM

This revision is now accepted and ready to land.Feb 17 2017, 2:44 PM

Should we set the option to true by default? I believe it would be best to keep it to false and send an email to llvm-dev to ask for benchmarking.

The floating point thingy that I mentioned.

I am fine with moving forward with the current patch as long as you commit to look at #2.

Thanks, Quentin.
I'm OOO next week. Committing the patch as is is the best option for me to get some feedback during my OOO.
I'll commit it with the option set to "false" by default.
It was set true in the review for 2 reasons: testing (if someone download patch as is) and to show influence on LIT tests.
And I will not make the option true until a decision on #2. I'll look into it once I'm back.

Thanks,
Evgeny

Closed by commit rL295704: The patch introduces new way of narrowing complex (>UINT16 variants) solutions. (authored by evstupac). · Explain WhyFeb 20 2017, 11:46 PM

This revision was automatically updated to reflect the committed changes.

Hello,

Looks like you committed with the options set to true. This seems to introduce some regressions we are seeing in internal benchmarks. While there are some good improvements too, some benchmarks are regressing 15-20%. For example I believe these Shootout-C++ matrix regressions are due to this commit:
http://llvm.org/perf/db_default/v4/nts/daily_report/2017/2/21?day_start=16

Unless you have any objections, I think we should set the option default to false for the time being, like the review says.

Thanks,
Dave

For example I believe these Shootout-C++ matrix regressions are due to this commit

Yes. However the regression was triggered by r295538, which I missed in my testing.
Anyway this is a regression and I need to address it somehow.
And yes I have no objections.

Thanks,
Evgeny

FWIW, we are seeing a 9% regression in spec2006/hmmer on our AArch64 Kryo target with this flag turned on by default.

Thanks for reporting the regressions. It looks like I have a simple fix for both.
Here is the case:
Use1 (Address):
{a} + {0,+,1} register num expectation 1 + 1/2
{a,+,1} register num expectation 1
Use2 (Address):
{b} + {0,+,1}
{b,+,1}
Use3 (ICmpZero):
-1024 + {0,+,1}

That way new method will select {a,+1} {b,+,1} and -1024 + {0,+,1} which is not optimal in terms of AddRecExprs (still optimal in terms of RegNum)

In Matrix test there are 32 Address uses like above. The expectation of {a} + {0,+,1} becomes very close 1, but still grater than 1.
The solution is simple - delete formulas in ICmpZero before Address. That way the optimal solution is selected (as {0,+,1} becomes unique).

And I'm testing new patch without float (with APInt numerator and denominator).

I have a fix to address both regressions and going to publish it after the review: https://reviews.llvm.org/D30527.
Before this should I revert the option to false?

Thanks,
Evgeny

The following patch should fix both matrix and hmmer regressions:
https://reviews.llvm.org/D30552

Thanks,
Evgeny

Thanks for looking into the regression. I tested D30552 on our AArch64 Kryo target and for spec2006/hmmer it recovered some of the lost performance, however it is still 2% regressed compared to 9% regression previously with lsr-exp-narrow flag on by default.

however it is still 2% regressed

I don't have a system on board. I've just looked into the changes.
Are there any gains in other benchmarks that can cover this 2% regression?
I mean is it ok to leave the regression or we should look deeper?

In D29862#680367, @qcolombet wrote:

LGTM with two caveats:

Should we set the option to true by default? I believe it would be best to keep it to false and send an email to llvm-dev to ask for benchmarking.

I strongly agree with Quentin in that this should have been committed with the feature disabled by default. I appreciate that the regressions are being address, but that should happen prior to the feature being enabled. I say this because this is a rather pervasive change that hasn't been full vetted on targets other than x86.

Here are some results from our LTO config on Kryo:
• Spec2006/h264ref -2.8%
• Spec2006/hmmer -8.9%
• Spec2006/perlbench +3.5%
• Spec2006/sjeng +2.3%
• Spec2006/dealII +2.3%
• Spec2006/sphinx3 -3.2%
• Spec2000/crafty +2%
• Spec2000/eon -2.4%
• Spec2000/mcf -2.3%
• Spec2000/perlbmk -2.8%
• Spec2000/mesa -3.2%

As you can see this is a net loss. Admittedly, this is a comparison against the prior week's performance and includes many changes, but we have at least bisected the the spec2006/hmmer regression to this commit (yes, I understand you have a fix pending, but that's beside the point). I've asked Balaram @bmakam to perform more exhaustive testing on our platform. In the mean time, can you please disable this feature (as was originally suggested) until we've done some amount of due diligence?

Thanks for the data.
Sure, I'll set the option to false today.

Thanks,
Evgeny

In D29862#691782, @evstupac wrote:

Thanks for the data.
Sure, I'll set the option to false today.

Thank you!

Regarding other regressions in spec2006.
New method does not guarantee perfect solution. So I think it would be fair to apply it if it generally demonstrate better code.
By generally I mean, that summ of all LSR solution registers (say in a benchmark) become lower.
I can collect such statistic for you arch (please proved me with exact options).
If new method generally select more registers for LSR solutions I'll need to fix this.

Hi Evgeny,
Has the fix for hmmer been committed? If not, we need that committed as a first step. Once that's complete we can reevaluate the performance on AArch64. I'd like to proceed by determine the specific cause of each regression and address them accordingly.

In D29862#693735, @evstupac wrote:

Regarding other regressions in spec2006.
New method does not guarantee perfect solution. So I think it would be fair to apply it if it generally demonstrate better code.
By generally I mean, that summ of all LSR solution registers (say in a benchmark) become lower.
I can collect such statistic for you arch (please proved me with exact options).
If new method generally select more registers for LSR solutions I'll need to fix this.

The options we use to test are:

O3 -fno-strict-aliasing -mcpu=kryo -fomit-frame-pointer
-Ofast -flto -fuse-ld=gold -mcpu=kryo -fomit-frame-pointer

In both configs we see regressions with this patch. Even after applying D30552 that almost fixes spec2006/hmmer we see a net loss of 0.4% geomean in LTO config.
spec2000/gzip and spec2006/hmmer (viterbi algorithm) seems to be the motivational benchmarks for this change but instead of improvement the performance is either neutral or negative.

Revision Contents

Path

Size

lib/

Transforms/

Scalar/

LoopStrengthReduce.cpp

161 lines

test/

Transforms/

LoopStrengthReduce/

2013-01-14-ReuseCast.ll

2 lines

ARM/

ivchain-ARM.ll

2 lines

Diff 88081

lib/Transforms/Scalar/LoopStrengthReduce.cpp

Show First 20 Lines • Show All 128 Lines • ▼ Show 20 Lines	static cl::opt<bool> EnablePhiElim(
"enable-lsr-phielim", cl::Hidden, cl::init(true),		"enable-lsr-phielim", cl::Hidden, cl::init(true),
cl::desc("Enable LSR phi elimination"));		cl::desc("Enable LSR phi elimination"));

// The flag adds instruction count to solutions cost comparision.		// The flag adds instruction count to solutions cost comparision.
static cl::opt<bool> InsnsCost(		static cl::opt<bool> InsnsCost(
"lsr-insns-cost", cl::Hidden, cl::init(false),		"lsr-insns-cost", cl::Hidden, cl::init(false),
cl::desc("Add instruction count to a LSR cost model"));		cl::desc("Add instruction count to a LSR cost model"));

		// Flag to choose how to narrow complex lsr solution
		static cl::opt<bool> LSRExpNarrow(
		"lsr-exp-narrow", cl::Hidden, cl::init(true),
		cl::desc("Narrow LSR complex solution by calculation"
		" expectation of registers number"));

#ifndef NDEBUG		#ifndef NDEBUG
// Stress test IV chain generation.		// Stress test IV chain generation.
static cl::opt<bool> StressIVChain(		static cl::opt<bool> StressIVChain(
"stress-ivchain", cl::Hidden, cl::init(false),		"stress-ivchain", cl::Hidden, cl::init(false),
cl::desc("Stress test LSR IV chains"));		cl::desc("Stress test LSR IV chains"));
#else		#else
static bool StressIVChain = false;		static bool StressIVChain = false;
#endif		#endif
▲ Show 20 Lines • Show All 945 Lines • ▼ Show 20 Lines	void pushFixup(LSRFixup &f) {
Fixups.push_back(f);		Fixups.push_back(f);
if (f.Offset > MaxOffset)		if (f.Offset > MaxOffset)
MaxOffset = f.Offset;		MaxOffset = f.Offset;
if (f.Offset < MinOffset)		if (f.Offset < MinOffset)
MinOffset = f.Offset;		MinOffset = f.Offset;
}		}

bool HasFormulaWithSameRegs(const Formula &F) const;		bool HasFormulaWithSameRegs(const Formula &F) const;
		float getNotSelectedProbability(const SCEV *Reg) const;
bool InsertFormula(const Formula &F);		bool InsertFormula(const Formula &F);
void DeleteFormula(Formula &F);		void DeleteFormula(Formula &F);
void RecomputeRegs(size_t LUIdx, RegUseTracker &Reguses);		void RecomputeRegs(size_t LUIdx, RegUseTracker &Reguses);

void print(raw_ostream &OS) const;		void print(raw_ostream &OS) const;
void dump() const;		void dump() const;
};		};

▲ Show 20 Lines • Show All 255 Lines • ▼ Show 20 Lines
bool LSRUse::HasFormulaWithSameRegs(const Formula &F) const {		bool LSRUse::HasFormulaWithSameRegs(const Formula &F) const {
SmallVector<const SCEV *, 4> Key = F.BaseRegs;		SmallVector<const SCEV *, 4> Key = F.BaseRegs;
if (F.ScaledReg) Key.push_back(F.ScaledReg);		if (F.ScaledReg) Key.push_back(F.ScaledReg);
// Unstable sort by host order ok, because this is only used for uniquifying.		// Unstable sort by host order ok, because this is only used for uniquifying.
std::sort(Key.begin(), Key.end());		std::sort(Key.begin(), Key.end());
return Uniquifier.count(Key);		return Uniquifier.count(Key);
}		}

		/// The function returns a probability of selecting formula without Reg.
		float LSRUse::getNotSelectedProbability(const SCEV *Reg) const {
		unsigned FNum = 0;
		for (const Formula &F : Formulae)
		if (F.referencesReg(Reg))
		FNum++;
		return ((float)(Formulae.size() - FNum)) / Formulae.size();
		}

/// If the given formula has not yet been inserted, add it to the list, and		/// If the given formula has not yet been inserted, add it to the list, and
/// return true. Return false otherwise. The formula must be in canonical form.		/// return true. Return false otherwise. The formula must be in canonical form.
bool LSRUse::InsertFormula(const Formula &F) {		bool LSRUse::InsertFormula(const Formula &F) {
assert(F.isCanonical() && "Invalid canonical representation");		assert(F.isCanonical() && "Invalid canonical representation");

if (!Formulae.empty() && RigidFormula)		if (!Formulae.empty() && RigidFormula)
return false;		return false;

▲ Show 20 Lines • Show All 457 Lines • ▼ Show 20 Lines	class LSRInstance {
void GenerateAllReuseFormulae();		void GenerateAllReuseFormulae();

void FilterOutUndesirableDedicatedRegisters();		void FilterOutUndesirableDedicatedRegisters();

size_t EstimateSearchSpaceComplexity() const;		size_t EstimateSearchSpaceComplexity() const;
void NarrowSearchSpaceByDetectingSupersets();		void NarrowSearchSpaceByDetectingSupersets();
void NarrowSearchSpaceByCollapsingUnrolledCode();		void NarrowSearchSpaceByCollapsingUnrolledCode();
void NarrowSearchSpaceByRefilteringUndesirableDedicatedRegisters();		void NarrowSearchSpaceByRefilteringUndesirableDedicatedRegisters();
		void NarrowSearchSpaceByDeletingCostlyFormulas();
void NarrowSearchSpaceByPickingWinnerRegs();		void NarrowSearchSpaceByPickingWinnerRegs();
void NarrowSearchSpaceUsingHeuristics();		void NarrowSearchSpaceUsingHeuristics();

void SolveRecurse(SmallVectorImpl<const Formula *> &Solution,		void SolveRecurse(SmallVectorImpl<const Formula *> &Solution,
Cost &SolutionCost,		Cost &SolutionCost,
SmallVectorImpl<const Formula *> &Workspace,		SmallVectorImpl<const Formula *> &Workspace,
const Cost &CurCost,		const Cost &CurCost,
const SmallPtrSet<const SCEV *, 16> &CurRegs,		const SmallPtrSet<const SCEV *, 16> &CurRegs,
▲ Show 20 Lines • Show All 2,380 Lines • ▼ Show 20 Lines	if (EstimateSearchSpaceComplexity() >= ComplexityLimit) {

FilterOutUndesirableDedicatedRegisters();		FilterOutUndesirableDedicatedRegisters();

DEBUG(dbgs() << "After pre-selection:\n";		DEBUG(dbgs() << "After pre-selection:\n";
print_uses(dbgs()));		print_uses(dbgs()));
}		}
}		}

		/// The function delete formulas with high registers number expectation.
		/// Assuming we don't know the value of each formula (already delete
		/// all inefficient), generate probability of not selecting for each
		/// register.
		/// For example,
		/// Use1:
		/// reg(a) + reg({0,+,1})
		/// reg(a) + reg({-1,+,1}) + 1
		/// reg({a,+,1})
		/// Use2:
		/// reg(b) + reg({0,+,1})
		/// reg(b) + reg({-1,+,1}) + 1
		/// reg({b,+,1})
		/// Use3:
		/// reg(c) + reg(b) + reg({0,+,1})
		/// reg(c) + reg({b,+,1})
		///
		/// Probability of not selecting
		/// Use1 Use2 Use3
		/// reg(a) (1/3) * 1 * 1
		/// reg(b) 1 * (1/3) * (1/2)
		/// reg({0,+,1}) (2/3) * (2/3) * (1/2)
		/// reg({-1,+,1}) (2/3) * (2/3) * 1
		/// reg({a,+,1}) (2/3) * 1 * 1
		/// reg({b,+,1}) 1 * (2/3) * (2/3)
		/// reg(c) 1 * 1 * 0
		///
		/// Now count registers number mathematical expectation for each formula:
		/// Note that for each use we exclude probability if not selecting for the use.
		/// For example for Use1 probability for reg(a) would be just 1 * 1 (excluding
		/// probabilty 1/3 of not selecting for Use1).
		/// Use1:
		/// reg(a) + reg({0,+,1}) 1 + 1/3 -- to be deleted
		/// reg(a) + reg({-1,+,1}) + 1 1 + 4/9 -- to be deleted
		/// reg({a,+,1}) 1
		/// Use2:
		/// reg(b) + reg({0,+,1}) 1/2 + 1/3 -- to be deleted
		/// reg(b) + reg({-1,+,1}) + 1 1/2 + 2/3 -- to be deleted
		/// reg({b,+,1}) 2/3
		/// Use3:
		/// reg(c) + reg(b) + reg({0,+,1}) 1 + 1/3 + 4/9 -- to be deleted
		/// reg(c) + reg({b,+,1}) 1 + 2/3

		void LSRInstance::NarrowSearchSpaceByDeletingCostlyFormulas() {
		if (EstimateSearchSpaceComplexity() < ComplexityLimit)
		return;
		// Ok, we have too many of formulae on our hands to conveniently handle.
		// Use a rough heuristic to thin out the list.

		// Set of Regs wich will be 100% used in final solution.
		// Used in each formula of a solution (in example above this is reg(c)).
		// We can skip them in calculations.
		SmallPtrSet<const SCEV *, 4> UniqRegs;
		DEBUG(dbgs() << "The search space is too complex.\n");

		// Map each register to probability of not selecting
		DenseMap <const SCEV *, float> RegNumMap;
		for (const SCEV *Reg : RegUses) {
		if (UniqRegs.count(Reg))
		continue;
		float PNotSel = 1;
		for (const LSRUse &LU : Uses) {
		if (!LU.Regs.count(Reg))
		continue;
		float P = LU.getNotSelectedProbability(Reg);
		if (P != 0.0)
		PNotSel *= P;
		else
		UniqRegs.insert(Reg);
		}
		RegNumMap.insert(std::make_pair(Reg, PNotSel));
		}

		DEBUG(dbgs() << "Narrowing the search space by deleting costly formulas\n");

		// Delete formulas where registers number expectation is high.
		for (size_t LUIdx = 0, NumUses = Uses.size(); LUIdx != NumUses; ++LUIdx) {
		LSRUse &LU = Uses[LUIdx];
		// If nothing to delete - continue.
		if(LU.Formulae.size() < 2)
		continue;
		float FMinRegNum = LU.Formulae[0].getNumRegs();
		float FMinARegNum = LU.Formulae[0].getNumRegs();
		qcolombetUnsubmitted Not Done Reply Inline Actions I am not a fan of using float in heuristic as they can introduce subtle difference from one target to other. Could we keep the numerator and denominator as two separate variables and do the comparison accordingly? We may already have helper class for that. (what is used in BlockFrequency maybe?) qcolombet: I am not a fan of using float in heuristic as they can introduce subtle difference from one…
		evstupacAuthorUnsubmitted Not Done Reply Inline Actions I am not a fan of using float in heuristic as they can introduce subtle difference from one target to other. Could we keep the numerator and denominator as two separate variables and do the comparison accordingly? Yes. That is doable. However, most likely will be slower. I'll take a look what we can do here (maybe using BlockFrequency). evstupac: >I am not a fan of using float in heuristic as they can introduce subtle difference from one…
		qcolombetUnsubmitted Not Done Reply Inline Actions BlockFrequency is probably not the best abstraction but maybe it uses something we can reuse here. qcolombet: BlockFrequency is probably not the best abstraction but maybe it uses something we can reuse…
		size_t MinIdx = 0;
		for (size_t i = 0, e = LU.Formulae.size(); i != e; ++i) {
		Formula &F = LU.Formulae[i];
		float FRegNum = 0;
		float FARegNum = 0;
		for (const SCEV *BaseReg : F.BaseRegs) {
		if (UniqRegs.count(BaseReg))
		continue;
		FRegNum += RegNumMap[BaseReg] / LU.getNotSelectedProbability(BaseReg);
		if (isa<SCEVAddRecExpr>(BaseReg))
		FARegNum += RegNumMap[BaseReg] / LU.getNotSelectedProbability(BaseReg);
		}
		if (const SCEV *ScaledReg = F.ScaledReg) {
		if (!UniqRegs.count(ScaledReg)) {
		FRegNum +=
		RegNumMap[ScaledReg] / LU.getNotSelectedProbability(ScaledReg);
		if (isa<SCEVAddRecExpr>(ScaledReg))
		FARegNum +=
		RegNumMap[ScaledReg] / LU.getNotSelectedProbability(ScaledReg);
		}
		}
		if (FMinRegNum > FRegNum \|\|
		(FMinRegNum == FRegNum && FMinARegNum > FARegNum)) {
		FMinRegNum = FRegNum;
		FMinARegNum = FARegNum;
		MinIdx = i;
		}
		}
		DEBUG(dbgs() << " The formula "; LU.Formulae[MinIdx].print(dbgs());
		dbgs() << " with min reg num " << FMinRegNum << '\n');
		while (LU.Formulae.size() != 1) {
		if (&LU.Formulae[MinIdx] == &LU.Formulae.back()) {
		if (MinIdx != 0) {
		std::swap(LU.Formulae[MinIdx], LU.Formulae[0]);
		MinIdx = 0;
		} else {
		std::swap(LU.Formulae[MinIdx], LU.Formulae[1]);
		MinIdx = 1;
		}
		}
		DEBUG(dbgs() << " Deleting "; LU.Formulae.back().print(dbgs()); dbgs() << '\n');
		LU.Formulae.pop_back();
		}
		LU.RecomputeRegs(LUIdx, RegUses);
		assert(LU.Formulae.size() == 1 && "Should be exactly 1 min regs formula");
		Formula &F = LU.Formulae[0];
		// When we choose the formula, the regs become unique.
		UniqRegs.insert(F.BaseRegs.begin(), F.BaseRegs.end());
		if (F.ScaledReg)
		UniqRegs.insert(F.ScaledReg);
		}
		DEBUG(dbgs() << "After pre-selection:\n";
		print_uses(dbgs()));
		}


/// Pick a register which seems likely to be profitable, and then in any use		/// Pick a register which seems likely to be profitable, and then in any use
/// which has any reference to that register, delete all formulae which do not		/// which has any reference to that register, delete all formulae which do not
/// reference that register.		/// reference that register.
void LSRInstance::NarrowSearchSpaceByPickingWinnerRegs() {		void LSRInstance::NarrowSearchSpaceByPickingWinnerRegs() {
// With all other options exhausted, loop until the system is simple		// With all other options exhausted, loop until the system is simple
// enough to handle.		// enough to handle.
SmallPtrSet<const SCEV *, 4> Taken;		SmallPtrSet<const SCEV *, 4> Taken;
while (EstimateSearchSpaceComplexity() >= ComplexityLimit) {		while (EstimateSearchSpaceComplexity() >= ComplexityLimit) {
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
/// If there are an extraordinary number of formulae to choose from, use some		/// If there are an extraordinary number of formulae to choose from, use some
/// rough heuristics to prune down the number of formulae. This keeps the main		/// rough heuristics to prune down the number of formulae. This keeps the main
/// solver from taking an extraordinary amount of time in some worst-case		/// solver from taking an extraordinary amount of time in some worst-case
/// scenarios.		/// scenarios.
void LSRInstance::NarrowSearchSpaceUsingHeuristics() {		void LSRInstance::NarrowSearchSpaceUsingHeuristics() {
NarrowSearchSpaceByDetectingSupersets();		NarrowSearchSpaceByDetectingSupersets();
NarrowSearchSpaceByCollapsingUnrolledCode();		NarrowSearchSpaceByCollapsingUnrolledCode();
NarrowSearchSpaceByRefilteringUndesirableDedicatedRegisters();		NarrowSearchSpaceByRefilteringUndesirableDedicatedRegisters();
		if (LSRExpNarrow)
		NarrowSearchSpaceByDeletingCostlyFormulas();
		else
NarrowSearchSpaceByPickingWinnerRegs();		NarrowSearchSpaceByPickingWinnerRegs();
}		}

/// This is the recursive solver.		/// This is the recursive solver.
void LSRInstance::SolveRecurse(SmallVectorImpl<const Formula *> &Solution,		void LSRInstance::SolveRecurse(SmallVectorImpl<const Formula *> &Solution,
Cost &SolutionCost,		Cost &SolutionCost,
SmallVectorImpl<const Formula *> &Workspace,		SmallVectorImpl<const Formula *> &Workspace,
const Cost &CurCost,		const Cost &CurCost,
const SmallPtrSet<const SCEV *, 16> &CurRegs,		const SmallPtrSet<const SCEV *, 16> &CurRegs,
▲ Show 20 Lines • Show All 837 Lines • Show Last 20 Lines

test/Transforms/LoopStrengthReduce/2013-01-14-ReuseCast.ll

	; RUN: opt -loop-reduce -S < %s \| FileCheck %s			; RUN: opt -loop-reduce -S < %s \| FileCheck %s
	;			;
	; LTO of clang, which mistakenly uses no TargetLoweringInfo, causes a			; LTO of clang, which mistakenly uses no TargetLoweringInfo, causes a
	; miscompile. ReuseOrCreateCast replace ptrtoint operand with undef.			; miscompile. ReuseOrCreateCast replace ptrtoint operand with undef.
	; Reproducing the miscompile requires no triple, hence no "TTI".			; Reproducing the miscompile requires no triple, hence no "TTI".
	; rdar://13007381			; rdar://13007381

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"

	; Verify that nothing uses the "dead" ptrtoint from "undef".			; Verify that nothing uses the "dead" ptrtoint from "undef".
	; CHECK-LABEL: @VerifyDiagnosticConsumerTest(			; CHECK-LABEL: @VerifyDiagnosticConsumerTest(
	; CHECK: bb:			; CHECK: bb:
	; "dead" ptrpoint not emitted (or dead code eliminated) with			; "dead" ptrpoint not emitted (or dead code eliminated) with
	; current LSR cost model.			; current LSR cost model.
	; CHECK-NOT: = ptrtoint i8* undef to i64			; CHECK-NOT: = ptrtoint i8* undef to i64
	; CHECK: .lr.ph			; CHECK: .lr.ph
	; CHECK: [[TMP:%[^ ]+]] = add i64 %tmp5, 1			; CHECK: [[TMP:%[^ ]+]] = add i64 %4, 1
	; CHECK: sub i64 [[TMP]], %tmp6			; CHECK: sub i64 [[TMP]], %tmp6
	; CHECK: ret void			; CHECK: ret void
	define void @VerifyDiagnosticConsumerTest() unnamed_addr nounwind uwtable align 2 {			define void @VerifyDiagnosticConsumerTest() unnamed_addr nounwind uwtable align 2 {
	bb:			bb:
	%tmp3 = call i8* @getCharData() nounwind			%tmp3 = call i8* @getCharData() nounwind
	%tmp4 = call i8* @getCharData() nounwind			%tmp4 = call i8* @getCharData() nounwind
	%tmp5 = ptrtoint i8* %tmp4 to i64			%tmp5 = ptrtoint i8* %tmp4 to i64
	%tmp6 = ptrtoint i8* %tmp3 to i64			%tmp6 = ptrtoint i8* %tmp3 to i64
	▲ Show 20 Lines • Show All 59 Lines • Show Last 20 Lines

test/Transforms/LoopStrengthReduce/ARM/ivchain-ARM.ll

	Show First 20 Lines • Show All 133 Lines • ▼ Show 20 Lines
	; @foldedidx is an unrolled variant of this loop:			; @foldedidx is an unrolled variant of this loop:
	; for (unsigned long i = 0; i < len; i += s) {			; for (unsigned long i = 0; i < len; i += s) {
	; c[i] = a[i] + b[i];			; c[i] = a[i] + b[i];
	; }			; }
	; where 's' can be folded into the addressing mode.			; where 's' can be folded into the addressing mode.
	; Consequently, we should not form any chains.			; Consequently, we should not form any chains.
	;			;
	; A9: foldedidx:			; A9: foldedidx:
	; A9: ldrb{{(.w)?}} {{r[0-9]\|lr}}, [{{r[0-9]\|lr}}, #3]			; A9: ldrb{{(.w)?}} {{r[0-9]\|lr}}, [{{r[0-9]\|lr}}, #403]
	define void @foldedidx(i8* nocapture %a, i8* nocapture %b, i8* nocapture %c) nounwind ssp {			define void @foldedidx(i8* nocapture %a, i8* nocapture %b, i8* nocapture %c) nounwind ssp {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%i.07 = phi i32 [ 0, %entry ], [ %inc.3, %for.body ]			%i.07 = phi i32 [ 0, %entry ], [ %inc.3, %for.body ]
	%arrayidx = getelementptr inbounds i8, i8* %a, i32 %i.07			%arrayidx = getelementptr inbounds i8, i8* %a, i32 %i.07
	%0 = load i8, i8* %arrayidx, align 1			%0 = load i8, i8* %arrayidx, align 1
	▲ Show 20 Lines • Show All 220 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

LSR: an alternative way to resolve complex solutionClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 88081

lib/Transforms/Scalar/LoopStrengthReduce.cpp

test/Transforms/LoopStrengthReduce/2013-01-14-ReuseCast.ll

test/Transforms/LoopStrengthReduce/ARM/ivchain-ARM.ll

LSR: an alternative way to resolve complex solution
ClosedPublic