This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
-
LoopStrengthReduce.cpp
-
test/Transforms/LoopStrengthReduce/X86/
-
Transforms/
-
LoopStrengthReduce/
-
X86/
-
lsr-insns-1.ll
-
lsr-insns-2.ll

Differential D28307

Add Instruction number to LSR cost model (PR23384) part 1 of 3
ClosedPublic

Authored by evstupac on Jan 4 2017, 11:46 AM.

Download Raw Diff

Details

Reviewers

qcolombet
hfinkel

Commits

rGfe6f548d2d69: Fix PR23384 (under "-lsr-insns-cost" option)
rL294821: Fix PR23384 (under "-lsr-insns-cost" option)

Summary

Fix PR23384.
The patch adds instructions number generated by a solution to LSR cost under "-lsr-insns-cost" option.
Performance improvement on x86:
spec2000

177.mesa on -O2 +3%
256.bzip2 on -Ofast -flto +1.5%

Diff Detail

Repository: rL LLVM

Event Timeline

evstupac updated this revision to Diff 83095.Jan 4 2017, 11:46 AM

evstupac retitled this revision from to Add Instruction number to LSR cost model (PR23384) part 1 of 3.

evstupac updated this object.

evstupac added a reviewer: qcolombet.

evstupac set the repository for this revision to rL LLVM.

evstupac added subscribers: llvm-commits, wmi.

Herald added subscribers: mehdi_amini, mzolotukhin. · View Herald TranscriptJan 4 2017, 11:46 AM

Fix "copy/paste" misprint.

PING.

PING 2

hfinkel added a subscriber: hfinkel.Jan 24 2017, 3:38 PM

hfinkel added inline comments.

lib/Transforms/Scalar/LoopStrengthReduce.cpp
1193 ↗	(On Diff #83111)	This represents the spill of the old value, right? We should say so.
1231 ↗	(On Diff #83111)	Please add a comment this explain this (a subtraction for the constant term in the formula?).
1233 ↗	(On Diff #83111)	This is the instruction that represents the increment?
1256 ↗	(On Diff #83111)	Do you really want to make this the most-important factor?

evstupac added inline comments.Jan 24 2017, 4:19 PM

lib/Transforms/Scalar/LoopStrengthReduce.cpp
1193 ↗	(On Diff #83111)	To be exact at least 1 fill. Ideally RA should start with fills.
1233 ↗	(On Diff #83111)	Yes.
1256 ↗	(On Diff #83111)	Yes. What is more important than instruction count on this stage?

Comments fixed according to review.

hfinkel added inline comments.Jan 24 2017, 5:29 PM

lib/Transforms/Scalar/LoopStrengthReduce.cpp
1256 ↗	(On Diff #83111)	Fair enough.
test/Transforms/LoopStrengthReduce/X86/lsr-insns-1.ll
1 ↗	(On Diff #85652)	Please use FileCheck, not grep (and match the name of the statistic, not just the number and the pass name).

Tests fixed according to the comments.

hfinkel added inline comments.Jan 25 2017, 6:32 AM

test/Transforms/LoopStrengthReduce/X86/lsr-insns-1.ll
14 ↗	(On Diff #85683)	Ah, okay. You're just counting the total number of instructions. Please actually match the desired output instruction pattern. Otherwise these tests will be fragile to unrelated changes.

evstupac added inline comments.Jan 25 2017, 10:22 AM

test/Transforms/LoopStrengthReduce/X86/lsr-insns-1.ll
14 ↗	(On Diff #85683)	Well, the tests are really small and simple. I expect more fragility from instruction pattern. Anyway, if any change will increase instruction count here it would be a bad sign and the change author should pay attention to this.

hfinkel added inline comments.Jan 25 2017, 11:55 AM

test/Transforms/LoopStrengthReduce/X86/lsr-insns-1.ll
14 ↗	(On Diff #85683)	Well, the tests are really small and simple. This is exactly why matching the output you expect is reasonable. I expect more fragility from instruction pattern. Perhaps, but make proper use of CHECK-DAG and named regular expressions for allocated registers and the test should not be fragile.

Updated test according to the latest comments.
For lsr-insns-1.ll the most important is to check that there are no "cmp" in the loop.
For lsr-insns-2.ll it is important that calculations are moved to address mode. However several variants are ok:

movl    (%rsi,%rcx,4), %eax
...
incq %rax
cmpq %rcx, %r8

movl    (%rsi,%rcx), %eax
...
addq $4, %rax
cmpq %rcx, %r8

"cmpq %rcx, %r8" could also be replaced with "decq %r8".
So I simplified checks to avoid fails in case of the replacements above.

LGTM. I assume that you're going to send a message to llvm-dev asking people to test this?

This revision is now accepted and ready to land.Jan 25 2017, 4:23 PM

In D28307#657004, @hfinkel wrote:

LGTM. I assume that you're going to send a message to llvm-dev asking people to test this?

Thanks!
Yes. The change is very sensitive. For x86 results are good for benchmarks set I have, but there could be corner cases that I've missed.
Ultimate goal is to make LSR cost target depended (there will be a separate patch for this) and move InsnsCost to 1st priority for x86.

Hi,

Sorry for the delay.
I'll have a look tomorrow.

Cheers,
-Quentin

Hi,

I'd like to see opt tests as well and one piece of the code seems dead to me unless I am missing something. See my inline comment.

Also a related question, what is your plan to move with that heuristic?

Generally speaking, I think the number of instructions is a good heuristic only for code size and it is not even a given. Indeed, for performance we care about critical path, IPL, this kind of thing. What am I saying is having more instructions is not necessarily a bad thing. The other thing is I believe we can end up with cases with less instructions, but more registers and potentially more spill. This is possible to happen because the heuristic that account for "spill" only trigger when the NumRegs gets high enough. However, NumRegs may be low enough and spill may already happen. See my comment below as well.

Cheers,
-Quentin

lib/Transforms/Scalar/LoopStrengthReduce.cpp
1195 ↗	(On Diff #85832)	With this kind of check we are (sort of) sure we are going to spill to materialize this formulae, but that does not mean the other don't. Bare in mind that NumRegs is only what's use in the formulae, not what is live. Your comment is right, just wanted to make sure we are on the same page.
1198 ↗	(On Diff #85832)	I'm confused, this was tested in the condition just before, i.e., it's always going to be true and the else statement is always going to be false. Am I missing something?
test/Transforms/LoopStrengthReduce/X86/lsr-insns-1.ll
1 ↗	(On Diff #85832)	I would like to see opt -loop-reduce checks as well.
test/Transforms/LoopStrengthReduce/X86/lsr-insns-2.ll
1 ↗	(On Diff #85832)	Ditto.

This revision now requires changes to proceed.Jan 26 2017, 5:58 PM

Hi,

Thanks for taking a look.

Also a related question, what is your plan to move with that heuristic?

As we deal in D27695, there will be next part moving Solution cost comparison to a target part.
For x86 I want to move "Insns" to number 1 priority. I've got good performance and code size results on my benchmarks (only linpack got 3% regression, which transforms into gain for some CPUs).
Anyway the results differs for x86 CPUs. So maybe we'll end up with different cost models.

I understand that the change is very sensitive - so an option gives testing opportunity.

Generally speaking, I think the number of instructions is a good heuristic only for code size and it is not even a given. Indeed, for performance we care about critical path, IPL, this kind of thing. What am I saying is having more instructions is not necessarily a bad thing. The other thing is I believe we can end up with cases with less instructions, but more registers and potentially more spill. This is possible to happen because the heuristic that account for "spill" only trigger when the NumRegs gets high enough. However, NumRegs may be low enough and spill may already happen.

Currently LSR take in account only Number of Registers. It is very important for 32 bit mode, but less important for 64 bit mode and float/vector loops.
While testing I have not seen cases where Insns count is less, but NumRegs are much bigger. Insns correlate with NumRegs (because Insns depend AddRec and NumBaseAdds).
Usually the case is that we have 1 more register (or same number) but less instructions. Putting Insns as first priority does not add much spill/fills. However it would be good to get such statistics. I'll gather this for x86.

Thanks,
Evgeny

lib/Transforms/Scalar/LoopStrengthReduce.cpp
1195 ↗	(On Diff #85832)	Yes. I understand this. Instruction number and Register number have correlations. When we move cost calculation to target part we can give Insns and RegNum some weights instead of priority.
1198 ↗	(On Diff #85832)	This is a misprint. It should be PrevNumRegs here like it was in inital patch D27695. Thanks for catching this.
test/Transforms/LoopStrengthReduce/X86/lsr-insns-1.ll
1 ↗	(On Diff #85832)	It is harder to see profit in IR as we are getting even more instructions (assuming target will combine them to complicated address or remove test on 0). The will be harder to understand the test goal, but it is not a big deal to add such.

Fixed according to the comments.
Added 2 opt tests.

In D28307#658330, @qcolombet wrote:

Hi,

...

Generally speaking, I think the number of instructions is a good heuristic only for code size and it is not even a given. Indeed, for performance we care about critical path, IPL, this kind of thing. What am I saying is having more instructions is not necessarily a bad thing.

I agree, for performance we should look at critical-path length, ILP, etc. However, here, we're only ranking one formula at a time, and these generally correspond to dependent chains of computations. Making each formula shorter (i.e. using fewer instructions), as a result, will tend to make each path-length shorter. This is obviously heuristic, because we don't have an interface here to query expected latencies, but most of these things are simple integer operations, so I suspect it is not a bad approximation. Nevertheless, if any of these formula are on the critical path, then hopefully it too will get shorter.

For ILP, we may want a longer sequence, but I think it would be hard to reason about that here. It seems better to do that in the MachineCombiner, or something at that level, and for that purpose, using fewer instructions is also probably better: they'll be easier to pattern-match at the MI level.

qcolombet added inline comments.Jan 27 2017, 9:00 AM

test/Transforms/LoopStrengthReduce/X86/lsr-insns-1.ll
1 ↗	(On Diff #85832)	I'm not really interested by checking the profit, more by checking that this does what we want with the fewer possible interaction with something else. Like you said, CodeGen could be fragile so it is best if we have better isolated testing, which is what opt is going to achieve.

evstupac added inline comments.Jan 27 2017, 9:31 AM

test/Transforms/LoopStrengthReduce/X86/lsr-insns-1.ll
1 ↗	(On Diff #85832)	I'm not really interested by checking the profit, more by checking that this does what we want with the fewer possible interaction with something else. Ok. Are current opt tests meet the needs?

Please merge the opt and llc test cases into one file.
Note: It would be nice to have two opt run lines per test case:

one with lsr insns cost
one without

That may help catch the difference between those two modes. E.g., have a common prefix BOTH and one WITH and one WITHOUT. If the difference between WITH and WITHOUT is too big (i.e., not that much BOTH prefix), don't bother.

test/Transforms/LoopStrengthReduce/X86/lsr-insns-2a.ll
5 ↗	(On Diff #86014)	You can set several run line with different check prefix in the same file. I.e., do not duplicate the test input. Could you check that we have the desired addressing pattern instead of not having an lsr generated variable?

Tests updated according to the latest comments.

LGTM, nitpicks on the check pattern for the llc test case.

test/Transforms/LoopStrengthReduce/X86/lsr-insns-1.ll
29 ↗	(On Diff #86371)	I usually like to check that the operands are correct as well.

This revision is now accepted and ready to land.Jan 31 2017, 2:27 PM

Closed by commit rL294821: Fix PR23384 (under "-lsr-insns-cost" option) (authored by evstupac). · Explain WhyFeb 10 2017, 7:09 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

Scalar/

LoopStrengthReduce.cpp

61 lines

test/

Transforms/

LoopStrengthReduce/

X86/

lsr-insns-1.ll

52 lines

lsr-insns-2.ll

58 lines

Diff 88076

llvm/trunk/lib/Transforms/Scalar/LoopStrengthReduce.cpp

Show First 20 Lines • Show All 123 Lines • ▼ Show 20 Lines
// Temporary flag to cleanup congruent phis after LSR phi expansion.		// Temporary flag to cleanup congruent phis after LSR phi expansion.
// It's currently disabled until we can determine whether it's truly useful or		// It's currently disabled until we can determine whether it's truly useful or
// not. The flag should be removed after the v3.0 release.		// not. The flag should be removed after the v3.0 release.
// This is now needed for ivchains.		// This is now needed for ivchains.
static cl::opt<bool> EnablePhiElim(		static cl::opt<bool> EnablePhiElim(
"enable-lsr-phielim", cl::Hidden, cl::init(true),		"enable-lsr-phielim", cl::Hidden, cl::init(true),
cl::desc("Enable LSR phi elimination"));		cl::desc("Enable LSR phi elimination"));

		// The flag adds instruction count to solutions cost comparision.
		static cl::opt<bool> InsnsCost(
		"lsr-insns-cost", cl::Hidden, cl::init(false),
		cl::desc("Add instruction count to a LSR cost model"));

#ifndef NDEBUG		#ifndef NDEBUG
// Stress test IV chain generation.		// Stress test IV chain generation.
static cl::opt<bool> StressIVChain(		static cl::opt<bool> StressIVChain(
"stress-ivchain", cl::Hidden, cl::init(false),		"stress-ivchain", cl::Hidden, cl::init(false),
cl::desc("Stress test LSR IV chains"));		cl::desc("Stress test LSR IV chains"));
#else		#else
static bool StressIVChain = false;		static bool StressIVChain = false;
#endif		#endif
▲ Show 20 Lines • Show All 180 Lines • ▼ Show 20 Lines	struct Formula {
void initialMatch(const SCEV S, Loop L, ScalarEvolution &SE);		void initialMatch(const SCEV S, Loop L, ScalarEvolution &SE);

bool isCanonical() const;		bool isCanonical() const;

void canonicalize();		void canonicalize();

bool unscale();		bool unscale();

		bool hasZeroEnd() const;

size_t getNumRegs() const;		size_t getNumRegs() const;
Type *getType() const;		Type *getType() const;

void deleteBaseReg(const SCEV *&S);		void deleteBaseReg(const SCEV *&S);

bool referencesReg(const SCEV *S) const;		bool referencesReg(const SCEV *S) const;
bool hasRegsUsedByUsesOtherThan(size_t LUIdx,		bool hasRegsUsedByUsesOtherThan(size_t LUIdx,
const RegUseTracker &RegUses) const;		const RegUseTracker &RegUses) const;
▲ Show 20 Lines • Show All 118 Lines • ▼ Show 20 Lines	bool Formula::unscale() {
if (Scale != 1)		if (Scale != 1)
return false;		return false;
Scale = 0;		Scale = 0;
BaseRegs.push_back(ScaledReg);		BaseRegs.push_back(ScaledReg);
ScaledReg = nullptr;		ScaledReg = nullptr;
return true;		return true;
}		}

		bool Formula::hasZeroEnd() const {
		if (UnfoldedOffset \|\| BaseOffset)
		return false;
		if (BaseRegs.size() != 1 \|\| ScaledReg)
		return false;
		return true;
		}

/// Return the total number of register operands used by this formula. This does		/// Return the total number of register operands used by this formula. This does
/// not include register uses implied by non-constant addrec strides.		/// not include register uses implied by non-constant addrec strides.
size_t Formula::getNumRegs() const {		size_t Formula::getNumRegs() const {
return !!ScaledReg + BaseRegs.size();		return !!ScaledReg + BaseRegs.size();
}		}

/// Return the type of this formula, if it has one, or null otherwise. This type		/// Return the type of this formula, if it has one, or null otherwise. This type
/// is meaningless except for the bit size.		/// is meaningless except for the bit size.
▲ Show 20 Lines • Show All 420 Lines • ▼ Show 20 Lines	static unsigned getScalingFactorCost(const TargetTransformInfo &TTI,
const LSRUse &LU, const Formula &F);		const LSRUse &LU, const Formula &F);

namespace {		namespace {

/// This class is used to measure and compare candidate formulae.		/// This class is used to measure and compare candidate formulae.
class Cost {		class Cost {
/// TODO: Some of these could be merged. Also, a lexical ordering		/// TODO: Some of these could be merged. Also, a lexical ordering
/// isn't always optimal.		/// isn't always optimal.
		unsigned Insns;
unsigned NumRegs;		unsigned NumRegs;
unsigned AddRecCost;		unsigned AddRecCost;
unsigned NumIVMuls;		unsigned NumIVMuls;
unsigned NumBaseAdds;		unsigned NumBaseAdds;
unsigned ImmCost;		unsigned ImmCost;
unsigned SetupCost;		unsigned SetupCost;
unsigned ScaleCost;		unsigned ScaleCost;

public:		public:
Cost()		Cost()
: NumRegs(0), AddRecCost(0), NumIVMuls(0), NumBaseAdds(0), ImmCost(0),		: Insns(0), NumRegs(0), AddRecCost(0), NumIVMuls(0), NumBaseAdds(0),
SetupCost(0), ScaleCost(0) {}		ImmCost(0), SetupCost(0), ScaleCost(0) {}

bool operator<(const Cost &Other) const;		bool operator<(const Cost &Other) const;

void Lose();		void Lose();

#ifndef NDEBUG		#ifndef NDEBUG
// Once any of the metrics loses, they must all remain losers.		// Once any of the metrics loses, they must all remain losers.
bool isValid() {		bool isValid() {
return ((NumRegs \| AddRecCost \| NumIVMuls \| NumBaseAdds		return ((Insns \| NumRegs \| AddRecCost \| NumIVMuls \| NumBaseAdds
\| ImmCost \| SetupCost \| ScaleCost) != ~0u)		\| ImmCost \| SetupCost \| ScaleCost) != ~0u)
\|\| ((NumRegs & AddRecCost & NumIVMuls & NumBaseAdds		\|\| ((Insns & NumRegs & AddRecCost & NumIVMuls & NumBaseAdds
& ImmCost & SetupCost & ScaleCost) == ~0u);		& ImmCost & SetupCost & ScaleCost) == ~0u);
}		}
#endif		#endif

bool isLoser() {		bool isLoser() {
assert(isValid() && "invalid cost");		assert(isValid() && "invalid cost");
return NumRegs == ~0u;		return NumRegs == ~0u;
}		}
▲ Show 20 Lines • Show All 229 Lines • ▼ Show 20 Lines	void Cost::RateFormula(const TargetTransformInfo &TTI,
SmallPtrSetImpl<const SCEV *> &Regs,		SmallPtrSetImpl<const SCEV *> &Regs,
const DenseSet<const SCEV *> &VisitedRegs,		const DenseSet<const SCEV *> &VisitedRegs,
const Loop *L,		const Loop *L,
ScalarEvolution &SE, DominatorTree &DT,		ScalarEvolution &SE, DominatorTree &DT,
const LSRUse &LU,		const LSRUse &LU,
SmallPtrSetImpl<const SCEV > LoserRegs) {		SmallPtrSetImpl<const SCEV > LoserRegs) {
assert(F.isCanonical() && "Cost is accurate only for canonical formula");		assert(F.isCanonical() && "Cost is accurate only for canonical formula");
// Tally up the registers.		// Tally up the registers.
		unsigned PrevAddRecCost = AddRecCost;
		unsigned PrevNumRegs = NumRegs;
		unsigned PrevNumBaseAdds = NumBaseAdds;
if (const SCEV *ScaledReg = F.ScaledReg) {		if (const SCEV *ScaledReg = F.ScaledReg) {
if (VisitedRegs.count(ScaledReg)) {		if (VisitedRegs.count(ScaledReg)) {
Lose();		Lose();
return;		return;
}		}
RatePrimaryRegister(ScaledReg, Regs, L, SE, DT, LoserRegs);		RatePrimaryRegister(ScaledReg, Regs, L, SE, DT, LoserRegs);
if (isLoser())		if (isLoser())
return;		return;
}		}
for (const SCEV *BaseReg : F.BaseRegs) {		for (const SCEV *BaseReg : F.BaseRegs) {
if (VisitedRegs.count(BaseReg)) {		if (VisitedRegs.count(BaseReg)) {
Lose();		Lose();
return;		return;
}		}
RatePrimaryRegister(BaseReg, Regs, L, SE, DT, LoserRegs);		RatePrimaryRegister(BaseReg, Regs, L, SE, DT, LoserRegs);
if (isLoser())		if (isLoser())
return;		return;
}		}

		// Treat every new register that exceeds TTI.getNumberOfRegisters() - 1 as
		// additional instruction (at least fill).
		unsigned TTIRegNum = TTI.getNumberOfRegisters(false) - 1;
		if (NumRegs > TTIRegNum) {
		// Cost already exceeded TTIRegNum, then only newly added register can add
		// new instructions.
		if (PrevNumRegs > TTIRegNum)
		Insns += (NumRegs - PrevNumRegs);
		else
		Insns += (NumRegs - TTIRegNum);
		}

// Determine how many (unfolded) adds we'll need inside the loop.		// Determine how many (unfolded) adds we'll need inside the loop.
size_t NumBaseParts = F.getNumRegs();		size_t NumBaseParts = F.getNumRegs();
if (NumBaseParts > 1)		if (NumBaseParts > 1)
// Do not count the base and a possible second register if the target		// Do not count the base and a possible second register if the target
// allows to fold 2 registers.		// allows to fold 2 registers.
NumBaseAdds +=		NumBaseAdds +=
NumBaseParts - (1 + (F.Scale && isAMCompletelyFolded(TTI, LU, F)));		NumBaseParts - (1 + (F.Scale && isAMCompletelyFolded(TTI, LU, F)));
NumBaseAdds += (F.UnfoldedOffset != 0);		NumBaseAdds += (F.UnfoldedOffset != 0);
Show All 12 Lines	else if (Offset != 0)
ImmCost += APInt(64, Offset, true).getMinSignedBits();		ImmCost += APInt(64, Offset, true).getMinSignedBits();

// Check with target if this offset with this instruction is		// Check with target if this offset with this instruction is
// specifically not supported.		// specifically not supported.
if ((isa<LoadInst>(Fixup.UserInst) \|\| isa<StoreInst>(Fixup.UserInst)) &&		if ((isa<LoadInst>(Fixup.UserInst) \|\| isa<StoreInst>(Fixup.UserInst)) &&
!TTI.isFoldableMemAccessOffset(Fixup.UserInst, Offset))		!TTI.isFoldableMemAccessOffset(Fixup.UserInst, Offset))
NumBaseAdds++;		NumBaseAdds++;
}		}

		// If ICmpZero formula ends with not 0, it could not be replaced by
		// just add or sub. We'll need to compare final result of AddRec.
		// That means we'll need an additional instruction.
		// For -10 + {0, +, 1}:
		// i = i + 1;
		// cmp i, 10
		//
		// For {-10, +, 1}:
		// i = i + 1;
		if (LU.Kind == LSRUse::ICmpZero && !F.hasZeroEnd())
		Insns++;
		// Each new AddRec adds 1 instruction to calculation.
		Insns += (AddRecCost - PrevAddRecCost);

		// BaseAdds adds instructions for unfolded registers.
		if (LU.Kind != LSRUse::ICmpZero)
		Insns += NumBaseAdds - PrevNumBaseAdds;
assert(isValid() && "invalid cost");		assert(isValid() && "invalid cost");
}		}

/// Set this cost to a losing value.		/// Set this cost to a losing value.
void Cost::Lose() {		void Cost::Lose() {
		Insns = ~0u;
NumRegs = ~0u;		NumRegs = ~0u;
AddRecCost = ~0u;		AddRecCost = ~0u;
NumIVMuls = ~0u;		NumIVMuls = ~0u;
NumBaseAdds = ~0u;		NumBaseAdds = ~0u;
ImmCost = ~0u;		ImmCost = ~0u;
SetupCost = ~0u;		SetupCost = ~0u;
ScaleCost = ~0u;		ScaleCost = ~0u;
}		}

/// Choose the lower cost.		/// Choose the lower cost.
bool Cost::operator<(const Cost &Other) const {		bool Cost::operator<(const Cost &Other) const {
		if (InsnsCost && Insns != Other.Insns)
		return Insns < Other.Insns;
return std::tie(NumRegs, AddRecCost, NumIVMuls, NumBaseAdds, ScaleCost,		return std::tie(NumRegs, AddRecCost, NumIVMuls, NumBaseAdds, ScaleCost,
ImmCost, SetupCost) <		ImmCost, SetupCost) <
std::tie(Other.NumRegs, Other.AddRecCost, Other.NumIVMuls,		std::tie(Other.NumRegs, Other.AddRecCost, Other.NumIVMuls,
Other.NumBaseAdds, Other.ScaleCost, Other.ImmCost,		Other.NumBaseAdds, Other.ScaleCost, Other.ImmCost,
Other.SetupCost);		Other.SetupCost);
}		}

void Cost::print(raw_ostream &OS) const {		void Cost::print(raw_ostream &OS) const {
		OS << Insns << " instruction" << (Insns == 1 ? " " : "s ");
OS << NumRegs << " reg" << (NumRegs == 1 ? "" : "s");		OS << NumRegs << " reg" << (NumRegs == 1 ? "" : "s");
if (AddRecCost != 0)		if (AddRecCost != 0)
OS << ", with addrec cost " << AddRecCost;		OS << ", with addrec cost " << AddRecCost;
if (NumIVMuls != 0)		if (NumIVMuls != 0)
OS << ", plus " << NumIVMuls << " IV mul" << (NumIVMuls == 1 ? "" : "s");		OS << ", plus " << NumIVMuls << " IV mul" << (NumIVMuls == 1 ? "" : "s");
if (NumBaseAdds != 0)		if (NumBaseAdds != 0)
OS << ", plus " << NumBaseAdds << " base add"		OS << ", plus " << NumBaseAdds << " base add"
<< (NumBaseAdds == 1 ? "" : "s");		<< (NumBaseAdds == 1 ? "" : "s");
▲ Show 20 Lines • Show All 3,858 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/LoopStrengthReduce/X86/lsr-insns-1.ll

				; RUN: opt < %s -loop-reduce -mtriple=x86_64 -lsr-insns-cost -S \| FileCheck %s -check-prefix=BOTH -check-prefix=INSN
				; RUN: opt < %s -loop-reduce -mtriple=x86_64 -S \| FileCheck %s -check-prefix=BOTH -check-prefix=REGS
				; RUN: llc < %s -O2 -march=x86-64 -lsr-insns-cost -asm-verbose=0 \| FileCheck %s

				; OPT test checks that LSR optimize compare for static counter to compare with 0.

				; BOTH: for.body:
				; INSN: icmp eq i64 %lsr.iv.next, 0
				; REGS: icmp eq i64 %indvars.iv.next, 1024

				; LLC test checks that LSR optimize compare for static counter.
				; That means that instead of creating the following:
				; movl %ecx, (%rdx,%rax,4)
				; incq %rax
				; cmpq $1024, %rax
				; LSR should optimize out cmp:
				; movl %ecx, 4096(%rdx,%rax)
				; addq $4, %rax
				; or
				; movl %ecx, 4096(%rdx,%rax,4)
				; incq %rax

				; CHECK: LBB0_1:
				; CHECK-NEXT: movl 4096(%{{...}},[[REG:%...]]
				; CHECK-NEXT: addl 4096(%{{...}},[[REG]]
				; CHECK-NEXT: movl %{{...}}, 4096(%{{...}},[[REG]]
				; CHECK-NOT: cmp
				; CHECK: jne

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				; Function Attrs: norecurse nounwind uwtable
				define void @foo(i32* nocapture readonly %x, i32* nocapture readonly %y, i32* nocapture %q) {
				entry:
				br label %for.body

				for.cond.cleanup: ; preds = %for.body
				ret void

				for.body: ; preds = %for.body, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%arrayidx = getelementptr inbounds i32, i32* %x, i64 %indvars.iv
				%tmp = load i32, i32* %arrayidx, align 4
				%arrayidx2 = getelementptr inbounds i32, i32* %y, i64 %indvars.iv
				%tmp1 = load i32, i32* %arrayidx2, align 4
				%add = add nsw i32 %tmp1, %tmp
				%arrayidx4 = getelementptr inbounds i32, i32* %q, i64 %indvars.iv
				store i32 %add, i32* %arrayidx4, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 1024
				br i1 %exitcond, label %for.cond.cleanup, label %for.body
				}

llvm/trunk/test/Transforms/LoopStrengthReduce/X86/lsr-insns-2.ll

				; RUN: opt < %s -loop-reduce -mtriple=x86_64 -lsr-insns-cost -S \| FileCheck %s -check-prefix=BOTH -check-prefix=INSN
				; RUN: opt < %s -loop-reduce -mtriple=x86_64 -S \| FileCheck %s -check-prefix=BOTH -check-prefix=REGS
				; RUN: llc < %s -O2 -march=x86-64 -lsr-insns-cost -asm-verbose=0 \| FileCheck %s

				; OPT checks that LSR prefers less instructions to less registers.
				; For x86 LSR should prefer complicated address to new lsr induction
				; variables.

				; BOTH: for.body:
				; INSN: getelementptr i32, i32* %x, i64 %indvars.iv
				; INSN: getelementptr i32, i32* %y, i64 %indvars.iv
				; INSN: getelementptr i32, i32* %q, i64 %indvars.iv
				; REGS %lsr.iv4 = phi
				; REGS %lsr.iv2 = phi
				; REGS %lsr.iv1 = phi
				; REGS: getelementptr i32, i32* %lsr.iv1, i64 1
				; REGS: getelementptr i32, i32* %lsr.iv2, i64 1
				; REGS: getelementptr i32, i32* %lsr.iv4, i64 1

				; LLC checks that LSR prefers less instructions to less registers.
				; LSR should prefer complicated address to additonal add instructions.

				; CHECK: LBB0_2:
				; CHECK-NEXT: movl (%r{{[a-z][a-z]}},
				; CHECK-NEXT: addl (%r{{[a-z][a-z]}},
				; CHECK-NEXT: movl %e{{[a-z][a-z]}}, (%r{{[a-z][a-z]}},

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				; Function Attrs: norecurse nounwind uwtable
				define void @foo(i32* nocapture readonly %x, i32* nocapture readonly %y, i32* nocapture %q, i32 %n) {
				entry:
				%cmp10 = icmp sgt i32 %n, 0
				br i1 %cmp10, label %for.body.preheader, label %for.cond.cleanup

				for.body.preheader: ; preds = %entry
				%wide.trip.count = zext i32 %n to i64
				br label %for.body

				for.cond.cleanup.loopexit: ; preds = %for.body
				br label %for.cond.cleanup

				for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry
				ret void

				for.body: ; preds = %for.body, %for.body.preheader
				%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %for.body.preheader ]
				%arrayidx = getelementptr inbounds i32, i32* %x, i64 %indvars.iv
				%tmp = load i32, i32* %arrayidx, align 4
				%arrayidx2 = getelementptr inbounds i32, i32* %y, i64 %indvars.iv
				%tmp1 = load i32, i32* %arrayidx2, align 4
				%add = add nsw i32 %tmp1, %tmp
				%arrayidx4 = getelementptr inbounds i32, i32* %q, i64 %indvars.iv
				store i32 %add, i32* %arrayidx4, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %wide.trip.count
				br i1 %exitcond, label %for.cond.cleanup.loopexit, label %for.body
				}