This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
-
LoopStrengthReduce.cpp
-
test/
-
CodeGen/X86/
-
X86/
-
regalloc-reconcile-broken-hints.ll
-
Transforms/LoopStrengthReduce/
-
LoopStrengthReduce/
-
2013-01-14-ReuseCast.ll
-
X86/
-
lsr-filtering-scaledreg.ll

Differential D34583

[LSR] Narrow search space by filtering non-optimal formulae with the same ScaledReg and Scale.
ClosedPublic

Authored by wmi on Jun 23 2017, 5:02 PM.

Download Raw Diff

Details

Reviewers

qcolombet
evstupac
davidxl
sanjoy

Commits

rG90707394e37f: [LSR] Narrow search space by filtering non-optimal formulae with the same…
rL307269: [LSR] Narrow search space by filtering non-optimal formulae with the same…

Summary

When the formulae search space is huge, LSR uses a series of heuristic to keep pruning the search space until the number of possible solutions are within certain limit.

The big hammer of the series of heuristics is NarrowSearchSpaceByPickingWinnerRegs, which picks the register which is used by the most LSRUses and deletes the other formulae which don't use the register. This is a effective way to prune the search space, but quite often not a good way to keep the best solution. We saw cases before that the heuristic pruned the best formula candidate out of search space.

To relieve the problem, we introduce a new heuristic called NarrowSearchSpaceByFilterFormulaWithSameScaledReg. The basic idea is in order to reduce the search space while keeping the best formula, we want to keep as many formula with different Scale and ScaledReg as possible. That is because the central idea of LSR is to choose a group of loop induction variables and use those induction variables to represent LSRUses. An induction variable candidate is often represented by the Scale and ScaledReg in a formula. If we have more formulae with different ScaledReg and Scale to choose, we have better opportunity to find the best solution. That is why we believe pruning search space by only keeping the best formula with the same Scale and ScaledReg should be more effective than PickingWinnerReg. And we use two criteria to choose the best formula with the same Scale and ScaledReg. The first criteria is to select the formula using less non shared registers, and the second criteria is to select the formula with less cost got from RateFormula. The patch implements the heuristic before NarrowSearchSpaceByPickingWinnerRegs, which is the last resort.

Testing shows we get 1.8% and 2% on two internal benchmarks on x86. llvm nightly testsuite performance is neutral. We also tried lsr-exp-narrow and it didn't help on the two improved internal cases we saw.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

wmi created this revision.Jun 23 2017, 5:02 PM

Herald added subscribers: mzolotukhin, sanjoy, MatzeB. · View Herald TranscriptJun 23 2017, 5:02 PM

wmi added a reviewer: sanjoy.Jun 26 2017, 3:09 PM

sanjoy requested changes to this revision.Jun 29 2017, 3:38 PM

sanjoy added inline comments.

lib/Transforms/Scalar/LoopStrengthReduce.cpp
146 ↗	(On Diff #103799)	I'd s/scaledreg/scaled-reg/
4318 ↗	(On Diff #103799)	Just to be clear, this benefit here is that you can just look at the base regs and (cheaply) decide which one is better? If so, add a line in the description about that.
4338 ↗	(On Diff #103799)	I think the convention would be `IsBetterThan`. I'd also add a one-liner on the direction of the relation you're computing ("return true if A is better than B").
4339 ↗	(On Diff #103799)	I'm being hyper pedantic here, but I suspect this will be "fewer" new registers.
4345 ↗	(On Diff #103799)	It's not clear to me why you need to use a floating point reciprocal here. Why not just count the total number of uses (as a integer) and compare that? Or is there some more subtle metric you're tracking that requires this reciprocal logic (if so please comment on that).
4355 ↗	(On Diff #103799)	Maybe the `DenseSet` can be declared at function scope and re-used in different invocations to `isBetterThan`? This is a worthwhile micro-optimization since `DenseSet` s are somewhat heavyweight.
4360 ↗	(On Diff #103799)	As far as I can tell, in this situation we're making an arbitrary choice when the register use count is the same and both the formulas have the same cost (we'll just pick the second one). Can we instead keep both the formulas when `Cost` does not give us a unambiguous signal?
4364 ↗	(On Diff #103799)	Let's call `Any` something more descriptive, like `FormulaPruned`. [edit: I just saw that other parts of LSR also call this variable `Any`. I think those too should change, but given the "prior art" I'd be okay if you want to keep this named `Any`.]

This revision now requires changes to proceed.Jun 29 2017, 3:38 PM

wmi marked 4 inline comments as done.Jun 29 2017, 5:04 PM

wmi added inline comments.

lib/Transforms/Scalar/LoopStrengthReduce.cpp
4318 ↗	(On Diff #103799)	The narrowing heuristic is to keep as many formulae with different Scale and ScaledReg pair as possible while narrowing the search space to be within limit. The benefit is that it is more likely to find out a better solution from a formulae set with more Scale and ScaledReg variations than picking the winner reg heuristic. Comments updated.
4345 ↗	(On Diff #103799)	Yeah I don't really need to use floating point. I change it to use integer.
4360 ↗	(On Diff #103799)	We uses the similar strategy in other places, like in FilterOutUndesirableDedicatedRegisters at the place: CostF.isLess(CostBest, TTI)). I think the key point here is to keep the formula set small while providing more induction variable choices for LSR solver, so making an arbitrary choice to reduce the formula set may not be too bad.

Address Sanjoy's comments.

sanjoy added inline comments.Jun 29 2017, 5:53 PM

lib/Transforms/Scalar/LoopStrengthReduce.cpp
4360 ↗	(On Diff #103799)	Sounds good.
4352 ↗	(On Diff #104786)	Can we instead take the total number of uses of registers and `FA.BaseRegs` and `FB.BaseRegs` and compare them? That is: int TotalUsesOfA = 0, TotalUsesOfB = 0; for (const SCEV Reg : FA.BaseRegs) TotalUsesOfA += RegUses.getUsedByIndices(Reg).count(); for (const SCEV Reg : FB.BaseRegs) TotalUsesOfB += RegUses.getUsedByIndices(Reg).count(); if (TotalUsesOfA != TotalUsesOfB) return TotalUsesOfA > TotalUsesOfB; Or does it have to exactly be the expression you're using?
test/Transforms/LoopStrengthReduce/X86/lsr-filtering-scaledreg.ll
7 ↗	(On Diff #104786)	Can you please clean up the names a bit here? Perhaps using metarenamer? Also, are both the loops necessary to show the difference in behavior?

wmi added inline comments.Jun 30 2017, 10:10 AM

lib/Transforms/Scalar/LoopStrengthReduce.cpp
4352 ↗	(On Diff #104786)	We don't want to choose the formula with the most sharing. What we expect is to have fewer registers. If a register is shared, we count it proportionally . That is why a shared register is better than a non-shared register.
test/Transforms/LoopStrengthReduce/X86/lsr-filtering-scaledreg.ll
7 ↗	(On Diff #104786)	metarenamer is a great tool. Thanks for the suggestion. I reduce the test to some extend, but the second loop is still necessary to create a complex enough searching space.

Cleanup and reduce the testcase.

lgtm

test/Transforms/LoopStrengthReduce/X86/lsr-filtering-scaledreg.ll
7 ↗	(On Diff #104786)	I reduce the test to some extend, but the second loop is still necessary to create a complex enough searching space. Sounds good!

This revision is now accepted and ready to land.Jun 30 2017, 4:55 PM

Closed by commit rL307269: [LSR] Narrow search space by filtering non-optimal formulae with the same… (authored by wmi). · Explain WhyJul 6 2017, 8:52 AM

This revision was automatically updated to reflect the committed changes.

test.ll3 KBDownload

test.good.ll3 KBDownload

test.bad.ll3 KBDownload

This patch caused regressions from 5% to 23% in two our internal benchmarks on Cortex-M23 and Cortex-M0+. I attached test.ll which is reduced from the benchmarks. I used LLVM revision 309830. 'test.good.ll' is a result when filtering is disabled. 'test.bad.ll' is a result when filtering is enabled.
Comparing them I can see that this optimization changes how an induction variable is changed. Originally it is incremented from 0 to 256. The optimization changes this into decrementing from 0 to -256. This induction variable is also used as an offset to memory. So to preserve this semantic conversion of the induction variable from a negative value to a positive value is inserted. This is lowered to additional instructions which causes performance regressions.

Could you please have a look at this issue?

Thanks,
Evgeny Astigeevich
The ARM Compiler Optimization team leader

This revision is now accepted and ready to land.Aug 3 2017, 4:36 AM

In D34583#830286, @eastig wrote:

test.ll3 KBDownload

test.good.ll3 KBDownload

test.bad.ll3 KBDownload

This patch caused regressions from 5% to 23% in two our internal benchmarks on Cortex-M23 and Cortex-M0+. I attached test.ll which is reduced from the benchmarks. I used LLVM revision 309830. 'test.good.ll' is a result when filtering is disabled. 'test.bad.ll' is a result when filtering is enabled.
Comparing them I can see that this optimization changes how an induction variable is changed. Originally it is incremented from 0 to 256. The optimization changes this into decrementing from 0 to -256. This induction variable is also used as an offset to memory. So to preserve this semantic conversion of the induction variable from a negative value to a positive value is inserted. This is lowered to additional instructions which causes performance regressions.

Could you please have a look at this issue?

Thanks,
Evgeny Astigeevich
The ARM Compiler Optimization team leader

Hi Evgeny,

Thanks for providing the testcase.

It looks like an existing issue in LSR cost evaluation exposed by the patch. Actually, comparing the trace by adding -debug-only=loop-reduce, all the candidates choosen by LSR without filtering are kept in the candidate set after adding the filter patch. However filtering patch provides some more candidates interesting for LSR cost model to choose, and LSR chooses a different set of candidates in the final result which it thinks better (1 less base add) but actually not. We can see that in the trace:

LSR without the filtering patch:
The chosen solution requires 5 regs, with addrec cost 2, plus 2 base adds, plus 4 imm cost, plus 1 setup cost:

LSR Use: Kind=Address of i8 in addrspace(0), Offsets={0}, widest fixup type: i8*
  reg({%ptr1,+,256}<%for.cond6.preheader.us.i.i>) + 1*reg({0,+,1}<nuw><nsw><%for.body8.us.i.i>)
LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32
  reg(256) + -1*reg({0,+,1}<nuw><nsw><%for.body8.us.i.i>)
LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup type: i32*
  reg(%c0.0103.us.i.i) + 4*reg({0,+,1}<nuw><nsw><%for.body8.us.i.i>)
LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0,4}, widest fixup type: i32*
  reg({(-4 + %c1.0104.us.i.i)<nsw>,+,4}<nsw><%for.body8.us.i.i>)
LSR Use: Kind=Special, Offsets={0}, all-fixups-outside-loop, widest fixup type: i32*
  reg(%c0.0103.us.i.i)
LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup type: i32*
  reg(%c1.0104.us.i.i) + 4*reg({0,+,1}<nuw><nsw><%for.body8.us.i.i>) + imm(4)
LSR Use: Kind=Special, Offsets={0}, all-fixups-outside-loop, widest fixup type: i32*
  reg(%c1.0104.us.i.i)

LSR with the filtering patch:
The chosen solution requires 5 regs, with addrec cost 2, plus 1 base add, plus 4 imm cost, plus 1 setup cost:

LSR Use: Kind=Address of i8 in addrspace(0), Offsets={0}, widest fixup type: i8*
  reg({%ptr1,+,256}<%for.cond6.preheader.us.i.i>) + -1*reg({0,+,-1}<nw><%for.body8.us.i.i>)
LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32
  reg(-256) + -1*reg({0,+,-1}<nw><%for.body8.us.i.i>)
LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup type: i32*
  reg(%c0.0103.us.i.i) + -4*reg({0,+,-1}<nw><%for.body8.us.i.i>)
LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0,4}, widest fixup type: i32*
  reg({(4 + %c1.0104.us.i.i)<nsw>,+,4}<nsw><%for.body8.us.i.i>) + imm(-8)
LSR Use: Kind=Special, Offsets={0}, all-fixups-outside-loop, widest fixup type: i32*
  reg(%c0.0103.us.i.i)
LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup type: i32*
  reg({(4 + %c1.0104.us.i.i)<nsw>,+,4}<nsw><%for.body8.us.i.i>)
LSR Use: Kind=Special, Offsets={0}, all-fixups-outside-loop, widest fixup type: i32*
  reg(%c1.0104.us.i.i)

The real problem is that LSR has no idea about the cost of getting negative value. It thinks 4*reg({0,+,-1} and -4*reg({0,+,-1} have the same cost.

LSR Use: Kind=Address of i8 in addrspace(0), Offsets={0}, widest fixup type: i8*
  reg({%ptr1,+,256}<%for.cond6.preheader.us.i.i>) + -1*reg({0,+,-1}<nw><%for.body8.us.i.i>)
LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup type: i32*
  reg(%c0.0103.us.i.i) + -4*reg({0,+,-1}<nw><%for.body8.us.i.i>)

I will think about how to fix it.

Wei.

Thank you, Wei. Just let me know when you need help to test a fix.

Closed by commit rG90707394e37f: [LSR] Narrow search space by filtering non-optimal formulae with the same… (authored by wmi). · Explain WhyOct 7 2019, 5:50 AM

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptOct 7 2019, 5:50 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

LoopStrengthReduce.cpp

108 lines

test/

CodeGen/

X86/

regalloc-reconcile-broken-hints.ll

2 lines

Transforms/

LoopStrengthReduce/

2013-01-14-ReuseCast.ll

4 lines

X86/

lsr-filtering-scaledreg.ll

60 lines

Diff 223535

llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp

Show First 20 Lines • Show All 134 Lines • ▼ Show 20 Lines	static cl::opt<bool> InsnsCost(
cl::desc("Add instruction count to a LSR cost model"));		cl::desc("Add instruction count to a LSR cost model"));

// Flag to choose how to narrow complex lsr solution		// Flag to choose how to narrow complex lsr solution
static cl::opt<bool> LSRExpNarrow(		static cl::opt<bool> LSRExpNarrow(
"lsr-exp-narrow", cl::Hidden, cl::init(false),		"lsr-exp-narrow", cl::Hidden, cl::init(false),
cl::desc("Narrow LSR complex solution using"		cl::desc("Narrow LSR complex solution using"
" expectation of registers number"));		" expectation of registers number"));

		// Flag to narrow search space by filtering non-optimal formulae with
		// the same ScaledReg and Scale.
		static cl::opt<bool> FilterSameScaledReg(
		"lsr-filter-same-scaled-reg", cl::Hidden, cl::init(true),
		cl::desc("Narrow LSR search space by filtering non-optimal formulae"
		" with the same ScaledReg and Scale"));

#ifndef NDEBUG		#ifndef NDEBUG
// Stress test IV chain generation.		// Stress test IV chain generation.
static cl::opt<bool> StressIVChain(		static cl::opt<bool> StressIVChain(
"stress-ivchain", cl::Hidden, cl::init(false),		"stress-ivchain", cl::Hidden, cl::init(false),
cl::desc("Stress test LSR IV chains"));		cl::desc("Stress test LSR IV chains"));
#else		#else
static bool StressIVChain = false;		static bool StressIVChain = false;
#endif		#endif
▲ Show 20 Lines • Show All 1,746 Lines • ▼ Show 20 Lines	class LSRInstance {
void GenerateAllReuseFormulae();		void GenerateAllReuseFormulae();

void FilterOutUndesirableDedicatedRegisters();		void FilterOutUndesirableDedicatedRegisters();

size_t EstimateSearchSpaceComplexity() const;		size_t EstimateSearchSpaceComplexity() const;
void NarrowSearchSpaceByDetectingSupersets();		void NarrowSearchSpaceByDetectingSupersets();
void NarrowSearchSpaceByCollapsingUnrolledCode();		void NarrowSearchSpaceByCollapsingUnrolledCode();
void NarrowSearchSpaceByRefilteringUndesirableDedicatedRegisters();		void NarrowSearchSpaceByRefilteringUndesirableDedicatedRegisters();
		void NarrowSearchSpaceByFilterFormulaWithSameScaledReg();
void NarrowSearchSpaceByDeletingCostlyFormulas();		void NarrowSearchSpaceByDeletingCostlyFormulas();
void NarrowSearchSpaceByPickingWinnerRegs();		void NarrowSearchSpaceByPickingWinnerRegs();
void NarrowSearchSpaceUsingHeuristics();		void NarrowSearchSpaceUsingHeuristics();

void SolveRecurse(SmallVectorImpl<const Formula *> &Solution,		void SolveRecurse(SmallVectorImpl<const Formula *> &Solution,
Cost &SolutionCost,		Cost &SolutionCost,
SmallVectorImpl<const Formula *> &Workspace,		SmallVectorImpl<const Formula *> &Workspace,
const Cost &CurCost,		const Cost &CurCost,
▲ Show 20 Lines • Show All 2,388 Lines • ▼ Show 20 Lines	if (EstimateSearchSpaceComplexity() >= ComplexityLimit) {

FilterOutUndesirableDedicatedRegisters();		FilterOutUndesirableDedicatedRegisters();

DEBUG(dbgs() << "After pre-selection:\n";		DEBUG(dbgs() << "After pre-selection:\n";
print_uses(dbgs()));		print_uses(dbgs()));
}		}
}		}

		/// If a LSRUse has multiple formulae with the same ScaledReg and Scale.
		/// Pick the best one and delete the others.
		/// This narrowing heuristic is to keep as many formulae with different
		/// Scale and ScaledReg pair as possible while narrowing the search space.
		/// The benefit is that it is more likely to find out a better solution
		/// from a formulae set with more Scale and ScaledReg variations than
		/// a formulae set with the same Scale and ScaledReg. The picking winner
		/// reg heurstic will often keep the formulae with the same Scale and
		/// ScaledReg and filter others, and we want to avoid that if possible.
		void LSRInstance::NarrowSearchSpaceByFilterFormulaWithSameScaledReg() {
		if (EstimateSearchSpaceComplexity() < ComplexityLimit)
		return;

		DEBUG(dbgs() << "The search space is too complex.\n"
		"Narrowing the search space by choosing the best Formula "
		"from the Formulae with the same Scale and ScaledReg.\n");

		// Map the "Scale * ScaledReg" pair to the best formula of current LSRUse.
		typedef DenseMap<std::pair<const SCEV *, int64_t>, size_t> BestFormulaeTy;
		BestFormulaeTy BestFormulae;
		#ifndef NDEBUG
		bool ChangedFormulae = false;
		#endif
		DenseSet<const SCEV *> VisitedRegs;
		SmallPtrSet<const SCEV *, 16> Regs;

		for (size_t LUIdx = 0, NumUses = Uses.size(); LUIdx != NumUses; ++LUIdx) {
		LSRUse &LU = Uses[LUIdx];
		DEBUG(dbgs() << "Filtering for use "; LU.print(dbgs()); dbgs() << '\n');

		// Return true if Formula FA is better than Formula FB.
		auto IsBetterThan = [&](Formula &FA, Formula &FB) {
		// First we will try to choose the Formula with fewer new registers.
		// For a register used by current Formula, the more the register is
		// shared among LSRUses, the less we increase the register number
		// counter of the formula.
		size_t FARegNum = 0;
		for (const SCEV *Reg : FA.BaseRegs) {
		const SmallBitVector &UsedByIndices = RegUses.getUsedByIndices(Reg);
		FARegNum += (NumUses - UsedByIndices.count() + 1);
		}
		size_t FBRegNum = 0;
		for (const SCEV *Reg : FB.BaseRegs) {
		const SmallBitVector &UsedByIndices = RegUses.getUsedByIndices(Reg);
		FBRegNum += (NumUses - UsedByIndices.count() + 1);
		}
		if (FARegNum != FBRegNum)
		return FARegNum < FBRegNum;

		// If the new register numbers are the same, choose the Formula with
		// less Cost.
		Cost CostFA, CostFB;
		Regs.clear();
		CostFA.RateFormula(TTI, FA, Regs, VisitedRegs, L, SE, DT, LU);
		Regs.clear();
		CostFB.RateFormula(TTI, FB, Regs, VisitedRegs, L, SE, DT, LU);
		return CostFA.isLess(CostFB, TTI);
		};

		bool Any = false;
		for (size_t FIdx = 0, NumForms = LU.Formulae.size(); FIdx != NumForms;
		++FIdx) {
		Formula &F = LU.Formulae[FIdx];
		if (!F.ScaledReg)
		continue;
		auto P = BestFormulae.insert({{F.ScaledReg, F.Scale}, FIdx});
		if (P.second)
		continue;

		Formula &Best = LU.Formulae[P.first->second];
		if (IsBetterThan(F, Best))
		std::swap(F, Best);
		DEBUG(dbgs() << " Filtering out formula "; F.print(dbgs());
		dbgs() << "\n"
		" in favor of formula ";
		Best.print(dbgs()); dbgs() << '\n');
		#ifndef NDEBUG
		ChangedFormulae = true;
		#endif
		LU.DeleteFormula(F);
		--FIdx;
		--NumForms;
		Any = true;
		}
		if (Any)
		LU.RecomputeRegs(LUIdx, RegUses);

		// Reset this to prepare for the next use.
		BestFormulae.clear();
		}

		DEBUG(if (ChangedFormulae) {
		dbgs() << "\n"
		"After filtering out undesirable candidates:\n";
		print_uses(dbgs());
		});
		}

/// The function delete formulas with high registers number expectation.		/// The function delete formulas with high registers number expectation.
/// Assuming we don't know the value of each formula (already delete		/// Assuming we don't know the value of each formula (already delete
/// all inefficient), generate probability of not selecting for each		/// all inefficient), generate probability of not selecting for each
/// register.		/// register.
/// For example,		/// For example,
/// Use1:		/// Use1:
/// reg(a) + reg({0,+,1})		/// reg(a) + reg({0,+,1})
/// reg(a) + reg({-1,+,1}) + 1		/// reg(a) + reg({-1,+,1}) + 1
▲ Show 20 Lines • Show All 194 Lines • ▼ Show 20 Lines
/// If there are an extraordinary number of formulae to choose from, use some		/// If there are an extraordinary number of formulae to choose from, use some
/// rough heuristics to prune down the number of formulae. This keeps the main		/// rough heuristics to prune down the number of formulae. This keeps the main
/// solver from taking an extraordinary amount of time in some worst-case		/// solver from taking an extraordinary amount of time in some worst-case
/// scenarios.		/// scenarios.
void LSRInstance::NarrowSearchSpaceUsingHeuristics() {		void LSRInstance::NarrowSearchSpaceUsingHeuristics() {
NarrowSearchSpaceByDetectingSupersets();		NarrowSearchSpaceByDetectingSupersets();
NarrowSearchSpaceByCollapsingUnrolledCode();		NarrowSearchSpaceByCollapsingUnrolledCode();
NarrowSearchSpaceByRefilteringUndesirableDedicatedRegisters();		NarrowSearchSpaceByRefilteringUndesirableDedicatedRegisters();
		if (FilterSameScaledReg)
		NarrowSearchSpaceByFilterFormulaWithSameScaledReg();
if (LSRExpNarrow)		if (LSRExpNarrow)
NarrowSearchSpaceByDeletingCostlyFormulas();		NarrowSearchSpaceByDeletingCostlyFormulas();
else		else
NarrowSearchSpaceByPickingWinnerRegs();		NarrowSearchSpaceByPickingWinnerRegs();
}		}

/// This is the recursive solver.		/// This is the recursive solver.
void LSRInstance::SolveRecurse(SmallVectorImpl<const Formula *> &Solution,		void LSRInstance::SolveRecurse(SmallVectorImpl<const Formula *> &Solution,
▲ Show 20 Lines • Show All 828 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/regalloc-reconcile-broken-hints.ll

	; RUN: llc < %s -o - -mtriple=x86_64-apple-macosx \| FileCheck %s			; RUN: llc -lsr-filter-same-scaled-reg=false < %s -o - -mtriple=x86_64-apple-macosx \| FileCheck %s
	; Test case for the recoloring of broken hints.			; Test case for the recoloring of broken hints.
	; This is tricky to have something reasonably small to kick this optimization since			; This is tricky to have something reasonably small to kick this optimization since
	; it requires that spliting and spilling occur.			; it requires that spliting and spilling occur.
	; The bottom line is that this test case is fragile.			; The bottom line is that this test case is fragile.
	; This was reduced from the make_list function from the llvm-testsuite:			; This was reduced from the make_list function from the llvm-testsuite:
	; SingleSource/Benchmarks/McGill/chomp.c			; SingleSource/Benchmarks/McGill/chomp.c

	target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
	▲ Show 20 Lines • Show All 136 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopStrengthReduce/2013-01-14-ReuseCast.ll

	; RUN: opt -loop-reduce -S < %s \| FileCheck %s			; RUN: opt -loop-reduce -S < %s \| FileCheck %s
	;			;
	; LTO of clang, which mistakenly uses no TargetLoweringInfo, causes a			; LTO of clang, which mistakenly uses no TargetLoweringInfo, causes a
	; miscompile. ReuseOrCreateCast replace ptrtoint operand with undef.			; miscompile. ReuseOrCreateCast replace ptrtoint operand with undef.
	; Reproducing the miscompile requires no triple, hence no "TTI".			; Reproducing the miscompile requires no triple, hence no "TTI".
	; rdar://13007381			; rdar://13007381

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"

	; Verify that nothing uses the "dead" ptrtoint from "undef".			; Verify that nothing uses the "dead" ptrtoint from "undef".
	; CHECK-LABEL: @VerifyDiagnosticConsumerTest(			; CHECK-LABEL: @VerifyDiagnosticConsumerTest(
	; CHECK: bb:			; CHECK: bb:
	; "dead" ptrpoint not emitted (or dead code eliminated) with			; "dead" ptrpoint not emitted (or dead code eliminated) with
	; current LSR cost model.			; current LSR cost model.
	; CHECK-NOT: = ptrtoint i8* undef to i64			; CHECK-NOT: = ptrtoint i8* undef to i64
	; CHECK: .lr.ph			; CHECK: .lr.ph
	; CHECK: [[TMP:%[^ ]+]] = add i64 %tmp5, 1			; CHECK: [[TMP:%[^ ]+]] = add i64 %tmp{{[0-9]+}}, -1
	; CHECK: sub i64 [[TMP]], %tmp6			; CHECK: sub i64 [[TMP]], %tmp{{[0-9]+}}
	; CHECK: ret void			; CHECK: ret void
	define void @VerifyDiagnosticConsumerTest() unnamed_addr nounwind uwtable align 2 {			define void @VerifyDiagnosticConsumerTest() unnamed_addr nounwind uwtable align 2 {
	bb:			bb:
	%tmp3 = call i8* @getCharData() nounwind			%tmp3 = call i8* @getCharData() nounwind
	%tmp4 = call i8* @getCharData() nounwind			%tmp4 = call i8* @getCharData() nounwind
	%tmp5 = ptrtoint i8* %tmp4 to i64			%tmp5 = ptrtoint i8* %tmp4 to i64
	%tmp6 = ptrtoint i8* %tmp3 to i64			%tmp6 = ptrtoint i8* %tmp3 to i64
	%tmp7 = sub i64 %tmp5, %tmp6			%tmp7 = sub i64 %tmp5, %tmp6
	▲ Show 20 Lines • Show All 58 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopStrengthReduce/X86/lsr-filtering-scaledreg.ll

This file was added.

				; RUN: opt < %s -loop-reduce -lsr-filter-same-scaled-reg=true -mtriple=x86_64-unknown-linux-gnu -S \| FileCheck %s

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				%struct.ham = type { i8, i8, [5 x i32], i64, i64, i64 }

				@global = external local_unnamed_addr global %struct.ham, align 8

				define void @foo() local_unnamed_addr {
				bb:
				%tmp = load i64, i64* getelementptr inbounds (%struct.ham, %struct.ham* @global, i64 0, i32 3), align 8
				%tmp1 = and i64 %tmp, 1792
				%tmp2 = load i64, i64* getelementptr inbounds (%struct.ham, %struct.ham* @global, i64 0, i32 4), align 8
				%tmp3 = add i64 %tmp1, %tmp2
				%tmp4 = load i8, i8* null, align 8
				%tmp5 = getelementptr inbounds i8, i8* %tmp4, i64 0
				%tmp6 = sub i64 0, %tmp3
				%tmp7 = getelementptr inbounds i8, i8* %tmp4, i64 %tmp6
				%tmp8 = inttoptr i64 0 to i8*
				br label %bb9

				; Without filtering non-optimal formulae with the same ScaledReg and Scale, the strategy
				; to narrow LSR search space by picking winner reg will generate only one lsr.iv and
				; unoptimal result.
				; CHECK-LABEL: @foo(
				; CHECK: bb9:
				; CHECK-NEXT: = phi i8*
				; CHECK-NEXT: = phi i8*

				bb9: ; preds = %bb12, %bb
				%tmp10 = phi i8* [ %tmp7, %bb ], [ %tmp16, %bb12 ]
				%tmp11 = phi i8* [ %tmp8, %bb ], [ %tmp17, %bb12 ]
				br i1 false, label %bb18, label %bb12

				bb12: ; preds = %bb9
				%tmp13 = getelementptr inbounds i8, i8* %tmp10, i64 8
				%tmp14 = bitcast i8* %tmp13 to i64*
				%tmp15 = load i64, i64* %tmp14, align 1
				%tmp16 = getelementptr inbounds i8, i8* %tmp10, i64 16
				%tmp17 = getelementptr inbounds i8, i8* %tmp11, i64 16
				br label %bb9

				bb18: ; preds = %bb9
				%tmp19 = icmp ugt i8* %tmp11, null
				%tmp20 = getelementptr inbounds i8, i8* %tmp10, i64 8
				%tmp21 = getelementptr inbounds i8, i8* %tmp11, i64 8
				%tmp22 = select i1 %tmp19, i8* %tmp10, i8* %tmp20
				%tmp23 = select i1 %tmp19, i8* %tmp11, i8* %tmp21
				br label %bb24

				bb24: ; preds = %bb24, %bb18
				%tmp25 = phi i8* [ %tmp27, %bb24 ], [ %tmp22, %bb18 ]
				%tmp26 = phi i8* [ %tmp29, %bb24 ], [ %tmp23, %bb18 ]
				%tmp27 = getelementptr inbounds i8, i8* %tmp25, i64 1
				%tmp28 = load i8, i8* %tmp25, align 1
				%tmp29 = getelementptr inbounds i8, i8* %tmp26, i64 1
				store i8 %tmp28, i8* %tmp26, align 1
				%tmp30 = icmp eq i8* %tmp29, %tmp5
				br label %bb24
				}