This is an archive of the discontinued LLVM Phabricator instance.

Improve the cost evaluation of LSR
Needs ReviewPublic

Authored by wmi on May 1 2015, 12:10 AM.

Download Raw Diff

Details

Reviewers

qcolombet
atrick
hfinkel

Summary

Motivation driven by testcases:

Existing LSR evaluates the cost of LSRUse Formula and LSR Solution by assigning weights to some indexes from high to low: NumRegs > AddRecCost > NumIVMuls > ... This doesn't work well for the testcase mentioned in https://llvm.org/bugs/show_bug.cgi?id=23384. For the testcase, on x86, AddRecCost should be valued more important than NumRegs because it will be translated to increased insn numbers directly, especially when the register pressure in the loop is low.

This problem may not be apparent on other architecture. X86 supports complex addressing mode so several load/store insns can share the same AddRec by using complex addressing mode. For those architectures not supporting complex addressing mode, they need extra insns for loads/stores to share the same AddRec, so saving AddRecCosts may not be translated to saving of insns.

However for architectures with pre/postinc addressing mode, we even want to increase AddRec. Here is another testcase:
int a[1000], b[1000], N, j;

void foo() {

long i;
for (i = 0; i < N; i+=1) {
  a[i+12] = 15;
  b[i-2] = 15;
}

}

For AArch64, llvm trunk generates code:.LBB0_2: // %for.body

lsl     x13, x9, #2
add     x9, x9, #1              // =1
add      x14, x10, x13
add      x13, x12, x13
str     w11, [x14, #48]
stur    w11, [x13, #-8]
cmp      x9, x8
b.lt    .LBB0_2

With the patch attached, it generates more AddRec but utilizes post-increment addressing mode to reduce the insns needed.
.LBB0_2: // %for.body

str     w12, [x10], #4
str     w12, [x11], #4
add     x9, x9, #1              // =1
cmp      x9, x8
b.lt    .LBB0_2

From the two testcases above, I think the existing cost evaluation which uses NumRegs as the major index may not be optimal, because instruction number is the most important factor showing the cost. Understanding all the cases in which extra instruction will be introduced and including those cases in the LSR cost model is the key to get the best performance.

Major changes in the patch:
1. The weights of new indexes from high to low: InstNumCost + SpillRegsCost > InstCmplxCost > SetupCost. InstNumCost is the number of extra instructions needed. SpillRegsCost is computed from regs used by Cost::UpdateSpillRegsCost (A simple version is used and it will be improved in the future). InstCmplxCost shows the complexity of instructions in the loop, which affects the laency of instruction decoding and available insn execution ports. SetupCost is the cost of instructions inserted in loop preheader. 2. InstNumCost is updated for some cases according to benchmark analysis on X86 architecture. It is also updated to handle the second testcase above to utilize pre/post-increment addressing mode on AArch64 or ARM.
  - For AddRec, InstNumCost will be increased by 1 unless the stride can be folded into load/store insns by using pre/post-increment addressing mode.
  - For ICmpZero, the loop can be a countdown loop only when the formula of ICmpZero only contains one register -- in this case, the conditional compare or test instruction can be omitted. In other cases, InstNumCost will be increased by 1.
  - Original NumBaseAdds/NumIVMuls/ImmCost are included into InstNumCost.

Request for suggestions. I did performance testing and tuning for the patch using google benchmarks on x86, and saw some perf improvement (0.2% geomean on sandybridge and 0.4% on westmere). But I didn't do any testing and tuning for other architectures so it can be worse than the original LSR on other architectures. Any suggestions to improve it on other architectures if the brief idea here is ok.

Thanks,
Wei.

Diff Detail

Repository: rL LLVM

Event Timeline

wmi updated this revision to Diff 24794.May 1 2015, 12:10 AM

wmi retitled this revision from to Improve the cost evaluation of LSR .

wmi updated this object.

wmi edited the test plan for this revision. (Show Details)

wmi added reviewers: atrick, sanjoy, hfinkel.

wmi set the repository for this revision to rL LLVM.

wmi added subscribers: Unknown Object (MLST), davidxl.

Herald added a subscriber: aemerson. · View Herald TranscriptMay 1 2015, 12:10 AM

atrick added a reviewer: qcolombet.May 1 2015, 4:02 AM

Hi Wei,

The short story is: I do not think this is the way to go.

Now, the long story.

I have mixed feelings on the direction of the approach. On one hand, I also think we should optimize for performance as long as register pressure is not a problem. On the other hand, the register pressure estimate at this level too rough to make any useful decisions.

Your current approach illustrate this point. Indeed, IIRC NumRegs only gives you the number of registers you need to materialize the formulae, we do not consider how many register we already need in the loop or through the loop. Therefore, I believe by tweaking the body of the loop in your motivating example (i.e., adding just enough live ranges), we can bloat the register pressure with the new rating and have spill within the loop, whereas we wouldn’t with the previous rating.

I also mentioned that in the related PR, but I believe the way to go is: not to care on register pressure and just rate the cost of the loop body. However, this implies the backends are able to recover from the register pressure bloat and I believe we are not quite here.

*How do we move?

I would suggest we add an internal option to make LSR more aggressive w.r.t. to register pressure, and fix all the problems that rise in the backends. Then, we can turn that option on by default.

What if other people still believe this is the right way to move?

Like I said, I do not think this is the right way to go. Now, if other people believe it is, I would at least expect that you supply more details numbers. In particular, what are the actual numbers (not just the geometric means) and what are the regressions, why we do not care or how do we plan to fix them.

Cheers,
-Quentin

Hi Quentin,

Thanks for explaining your idea to the LSR problem in detail.

Now, the long story.

I have mixed feelings on the direction of the approach. On one hand, I also think we should optimize for performance as long as register pressure is not a problem. On the other hand, the register pressure estimate at this level too rough to make any useful decisions.

I agree with you, except that the register pressure estimate cannot
make any useful decisions.
I understand that the register pressure estimation cannot be very
precise in an early stage like LSR, but it could still be useful,
especially when the real register pressure in a loop is very low or
very high. When the register pressure in a loop is close to the number
of available registers, I admit my patch can intend to use more
register, but the intention is still to reduce instruction number in
other place, like reducing recurrance adds at the end of loop or
reducing NumAddParts for a LSRUse. Perf impact in such case will be
hard to tell, because we may have more spills, but have less add insns
in the loop at the same time.

Your current approach illustrate this point. Indeed, IIRC NumRegs only gives you the number of registers you need to materialize the formulae, we do not consider how many register we already need in the loop or through the loop. Therefore, I believe by tweaking the body of the loop in your motivating example (i.e., adding just enough live ranges), we can bloat the register pressure with the new rating and have spill within the loop, whereas we wouldn’t with the previous rating.

I hope for the test tweaked, although the new rating may have more
spills, it will have less add insns. If it is not the case, it is a
bug in the new rating I should look at. The new rating have no reason
to increase NumRegs when it cannot reduce InstNumCost.

I also mentioned that in the related PR, but I believe the way to go is: not to care on register pressure and just rate the cost of the loop body. However, this implies the backends are able to recover from the register pressure bloat and I believe we are not quite here.

I agree it is a difficut way to go.

*How do we move?

I would suggest we add an internal option to make LSR more aggressive w.r.t. to register pressure, and fix all the problems that rise in the backends. Then, we can turn that option on by default.

That is a good suggestion! We can have more testcases then by
comparing several approaches.

What if other people still believe this is the right way to move?

Like I said, I do not think this is the right way to go. Now, if other people believe it is, I would at least expect that you supply more details numbers. In particular, what are the actual numbers (not just the geometric means) and what are the regressions, why we do not care or how do we plan to fix them.

The improvement is: 1% for Ad Delivery, 1.5% for licence place
detection, 1.5% for object recognition, 3% for a matrix computation
library. I also tested spec2000 and found almost no perf difference.
I didn't see significant regressions caused by LSR now. Actually I saw
at first then I fixed those regressions and collected some testcases.
There were still some regressions but caused by other side effect
after analysis. However all the tests are only for x86. I believe it
will have many regressions on other platforms.

Thanks,
Wei.

spatel added a subscriber: spatel.May 1 2015, 1:08 PM

Something to consider when evaluating changes in this area: I see no perf difference between the existing and proposed codegen for the loops seen in https://llvm.org/bugs/show_bug.cgi?id=23384#c0 when running on an Intel Haswell system, but there is a >10% win running on an AMD Jaguar system when using the complex addressing. The Jaguar core is narrower (2-wide issue) and has much less micro-architectural shock absorption than Haswell. Wins and losses may be more apparent on smaller cores such as this or Atom.

FWIW, I applied the patch, ran the benchmarking subset of test-suite on the Jaguar system, and saw a 0.7% geomean improvement after filtering out lots of noisy tests. Benchmarks/McGill/chomp is the largest improvement: +31%; Benchmarks/Stanford/FloatMM is the worst regression: -10%. I haven't analyzed the results any more than that yet.

Thanks Sanjay for the testing. I havn't considered the difference
between different microarchitectures, which is needed to be considered
in the future. For the patch, I regarded all x86 architecture as the
same and evaluated the cost mainly using inst numbers.

It is good to know for Jaguar the patch got some performance
improvement. And I am especially interested at the cause of the
regression. I will try the benchmark and see if there is any
instruction increase there.

Wei.

Hi,

I separated the existing and new cost model code, and used the following options to choose the cost model.

-lsrmodel                                               - Choose the LSR cost model:
  =basic                                                -   enable basic cost model
  =aggr-unlimited                                       -   enable aggressive cost model and assume unlimited hard regs
  =aggr-estimate                                        -   enable aggressive cost model and estimate reg pressure
  =aggr-few                                             -   enable aggressive cost model and assume very few hard regs

-lsrmodel=basic is the existing implementation. -lsrmodel=aggr-estimate is the new cost model. -lsrmodel=aggr-unlimited and -lsrmodel=aggr-few represent the cases when we have unlimited hardregs or very few hardregs.

It is very appreciated to try the new cost model on various platforms, so it can be improved based on more testcases.

Other changes includes adding two testcases and fixing an inst increase regression on x86.

Thanks,
Wei.

I don't think I'm qualified to review this.

Hi Wei,

What is the long-term plan here?

My understanding is that you want to gather benchmark numbers and maybe change the default to the best one.
Is what you are aiming for?

Thanks,
-Quentin

In D9429#213884, @qcolombet wrote:

Hi Wei,

What is the long-term plan here?

My understanding is that you want to gather benchmark numbers and maybe change the default to the best one.
Is what you are aiming for?

Thanks,
-Quentin

Hi Quentin,

Sorry for not updating this one for a while. For google benchmarks and for x86 platform, aggr-estimate is better than basic mode for several benchmarks. There is no much difference between aggr-unlimited, aggr-few and aggr-estimate. My plan is to compare the performance between various modes and do another round of perf analysis based on llvm testsuite (This part has not been done), refine the logic of aggr-estimate mode if there is regression. Then if the status is satisfying, checkin the patch (if review is ok) and make the aggr-estimate mode available for wider use and further tuning (But still keep the basic mode as default until it becomes good enough on platforms other than x86).

Thanks,
Wei.

Hi Wei,

Thanks for the update.

There is no much difference between aggr-unlimited, aggr-few and aggr-estimate.

This seems to concur my feeling that the register pressure estimation of the proposed model is off.
Given that evidence, I would drop aggr-estimate and aggr-few and focus aggr-unlimited. Indeed, this is the only configuration that this new models represent correctly AFAICT.

Thanks,
-Quentin

Revision Contents

Path

Size

include/

llvm/

Analysis/

TargetTransformInfo.h

9 lines

TargetTransformInfoImpl.h

2 lines

CodeGen/

BasicTTIImpl.h

4 lines

Target/

TargetLowering.h

5 lines

lib/

Analysis/

TargetTransformInfo.cpp

4 lines

Target/

AArch64/

AArch64ISelLowering.h

1 line

AArch64ISelLowering.cpp

6 lines

ARM/

ARMISelLowering.h

4 lines

ARMISelLowering.cpp

4 lines

Transforms/

Scalar/

LoopStrengthReduce.cpp

609 lines

test/

Transforms/

LoopStrengthReduce/

new_cost_1.ll

45 lines

new_cost_2.ll

111 lines

Diff 25507

include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 299 Lines • ▼ Show 20 Lines	public:
/// \brief Return true if the addressing mode represented by AM is legal for		/// \brief Return true if the addressing mode represented by AM is legal for
/// this target, for a load/store of the specified type.		/// this target, for a load/store of the specified type.
/// The type may be VoidTy, in which case only return true if the addressing		/// The type may be VoidTy, in which case only return true if the addressing
/// mode is legal for a load/store of any legal type.		/// mode is legal for a load/store of any legal type.
/// TODO: Handle pre/postinc as well.		/// TODO: Handle pre/postinc as well.
bool isLegalAddressingMode(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,		bool isLegalAddressingMode(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,
bool HasBaseReg, int64_t Scale) const;		bool HasBaseReg, int64_t Scale) const;

		/// \brief Return true when the AddRec can be folded into other insn. This is
		/// possible when the target supports pre/postinc and the stride of the AddRec
		/// is an acceptable const.
		bool isAddRecFoldable(int64_t stride) const;

/// \brief Return true if the target works with masked instruction		/// \brief Return true if the target works with masked instruction
/// AVX2 allows masks for consecutive load and store for i32 and i64 elements.		/// AVX2 allows masks for consecutive load and store for i32 and i64 elements.
/// AVX-512 architecture will also allow masks for non-consecutive memory		/// AVX-512 architecture will also allow masks for non-consecutive memory
/// accesses.		/// accesses.
bool isLegalMaskedStore(Type *DataType, int Consecutive) const;		bool isLegalMaskedStore(Type *DataType, int Consecutive) const;
bool isLegalMaskedLoad(Type *DataType, int Consecutive) const;		bool isLegalMaskedLoad(Type *DataType, int Consecutive) const;

/// \brief Return the cost of the scaling factor used in the addressing		/// \brief Return the cost of the scaling factor used in the addressing
▲ Show 20 Lines • Show All 216 Lines • ▼ Show 20 Lines	virtual unsigned getIntrinsicCost(Intrinsic::ID IID, Type *RetTy,
ArrayRef<const Value *> Arguments) = 0;		ArrayRef<const Value *> Arguments) = 0;
virtual unsigned getUserCost(const User *U) = 0;		virtual unsigned getUserCost(const User *U) = 0;
virtual bool hasBranchDivergence() = 0;		virtual bool hasBranchDivergence() = 0;
virtual bool isSourceOfDivergence(const Value *V) = 0;		virtual bool isSourceOfDivergence(const Value *V) = 0;
virtual bool isLoweredToCall(const Function *F) = 0;		virtual bool isLoweredToCall(const Function *F) = 0;
virtual void getUnrollingPreferences(Loop *L, UnrollingPreferences &UP) = 0;		virtual void getUnrollingPreferences(Loop *L, UnrollingPreferences &UP) = 0;
virtual bool isLegalAddImmediate(int64_t Imm) = 0;		virtual bool isLegalAddImmediate(int64_t Imm) = 0;
virtual bool isLegalICmpImmediate(int64_t Imm) = 0;		virtual bool isLegalICmpImmediate(int64_t Imm) = 0;
		virtual bool isAddRecFoldable(int64_t stride) = 0;
virtual bool isLegalAddressingMode(Type Ty, GlobalValue BaseGV,		virtual bool isLegalAddressingMode(Type Ty, GlobalValue BaseGV,
int64_t BaseOffset, bool HasBaseReg,		int64_t BaseOffset, bool HasBaseReg,
int64_t Scale) = 0;		int64_t Scale) = 0;
virtual bool isLegalMaskedStore(Type *DataType, int Consecutive) = 0;		virtual bool isLegalMaskedStore(Type *DataType, int Consecutive) = 0;
virtual bool isLegalMaskedLoad(Type *DataType, int Consecutive) = 0;		virtual bool isLegalMaskedLoad(Type *DataType, int Consecutive) = 0;
virtual int getScalingFactorCost(Type Ty, GlobalValue BaseGV,		virtual int getScalingFactorCost(Type Ty, GlobalValue BaseGV,
int64_t BaseOffset, bool HasBaseReg,		int64_t BaseOffset, bool HasBaseReg,
int64_t Scale) = 0;		int64_t Scale) = 0;
▲ Show 20 Lines • Show All 94 Lines • ▼ Show 20 Lines	void getUnrollingPreferences(Loop *L, UnrollingPreferences &UP) override {
return Impl.getUnrollingPreferences(L, UP);		return Impl.getUnrollingPreferences(L, UP);
}		}
bool isLegalAddImmediate(int64_t Imm) override {		bool isLegalAddImmediate(int64_t Imm) override {
return Impl.isLegalAddImmediate(Imm);		return Impl.isLegalAddImmediate(Imm);
}		}
bool isLegalICmpImmediate(int64_t Imm) override {		bool isLegalICmpImmediate(int64_t Imm) override {
return Impl.isLegalICmpImmediate(Imm);		return Impl.isLegalICmpImmediate(Imm);
}		}
		bool isAddRecFoldable(int64_t stride) override {
		return Impl.isAddRecFoldable(stride);
		}
bool isLegalAddressingMode(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,		bool isLegalAddressingMode(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,
bool HasBaseReg, int64_t Scale) override {		bool HasBaseReg, int64_t Scale) override {
return Impl.isLegalAddressingMode(Ty, BaseGV, BaseOffset, HasBaseReg,		return Impl.isLegalAddressingMode(Ty, BaseGV, BaseOffset, HasBaseReg,
Scale);		Scale);
}		}
bool isLegalMaskedStore(Type *DataType, int Consecutive) override {		bool isLegalMaskedStore(Type *DataType, int Consecutive) override {
return Impl.isLegalMaskedStore(DataType, Consecutive);		return Impl.isLegalMaskedStore(DataType, Consecutive);
}		}
▲ Show 20 Lines • Show All 221 Lines • Show Last 20 Lines

include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 200 Lines • ▼ Show 20 Lines	public:
}		}

void getUnrollingPreferences(Loop *, TTI::UnrollingPreferences &) {}		void getUnrollingPreferences(Loop *, TTI::UnrollingPreferences &) {}

bool isLegalAddImmediate(int64_t Imm) { return false; }		bool isLegalAddImmediate(int64_t Imm) { return false; }

bool isLegalICmpImmediate(int64_t Imm) { return false; }		bool isLegalICmpImmediate(int64_t Imm) { return false; }

		bool isAddRecFoldable(int64_t stride) { return false; }

bool isLegalAddressingMode(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,		bool isLegalAddressingMode(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,
bool HasBaseReg, int64_t Scale) {		bool HasBaseReg, int64_t Scale) {
// Guess that reg+reg addressing is allowed. This heuristic is taken from		// Guess that reg+reg addressing is allowed. This heuristic is taken from
// the implementation of LSR.		// the implementation of LSR.
return !BaseGV && BaseOffset == 0 && Scale <= 1;		return !BaseGV && BaseOffset == 0 && Scale <= 1;
}		}

bool isLegalMaskedStore(Type *DataType, int Consecutive) { return false; }		bool isLegalMaskedStore(Type *DataType, int Consecutive) { return false; }
▲ Show 20 Lines • Show All 225 Lines • Show Last 20 Lines

include/llvm/CodeGen/BasicTTIImpl.h

Show First 20 Lines • Show All 118 Lines • ▼ Show 20 Lines	public:
bool isLegalAddImmediate(int64_t imm) {		bool isLegalAddImmediate(int64_t imm) {
return getTLI()->isLegalAddImmediate(imm);		return getTLI()->isLegalAddImmediate(imm);
}		}

bool isLegalICmpImmediate(int64_t imm) {		bool isLegalICmpImmediate(int64_t imm) {
return getTLI()->isLegalICmpImmediate(imm);		return getTLI()->isLegalICmpImmediate(imm);
}		}

		bool isAddRecFoldable(int64_t stride) {
		return getTLI()->isAddRecFoldable(stride);
		}

bool isLegalAddressingMode(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,		bool isLegalAddressingMode(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,
bool HasBaseReg, int64_t Scale) {		bool HasBaseReg, int64_t Scale) {
TargetLoweringBase::AddrMode AM;		TargetLoweringBase::AddrMode AM;
AM.BaseGV = BaseGV;		AM.BaseGV = BaseGV;
AM.BaseOffs = BaseOffset;		AM.BaseOffs = BaseOffset;
AM.HasBaseReg = HasBaseReg;		AM.HasBaseReg = HasBaseReg;
AM.Scale = Scale;		AM.Scale = Scale;
return getTLI()->isLegalAddressingMode(AM, Ty);		return getTLI()->isLegalAddressingMode(AM, Ty);
▲ Show 20 Lines • Show All 633 Lines • Show Last 20 Lines

include/llvm/Target/TargetLowering.h

Show First 20 Lines • Show All 1,456 Lines • ▼ Show 20 Lines	public:

/// Return true if the specified immediate is legal icmp immediate, that is		/// Return true if the specified immediate is legal icmp immediate, that is
/// the target has icmp instructions which can compare a register against the		/// the target has icmp instructions which can compare a register against the
/// immediate without having to materialize the immediate into a register.		/// immediate without having to materialize the immediate into a register.
virtual bool isLegalICmpImmediate(int64_t) const {		virtual bool isLegalICmpImmediate(int64_t) const {
return true;		return true;
}		}

		/// \brief Return true when the AddRec can be folded into other insn. This is
		/// possible when the target supports pre/postinc and the stride of the AddRec
		/// is an acceptable const.
		virtual bool isAddRecFoldable(int64_t stride) const { return false; }

/// Return true if the specified immediate is legal add immediate, that is the		/// Return true if the specified immediate is legal add immediate, that is the
/// target has add instructions which can add a register with the immediate		/// target has add instructions which can add a register with the immediate
/// without having to materialize the immediate into a register.		/// without having to materialize the immediate into a register.
virtual bool isLegalAddImmediate(int64_t) const {		virtual bool isLegalAddImmediate(int64_t) const {
return true;		return true;
}		}

/// Return true if it's significantly cheaper to shift a vector by a uniform		/// Return true if it's significantly cheaper to shift a vector by a uniform
▲ Show 20 Lines • Show All 1,311 Lines • Show Last 20 Lines

lib/Analysis/TargetTransformInfo.cpp

	Show First 20 Lines • Show All 91 Lines • ▼ Show 20 Lines
	bool TargetTransformInfo::isLegalAddImmediate(int64_t Imm) const {			bool TargetTransformInfo::isLegalAddImmediate(int64_t Imm) const {
	return TTIImpl->isLegalAddImmediate(Imm);			return TTIImpl->isLegalAddImmediate(Imm);
	}			}

	bool TargetTransformInfo::isLegalICmpImmediate(int64_t Imm) const {			bool TargetTransformInfo::isLegalICmpImmediate(int64_t Imm) const {
	return TTIImpl->isLegalICmpImmediate(Imm);			return TTIImpl->isLegalICmpImmediate(Imm);
	}			}

				bool TargetTransformInfo::isAddRecFoldable(int64_t stride) const {
				return TTIImpl->isAddRecFoldable(stride);
				}

	bool TargetTransformInfo::isLegalAddressingMode(Type Ty, GlobalValue BaseGV,			bool TargetTransformInfo::isLegalAddressingMode(Type Ty, GlobalValue BaseGV,
	int64_t BaseOffset,			int64_t BaseOffset,
	bool HasBaseReg,			bool HasBaseReg,
	int64_t Scale) const {			int64_t Scale) const {
	return TTIImpl->isLegalAddressingMode(Ty, BaseGV, BaseOffset, HasBaseReg,			return TTIImpl->isLegalAddressingMode(Ty, BaseGV, BaseOffset, HasBaseReg,
	Scale);			Scale);
	}			}

	▲ Show 20 Lines • Show All 217 Lines • Show Last 20 Lines

lib/Target/AArch64/AArch64ISelLowering.h

Show First 20 Lines • Show All 301 Lines • ▼ Show 20 Lines	public:
bool isZExtFree(SDValue Val, EVT VT2) const override;		bool isZExtFree(SDValue Val, EVT VT2) const override;

bool hasPairedLoad(Type *LoadedType,		bool hasPairedLoad(Type *LoadedType,
unsigned &RequiredAligment) const override;		unsigned &RequiredAligment) const override;
bool hasPairedLoad(EVT LoadedType, unsigned &RequiredAligment) const override;		bool hasPairedLoad(EVT LoadedType, unsigned &RequiredAligment) const override;

bool isLegalAddImmediate(int64_t) const override;		bool isLegalAddImmediate(int64_t) const override;
bool isLegalICmpImmediate(int64_t) const override;		bool isLegalICmpImmediate(int64_t) const override;
		bool isAddRecFoldable(int64_t stride) const override;

EVT getOptimalMemOpType(uint64_t Size, unsigned DstAlign, unsigned SrcAlign,		EVT getOptimalMemOpType(uint64_t Size, unsigned DstAlign, unsigned SrcAlign,
bool IsMemset, bool ZeroMemset, bool MemcpyStrSrc,		bool IsMemset, bool ZeroMemset, bool MemcpyStrSrc,
MachineFunction &MF) const override;		MachineFunction &MF) const override;

/// isLegalAddressingMode - Return true if the addressing mode represented		/// isLegalAddressingMode - Return true if the addressing mode represented
/// by AM is legal for this target, for a load/store of the specified type.		/// by AM is legal for this target, for a load/store of the specified type.
bool isLegalAddressingMode(const AddrMode &AM, Type *Ty) const override;		bool isLegalAddressingMode(const AddrMode &AM, Type *Ty) const override;
▲ Show 20 Lines • Show All 201 Lines • Show Last 20 Lines

lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 6,760 Lines • ▼ Show 20 Lines
	// Integer comparisons are implemented with ADDS/SUBS, so the range of valid			// Integer comparisons are implemented with ADDS/SUBS, so the range of valid
	// immediates is the same as for an add or a sub.			// immediates is the same as for an add or a sub.
	bool AArch64TargetLowering::isLegalICmpImmediate(int64_t Immed) const {			bool AArch64TargetLowering::isLegalICmpImmediate(int64_t Immed) const {
	if (Immed < 0)			if (Immed < 0)
	Immed *= -1;			Immed *= -1;
	return isLegalAddImmediate(Immed);			return isLegalAddImmediate(Immed);
	}			}

				/// AArch64 supports pre/postinc so the stride of the AddRec may be folded
				/// into other load/store insn.
				bool AArch64TargetLowering::isAddRecFoldable(int64_t stride) const {
				return true;
				}

	/// isLegalAddressingMode - Return true if the addressing mode represented			/// isLegalAddressingMode - Return true if the addressing mode represented
	/// by AM is legal for this target, for a load/store of the specified type.			/// by AM is legal for this target, for a load/store of the specified type.
	bool AArch64TargetLowering::isLegalAddressingMode(const AddrMode &AM,			bool AArch64TargetLowering::isLegalAddressingMode(const AddrMode &AM,
	Type *Ty) const {			Type *Ty) const {
	// AArch64 has five basic addressing modes:			// AArch64 has five basic addressing modes:
	// reg			// reg
	// reg + 9-bit signed offset			// reg + 9-bit signed offset
	// reg + SIZE_IN_BYTES * 12-bit unsigned offset			// reg + SIZE_IN_BYTES * 12-bit unsigned offset
	▲ Show 20 Lines • Show All 2,303 Lines • Show Last 20 Lines

lib/Target/ARM/ARMISelLowering.h

Show First 20 Lines • Show All 288 Lines • ▼ Show 20 Lines	public:
bool isLegalT2ScaledAddressingMode(const AddrMode &AM, EVT VT) const;		bool isLegalT2ScaledAddressingMode(const AddrMode &AM, EVT VT) const;

/// isLegalICmpImmediate - Return true if the specified immediate is legal		/// isLegalICmpImmediate - Return true if the specified immediate is legal
/// icmp immediate, that is the target has icmp instructions which can		/// icmp immediate, that is the target has icmp instructions which can
/// compare a register against the immediate without having to materialize		/// compare a register against the immediate without having to materialize
/// the immediate into a register.		/// the immediate into a register.
bool isLegalICmpImmediate(int64_t Imm) const override;		bool isLegalICmpImmediate(int64_t Imm) const override;

		/// isAddRecFoldable - Return true if the const stride of AddRec can be
		/// folded into previous load/store insn via pre/postinc addressing mode.
		bool isAddRecFoldable(int64_t stride) const override;

/// isLegalAddImmediate - Return true if the specified immediate is legal		/// isLegalAddImmediate - Return true if the specified immediate is legal
/// add immediate, that is the target has add instructions which can		/// add immediate, that is the target has add instructions which can
/// add a register and the immediate without having to materialize		/// add a register and the immediate without having to materialize
/// the immediate into a register.		/// the immediate into a register.
bool isLegalAddImmediate(int64_t Imm) const override;		bool isLegalAddImmediate(int64_t Imm) const override;

/// getPreIndexedAddressParts - returns true by value, base pointer and		/// getPreIndexedAddressParts - returns true by value, base pointer and
/// offset pointer and addressing mode by reference if the node's address		/// offset pointer and addressing mode by reference if the node's address
▲ Show 20 Lines • Show All 314 Lines • Show Last 20 Lines

lib/Target/ARM/ARMISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 10,293 Lines • ▼ Show 20 Lines	bool ARMTargetLowering::isLegalICmpImmediate(int64_t Imm) const {
if (!Subtarget->isThumb())		if (!Subtarget->isThumb())
return ARM_AM::getSOImmVal(std::abs(Imm)) != -1;		return ARM_AM::getSOImmVal(std::abs(Imm)) != -1;
if (Subtarget->isThumb2())		if (Subtarget->isThumb2())
return ARM_AM::getT2SOImmVal(std::abs(Imm)) != -1;		return ARM_AM::getT2SOImmVal(std::abs(Imm)) != -1;
// Thumb1 doesn't have cmn, and only 8-bit immediates.		// Thumb1 doesn't have cmn, and only 8-bit immediates.
return Imm >= 0 && Imm <= 255;		return Imm >= 0 && Imm <= 255;
}		}

		/// ARM supports pre/postinc so the stride of the AddRec may be folded
		/// into other load/store insn.
		bool ARMTargetLowering::isAddRecFoldable(int64_t stride) const { return true; }

/// isLegalAddImmediate - Return true if the specified immediate is a legal add		/// isLegalAddImmediate - Return true if the specified immediate is a legal add
/// or sub immediate, that is the target has add or sub instructions which can		/// or sub immediate, that is the target has add or sub instructions which can
/// add a register with the immediate without having to materialize the		/// add a register with the immediate without having to materialize the
/// immediate into a register.		/// immediate into a register.
bool ARMTargetLowering::isLegalAddImmediate(int64_t Imm) const {		bool ARMTargetLowering::isLegalAddImmediate(int64_t Imm) const {
// Same encoding for add/sub, just flip the sign.		// Same encoding for add/sub, just flip the sign.
int64_t AbsImm = std::abs(Imm);		int64_t AbsImm = std::abs(Imm);
if (!Subtarget->isThumb())		if (!Subtarget->isThumb())
▲ Show 20 Lines • Show All 1,081 Lines • Show Last 20 Lines

lib/Transforms/Scalar/LoopStrengthReduce.cpp

Show First 20 Lines • Show All 97 Lines • ▼ Show 20 Lines
// Stress test IV chain generation.		// Stress test IV chain generation.
static cl::opt<bool> StressIVChain(		static cl::opt<bool> StressIVChain(
"stress-ivchain", cl::Hidden, cl::init(false),		"stress-ivchain", cl::Hidden, cl::init(false),
cl::desc("Stress test LSR IV chains"));		cl::desc("Stress test LSR IV chains"));
#else		#else
static bool StressIVChain = false;		static bool StressIVChain = false;
#endif		#endif

		enum LSRModel { BasicMod, AggrModUnlimitedReg, AggrModEstReg, AggrModFewReg };

		static cl::opt<LSRModel> LSRCostModel(
		"lsrmodel", cl::Hidden, cl::init(BasicMod),
		cl::desc("Choose the LSR cost model:"),
		cl::values(
		clEnumValN(BasicMod, "basic", "enable basic cost model"),
		clEnumValN(
		AggrModUnlimitedReg, "aggr-unlimited",
		"enable aggressive cost model and assume unlimited hard regs"),
		clEnumValN(AggrModEstReg, "aggr-estimate",
		"enable aggressive cost model and estimate reg pressure"),
		clEnumValN(
		AggrModFewReg, "aggr-few",
		"enable aggressive cost model and assume very few hard regs"),
		clEnumValEnd));

namespace {		namespace {

/// RegSortData - This class holds data which is used to order reuse candidates.		/// RegSortData - This class holds data which is used to order reuse candidates.
class RegSortData {		class RegSortData {
public:		public:
/// UsedByIndices - This represents the set of LSRUse indices which reference		/// UsedByIndices - This represents the set of LSRUse indices which reference
/// a particular register.		/// a particular register.
SmallBitVector UsedByIndices;		SmallBitVector UsedByIndices;
▲ Show 20 Lines • Show All 728 Lines • ▼ Show 20 Lines
/// This function returns true if \p LU can accommodate what \p F		/// This function returns true if \p LU can accommodate what \p F
/// defines and up to 1 base + 1 scaled + offset.		/// defines and up to 1 base + 1 scaled + offset.
/// In other words, if \p F has several base registers, this function may		/// In other words, if \p F has several base registers, this function may
/// still return true. Therefore, users still need to account for		/// still return true. Therefore, users still need to account for
/// additional base registers and/or unfolded offsets to derive an		/// additional base registers and/or unfolded offsets to derive an
/// accurate cost model.		/// accurate cost model.
static bool isAMCompletelyFolded(const TargetTransformInfo &TTI,		static bool isAMCompletelyFolded(const TargetTransformInfo &TTI,
const LSRUse &LU, const Formula &F);		const LSRUse &LU, const Formula &F);
// Get the cost of the scaling factor used in F for LU.
static unsigned getScalingFactorCost(const TargetTransformInfo &TTI,
const LSRUse &LU, const Formula &F);

namespace {		namespace {

		/// Types of the cost models.
		enum CostTypes { BasicCostType, AggressiveCostType };

/// Cost - This class is used to measure and compare candidate formulae.		/// Cost - This class is used to measure and compare candidate formulae.
class Cost {		class Cost {
		protected:
		unsigned NumRegs;
		unsigned SetupCost;

		/// it shows the class type.
		unsigned CostType;

		public:
		static Cost &Create();
		static void Delete(Cost &);

		virtual ~Cost(){};
		virtual bool operator<(const Cost &Other) const = 0;
		virtual void operator=(const Cost &Other) = 0;
		virtual void Lose() = 0;
		#ifndef NDEBUG
		virtual bool isValid() = 0;
		#endif
		virtual void clear() = 0;
		virtual bool isLoser() = 0;
		virtual void
		RateFormula(const TargetTransformInfo &TTI, const Formula &F,
		SmallPtrSetImpl<const SCEV *> &Regs,
		const DenseSet<const SCEV > &VisitedRegs, const Loop L,
		const SmallVectorImpl<int64_t> &Offsets, ScalarEvolution &SE,
		DominatorTree &DT, const LSRUse &LU,
		SmallPtrSetImpl<const SCEV > LoserRegs = nullptr) = 0;

		virtual void print(raw_ostream &OS) const = 0;
		void dump();
		unsigned getCostType() const { return CostType; }
		};

		class BasicCost : public Cost {
/// TODO: Some of these could be merged. Also, a lexical ordering		/// TODO: Some of these could be merged. Also, a lexical ordering
/// isn't always optimal.		/// isn't always optimal.
unsigned NumRegs;
unsigned AddRecCost;		unsigned AddRecCost;
unsigned NumIVMuls;		unsigned NumIVMuls;
unsigned NumBaseAdds;		unsigned NumBaseAdds;
unsigned ImmCost;		unsigned ImmCost;
unsigned SetupCost;
unsigned ScaleCost;		unsigned ScaleCost;

public:		public:
Cost()		void clear();
: NumRegs(0), AddRecCost(0), NumIVMuls(0), NumBaseAdds(0), ImmCost(0),
SetupCost(0), ScaleCost(0) {}		BasicCost() {
		CostType = BasicCostType;
		clear();
		}

bool operator<(const Cost &Other) const;		bool operator<(const Cost &Other) const override;
		void operator=(const Cost &Other) override;

void Lose();		void Lose();

#ifndef NDEBUG		#ifndef NDEBUG
// Once any of the metrics loses, they must all remain losers.		// Once any of the metrics loses, they must all remain losers.
bool isValid() {		virtual bool isValid() {
return ((NumRegs \| AddRecCost \| NumIVMuls \| NumBaseAdds		return ((NumRegs \| AddRecCost \| NumIVMuls \| NumBaseAdds
\| ImmCost \| SetupCost \| ScaleCost) != ~0u)		\| ImmCost \| SetupCost \| ScaleCost) != ~0u)
\|\| ((NumRegs & AddRecCost & NumIVMuls & NumBaseAdds		\|\| ((NumRegs & AddRecCost & NumIVMuls & NumBaseAdds
& ImmCost & SetupCost & ScaleCost) == ~0u);		& ImmCost & SetupCost & ScaleCost) == ~0u);
}		}
#endif		#endif

bool isLoser() {		bool isLoser() {
assert(isValid() && "invalid cost");		assert(isValid() && "invalid cost");
return NumRegs == ~0u;		return NumRegs == ~0u;
}		}

void RateFormula(const TargetTransformInfo &TTI,		virtual void RateFormula(const TargetTransformInfo &TTI, const Formula &F,
const Formula &F,
SmallPtrSetImpl<const SCEV *> &Regs,		SmallPtrSetImpl<const SCEV *> &Regs,
const DenseSet<const SCEV *> &VisitedRegs,		const DenseSet<const SCEV *> &VisitedRegs,
const Loop *L,		const Loop *L,
const SmallVectorImpl<int64_t> &Offsets,		const SmallVectorImpl<int64_t> &Offsets,
ScalarEvolution &SE, DominatorTree &DT,		ScalarEvolution &SE, DominatorTree &DT,
const LSRUse &LU,		const LSRUse &LU,
SmallPtrSetImpl<const SCEV > LoserRegs = nullptr);		SmallPtrSetImpl<const SCEV > LoserRegs = nullptr);

void print(raw_ostream &OS) const;		virtual void print(raw_ostream &OS) const;
void dump() const;
		static inline bool classof(const Cost *CT) {
		return CT->getCostType() == BasicCostType;
		}

private:		private:
void RateRegister(const SCEV *Reg,		void RateRegister(const SCEV Reg, SmallPtrSetImpl<const SCEV > &Regs,
SmallPtrSetImpl<const SCEV *> &Regs,		const Loop *L, ScalarEvolution &SE, DominatorTree &DT);
const Loop *L,		void RatePrimaryRegister(const SCEV Reg, SmallPtrSetImpl<const SCEV > &Regs,
ScalarEvolution &SE, DominatorTree &DT);		const Loop *L, ScalarEvolution &SE,
void RatePrimaryRegister(const SCEV *Reg,		DominatorTree &DT,
		SmallPtrSetImpl<const SCEV > LoserRegs);
		unsigned getScalingFactorCost(const TargetTransformInfo &TTI,
		const LSRUse &LU, const Formula &F);
		};

		class AggressiveCost : public Cost {
		/// Inst number cost.
		unsigned InstNumCost;
		/// Cost of regs which may spill.
		unsigned SpillRegsCost;
		/// Inst complexity cost.
		unsigned InstCmplxCost;

		public:
		void clear();

		AggressiveCost() {
		CostType = AggressiveCostType;
		clear();
		}

		bool operator<(const Cost &Other) const override;
		void operator=(const Cost &Other) override;

		void Lose();

		#ifndef NDEBUG
		// Once any of the metrics loses, they must all remain losers.
		bool isValid() {
		return (
		((InstNumCost \| SpillRegsCost \| InstCmplxCost \| SetupCost) != ~0u) \|\|
		((InstNumCost & SpillRegsCost & InstCmplxCost & SetupCost) == ~0u));
		}
		#endif

		bool isLoser() {
		assert(isValid() && "invalid cost");
		return InstNumCost == ~0u;
		}

		virtual void RateFormula(const TargetTransformInfo &TTI, const Formula &F,
SmallPtrSetImpl<const SCEV *> &Regs,		SmallPtrSetImpl<const SCEV *> &Regs,
		const DenseSet<const SCEV *> &VisitedRegs,
const Loop *L,		const Loop *L,
		const SmallVectorImpl<int64_t> &Offsets,
ScalarEvolution &SE, DominatorTree &DT,		ScalarEvolution &SE, DominatorTree &DT,
SmallPtrSetImpl<const SCEV > LoserRegs);		const LSRUse &LU,
		SmallPtrSetImpl<const SCEV > LoserRegs = nullptr);

		virtual void print(raw_ostream &OS) const;

		static inline bool classof(const Cost *CT) {
		return CT->getCostType() == AggressiveCostType;
		}

		private:
		void RateRegister(const TargetTransformInfo &TTI, const SCEV *Reg,
		SmallPtrSetImpl<const SCEV > &Regs, const Loop L,
		ScalarEvolution &SE, DominatorTree &DT, const LSRUse &LU,
		bool &IsIndexedLoadStore);
		void RatePrimaryRegister(const TargetTransformInfo &TTI, const SCEV *Reg,
		SmallPtrSetImpl<const SCEV > &Regs, const Loop L,
		ScalarEvolution &SE, DominatorTree &DT,
		SmallPtrSetImpl<const SCEV > LoserRegs,
		const LSRUse &LU, bool &IsIndexedLoadStore);
		unsigned getScalingFactorCost(const TargetTransformInfo &TTI,
		const LSRUse &LU, const Formula &F);
		void UpdateSpillRegsCost(const TargetTransformInfo &TTI);
		unsigned EvaluateBaseParts(const Formula &F, ScalarEvolution &SE,
		const Loop *L);
};		};

}		}

		Cost &Cost::Create() {
		if (LSRCostModel == BasicMod)
		return *(new BasicCost());
		else
		return *(new AggressiveCost());
		}

		void Cost::Delete(Cost &cost) { delete &cost; }

		void BasicCost::clear() {
		NumRegs = 0;
		AddRecCost = 0;
		NumIVMuls = 0;
		NumBaseAdds = 0;
		ImmCost = 0;
		SetupCost = 0;
		ScaleCost = 0;
		}

/// RateRegister - Tally up interesting quantities from the given register.		/// RateRegister - Tally up interesting quantities from the given register.
void Cost::RateRegister(const SCEV *Reg,		void BasicCost::RateRegister(const SCEV *Reg,
SmallPtrSetImpl<const SCEV *> &Regs,		SmallPtrSetImpl<const SCEV > &Regs, const Loop L,
const Loop *L,
ScalarEvolution &SE, DominatorTree &DT) {		ScalarEvolution &SE, DominatorTree &DT) {
if (const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(Reg)) {		if (const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(Reg)) {
// If this is an addrec for another loop, don't second-guess its addrec phi		// If this is an addrec for another loop, don't second-guess its addrec phi
// nodes. LSR isn't currently smart enough to reason about more than one		// nodes. LSR isn't currently smart enough to reason about more than one
// loop at a time. LSR has already run on inner loops, will not run on outer		// loop at a time. LSR has already run on inner loops, will not run on outer
// loops, and cannot be expected to change sibling loops.		// loops, and cannot be expected to change sibling loops.
if (AR->getLoop() != L) {		if (AR->getLoop() != L) {
// If the AddRec exists, consider it's register free and leave it alone.		// If the AddRec exists, consider it's register free and leave it alone.
if (isExistingPhi(AR, SE))		if (isExistingPhi(AR, SE))
Show All 28 Lines	if (!isa<SCEVUnknown>(Reg) &&

NumIVMuls += isa<SCEVMulExpr>(Reg) &&		NumIVMuls += isa<SCEVMulExpr>(Reg) &&
SE.hasComputableLoopEvolution(Reg, L);		SE.hasComputableLoopEvolution(Reg, L);
}		}

/// RatePrimaryRegister - Record this register in the set. If we haven't seen it		/// RatePrimaryRegister - Record this register in the set. If we haven't seen it
/// before, rate it. Optional LoserRegs provides a way to declare any formula		/// before, rate it. Optional LoserRegs provides a way to declare any formula
/// that refers to one of those regs an instant loser.		/// that refers to one of those regs an instant loser.
void Cost::RatePrimaryRegister(const SCEV *Reg,		void BasicCost::RatePrimaryRegister(const SCEV *Reg,
SmallPtrSetImpl<const SCEV *> &Regs,		SmallPtrSetImpl<const SCEV *> &Regs,
const Loop *L,		const Loop *L, ScalarEvolution &SE,
ScalarEvolution &SE, DominatorTree &DT,		DominatorTree &DT,
SmallPtrSetImpl<const SCEV > LoserRegs) {		SmallPtrSetImpl<const SCEV > LoserRegs) {
if (LoserRegs && LoserRegs->count(Reg)) {		if (LoserRegs && LoserRegs->count(Reg)) {
Lose();		Lose();
return;		return;
}		}
if (Regs.insert(Reg).second) {		if (Regs.insert(Reg).second) {
RateRegister(Reg, Regs, L, SE, DT);		RateRegister(Reg, Regs, L, SE, DT);
if (LoserRegs && isLoser())		if (LoserRegs && isLoser())
LoserRegs->insert(Reg);		LoserRegs->insert(Reg);
}		}
}		}

void Cost::RateFormula(const TargetTransformInfo &TTI,		void BasicCost::RateFormula(const TargetTransformInfo &TTI, const Formula &F,
const Formula &F,
SmallPtrSetImpl<const SCEV *> &Regs,		SmallPtrSetImpl<const SCEV *> &Regs,
const DenseSet<const SCEV *> &VisitedRegs,		const DenseSet<const SCEV *> &VisitedRegs,
const Loop *L,		const Loop *L,
const SmallVectorImpl<int64_t> &Offsets,		const SmallVectorImpl<int64_t> &Offsets,
ScalarEvolution &SE, DominatorTree &DT,		ScalarEvolution &SE, DominatorTree &DT,
const LSRUse &LU,		const LSRUse &LU,
SmallPtrSetImpl<const SCEV > LoserRegs) {		SmallPtrSetImpl<const SCEV > LoserRegs) {
assert(F.isCanonical() && "Cost is accurate only for canonical formula");		assert(F.isCanonical() && "Cost is accurate only for canonical formula");
// Tally up the registers.		// Tally up the registers.
if (const SCEV *ScaledReg = F.ScaledReg) {		if (const SCEV *ScaledReg = F.ScaledReg) {
if (VisitedRegs.count(ScaledReg)) {		if (VisitedRegs.count(ScaledReg)) {
Lose();		Lose();
return;		return;
}		}
RatePrimaryRegister(ScaledReg, Regs, L, SE, DT, LoserRegs);		RatePrimaryRegister(ScaledReg, Regs, L, SE, DT, LoserRegs);
Show All 33 Lines	if (F.BaseGV)
// TODO: This should probably be the pointer size.		// TODO: This should probably be the pointer size.
else if (Offset != 0)		else if (Offset != 0)
ImmCost += APInt(64, Offset, true).getMinSignedBits();		ImmCost += APInt(64, Offset, true).getMinSignedBits();
}		}
assert(isValid() && "invalid cost");		assert(isValid() && "invalid cost");
}		}

/// Lose - Set this cost to a losing value.		/// Lose - Set this cost to a losing value.
void Cost::Lose() {		void BasicCost::Lose() {
NumRegs = ~0u;		NumRegs = ~0u;
AddRecCost = ~0u;		AddRecCost = ~0u;
NumIVMuls = ~0u;		NumIVMuls = ~0u;
NumBaseAdds = ~0u;		NumBaseAdds = ~0u;
ImmCost = ~0u;		ImmCost = ~0u;
SetupCost = ~0u;		SetupCost = ~0u;
ScaleCost = ~0u;		ScaleCost = ~0u;
}		}

/// operator< - Choose the lower cost.		/// operator< - Choose the lower cost.
bool Cost::operator<(const Cost &Other) const {		bool BasicCost::operator<(const Cost &CT) const {
		assert(isa<BasicCost>(CT) && "Expect a BasicCost object");
		const BasicCost &Other = static_cast<const BasicCost &>(CT);
return std::tie(NumRegs, AddRecCost, NumIVMuls, NumBaseAdds, ScaleCost,		return std::tie(NumRegs, AddRecCost, NumIVMuls, NumBaseAdds, ScaleCost,
ImmCost, SetupCost) <		ImmCost, SetupCost) <
std::tie(Other.NumRegs, Other.AddRecCost, Other.NumIVMuls,		std::tie(Other.NumRegs, Other.AddRecCost, Other.NumIVMuls,
Other.NumBaseAdds, Other.ScaleCost, Other.ImmCost,		Other.NumBaseAdds, Other.ScaleCost, Other.ImmCost,
Other.SetupCost);		Other.SetupCost);
}		}

void Cost::print(raw_ostream &OS) const {		void BasicCost::operator=(const Cost &CT) {
		assert(isa<BasicCost>(CT) && "Expect a BasicCost object");
		const BasicCost &Other = static_cast<const BasicCost &>(CT);

		NumRegs = Other.NumRegs;
		AddRecCost = Other.AddRecCost;
		NumIVMuls = Other.NumIVMuls;
		NumBaseAdds = Other.NumBaseAdds;
		ImmCost = Other.ImmCost;
		SetupCost = Other.SetupCost;
		ScaleCost = Other.ScaleCost;
		}

		void BasicCost::print(raw_ostream &OS) const {
OS << NumRegs << " reg" << (NumRegs == 1 ? "" : "s");		OS << NumRegs << " reg" << (NumRegs == 1 ? "" : "s");
if (AddRecCost != 0)		if (AddRecCost != 0)
OS << ", with addrec cost " << AddRecCost;		OS << ", with addrec cost " << AddRecCost;
if (NumIVMuls != 0)		if (NumIVMuls != 0)
OS << ", plus " << NumIVMuls << " IV mul" << (NumIVMuls == 1 ? "" : "s");		OS << ", plus " << NumIVMuls << " IV mul" << (NumIVMuls == 1 ? "" : "s");
if (NumBaseAdds != 0)		if (NumBaseAdds != 0)
OS << ", plus " << NumBaseAdds << " base add"		OS << ", plus " << NumBaseAdds << " base add"
<< (NumBaseAdds == 1 ? "" : "s");		<< (NumBaseAdds == 1 ? "" : "s");
if (ScaleCost != 0)		if (ScaleCost != 0)
OS << ", plus " << ScaleCost << " scale cost";		OS << ", plus " << ScaleCost << " scale cost";
if (ImmCost != 0)		if (ImmCost != 0)
OS << ", plus " << ImmCost << " imm cost";		OS << ", plus " << ImmCost << " imm cost";
if (SetupCost != 0)		if (SetupCost != 0)
OS << ", plus " << SetupCost << " setup cost";		OS << ", plus " << SetupCost << " setup cost";
}		}

#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)		void AggressiveCost::clear() {
void Cost::dump() const {		NumRegs = 0;
print(errs()); errs() << '\n';		SetupCost = 0;
		InstNumCost = 0;
		SpillRegsCost = 0;
		InstCmplxCost = 0;
		SetupCost = 0;
		}

		/// RatePrimaryRegister - Record this register in the set. If we haven't seen it
		/// before, rate it. Optional LoserRegs provides a way to declare any formula
		/// that refers to one of those regs an instant loser.
		void AggressiveCost::RatePrimaryRegister(
		const TargetTransformInfo &TTI, const SCEV *Reg,
		SmallPtrSetImpl<const SCEV > &Regs, const Loop L, ScalarEvolution &SE,
		DominatorTree &DT, SmallPtrSetImpl<const SCEV > LoserRegs,
		const LSRUse &LU, bool &IsIndexedLoadStore) {
		if (LoserRegs && LoserRegs->count(Reg)) {
		Lose();
		return;
		}
		if (Regs.insert(Reg).second) {
		RateRegister(TTI, Reg, Regs, L, SE, DT, LU, IsIndexedLoadStore);
		if (LoserRegs && isLoser())
		LoserRegs->insert(Reg);
		}
		}

		unsigned AggressiveCost::EvaluateBaseParts(const Formula &F,
		ScalarEvolution &SE, const Loop *L) {
		unsigned variant = 0;
		unsigned invariant = 0;
		for (SmallVectorImpl<const SCEV *>::const_iterator I = F.BaseRegs.begin(),
		E = F.BaseRegs.end();
		I != E; ++I) {
		if (SE.isLoopInvariant(*I, L))
		invariant++;
		else
		variant++;
		}
		if (F.BaseGV \|\| F.BaseOffset)
		invariant++;
		if (F.ScaledReg)
		variant++;
		return variant + (invariant ? 1 : 0);
		}

		void AggressiveCost::RateFormula(const TargetTransformInfo &TTI,
		const Formula &F,
		SmallPtrSetImpl<const SCEV *> &Regs,
		const DenseSet<const SCEV *> &VisitedRegs,
		const Loop *L,
		const SmallVectorImpl<int64_t> &Offsets,
		ScalarEvolution &SE, DominatorTree &DT,
		const LSRUse &LU,
		SmallPtrSetImpl<const SCEV > LoserRegs) {
		// Show whether pre/postinc addressing mode is used for the formula.
		bool IsIndexedLoadStore = false;
		assert(F.isCanonical() && "Cost is accurate only for canonical formula");
		// Tally up the registers.
		if (const SCEV *ScaledReg = F.ScaledReg) {
		if (VisitedRegs.count(ScaledReg)) {
		Lose();
		return;
		}
		RatePrimaryRegister(TTI, ScaledReg, Regs, L, SE, DT, LoserRegs, LU,
		IsIndexedLoadStore);
		if (isLoser())
		return;
		}
		for (SmallVectorImpl<const SCEV *>::const_iterator I = F.BaseRegs.begin(),
		E = F.BaseRegs.end();
		I != E; ++I) {
		const SCEV BaseReg = I;
		if (VisitedRegs.count(BaseReg)) {
		Lose();
		return;
		}
		RatePrimaryRegister(TTI, BaseReg, Regs, L, SE, DT, LoserRegs, LU,
		IsIndexedLoadStore);
		if (isLoser())
		return;
		}

		// Determine how many (unfolded) adds we'll need inside the loop.
		size_t NumBaseParts = EvaluateBaseParts(F, SE, L);
		unsigned NumBaseAdds = 0;
		if (NumBaseParts > 1)
		// Do not count the base and a possible second register if the target
		// allows to fold 2 registers.
		NumBaseAdds +=
		NumBaseParts - (1 + (F.Scale && isAMCompletelyFolded(TTI, LU, F) &&
		!IsIndexedLoadStore));
		NumBaseAdds += (F.UnfoldedOffset != 0);
		InstNumCost += NumBaseAdds;

		// Accumulate non-free scaling amounts.
		InstCmplxCost += getScalingFactorCost(TTI, LU, F);

		// Tally up the non-zero immediates.
		unsigned ImmCost = 0;
		for (SmallVectorImpl<int64_t>::const_iterator I = Offsets.begin(),
		E = Offsets.end();
		I != E; ++I) {
		int64_t Offset = (uint64_t)*I + F.BaseOffset;
		if (Offset != 0 && IsIndexedLoadStore)
		ImmCost += 1;
		}
		InstNumCost += ImmCost;

		UpdateSpillRegsCost(TTI);
		assert(isValid() && "invalid cost");
		}

		/// Lose - Set this cost to a losing value.
		void AggressiveCost::Lose() {
		NumRegs = ~0u;
		InstNumCost = ~0u;
		SpillRegsCost = ~0u;
		InstCmplxCost = ~0u;
		SetupCost = ~0u;
		}

		/// UpdateSpillRegsCost -- Update the SpillRegsCost according to
		/// LSRCostModel and register pressure.
		void AggressiveCost::UpdateSpillRegsCost(const TargetTransformInfo &TTI) {
		if (LSRCostModel == AggrModEstReg)
		SpillRegsCost = (NumRegs > TTI.getNumberOfRegisters(false))
		? NumRegs - TTI.getNumberOfRegisters(false)
		: 0;
		else if (LSRCostModel == AggrModUnlimitedReg)
		SpillRegsCost = 0;
		else
		SpillRegsCost = NumRegs;
		}

		/// operator< - Choose the lower cost.
		bool AggressiveCost::operator<(const Cost &CT) const {
		assert(isa<AggressiveCost>(CT) && "Expect a AggressiveCost object");
		const AggressiveCost &Other = static_cast<const AggressiveCost &>(CT);
		unsigned InstCost = InstNumCost + SpillRegsCost;
		unsigned OtherInstCost = Other.InstNumCost + Other.SpillRegsCost;
		return std::tie(InstCost, InstCmplxCost, SetupCost) <
		std::tie(OtherInstCost, Other.InstCmplxCost, Other.SetupCost);
		}

		void AggressiveCost::operator=(const Cost &CT) {
		assert(isa<AggressiveCost>(CT) && "Expect a AggressiveCost object");
		const AggressiveCost &Other = static_cast<const AggressiveCost &>(CT);

		NumRegs = Other.NumRegs;
		InstNumCost = Other.InstNumCost;
		SpillRegsCost = Other.SpillRegsCost;
		InstCmplxCost = Other.InstCmplxCost;
		SetupCost = Other.SetupCost;
		}

		void AggressiveCost::print(raw_ostream &OS) const {
		OS << NumRegs << " Reg" << (NumRegs == 1 ? "" : "s");
		if (InstNumCost != 0)
		OS << ", plus " << InstNumCost << " InstNumCost";
		if (SpillRegsCost != 0)
		OS << ", plus " << InstNumCost << " SpillRegsCost";
		if (InstCmplxCost != 0)
		OS << ", plus " << InstCmplxCost << " InstCmplxCost";
		if (SetupCost != 0)
		OS << ", plus " << SetupCost << " SetupCost";
}		}
#endif

namespace {		namespace {

/// LSRFixup - An operand value in an instruction which is to be replaced		/// LSRFixup - An operand value in an instruction which is to be replaced
/// with some equivalent, possibly strength-reduced, replacement.		/// with some equivalent, possibly strength-reduced, replacement.
struct LSRFixup {		struct LSRFixup {
/// UserInst - The instruction which will be updated.		/// UserInst - The instruction which will be updated.
Instruction *UserInst;		Instruction *UserInst;
▲ Show 20 Lines • Show All 173 Lines • ▼ Show 20 Lines	public:
void RecomputeRegs(size_t LUIdx, RegUseTracker &Reguses);		void RecomputeRegs(size_t LUIdx, RegUseTracker &Reguses);

void print(raw_ostream &OS) const;		void print(raw_ostream &OS) const;
void dump() const;		void dump() const;
};		};

}		}

		/// RateRegister - Tally up interesting quantities from the given register.
		void AggressiveCost::RateRegister(const TargetTransformInfo &TTI,
		const SCEV *Reg,
		SmallPtrSetImpl<const SCEV *> &Regs,
		const Loop *L, ScalarEvolution &SE,
		DominatorTree &DT, const LSRUse &LU,
		bool &IsIndexedLoadStore) {
		if (const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(Reg)) {
		// If this is an addrec for another loop, don't second-guess its addrec phi
		// nodes. LSR isn't currently smart enough to reason about more than one
		// loop at a time. LSR has already run on inner loops, will not run on outer
		// loops, and cannot be expected to change sibling loops.
		if (AR->getLoop() != L) {
		// If the AddRec exists, consider it's register free and leave it alone.
		if (isExistingPhi(AR, SE))
		return;

		// Otherwise, do not consider this formula at all.
		Lose();
		return;
		}

		// Usually we need a separate add/sub insn for every AddRec. But
		// if the target support pre/postinc addressing mode, the following
		// pattern (use AARCH64 as an example):
		// adrp x10, a
		// add x10, x10, :lo12:a
		// .LBB0_2:
		// add x12, x10, x9, lsl #2 // add insn for AddRec used by store
		// add x9, x9, #1
		// str w11, [x12, #48]
		// cmp x9, x8
		// b.lt .LBB0_2
		// can be optimized to this form using postinc addressing mode:
		// adrp x10, a
		// add x10, x10, :lo12:a
		// add x10, x10, #48
		// .LBB0_2:
		// str w11, [x10], #4
		// add x9, x9, #1
		// cmp x9, x8
		// b.lt .LBB0_2
		// And no separate add insn is required for the AddRec. If pre/postinc
		// addressing mode is used, we record it in IsIndexedLoadStore because
		// it usually means the load/store cannot fold other regs or offsets
		// into the same insn.
		if (AR->isAffine() && isa<SCEVConstant>(AR->getOperand(1))) {
		int64_t stride = dyn_cast_or_null<SCEVConstant>(AR->getOperand(1))
		->getValue()
		->getValue()
		.getSExtValue();
		if (stride != 0 && TTI.isAddRecFoldable(stride) &&
		(LU.Kind == LSRUse::Address))
		IsIndexedLoadStore = true;
		else
		InstNumCost += 1;
		} else {
		InstNumCost += 1;
		}

		// Add the step value register, if it needs one.
		// TODO: The non-affine case isn't precisely modeled here.
		if (!AR->isAffine() \|\| !isa<SCEVConstant>(AR->getOperand(1))) {
		if (!Regs.count(AR->getOperand(1))) {
		RateRegister(TTI, AR->getOperand(1), Regs, L, SE, DT, LU,
		IsIndexedLoadStore);
		if (isLoser())
		return;
		}
		}
		}
		++NumRegs;

		// Rough heuristic; favor registers which don't require extra setup
		// instructions in the preheader.
		if (!isa<SCEVUnknown>(Reg) && !isa<SCEVConstant>(Reg) &&
		!(isa<SCEVAddRecExpr>(Reg) &&
		(isa<SCEVUnknown>(cast<SCEVAddRecExpr>(Reg)->getStart()) \|\|
		isa<SCEVConstant>(cast<SCEVAddRecExpr>(Reg)->getStart()))))
		++SetupCost;

		InstNumCost += isa<SCEVMulExpr>(Reg) && SE.hasComputableLoopEvolution(Reg, L);
		}

/// HasFormula - Test whether this use as a formula which has the same		/// HasFormula - Test whether this use as a formula which has the same
/// registers as the given formula.		/// registers as the given formula.
bool LSRUse::HasFormulaWithSameRegs(const Formula &F) const {		bool LSRUse::HasFormulaWithSameRegs(const Formula &F) const {
SmallVector<const SCEV *, 4> Key = F.BaseRegs;		SmallVector<const SCEV *, 4> Key = F.BaseRegs;
if (F.ScaledReg) Key.push_back(F.ScaledReg);		if (F.ScaledReg) Key.push_back(F.ScaledReg);
// Unstable sort by host order ok, because this is only used for uniquifying.		// Unstable sort by host order ok, because this is only used for uniquifying.
std::sort(Key.begin(), Key.end());		std::sort(Key.begin(), Key.end());
return Uniquifier.count(Key);		return Uniquifier.count(Key);
▲ Show 20 Lines • Show All 207 Lines • ▼ Show 20 Lines

static bool isAMCompletelyFolded(const TargetTransformInfo &TTI,		static bool isAMCompletelyFolded(const TargetTransformInfo &TTI,
const LSRUse &LU, const Formula &F) {		const LSRUse &LU, const Formula &F) {
return isAMCompletelyFolded(TTI, LU.MinOffset, LU.MaxOffset, LU.Kind,		return isAMCompletelyFolded(TTI, LU.MinOffset, LU.MaxOffset, LU.Kind,
LU.AccessTy, F.BaseGV, F.BaseOffset, F.HasBaseReg,		LU.AccessTy, F.BaseGV, F.BaseOffset, F.HasBaseReg,
F.Scale);		F.Scale);
}		}

static unsigned getScalingFactorCost(const TargetTransformInfo &TTI,		/// Get the cost of the scaling factor used in F for LU.
		unsigned BasicCost::getScalingFactorCost(const TargetTransformInfo &TTI,
const LSRUse &LU, const Formula &F) {		const LSRUse &LU, const Formula &F) {
if (!F.Scale)		if (!F.Scale)
return 0;		return 0;

// If the use is not completely folded in that instruction, we will have to		// If the use is not completely folded in that instruction, we will have to
// pay an extra cost only for scale != 1.		// pay an extra cost only for scale != 1.
if (!isAMCompletelyFolded(TTI, LU.MinOffset, LU.MaxOffset, LU.Kind,		if (!isAMCompletelyFolded(TTI, LU.MinOffset, LU.MaxOffset, LU.Kind,
LU.AccessTy, F))		LU.AccessTy, F))
return F.Scale != 1;		return F.Scale != 1;
Show All 20 Lines	case LSRUse::Special:
// The use is completely folded, i.e., everything is folded into the		// The use is completely folded, i.e., everything is folded into the
// instruction.		// instruction.
return 0;		return 0;
}		}

llvm_unreachable("Invalid LSRUse Kind!");		llvm_unreachable("Invalid LSRUse Kind!");
}		}

		unsigned AggressiveCost::getScalingFactorCost(const TargetTransformInfo &TTI,
		const LSRUse &LU,
		const Formula &F) {
		if (!F.Scale)
		return 0;

		// If the use is not completely folded in that instruction, we will have to
		// pay an extra cost only for scale != 1.
		if (!isAMCompletelyFolded(TTI, LU.MinOffset, LU.MaxOffset, LU.Kind,
		LU.AccessTy, F)) {
		InstNumCost += 1;
		return 0;
		}

		switch (LU.Kind) {
		case LSRUse::Address: {
		// Check the scaling factor cost with both the min and max offsets.
		int ScaleCostMinOffset = TTI.getScalingFactorCost(
		LU.AccessTy, F.BaseGV, F.BaseOffset + LU.MinOffset, F.HasBaseReg,
		F.Scale);
		int ScaleCostMaxOffset = TTI.getScalingFactorCost(
		LU.AccessTy, F.BaseGV, F.BaseOffset + LU.MaxOffset, F.HasBaseReg,
		F.Scale);

		assert(ScaleCostMinOffset >= 0 && ScaleCostMaxOffset >= 0 &&
		"Legal addressing mode has an illegal cost!");
		return std::max(ScaleCostMinOffset, ScaleCostMaxOffset);
		}
		case LSRUse::ICmpZero:
		case LSRUse::Basic:
		case LSRUse::Special:
		// For ICmpZero, the loop be kept as a count-down loop only when
		// F.getNumRegs() == 1 and there is no other offset, or else we
		// will need a separate cmp/test instruction.
		// For Basic and Special uses, no complex addressing mode is allowed, so
		// we need an extra instruction when F contains more than one reg or
		// F contains other offset.
		if (!(F.getNumRegs() == 1 && F.BaseOffset == 0 && F.UnfoldedOffset == 0))
		InstNumCost += 1;
		return 0;
		}

		llvm_unreachable("Invalid LSRUse Kind!");
		}

static bool isAlwaysFoldable(const TargetTransformInfo &TTI,		static bool isAlwaysFoldable(const TargetTransformInfo &TTI,
LSRUse::KindType Kind, Type *AccessTy,		LSRUse::KindType Kind, Type *AccessTy,
GlobalValue *BaseGV, int64_t BaseOffset,		GlobalValue *BaseGV, int64_t BaseOffset,
bool HasBaseReg) {		bool HasBaseReg) {
// Fast-path: zero is always foldable.		// Fast-path: zero is always foldable.
if (BaseOffset == 0 && !BaseGV) return true;		if (BaseOffset == 0 && !BaseGV) return true;

// Conservatively, create an address with an immediate and a		// Conservatively, create an address with an immediate and a
▲ Show 20 Lines • Show All 222 Lines • ▼ Show 20 Lines	class LSRInstance {
void SolveRecurse(SmallVectorImpl<const Formula *> &Solution,		void SolveRecurse(SmallVectorImpl<const Formula *> &Solution,
Cost &SolutionCost,		Cost &SolutionCost,
SmallVectorImpl<const Formula *> &Workspace,		SmallVectorImpl<const Formula *> &Workspace,
const Cost &CurCost,		const Cost &CurCost,
const SmallPtrSet<const SCEV *, 16> &CurRegs,		const SmallPtrSet<const SCEV *, 16> &CurRegs,
DenseSet<const SCEV *> &VisitedRegs) const;		DenseSet<const SCEV *> &VisitedRegs) const;
void Solve(SmallVectorImpl<const Formula *> &Solution) const;		void Solve(SmallVectorImpl<const Formula *> &Solution) const;

		void DumpSolution(SmallVectorImpl<const Formula *> &Solution,
		Cost &SolutionCost, std::string HeadStr) const;

BasicBlock::iterator		BasicBlock::iterator
HoistInsertPosition(BasicBlock::iterator IP,		HoistInsertPosition(BasicBlock::iterator IP,
const SmallVectorImpl<Instruction *> &Inputs) const;		const SmallVectorImpl<Instruction *> &Inputs) const;
BasicBlock::iterator		BasicBlock::iterator
AdjustInsertPositionForExpand(BasicBlock::iterator IP,		AdjustInsertPositionForExpand(BasicBlock::iterator IP,
const LSRFixup &LF,		const LSRFixup &LF,
const LSRUse &LU,		const LSRUse &LU,
SCEVExpander &Rewriter) const;		SCEVExpander &Rewriter) const;
▲ Show 20 Lines • Show All 2,137 Lines • ▼ Show 20 Lines
#endif		#endif

// Collect the best formula for each unique set of shared registers. This		// Collect the best formula for each unique set of shared registers. This
// is reset for each use.		// is reset for each use.
typedef DenseMap<SmallVector<const SCEV *, 4>, size_t, UniquifierDenseMapInfo>		typedef DenseMap<SmallVector<const SCEV *, 4>, size_t, UniquifierDenseMapInfo>
BestFormulaeTy;		BestFormulaeTy;
BestFormulaeTy BestFormulae;		BestFormulaeTy BestFormulae;

		Cost &CostF = Cost::Create();
		Cost &CostBest = Cost::Create();
for (size_t LUIdx = 0, NumUses = Uses.size(); LUIdx != NumUses; ++LUIdx) {		for (size_t LUIdx = 0, NumUses = Uses.size(); LUIdx != NumUses; ++LUIdx) {
LSRUse &LU = Uses[LUIdx];		LSRUse &LU = Uses[LUIdx];
DEBUG(dbgs() << "Filtering for use "; LU.print(dbgs()); dbgs() << '\n');		DEBUG(dbgs() << "Filtering for use "; LU.print(dbgs()); dbgs() << '\n');

bool Any = false;		bool Any = false;
for (size_t FIdx = 0, NumForms = LU.Formulae.size();		for (size_t FIdx = 0, NumForms = LU.Formulae.size();
FIdx != NumForms; ++FIdx) {		FIdx != NumForms; ++FIdx) {
Formula &F = LU.Formulae[FIdx];		Formula &F = LU.Formulae[FIdx];

// Some formulas are instant losers. For example, they may depend on		// Some formulas are instant losers. For example, they may depend on
// nonexistent AddRecs from other loops. These need to be filtered		// nonexistent AddRecs from other loops. These need to be filtered
// immediately, otherwise heuristics could choose them over others leading		// immediately, otherwise heuristics could choose them over others leading
// to an unsatisfactory solution. Passing LoserRegs into RateFormula here		// to an unsatisfactory solution. Passing LoserRegs into RateFormula here
// avoids the need to recompute this information across formulae using the		// avoids the need to recompute this information across formulae using the
// same bad AddRec. Passing LoserRegs is also essential unless we remove		// same bad AddRec. Passing LoserRegs is also essential unless we remove
// the corresponding bad register from the Regs set.		// the corresponding bad register from the Regs set.
Cost CostF;
Regs.clear();		Regs.clear();
		CostF.clear();
CostF.RateFormula(TTI, F, Regs, VisitedRegs, L, LU.Offsets, SE, DT, LU,		CostF.RateFormula(TTI, F, Regs, VisitedRegs, L, LU.Offsets, SE, DT, LU,
&LoserRegs);		&LoserRegs);
if (CostF.isLoser()) {		if (CostF.isLoser()) {
// During initial formula generation, undesirable formulae are generated		// During initial formula generation, undesirable formulae are generated
// by uses within other loops that have some non-trivial address mode or		// by uses within other loops that have some non-trivial address mode or
// use the postinc form of the IV. LSR needs to provide these formulae		// use the postinc form of the IV. LSR needs to provide these formulae
// as the basis of rediscovering the desired formula that uses an AddRec		// as the basis of rediscovering the desired formula that uses an AddRec
// corresponding to the existing phi. Once all formulae have been		// corresponding to the existing phi. Once all formulae have been
Show All 18 Lines	for (size_t FIdx = 0, NumForms = LU.Formulae.size();

std::pair<BestFormulaeTy::const_iterator, bool> P =		std::pair<BestFormulaeTy::const_iterator, bool> P =
BestFormulae.insert(std::make_pair(Key, FIdx));		BestFormulae.insert(std::make_pair(Key, FIdx));
if (P.second)		if (P.second)
continue;		continue;

Formula &Best = LU.Formulae[P.first->second];		Formula &Best = LU.Formulae[P.first->second];

Cost CostBest;
Regs.clear();		Regs.clear();
		CostBest.clear();
CostBest.RateFormula(TTI, Best, Regs, VisitedRegs, L, LU.Offsets, SE,		CostBest.RateFormula(TTI, Best, Regs, VisitedRegs, L, LU.Offsets, SE,
DT, LU);		DT, LU);
if (CostF < CostBest)		if (CostF < CostBest)
std::swap(F, Best);		std::swap(F, Best);
DEBUG(dbgs() << " Filtering out formula "; F.print(dbgs());		DEBUG(dbgs() << " Filtering out formula "; F.print(dbgs());
dbgs() << "\n"		dbgs() << "\n"
" in favor of formula "; Best.print(dbgs());		" in favor of formula "; Best.print(dbgs());
dbgs() << '\n');		dbgs() << '\n');
Show All 10 Lines	#endif
// Now that we've filtered out some formulae, recompute the Regs set.		// Now that we've filtered out some formulae, recompute the Regs set.
if (Any)		if (Any)
LU.RecomputeRegs(LUIdx, RegUses);		LU.RecomputeRegs(LUIdx, RegUses);

// Reset this to prepare for the next use.		// Reset this to prepare for the next use.
BestFormulae.clear();		BestFormulae.clear();
}		}

		// wmi
		Cost::Delete(CostF);
		Cost::Delete(CostBest);

DEBUG(if (ChangedFormulae) {		DEBUG(if (ChangedFormulae) {
dbgs() << "\n"		dbgs() << "\n"
"After filtering out undesirable candidates:\n";		"After filtering out undesirable candidates:\n";
print_uses(dbgs());		print_uses(dbgs());
});		});
}		}

// This is a rough guess that seems to work fairly well.		// This is a rough guess that seems to work fairly well.
▲ Show 20 Lines • Show All 279 Lines • ▼ Show 20 Lines	void LSRInstance::SolveRecurse(SmallVectorImpl<const Formula *> &Solution,

const LSRUse &LU = Uses[Workspace.size()];		const LSRUse &LU = Uses[Workspace.size()];

// If this use references any register that's already a part of the		// If this use references any register that's already a part of the
// in-progress solution, consider it a requirement that a formula must		// in-progress solution, consider it a requirement that a formula must
// reference that register in order to be considered. This prunes out		// reference that register in order to be considered. This prunes out
// unprofitable searching.		// unprofitable searching.
SmallSetVector<const SCEV *, 4> ReqRegs;		SmallSetVector<const SCEV *, 4> ReqRegs;
		if (LSRCostModel == BasicMod) {
for (const SCEV *S : CurRegs)		for (const SCEV *S : CurRegs)
if (LU.Regs.count(S))		if (LU.Regs.count(S))
ReqRegs.insert(S);		ReqRegs.insert(S);
		}

SmallPtrSet<const SCEV *, 16> NewRegs;		SmallPtrSet<const SCEV *, 16> NewRegs;
Cost NewCost;		Cost &NewCost = Cost::Create();
for (SmallVectorImpl<Formula>::const_iterator I = LU.Formulae.begin(),		for (SmallVectorImpl<Formula>::const_iterator I = LU.Formulae.begin(),
E = LU.Formulae.end(); I != E; ++I) {		E = LU.Formulae.end(); I != E; ++I) {
const Formula &F = *I;		const Formula &F = *I;

// Ignore formulae which may not be ideal in terms of register reuse of		// Ignore formulae which may not be ideal in terms of register reuse of
// ReqRegs. The formula should use all required registers before		// ReqRegs. The formula should use all required registers before
// introducing new ones.		// introducing new ones.
int NumReqRegsToFind = std::min(F.getNumRegs(), ReqRegs.size());		int NumReqRegsToFind = std::min(F.getNumRegs(), ReqRegs.size());
Show All 23 Lines	for (SmallVectorImpl<Formula>::const_iterator I = LU.Formulae.begin(),
if (NewCost < SolutionCost) {		if (NewCost < SolutionCost) {
Workspace.push_back(&F);		Workspace.push_back(&F);
if (Workspace.size() != Uses.size()) {		if (Workspace.size() != Uses.size()) {
SolveRecurse(Solution, SolutionCost, Workspace, NewCost,		SolveRecurse(Solution, SolutionCost, Workspace, NewCost,
NewRegs, VisitedRegs);		NewRegs, VisitedRegs);
if (F.getNumRegs() == 1 && Workspace.size() == 1)		if (F.getNumRegs() == 1 && Workspace.size() == 1)
VisitedRegs.insert(F.ScaledReg ? F.ScaledReg : F.BaseRegs[0]);		VisitedRegs.insert(F.ScaledReg ? F.ScaledReg : F.BaseRegs[0]);
} else {		} else {
DEBUG(dbgs() << "New best at "; NewCost.print(dbgs());
dbgs() << ".\n Regs:";
for (const SCEV *S : NewRegs)
dbgs() << ' ' << *S;
dbgs() << '\n');

SolutionCost = NewCost;		SolutionCost = NewCost;
Solution = Workspace;		Solution = Workspace;
		DumpSolution(Solution, SolutionCost, "New best at ");
}		}
Workspace.pop_back();		Workspace.pop_back();
}		}
}		}
		Cost::Delete(NewCost);
}		}

/// Solve - Choose one formula from each use. Return the results in the given		/// Solve - Choose one formula from each use. Return the results in the given
/// Solution vector.		/// Solution vector.
void LSRInstance::Solve(SmallVectorImpl<const Formula *> &Solution) const {		void LSRInstance::Solve(SmallVectorImpl<const Formula *> &Solution) const {
SmallVector<const Formula *, 8> Workspace;		SmallVector<const Formula *, 8> Workspace;
Cost SolutionCost;		Cost &SolutionCost = Cost::Create();
SolutionCost.Lose();		SolutionCost.Lose();
Cost CurCost;		Cost &CurCost = Cost::Create();
SmallPtrSet<const SCEV *, 16> CurRegs;		SmallPtrSet<const SCEV *, 16> CurRegs;
DenseSet<const SCEV *> VisitedRegs;		DenseSet<const SCEV *> VisitedRegs;
Workspace.reserve(Uses.size());		Workspace.reserve(Uses.size());

// SolveRecurse does all the work.		// SolveRecurse does all the work.
SolveRecurse(Solution, SolutionCost, Workspace, CurCost,		SolveRecurse(Solution, SolutionCost, Workspace, CurCost,
CurRegs, VisitedRegs);		CurRegs, VisitedRegs);
if (Solution.empty()) {		if (Solution.empty()) {
DEBUG(dbgs() << "\nNo Satisfactory Solution\n");		DEBUG(dbgs() << "\nNo Satisfactory Solution\n");
return;		goto ret;
}		}

// Ok, we've now made all our decisions.		// Ok, we've now made all our decisions.
DEBUG(dbgs() << "\n"		DumpSolution(Solution, SolutionCost, "The chosen solution requires ");
"The chosen solution requires "; SolutionCost.print(dbgs());
dbgs() << ":\n";		assert(Solution.size() == Uses.size() && "Malformed solution!");

		ret:
		Cost::Delete(SolutionCost);
		Cost::Delete(CurCost);
		}

		/// DumpSolution - Dump the existing solution.
		void LSRInstance::DumpSolution(SmallVectorImpl<const Formula *> &Solution,
		Cost &SolutionCost, std::string HeadStr) const {
		DEBUG(dbgs() << "\n" << HeadStr; SolutionCost.print(dbgs()); dbgs() << ":\n";
for (size_t i = 0, e = Uses.size(); i != e; ++i) {		for (size_t i = 0, e = Uses.size(); i != e; ++i) {
dbgs() << " ";		dbgs() << " ";
Uses[i].print(dbgs());		Uses[i].print(dbgs());
dbgs() << "\n"		dbgs() << "\n"
" ";		" ";
Solution[i]->print(dbgs());		Solution[i]->print(dbgs());
dbgs() << '\n';		dbgs() << '\n';
});		});

assert(Solution.size() == Uses.size() && "Malformed solution!");
}		}

/// HoistInsertPosition - Helper for AdjustInsertPositionForExpand. Climb up		/// HoistInsertPosition - Helper for AdjustInsertPositionForExpand. Climb up
/// the dominator tree far as we can go while still being dominated by the		/// the dominator tree far as we can go while still being dominated by the
/// input positions. This helps canonicalize the insert position, which		/// input positions. This helps canonicalize the insert position, which
/// encourages sharing.		/// encourages sharing.
BasicBlock::iterator		BasicBlock::iterator
LSRInstance::HoistInsertPosition(BasicBlock::iterator IP,		LSRInstance::HoistInsertPosition(BasicBlock::iterator IP,
▲ Show 20 Lines • Show All 711 Lines • Show Last 20 Lines

test/Transforms/LoopStrengthReduce/new_cost_1.ll

				; This test makes sure that loop uses post-increment addressing mode to reduce insns.
				; RUN: opt < %s -loop-reduce -lsrmodel=aggr-estimate -S \| FileCheck %s

				; Make sure there are two new loop induction variables are used.
				; CHECK-LABEL: @foo(
				; CHECK: getelementptr{{.*}}lsr.iv, i64 0, i64 1
				; CHECK: getelementptr{{.*}}lsr.iv2, i64 0, i64 1

				target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"
				target triple = "aarch64-none-linux-gnu"

				@N = common global i32 0, align 4
				@a = common global [1000 x i32] zeroinitializer, align 4
				@b = common global [1000 x i32] zeroinitializer, align 4
				@j = common global i32 0, align 4

				; Function Attrs: nounwind
				define void @foo() #0 {
				entry:
				%0 = load i32, i32* @N, align 4
				%cmp9 = icmp sgt i32 %0, 0
				br i1 %cmp9, label %for.body.lr.ph, label %for.end

				for.body.lr.ph: ; preds = %entry
				%conv = sext i32 %0 to i64
				br label %for.body

				for.body: ; preds = %for.body.lr.ph, %for.body
				%i.010 = phi i64 [ 0, %for.body.lr.ph ], [ %add3, %for.body ]
				%add = add nuw nsw i64 %i.010, 12
				%arrayidx = getelementptr inbounds [1000 x i32], [1000 x i32]* @a, i64 0, i64 %add
				store i32 15, i32* %arrayidx, align 4
				%sub = add nsw i64 %i.010, -2
				%arrayidx2 = getelementptr inbounds [1000 x i32], [1000 x i32]* @b, i64 0, i64 %sub
				store i32 15, i32* %arrayidx2, align 4
				%add3 = add nuw nsw i64 %i.010, 1
				%cmp = icmp slt i64 %add3, %conv
				br i1 %cmp, label %for.body, label %for.end.loopexit

				for.end.loopexit: ; preds = %for.body
				br label %for.end

				for.end: ; preds = %for.end.loopexit, %entry
				ret void
				}

test/Transforms/LoopStrengthReduce/new_cost_2.ll

				; This test makes sure that loop uses post-increment addressing mode to reduce insns.
				; RUN: opt < %s -loop-reduce -lsrmodel=aggr-estimate -S \| FileCheck %s

				; Make sure no lsr variable is generated. All the address references will share the
				; basic induction variable.
				; CHECK-NOT: lsr

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				%class.G = type { i8 }
				%"class.G::H" = type { i8 }
				%class.C = type { %class.tuple }
				%class.tuple = type { i8 }
				%"class.internal::MatrixNaive" = type { i32, %class.J, %class.J.2, %class.J.2 }
				%class.J = type { %struct.F }
				%struct.F = type { %struct.anon }
				%struct.anon = type { %class.J.2* }
				%class.J.2 = type { %struct.F.3 }
				%struct.F.3 = type { %struct.anon.4 }
				%struct.anon.4 = type { float* }

				$_Z6MatrixI2S1IfLi27ELi51EEEvv = comdat any

				$_ZN6Runner15BenchmarkRunnerIN8internal11MatrixNaiveIfLi27ELi51EEEEEvi = comdat any

				@Benchmark = global %class.G zeroinitializer, align 1

				declare i32* @_Z8CallbackIPFvvEEPiT_(void ()*) #0

				; Function Attrs: noreturn uwtable
				define linkonce_odr void @_Z6MatrixI2S1IfLi27ELi51EEEvv() #1 comdat {
				entry:
				tail call void @_ZN6Runner15BenchmarkRunnerIN8internal11MatrixNaiveIfLi27ELi51EEEEEvi(i32 undef)
				unreachable
				}

				declare void @_ZN1G1HC1EPi(%"class.G::H", i32) #0

				declare void @_ZN1GC1ENS_1HE(%class.G*) #0

				; Function Attrs: nounwind
				declare void @llvm.lifetime.start(i64, i8* nocapture) #2

				; Function Attrs: noreturn uwtable
				define linkonce_odr void @_ZN6Runner15BenchmarkRunnerIN8internal11MatrixNaiveIfLi27ELi51EEEEEvi(i32) #1 comdat align 2 {
				entry:
				%operation = alloca %class.C, align 1
				%1 = getelementptr inbounds %class.C, %class.C* %operation, i64 0, i32 0, i32 0
				call void @llvm.lifetime.start(i64 1, i8* %1) #2
				br label %for.cond

				for.cond: ; preds = %for.cond, %entry
				%call1 = call %"class.internal::MatrixNaive"* @_ZN1CIN8internal11MatrixNaiveIfLi27ELi51EEEEptEv(%class.C* %operation)
				%matrix_.i = getelementptr inbounds %"class.internal::MatrixNaive", %"class.internal::MatrixNaive"* %call1, i64 0, i32 1
				%call.i = call i64 @_ZN1JIS_If1AIfEES0_IS2_EE4sizeEv(%class.J* %matrix_.i)
				%Execute_num_rows.i = getelementptr inbounds %"class.internal::MatrixNaive", %"class.internal::MatrixNaive"* %call1, i64 0, i32 0
				%2 = load i32, i32* %Execute_num_rows.i, align 4
				%tobool32.i = icmp eq i32 %2, 0
				br i1 %tobool32.i, label %for.cond, label %for.body.lr.ph.i

				for.body.lr.ph.i: ; preds = %for.cond
				%conv.i.le = trunc i64 %call.i to i32
				%3 = bitcast %class.J* %matrix_.i to i64**
				%cmp29.i = icmp sgt i32 %conv.i.le, 0
				%_M_start.i27.i = getelementptr inbounds %"class.internal::MatrixNaive", %"class.internal::MatrixNaive"* %call1, i64 0, i32 3, i32 0, i32 0, i32 0
				%_M_start.i24.i = getelementptr inbounds %"class.internal::MatrixNaive", %"class.internal::MatrixNaive"* %call1, i64 0, i32 2, i32 0, i32 0, i32 0
				br label %for.body.i

				for.body.i: ; preds = %for.cond.cleanup6.i, %for.body.lr.ph.i
				%sum.033.i = phi float [ undef, %for.body.lr.ph.i ], [ %sum.1.lcssa.i, %for.cond.cleanup6.i ]
				br i1 %cmp29.i, label %for.body7.lr.ph.i, label %for.cond.cleanup6.i

				for.body7.lr.ph.i: ; preds = %for.body.i
				%4 = load i64, i64* %3, align 8
				%5 = load i64, i64* %4, align 8
				%6 = inttoptr i64 %5 to float*
				%7 = load float, float* %_M_start.i24.i, align 8
				br label %for.body7.i

				for.cond.cleanup6.i.loopexit: ; preds = %for.body7.i
				br label %for.cond.cleanup6.i

				for.cond.cleanup6.i: ; preds = %for.cond.cleanup6.i.loopexit, %for.body.i
				%sum.1.lcssa.i = phi float [ %sum.033.i, %for.body.i ], [ %add.i, %for.cond.cleanup6.i.loopexit ]
				%8 = load float, float* %_M_start.i27.i, align 8
				store float %sum.1.lcssa.i, float* %8, align 4
				br label %for.body.i

				for.body7.i: ; preds = %for.body7.i, %for.body7.lr.ph.i
				%indvars.iv.i = phi i64 [ 0, %for.body7.lr.ph.i ], [ %indvars.iv.next.i, %for.body7.i ]
				%sum.130.i = phi float [ %sum.033.i, %for.body7.lr.ph.i ], [ %add.i, %for.body7.i ]
				%add.ptr.i26.i = getelementptr inbounds float, float* %6, i64 %indvars.iv.i
				%9 = load float, float* %add.ptr.i26.i, align 4
				%add.ptr.i.i = getelementptr inbounds float, float* %7, i64 %indvars.iv.i
				%10 = load float, float* %add.ptr.i.i, align 4
				%mul.i = fmul float %9, %10
				%add.i = fadd float %sum.130.i, %mul.i
				%indvars.iv.next.i = add nuw nsw i64 %indvars.iv.i, 1
				%lftr.wideiv = trunc i64 %indvars.iv.next.i to i32
				%exitcond = icmp eq i32 %lftr.wideiv, %conv.i.le
				br i1 %exitcond, label %for.cond.cleanup6.i.loopexit, label %for.body7.i
				}

				declare void @llvm.lifetime.end(i64, i8* nocapture) #2

				declare %"class.internal::MatrixNaive"* @_ZN1CIN8internal11MatrixNaiveIfLi27ELi51EEEEptEv(%class.C*) #0

				declare i64 @_ZN1JIS_If1AIfEES0_IS2_EE4sizeEv(%class.J*) #0

This is an archive of the discontinued LLVM Phabricator instance.

Improve the cost evaluation of LSR Needs ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 25507

include/llvm/Analysis/TargetTransformInfo.h

include/llvm/Analysis/TargetTransformInfoImpl.h

include/llvm/CodeGen/BasicTTIImpl.h

include/llvm/Target/TargetLowering.h

lib/Analysis/TargetTransformInfo.cpp

lib/Target/AArch64/AArch64ISelLowering.h

lib/Target/AArch64/AArch64ISelLowering.cpp

lib/Target/ARM/ARMISelLowering.h

lib/Target/ARM/ARMISelLowering.cpp

lib/Transforms/Scalar/LoopStrengthReduce.cpp

test/Transforms/LoopStrengthReduce/new_cost_1.ll

test/Transforms/LoopStrengthReduce/new_cost_2.ll

Improve the cost evaluation of LSR
Needs ReviewPublic