This is an archive of the discontinued LLVM Phabricator instance.

Add Instruction number to LSR cost model (PR23384)
AbandonedPublic

Authored by evstupac on Dec 12 2016, 6:07 PM.

Download Raw Diff

Details

Reviewers

qcolombet
hfinkel

Summary

Fix PR23384.
The patch do the following:

Add instructions number generated by a solution to LSR cost
Move LSR cost comparison to target part
Add new cross use generation for ICmpZero that ends with zero

One LIT test fails. However it should be fixed when D26367 is committed.

Performance improvement on x86:
spec2000

177.mesa on -O2 +3%
256.bzip2 on -Ofast -flto +1.5%

Diff Detail

Repository: rL LLVM

Event Timeline

evstupac updated this revision to Diff 81165.Dec 12 2016, 6:07 PM

evstupac retitled this revision from to Add Instruction number to LSR cost model (PR23384).

evstupac updated this object.

evstupac added reviewers: qcolombet, hfinkel.

evstupac set the repository for this revision to rL LLVM.

evstupac added subscribers: llvm-commits, Farhana, wmi, mzolotukhin.

Herald added a subscriber: mehdi_amini. · View Herald TranscriptDec 12 2016, 6:07 PM

Thanks for working on it. We also noticed the same problem and wanted it to be fixed.

Add instructions number generated by a solution to LSR cost

Move LSR cost comparison to target part

seems a good idea for me, because for arch with post load/store increments addinc cost are not as significant as the cost on arches like x86 without the support.

It may be separated as a NFC patch?

Add new cross use generation for ICmpZero that ends with zero

Can it be split into a separate patch?

lib/Transforms/Scalar/LoopStrengthReduce.cpp
917	Better to use member initializer list? Cost() : C({0, 0, ... ,0}) {}
1208	C.NumRegs only calculate the regs used in induction var expr so it is not a good estimation for register number used in the loop. It can be much less than the real register number used. We can have a utility like that in vectorization to get a better register number estimation used in the loop, but that can be in a further enhancement. Before we have such a utility in place, I would rather conservatively think every loop has high register pressure, and always add C.NumRegs into C.Insns. I think it avoids the case that LSR significantly increase register pressure just to reduce one addinc.

Hi Wei,

Thanks for taking a look and you comments.
This is a change affecting almost every test. With benchmarks I have it looks good. However more data is better.
Would you mind testing it on your benchmarks/machines, please?

It may be separated as a NFC patch?

Yes. But that will be a patch with postponed effect. Before introducing Insns in solution cost I don't see a reason for x86 to get own cost function.
If we introduce Insns first that will affect others architectures as well (which I'm not specialist in). So I'd like to leave target cost functions tune for the target professionals.

Add new cross use generation for ICmpZero that ends with zero
Can it be split into a separate patch?

I'd like to leave this here as well. When LSR starts to count Insns, it is able to remove some unnecessary compares, but not all (because for some solution we don't generate appropriate cross use). That way for some test I need to remove cmp from one function and leave it in another. I'm trying to avoid a case when say for one arch/mode we check for cmp and for another just add.

lib/Transforms/Scalar/LoopStrengthReduce.cpp
917	Yes. Will fix.
1208	C.NumRegs only calculate the regs used in induction var expr so it is not a good estimation for register number used in the loop. It can be much less than the real register number used. That is right. But we can say for sure that when we exceed TTIRegNum for a solution, we'll get at least fill. Why not to do this if already have NumRegs used by solution. I like the idea to get better estimation. However let's leave this for a separate patch.

Hi Evgeny,

Like Wei, I'd like #3 to be a separate patch.
Indeed, #1 and #2 should be NFC as long as the targets do not change the cost model.

Cheers,
-Quentin

lib/Target/X86/X86TargetTransformInfo.cpp
1874	Do you have data to support that heuristic? Like Wei said, I suspect this may lead to pretty bad side effect where we will increase the register pressure by a lot to save a few instruction. So before we switch the default, I want supportive evidence that this is general goodness. For the record, we discuss with Wei this cost model issue and I still have on my todolist a better register pressure estimation for the loop. E.g., we can ignore NumRegs as long as it is below the regpressure.

This revision now requires changes to proceed.Dec 19 2016, 6:03 PM

Hi Quentin,

Thanks for taking a look.

Like Wei, I'd like #3 to be a separate patch.
Indeed, #1 and #2 should be NFC as long as the targets do not change the cost model.

Ok. I'll split the patch.
However #3 without #2 has almost no effect. So the following order seems the best #1, #3, #2.

Thanks,
Evgeny

lib/Target/X86/X86TargetTransformInfo.cpp
1874	Do you have data to support that heuristic? Yes on spec2000: 177.mesa on -O2 +3% 256.bzip2 on -Ofast -flto +1.5% There are gains on EEMBC tests. I don't see any significant (>2% regressions). And I'm waiting for data from Wei. However I agree that for better testing we can introduce an option like -lsr-count-insns. E.g., we can ignore NumRegs as long as it is below the regpressure. That is roughly estimated in my patch. When TTI.getNumberOfRegisters(false) - 1 is exceeded we start to increment instruction counter (1 insn per 1 new register). In most cases that is enough to take a solution with low register pressure.

Quentin,

I've put first part in a separate review: https://reviews.llvm.org/D28307

Wei,

Did you have a chance to test the patch performance on your benchmarks?

evstupac mentioned this in D28307: Add Instruction number to LSR cost model (PR23384) part 1 of 3.Jan 26 2017, 7:14 PM

evstupac abandoned this revision.May 2 2018, 12:01 PM

Revision Contents

Path

Size

include/

llvm/

Analysis/

TargetTransformInfo.h

21 lines

TargetTransformInfoImpl.h

7 lines

CodeGen/

BasicTTIImpl.h

4 lines

lib/

Analysis/

TargetTransformInfo.cpp

4 lines

Target/

X86/

X86TargetTransformInfo.h

2 lines

X86TargetTransformInfo.cpp

11 lines

Transforms/

Scalar/

LoopStrengthReduce.cpp

240 lines

test/

CodeGen/

X86/

2006-05-11-InstrSched.ll

2 lines

11 lines

6 lines

4 lines

10 lines

loop-strength-reduce4.ll

15 lines

masked-iv-safe.ll

16 lines

misched-matrix.ll

24 lines

Transforms/

LoopStrengthReduce/

X86/

ivchain-X86.ll

6 lines

Diff 81165

include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 227 Lines • ▼ Show 20 Lines	public:
/// FIXME: It's not clear that this is a good or useful query API. Client's		/// FIXME: It's not clear that this is a good or useful query API. Client's
/// should probably move to simpler cost metrics using the above.		/// should probably move to simpler cost metrics using the above.
/// Alternatively, we could split the cost interface into distinct code-size		/// Alternatively, we could split the cost interface into distinct code-size
/// and execution-speed costs. This would allow modelling the core of this		/// and execution-speed costs. This would allow modelling the core of this
/// query more accurately as a call is a single small instruction, but		/// query more accurately as a call is a single small instruction, but
/// incurs significant execution cost.		/// incurs significant execution cost.
bool isLoweredToCall(const Function *F) const;		bool isLoweredToCall(const Function *F) const;

		struct LSRCost {
		unsigned Insns;
		unsigned NumRegs;
		unsigned AddRecCost;
		unsigned NumIVMuls;
		unsigned NumBaseAdds;
		unsigned ImmCost;
		unsigned SetupCost;
		unsigned ScaleCost;
		};

/// Parameters that control the generic loop unrolling transformation.		/// Parameters that control the generic loop unrolling transformation.
struct UnrollingPreferences {		struct UnrollingPreferences {
/// The cost threshold for the unrolled loop. Should be relative to the		/// The cost threshold for the unrolled loop. Should be relative to the
/// getUserCost values returned by this API, and the expectation is that		/// getUserCost values returned by this API, and the expectation is that
/// the unrolled loop's instructions when run through that interface should		/// the unrolled loop's instructions when run through that interface should
/// not exceed this cost. However, this is only an estimate. Also, specific		/// not exceed this cost. However, this is only an estimate. Also, specific
/// loops may be unrolled even with a cost above this threshold if deemed		/// loops may be unrolled even with a cost above this threshold if deemed
/// profitable. Set this to UINT_MAX to disable the loop body cost		/// profitable. Set this to UINT_MAX to disable the loop body cost
▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines	public:
/// this target, for a load/store of the specified type.		/// this target, for a load/store of the specified type.
/// The type may be VoidTy, in which case only return true if the addressing		/// The type may be VoidTy, in which case only return true if the addressing
/// mode is legal for a load/store of any legal type.		/// mode is legal for a load/store of any legal type.
/// TODO: Handle pre/postinc as well.		/// TODO: Handle pre/postinc as well.
bool isLegalAddressingMode(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,		bool isLegalAddressingMode(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,
bool HasBaseReg, int64_t Scale,		bool HasBaseReg, int64_t Scale,
unsigned AddrSpace = 0) const;		unsigned AddrSpace = 0) const;

		/// \brief Return true if LSR cost of C1 is lower than C1.
		bool isLSRCostLower(TargetTransformInfo::LSRCost &C1,
		TargetTransformInfo::LSRCost &C2) const;

/// \brief Return true if the target supports masked load/store		/// \brief Return true if the target supports masked load/store
/// AVX2 and AVX-512 targets allow masks for consecutive load and store		/// AVX2 and AVX-512 targets allow masks for consecutive load and store
bool isLegalMaskedStore(Type *DataType) const;		bool isLegalMaskedStore(Type *DataType) const;
bool isLegalMaskedLoad(Type *DataType) const;		bool isLegalMaskedLoad(Type *DataType) const;

/// \brief Return true if the target supports masked gather/scatter		/// \brief Return true if the target supports masked gather/scatter
/// AVX-512 fully supports gather and scatter for vectors with 32 and 64		/// AVX-512 fully supports gather and scatter for vectors with 32 and 64
/// bits scalar type.		/// bits scalar type.
▲ Show 20 Lines • Show All 345 Lines • ▼ Show 20 Lines	public:
virtual bool isLoweredToCall(const Function *F) = 0;		virtual bool isLoweredToCall(const Function *F) = 0;
virtual void getUnrollingPreferences(Loop *L, UnrollingPreferences &UP) = 0;		virtual void getUnrollingPreferences(Loop *L, UnrollingPreferences &UP) = 0;
virtual bool isLegalAddImmediate(int64_t Imm) = 0;		virtual bool isLegalAddImmediate(int64_t Imm) = 0;
virtual bool isLegalICmpImmediate(int64_t Imm) = 0;		virtual bool isLegalICmpImmediate(int64_t Imm) = 0;
virtual bool isLegalAddressingMode(Type Ty, GlobalValue BaseGV,		virtual bool isLegalAddressingMode(Type Ty, GlobalValue BaseGV,
int64_t BaseOffset, bool HasBaseReg,		int64_t BaseOffset, bool HasBaseReg,
int64_t Scale,		int64_t Scale,
unsigned AddrSpace) = 0;		unsigned AddrSpace) = 0;
		virtual bool isLSRCostLower(TargetTransformInfo::LSRCost &C1,
		TargetTransformInfo::LSRCost &C2) = 0;
virtual bool isLegalMaskedStore(Type *DataType) = 0;		virtual bool isLegalMaskedStore(Type *DataType) = 0;
virtual bool isLegalMaskedLoad(Type *DataType) = 0;		virtual bool isLegalMaskedLoad(Type *DataType) = 0;
virtual bool isLegalMaskedScatter(Type *DataType) = 0;		virtual bool isLegalMaskedScatter(Type *DataType) = 0;
virtual bool isLegalMaskedGather(Type *DataType) = 0;		virtual bool isLegalMaskedGather(Type *DataType) = 0;
virtual int getScalingFactorCost(Type Ty, GlobalValue BaseGV,		virtual int getScalingFactorCost(Type Ty, GlobalValue BaseGV,
int64_t BaseOffset, bool HasBaseReg,		int64_t BaseOffset, bool HasBaseReg,
int64_t Scale, unsigned AddrSpace) = 0;		int64_t Scale, unsigned AddrSpace) = 0;
virtual bool isFoldableMemAccessOffset(Instruction *I, int64_t Offset) = 0;		virtual bool isFoldableMemAccessOffset(Instruction *I, int64_t Offset) = 0;
▲ Show 20 Lines • Show All 151 Lines • ▼ Show 20 Lines	bool isLegalICmpImmediate(int64_t Imm) override {
return Impl.isLegalICmpImmediate(Imm);		return Impl.isLegalICmpImmediate(Imm);
}		}
bool isLegalAddressingMode(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,		bool isLegalAddressingMode(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,
bool HasBaseReg, int64_t Scale,		bool HasBaseReg, int64_t Scale,
unsigned AddrSpace) override {		unsigned AddrSpace) override {
return Impl.isLegalAddressingMode(Ty, BaseGV, BaseOffset, HasBaseReg,		return Impl.isLegalAddressingMode(Ty, BaseGV, BaseOffset, HasBaseReg,
Scale, AddrSpace);		Scale, AddrSpace);
}		}
		bool isLSRCostLower(TargetTransformInfo::LSRCost &C1,
		TargetTransformInfo::LSRCost &C2) override {
		return Impl.isLSRCostLower(C1, C2);
		}
bool isLegalMaskedStore(Type *DataType) override {		bool isLegalMaskedStore(Type *DataType) override {
return Impl.isLegalMaskedStore(DataType);		return Impl.isLegalMaskedStore(DataType);
}		}
bool isLegalMaskedLoad(Type *DataType) override {		bool isLegalMaskedLoad(Type *DataType) override {
return Impl.isLegalMaskedLoad(DataType);		return Impl.isLegalMaskedLoad(DataType);
}		}
bool isLegalMaskedScatter(Type *DataType) override {		bool isLegalMaskedScatter(Type *DataType) override {
return Impl.isLegalMaskedScatter(DataType);		return Impl.isLegalMaskedScatter(DataType);
▲ Show 20 Lines • Show All 304 Lines • Show Last 20 Lines

include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 212 Lines • ▼ Show 20 Lines	public:
bool isLegalAddressingMode(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,		bool isLegalAddressingMode(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,
bool HasBaseReg, int64_t Scale,		bool HasBaseReg, int64_t Scale,
unsigned AddrSpace) {		unsigned AddrSpace) {
// Guess that only reg and reg+reg addressing is allowed. This heuristic is		// Guess that only reg and reg+reg addressing is allowed. This heuristic is
// taken from the implementation of LSR.		// taken from the implementation of LSR.
return !BaseGV && BaseOffset == 0 && (Scale == 0 \|\| Scale == 1);		return !BaseGV && BaseOffset == 0 && (Scale == 0 \|\| Scale == 1);
}		}

		bool isLSRCostLower(TTI::LSRCost &C1, TTI::LSRCost &C2) {
		return std::tie(C1.NumRegs, C1.AddRecCost, C1.NumIVMuls, C1.NumBaseAdds,
		C1.ScaleCost, C1.ImmCost, C1.SetupCost) <
		std::tie(C2.NumRegs, C2.AddRecCost, C2.NumIVMuls, C2.NumBaseAdds,
		C2.ScaleCost, C2.ImmCost, C2.SetupCost);
		}

bool isLegalMaskedStore(Type *DataType) { return false; }		bool isLegalMaskedStore(Type *DataType) { return false; }

bool isLegalMaskedLoad(Type *DataType) { return false; }		bool isLegalMaskedLoad(Type *DataType) { return false; }

bool isLegalMaskedScatter(Type *DataType) { return false; }		bool isLegalMaskedScatter(Type *DataType) { return false; }

bool isLegalMaskedGather(Type *DataType) { return false; }		bool isLegalMaskedGather(Type *DataType) { return false; }

▲ Show 20 Lines • Show All 348 Lines • Show Last 20 Lines

include/llvm/CodeGen/BasicTTIImpl.h

Show First 20 Lines • Show All 123 Lines • ▼ Show 20 Lines	bool isLegalAddressingMode(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,
TargetLoweringBase::AddrMode AM;		TargetLoweringBase::AddrMode AM;
AM.BaseGV = BaseGV;		AM.BaseGV = BaseGV;
AM.BaseOffs = BaseOffset;		AM.BaseOffs = BaseOffset;
AM.HasBaseReg = HasBaseReg;		AM.HasBaseReg = HasBaseReg;
AM.Scale = Scale;		AM.Scale = Scale;
return getTLI()->isLegalAddressingMode(DL, AM, Ty, AddrSpace);		return getTLI()->isLegalAddressingMode(DL, AM, Ty, AddrSpace);
}		}

		bool isLSRCostLower(TTI::LSRCost C1, TTI::LSRCost C2) {
		return TargetTransformInfoImplBase::isLSRCostLower(C1, C2);
		}

int getScalingFactorCost(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,		int getScalingFactorCost(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,
bool HasBaseReg, int64_t Scale, unsigned AddrSpace) {		bool HasBaseReg, int64_t Scale, unsigned AddrSpace) {
TargetLoweringBase::AddrMode AM;		TargetLoweringBase::AddrMode AM;
AM.BaseGV = BaseGV;		AM.BaseGV = BaseGV;
AM.BaseOffs = BaseOffset;		AM.BaseOffs = BaseOffset;
AM.HasBaseReg = HasBaseReg;		AM.HasBaseReg = HasBaseReg;
AM.Scale = Scale;		AM.Scale = Scale;
return getTLI()->getScalingFactorCost(DL, AM, Ty, AddrSpace);		return getTLI()->getScalingFactorCost(DL, AM, Ty, AddrSpace);
▲ Show 20 Lines • Show All 882 Lines • Show Last 20 Lines

lib/Analysis/TargetTransformInfo.cpp

Show First 20 Lines • Show All 117 Lines • ▼ Show 20 Lines	bool TargetTransformInfo::isLegalAddressingMode(Type Ty, GlobalValue BaseGV,
int64_t BaseOffset,		int64_t BaseOffset,
bool HasBaseReg,		bool HasBaseReg,
int64_t Scale,		int64_t Scale,
unsigned AddrSpace) const {		unsigned AddrSpace) const {
return TTIImpl->isLegalAddressingMode(Ty, BaseGV, BaseOffset, HasBaseReg,		return TTIImpl->isLegalAddressingMode(Ty, BaseGV, BaseOffset, HasBaseReg,
Scale, AddrSpace);		Scale, AddrSpace);
}		}

		bool TargetTransformInfo::isLSRCostLower(LSRCost &C1, LSRCost &C2) const {
		return TTIImpl->isLSRCostLower(C1, C2);
		}

bool TargetTransformInfo::isLegalMaskedStore(Type *DataType) const {		bool TargetTransformInfo::isLegalMaskedStore(Type *DataType) const {
return TTIImpl->isLegalMaskedStore(DataType);		return TTIImpl->isLegalMaskedStore(DataType);
}		}

bool TargetTransformInfo::isLegalMaskedLoad(Type *DataType) const {		bool TargetTransformInfo::isLegalMaskedLoad(Type *DataType) const {
return TTIImpl->isLegalMaskedLoad(DataType);		return TTIImpl->isLegalMaskedLoad(DataType);
}		}

▲ Show 20 Lines • Show All 378 Lines • Show Last 20 Lines

lib/Target/X86/X86TargetTransformInfo.h

Show First 20 Lines • Show All 81 Lines • ▼ Show 20 Lines	public:

int getIntImmCost(int64_t);		int getIntImmCost(int64_t);

int getIntImmCost(const APInt &Imm, Type *Ty);		int getIntImmCost(const APInt &Imm, Type *Ty);

int getIntImmCost(unsigned Opcode, unsigned Idx, const APInt &Imm, Type *Ty);		int getIntImmCost(unsigned Opcode, unsigned Idx, const APInt &Imm, Type *Ty);
int getIntImmCost(Intrinsic::ID IID, unsigned Idx, const APInt &Imm,		int getIntImmCost(Intrinsic::ID IID, unsigned Idx, const APInt &Imm,
Type *Ty);		Type *Ty);
		bool isLSRCostLower(TargetTransformInfo::LSRCost &C1,
		TargetTransformInfo::LSRCost &C2);
bool isLegalMaskedLoad(Type *DataType);		bool isLegalMaskedLoad(Type *DataType);
bool isLegalMaskedStore(Type *DataType);		bool isLegalMaskedStore(Type *DataType);
bool isLegalMaskedGather(Type *DataType);		bool isLegalMaskedGather(Type *DataType);
bool isLegalMaskedScatter(Type *DataType);		bool isLegalMaskedScatter(Type *DataType);
bool areInlineCompatible(const Function *Caller,		bool areInlineCompatible(const Function *Caller,
const Function *Callee) const;		const Function *Callee) const;

bool enableInterleavedAccessVectorization();		bool enableInterleavedAccessVectorization();
Show All 12 Lines

lib/Target/X86/X86TargetTransformInfo.cpp

Show First 20 Lines • Show All 1,857 Lines • ▼ Show 20 Lines	if (VF == 2 \|\| (VF == 4 && !ST->hasVLX()))
Scalarize = true;		Scalarize = true;

if (Scalarize)		if (Scalarize)
return getGSScalarCost(Opcode, SrcVTy, VariableMask, Alignment, AddressSpace);		return getGSScalarCost(Opcode, SrcVTy, VariableMask, Alignment, AddressSpace);

return getGSVectorCost(Opcode, SrcVTy, Ptr, Alignment, AddressSpace);		return getGSVectorCost(Opcode, SrcVTy, Ptr, Alignment, AddressSpace);
}		}

		bool X86TTIImpl::isLSRCostLower(TargetTransformInfo::LSRCost &C1,
		TargetTransformInfo::LSRCost &C2) {
		// X86 specific here are "instruction number 1st priority".
		return std::tie(C1.Insns, C1.NumRegs, C1.AddRecCost,
		C1.NumIVMuls, C1.NumBaseAdds,
		C1.ScaleCost, C1.ImmCost, C1.SetupCost) <
		std::tie(C2.Insns, C2.NumRegs, C2.AddRecCost,
		C2.NumIVMuls, C2.NumBaseAdds,
		C2.ScaleCost, C2.ImmCost, C2.SetupCost);
		qcolombetUnsubmitted Not Done Reply Inline Actions Do you have data to support that heuristic? Like Wei said, I suspect this may lead to pretty bad side effect where we will increase the register pressure by a lot to save a few instruction. So before we switch the default, I want supportive evidence that this is general goodness. For the record, we discuss with Wei this cost model issue and I still have on my todolist a better register pressure estimation for the loop. E.g., we can ignore NumRegs as long as it is below the regpressure. qcolombet: Do you have data to support that heuristic? Like Wei said, I suspect this may lead to pretty…
		evstupacAuthorUnsubmitted Not Done Reply Inline Actions Do you have data to support that heuristic? Yes on spec2000: 177.mesa on -O2 +3% 256.bzip2 on -Ofast -flto +1.5% There are gains on EEMBC tests. I don't see any significant (>2% regressions). And I'm waiting for data from Wei. However I agree that for better testing we can introduce an option like -lsr-count-insns. E.g., we can ignore NumRegs as long as it is below the regpressure. That is roughly estimated in my patch. When TTI.getNumberOfRegisters(false) - 1 is exceeded we start to increment instruction counter (1 insn per 1 new register). In most cases that is enough to take a solution with low register pressure. evstupac: >Do you have data to support that heuristic? Yes on spec2000: 177.mesa on -O2 +3% 256.bzip2 on…
		}

bool X86TTIImpl::isLegalMaskedLoad(Type *DataTy) {		bool X86TTIImpl::isLegalMaskedLoad(Type *DataTy) {
Type *ScalarTy = DataTy->getScalarType();		Type *ScalarTy = DataTy->getScalarType();
int DataWidth = isa<PointerType>(ScalarTy) ?		int DataWidth = isa<PointerType>(ScalarTy) ?
DL.getPointerSizeInBits() : ScalarTy->getPrimitiveSizeInBits();		DL.getPointerSizeInBits() : ScalarTy->getPrimitiveSizeInBits();

return ((DataWidth == 32 \|\| DataWidth == 64) && ST->hasAVX()) \|\|		return ((DataWidth == 32 \|\| DataWidth == 64) && ST->hasAVX()) \|\|
((DataWidth == 8 \|\| DataWidth == 16) && ST->hasBWI());		((DataWidth == 8 \|\| DataWidth == 16) && ST->hasBWI());
}		}
▲ Show 20 Lines • Show All 52 Lines • Show Last 20 Lines

lib/Transforms/Scalar/LoopStrengthReduce.cpp

Show First 20 Lines • Show All 317 Lines • ▼ Show 20 Lines	struct Formula {
void initialMatch(const SCEV S, Loop L, ScalarEvolution &SE);		void initialMatch(const SCEV S, Loop L, ScalarEvolution &SE);

bool isCanonical() const;		bool isCanonical() const;

void canonicalize();		void canonicalize();

bool unscale();		bool unscale();

		bool hasZeroEnd() const;

size_t getNumRegs() const;		size_t getNumRegs() const;
Type *getType() const;		Type *getType() const;

void deleteBaseReg(const SCEV *&S);		void deleteBaseReg(const SCEV *&S);

bool referencesReg(const SCEV *S) const;		bool referencesReg(const SCEV *S) const;
bool hasRegsUsedByUsesOtherThan(size_t LUIdx,		bool hasRegsUsedByUsesOtherThan(size_t LUIdx,
const RegUseTracker &RegUses) const;		const RegUseTracker &RegUses) const;
▲ Show 20 Lines • Show All 118 Lines • ▼ Show 20 Lines	bool Formula::unscale() {
if (Scale != 1)		if (Scale != 1)
return false;		return false;
Scale = 0;		Scale = 0;
BaseRegs.push_back(ScaledReg);		BaseRegs.push_back(ScaledReg);
ScaledReg = nullptr;		ScaledReg = nullptr;
return true;		return true;
}		}

		bool Formula::hasZeroEnd() const {
		if (UnfoldedOffset \|\| BaseOffset)
		return false;
		if (BaseRegs.size() != 1 \|\| ScaledReg)
		return false;
		return true;
		}

/// Return the total number of register operands used by this formula. This does		/// Return the total number of register operands used by this formula. This does
/// not include register uses implied by non-constant addrec strides.		/// not include register uses implied by non-constant addrec strides.
size_t Formula::getNumRegs() const {		size_t Formula::getNumRegs() const {
return !!ScaledReg + BaseRegs.size();		return !!ScaledReg + BaseRegs.size();
}		}

/// Return the type of this formula, if it has one, or null otherwise. This type		/// Return the type of this formula, if it has one, or null otherwise. This type
/// is meaningless except for the bit size.		/// is meaningless except for the bit size.
▲ Show 20 Lines • Show All 382 Lines • ▼ Show 20 Lines	while (!DeadInsts.empty()) {

I->eraseFromParent();		I->eraseFromParent();
Changed = true;		Changed = true;
}		}

return Changed;		return Changed;
}		}

		/// Returns true if A and B has same constant value.
		///
		static bool hasSameConstValue(const SCEV A, const SCEV B) {
		if (const SCEVConstant *AC = dyn_cast<SCEVConstant>(A))
		if (const SCEVConstant *BC = dyn_cast<SCEVConstant>(B))
		return APInt::isSameValue(AC->getAPInt(), BC->getAPInt());
		return false;
		}

namespace {		namespace {

class LSRUse;		class LSRUse;

} // end anonymous namespace		} // end anonymous namespace

/// \brief Check if the addressing mode defined by \p F is completely		/// \brief Check if the addressing mode defined by \p F is completely
/// folded in \p LU at isel time.		/// folded in \p LU at isel time.
Show All 11 Lines	static unsigned getScalingFactorCost(const TargetTransformInfo &TTI,
const LSRUse &LU, const Formula &F);		const LSRUse &LU, const Formula &F);

namespace {		namespace {

/// This class is used to measure and compare candidate formulae.		/// This class is used to measure and compare candidate formulae.
class Cost {		class Cost {
/// TODO: Some of these could be merged. Also, a lexical ordering		/// TODO: Some of these could be merged. Also, a lexical ordering
/// isn't always optimal.		/// isn't always optimal.
unsigned NumRegs;		TargetTransformInfo::LSRCost C;
unsigned AddRecCost;
unsigned NumIVMuls;
unsigned NumBaseAdds;
unsigned ImmCost;
unsigned SetupCost;
unsigned ScaleCost;

public:		public:
Cost()		Cost() {
: NumRegs(0), AddRecCost(0), NumIVMuls(0), NumBaseAdds(0), ImmCost(0),		C.Insns = 0;
SetupCost(0), ScaleCost(0) {}		C.NumRegs = 0;
		C.AddRecCost = 0;
		C.NumIVMuls = 0;
		C.NumBaseAdds = 0;
		C.ImmCost = 0;
		C.SetupCost = 0;
		C.ScaleCost = 0;
		}

		wmiUnsubmitted Not Done Reply Inline Actions Better to use member initializer list? Cost() : C({0, 0, ... ,0}) {} wmi: Better to use member initializer list? Cost() : C({0, 0, ... ,0}) {}
		evstupacAuthorUnsubmitted Not Done Reply Inline Actions Yes. Will fix. evstupac: Yes. Will fix.
bool operator<(const Cost &Other) const;		bool isLower(Cost &Other, const TargetTransformInfo &TTI);

void Lose();		void Lose();

#ifndef NDEBUG		#ifndef NDEBUG
// Once any of the metrics loses, they must all remain losers.		// Once any of the metrics loses, they must all remain losers.
bool isValid() {		bool isValid() {
return ((NumRegs \| AddRecCost \| NumIVMuls \| NumBaseAdds		return ((C.Insns \| C.NumRegs \| C.AddRecCost \| C.NumIVMuls
\| ImmCost \| SetupCost \| ScaleCost) != ~0u)		\| C.NumBaseAdds \| C.ImmCost \| C.SetupCost
\|\| ((NumRegs & AddRecCost & NumIVMuls & NumBaseAdds		\| C.ScaleCost) != ~0u)
& ImmCost & SetupCost & ScaleCost) == ~0u);		\|\| ((C.Insns & C.NumRegs & C.AddRecCost & C.NumIVMuls
		& C.NumBaseAdds & C.ImmCost & C.SetupCost
		& C.ScaleCost) == ~0u);
}		}
#endif		#endif

bool isLoser() {		bool isLoser() {
assert(isValid() && "invalid cost");		assert(isValid() && "invalid cost");
return NumRegs == ~0u;		return C.NumRegs == ~0u;
}		}

void RateFormula(const TargetTransformInfo &TTI,		void RateFormula(const TargetTransformInfo &TTI,
const Formula &F,		const Formula &F,
SmallPtrSetImpl<const SCEV *> &Regs,		SmallPtrSetImpl<const SCEV *> &Regs,
const DenseSet<const SCEV *> &VisitedRegs,		const DenseSet<const SCEV *> &VisitedRegs,
const Loop *L,		const Loop *L,
ScalarEvolution &SE, DominatorTree &DT,		ScalarEvolution &SE, DominatorTree &DT,
▲ Show 20 Lines • Show All 167 Lines • ▼ Show 20 Lines	if (AR->getLoop() != L) {
// If the AddRec exists, consider it's register free and leave it alone.		// If the AddRec exists, consider it's register free and leave it alone.
if (isExistingPhi(AR, SE))		if (isExistingPhi(AR, SE))
return;		return;

// Otherwise, do not consider this formula at all.		// Otherwise, do not consider this formula at all.
Lose();		Lose();
return;		return;
}		}
AddRecCost += 1; /// TODO: This should be a function of the stride.		C.AddRecCost += 1; /// TODO: This should be a function of the stride.

// Add the step value register, if it needs one.		// Add the step value register, if it needs one.
// TODO: The non-affine case isn't precisely modeled here.		// TODO: The non-affine case isn't precisely modeled here.
if (!AR->isAffine() \|\| !isa<SCEVConstant>(AR->getOperand(1))) {		if (!AR->isAffine() \|\| !isa<SCEVConstant>(AR->getOperand(1))) {
if (!Regs.count(AR->getOperand(1))) {		if (!Regs.count(AR->getOperand(1))) {
RateRegister(AR->getOperand(1), Regs, L, SE, DT);		RateRegister(AR->getOperand(1), Regs, L, SE, DT);
if (isLoser())		if (isLoser())
return;		return;
}		}
}		}
}		}
++NumRegs;		++C.NumRegs;

// Rough heuristic; favor registers which don't require extra setup		// Rough heuristic; favor registers which don't require extra setup
// instructions in the preheader.		// instructions in the preheader.
if (!isa<SCEVUnknown>(Reg) &&		if (!isa<SCEVUnknown>(Reg) &&
!isa<SCEVConstant>(Reg) &&		!isa<SCEVConstant>(Reg) &&
!(isa<SCEVAddRecExpr>(Reg) &&		!(isa<SCEVAddRecExpr>(Reg) &&
(isa<SCEVUnknown>(cast<SCEVAddRecExpr>(Reg)->getStart()) \|\|		(isa<SCEVUnknown>(cast<SCEVAddRecExpr>(Reg)->getStart()) \|\|
isa<SCEVConstant>(cast<SCEVAddRecExpr>(Reg)->getStart()))))		isa<SCEVConstant>(cast<SCEVAddRecExpr>(Reg)->getStart()))))
++SetupCost;		++C.SetupCost;

NumIVMuls += isa<SCEVMulExpr>(Reg) &&		C.NumIVMuls += isa<SCEVMulExpr>(Reg) &&
SE.hasComputableLoopEvolution(Reg, L);		SE.hasComputableLoopEvolution(Reg, L);
}		}

/// Record this register in the set. If we haven't seen it before, rate		/// Record this register in the set. If we haven't seen it before, rate
/// it. Optional LoserRegs provides a way to declare any formula that refers to		/// it. Optional LoserRegs provides a way to declare any formula that refers to
/// one of those regs an instant loser.		/// one of those regs an instant loser.
void Cost::RatePrimaryRegister(const SCEV *Reg,		void Cost::RatePrimaryRegister(const SCEV *Reg,
SmallPtrSetImpl<const SCEV *> &Regs,		SmallPtrSetImpl<const SCEV *> &Regs,
Show All 16 Lines	void Cost::RateFormula(const TargetTransformInfo &TTI,
SmallPtrSetImpl<const SCEV *> &Regs,		SmallPtrSetImpl<const SCEV *> &Regs,
const DenseSet<const SCEV *> &VisitedRegs,		const DenseSet<const SCEV *> &VisitedRegs,
const Loop *L,		const Loop *L,
ScalarEvolution &SE, DominatorTree &DT,		ScalarEvolution &SE, DominatorTree &DT,
const LSRUse &LU,		const LSRUse &LU,
SmallPtrSetImpl<const SCEV > LoserRegs) {		SmallPtrSetImpl<const SCEV > LoserRegs) {
assert(F.isCanonical() && "Cost is accurate only for canonical formula");		assert(F.isCanonical() && "Cost is accurate only for canonical formula");
// Tally up the registers.		// Tally up the registers.
		unsigned AddRecCost = C.AddRecCost;
		unsigned NumRegs = C.NumRegs;
		unsigned NumBaseAdds = C.NumBaseAdds;
if (const SCEV *ScaledReg = F.ScaledReg) {		if (const SCEV *ScaledReg = F.ScaledReg) {
if (VisitedRegs.count(ScaledReg)) {		if (VisitedRegs.count(ScaledReg)) {
Lose();		Lose();
return;		return;
}		}
RatePrimaryRegister(ScaledReg, Regs, L, SE, DT, LoserRegs);		RatePrimaryRegister(ScaledReg, Regs, L, SE, DT, LoserRegs);
if (isLoser())		if (isLoser())
return;		return;
}		}
for (const SCEV *BaseReg : F.BaseRegs) {		for (const SCEV *BaseReg : F.BaseRegs) {
if (VisitedRegs.count(BaseReg)) {		if (VisitedRegs.count(BaseReg)) {
Lose();		Lose();
return;		return;
}		}
RatePrimaryRegister(BaseReg, Regs, L, SE, DT, LoserRegs);		RatePrimaryRegister(BaseReg, Regs, L, SE, DT, LoserRegs);
if (isLoser())		if (isLoser())
return;		return;
}		}

		// Treat every new register that exceeds TTI.getNumberOfRegisters() - 1 as
		// additional instruction.
		unsigned TTIRegNum = TTI.getNumberOfRegisters(false) - 1;
		if (C.NumRegs > TTIRegNum) {
		// Cost already exceeded TTIRegNum, then only newly added register can add
		// new instructions.
		if (NumRegs > TTIRegNum)
		C.Insns += (C.NumRegs - NumRegs);
		else
		C.Insns += (C.NumRegs - TTIRegNum);
		}

		wmiUnsubmitted Not Done Reply Inline Actions C.NumRegs only calculate the regs used in induction var expr so it is not a good estimation for register number used in the loop. It can be much less than the real register number used. We can have a utility like that in vectorization to get a better register number estimation used in the loop, but that can be in a further enhancement. Before we have such a utility in place, I would rather conservatively think every loop has high register pressure, and always add C.NumRegs into C.Insns. I think it avoids the case that LSR significantly increase register pressure just to reduce one addinc. wmi: C.NumRegs only calculate the regs used in induction var expr so it is not a good estimation for…
		evstupacAuthorUnsubmitted Not Done Reply Inline Actions C.NumRegs only calculate the regs used in induction var expr so it is not a good estimation for register number used in the loop. It can be much less than the real register number used. That is right. But we can say for sure that when we exceed TTIRegNum for a solution, we'll get at least fill. Why not to do this if already have NumRegs used by solution. I like the idea to get better estimation. However let's leave this for a separate patch. evstupac: >C.NumRegs only calculate the regs used in induction var expr so it is not a good estimation…
// Determine how many (unfolded) adds we'll need inside the loop.		// Determine how many (unfolded) adds we'll need inside the loop.
size_t NumBaseParts = F.getNumRegs();		size_t NumBaseParts = F.getNumRegs();
if (NumBaseParts > 1)		if (NumBaseParts > 1)
// Do not count the base and a possible second register if the target		// Do not count the base and a possible second register if the target
// allows to fold 2 registers.		// allows to fold 2 registers.
NumBaseAdds +=		C.NumBaseAdds +=
NumBaseParts - (1 + (F.Scale && isAMCompletelyFolded(TTI, LU, F)));		NumBaseParts - (1 + (F.Scale && isAMCompletelyFolded(TTI, LU, F)));
NumBaseAdds += (F.UnfoldedOffset != 0);		C.NumBaseAdds += (F.UnfoldedOffset != 0);

// Accumulate non-free scaling amounts.		// Accumulate non-free scaling amounts.
ScaleCost += getScalingFactorCost(TTI, LU, F);		C.ScaleCost += getScalingFactorCost(TTI, LU, F);

// Tally up the non-zero immediates.		// Tally up the non-zero immediates.
for (const LSRFixup &Fixup : LU.Fixups) {		for (const LSRFixup &Fixup : LU.Fixups) {
int64_t O = Fixup.Offset;		int64_t O = Fixup.Offset;
int64_t Offset = (uint64_t)O + F.BaseOffset;		int64_t Offset = (uint64_t)O + F.BaseOffset;
if (F.BaseGV)		if (F.BaseGV)
ImmCost += 64; // Handle symbolic values conservatively.		C.ImmCost += 64; // Handle symbolic values conservatively.
// TODO: This should probably be the pointer size.		// TODO: This should probably be the pointer size.
else if (Offset != 0)		else if (Offset != 0)
ImmCost += APInt(64, Offset, true).getMinSignedBits();		C.ImmCost += APInt(64, Offset, true).getMinSignedBits();

// Check with target if this offset with this instruction is		// Check with target if this offset with this instruction is
// specifically not supported.		// specifically not supported.
if ((isa<LoadInst>(Fixup.UserInst) \|\| isa<StoreInst>(Fixup.UserInst)) &&		if ((isa<LoadInst>(Fixup.UserInst) \|\| isa<StoreInst>(Fixup.UserInst)) &&
!TTI.isFoldableMemAccessOffset(Fixup.UserInst, Offset))		!TTI.isFoldableMemAccessOffset(Fixup.UserInst, Offset))
NumBaseAdds++;		C.NumBaseAdds++;
}		}

		// Each new AddRec adds 1 instruction to calculation.
		C.Insns += (C.AddRecCost - AddRecCost);
		// ICmpZero adds no Insns if it ends with zero.
		if (LU.Kind == LSRUse::ICmpZero && !F.hasZeroEnd())
		C.Insns++;
		// BaseAdds adds instructions for unfolded registers.
		if (LU.Kind != LSRUse::ICmpZero)
		C.Insns += C.NumBaseAdds - NumBaseAdds;
assert(isValid() && "invalid cost");		assert(isValid() && "invalid cost");
}		}

/// Set this cost to a losing value.		/// Set this cost to a losing value.
void Cost::Lose() {		void Cost::Lose() {
NumRegs = ~0u;		C.Insns = ~0u;
AddRecCost = ~0u;		C.NumRegs = ~0u;
NumIVMuls = ~0u;		C.AddRecCost = ~0u;
NumBaseAdds = ~0u;		C.NumIVMuls = ~0u;
ImmCost = ~0u;		C.NumBaseAdds = ~0u;
SetupCost = ~0u;		C.ImmCost = ~0u;
ScaleCost = ~0u;		C.SetupCost = ~0u;
		C.ScaleCost = ~0u;
}		}

/// Choose the lower cost.		/// Choose the lower cost.
bool Cost::operator<(const Cost &Other) const {		bool Cost::isLower(Cost &Other, const TargetTransformInfo &TTI) {
return std::tie(NumRegs, AddRecCost, NumIVMuls, NumBaseAdds, ScaleCost,		return TTI.isLSRCostLower(C, Other.C);
ImmCost, SetupCost) <
std::tie(Other.NumRegs, Other.AddRecCost, Other.NumIVMuls,
Other.NumBaseAdds, Other.ScaleCost, Other.ImmCost,
Other.SetupCost);
}		}

void Cost::print(raw_ostream &OS) const {		void Cost::print(raw_ostream &OS) const {
OS << NumRegs << " reg" << (NumRegs == 1 ? "" : "s");		OS << C.Insns << " instruction" << (C.Insns == 1 ? " " : "s ");
if (AddRecCost != 0)		OS << C.NumRegs << " reg" << (C.NumRegs == 1 ? "" : "s");
OS << ", with addrec cost " << AddRecCost;		if (C.AddRecCost != 0)
if (NumIVMuls != 0)		OS << ", with addrec cost " << C.AddRecCost;
OS << ", plus " << NumIVMuls << " IV mul" << (NumIVMuls == 1 ? "" : "s");		if (C.NumIVMuls != 0)
if (NumBaseAdds != 0)		OS << ", plus " << C.NumIVMuls << " IV mul"
OS << ", plus " << NumBaseAdds << " base add"		<< (C.NumIVMuls == 1 ? "" : "s");
<< (NumBaseAdds == 1 ? "" : "s");		if (C.NumBaseAdds != 0)
if (ScaleCost != 0)		OS << ", plus " << C.NumBaseAdds << " base add"
OS << ", plus " << ScaleCost << " scale cost";		<< (C.NumBaseAdds == 1 ? "" : "s");
if (ImmCost != 0)		if (C.ScaleCost != 0)
OS << ", plus " << ImmCost << " imm cost";		OS << ", plus " << C.ScaleCost << " scale cost";
if (SetupCost != 0)		if (C.ImmCost != 0)
OS << ", plus " << SetupCost << " setup cost";		OS << ", plus " << C.ImmCost << " imm cost";
		if (C.SetupCost != 0)
		OS << ", plus " << C.SetupCost << " setup cost";
}		}

LLVM_DUMP_METHOD		LLVM_DUMP_METHOD
void Cost::dump() const {		void Cost::dump() const {
print(errs()); errs() << '\n';		print(errs()); errs() << '\n';
}		}

LSRFixup::LSRFixup()		LSRFixup::LSRFixup()
▲ Show 20 Lines • Show All 511 Lines • ▼ Show 20 Lines	void GenerateConstantOffsetsImpl(LSRUse &LU, unsigned LUIdx,
const Formula &Base,		const Formula &Base,
const SmallVectorImpl<int64_t> &Worklist,		const SmallVectorImpl<int64_t> &Worklist,
size_t Idx, bool IsScaledReg = false);		size_t Idx, bool IsScaledReg = false);
void GenerateConstantOffsets(LSRUse &LU, unsigned LUIdx, Formula Base);		void GenerateConstantOffsets(LSRUse &LU, unsigned LUIdx, Formula Base);
void GenerateICmpZeroScales(LSRUse &LU, unsigned LUIdx, Formula Base);		void GenerateICmpZeroScales(LSRUse &LU, unsigned LUIdx, Formula Base);
void GenerateScales(LSRUse &LU, unsigned LUIdx, Formula Base);		void GenerateScales(LSRUse &LU, unsigned LUIdx, Formula Base);
void GenerateTruncates(LSRUse &LU, unsigned LUIdx, Formula Base);		void GenerateTruncates(LSRUse &LU, unsigned LUIdx, Formula Base);
void GenerateCrossUseConstantOffsets();		void GenerateCrossUseConstantOffsets();
		void GenerateCrossUseICmpZero();
void GenerateAllReuseFormulae();		void GenerateAllReuseFormulae();

void FilterOutUndesirableDedicatedRegisters();		void FilterOutUndesirableDedicatedRegisters();

size_t EstimateSearchSpaceComplexity() const;		size_t EstimateSearchSpaceComplexity() const;
void NarrowSearchSpaceByDetectingSupersets();		void NarrowSearchSpaceByDetectingSupersets();
void NarrowSearchSpaceByCollapsingUnrolledCode();		void NarrowSearchSpaceByCollapsingUnrolledCode();
void NarrowSearchSpaceByRefilteringUndesirableDedicatedRegisters();		void NarrowSearchSpaceByRefilteringUndesirableDedicatedRegisters();
▲ Show 20 Lines • Show All 1,917 Lines • ▼ Show 20 Lines	OS << "in formulae referencing " << *OrigReg << " in use " << LUIdx
<< " , add offset " << Imm;		<< " , add offset " << Imm;
}		}

LLVM_DUMP_METHOD		LLVM_DUMP_METHOD
void WorkItem::dump() const {		void WorkItem::dump() const {
print(errs()); errs() << '\n';		print(errs()); errs() << '\n';
}		}

		/// Look for ICmp AddRecExpr that ends with zero and try to reuse them in
		/// other formulas.
		/// For the following:
		/// ICmpZero {-40,+,4}
		/// Address {%a,+,4}
		/// Algorithm will add 1 Address Formula:
		/// ICmpZero {-40,+,4}
		/// Address {%a} + {0,+,4}
		/// 40 + {%a} + {-40,+,4}
		///
		void LSRInstance::GenerateCrossUseICmpZero() {
		SmallVector<const SCEV *, 8> Sequence;
		// Get all ICmpZero registers that ens with zero.
		for (LSRUse &LU : Uses) {
		if (LU.Kind != LSRUse::ICmpZero)
		continue;
		for (const Formula &F : LU.Formulae) {
		if (!F.hasZeroEnd())
		continue;
		const SCEVAddRecExpr *Reg = dyn_cast<SCEVAddRecExpr>(F.BaseRegs[0]);
		if (!Reg \|\| !isa<SCEVConstant>(Reg->getStart()))
		continue;
		Sequence.push_back(F.BaseRegs[0]);
		}
		}
		if (Sequence.empty())
		return;
		for (size_t LUIdx = 0, NumUses = Uses.size(); LUIdx != NumUses; ++LUIdx) {
		LSRUse &LU = Uses[LUIdx];
		if (LU.Kind == LSRUse::ICmpZero)
		continue;
		// If we found AddRecExpr register in LSR use that has same step,
		// try to make it the same by shifting constant start.
		for (const SCEV *CmpReg : Sequence) {
		const SCEVAddRecExpr *RegAR = cast<SCEVAddRecExpr>(CmpReg);
		const SCEVConstant *RegStart = cast<SCEVConstant>(RegAR->getStart());
		for (size_t L = 0, LE = LU.Formulae.size(); L != LE; ++L) {
		Formula F = LU.Formulae[L];
		F.unscale();
		Formula NewF = F;
		bool Changed = false;
		for (size_t N = 0, NE = F.BaseRegs.size(); N != NE; ++N) {
		const SCEVAddRecExpr *BaseRegAR =
		dyn_cast<SCEVAddRecExpr>(F.BaseRegs[N]);
		if (!BaseRegAR)
		continue;
		if (!hasSameConstValue(BaseRegAR->getStepRecurrence(SE),
		RegAR->getStepRecurrence(SE)))
		continue;
		const SCEVConstant *BaseRegStart =
		dyn_cast<SCEVConstant>(BaseRegAR->getStart());
		if (!BaseRegStart)
		continue;
		int64_t RegDiff = BaseRegStart->getAPInt().getSExtValue() -
		RegStart->getAPInt().getSExtValue();
		Type *IntTy = SE.getEffectiveSCEVType(F.BaseRegs[N]->getType());
		const SCEV *NegRegDiff =
		SE.getSCEV(ConstantInt::get(IntTy, -RegDiff));
		NewF.BaseOffset += RegDiff;
		if (!isLegalUse(TTI, LU.MinOffset, LU.MaxOffset,
		LU.Kind, LU.AccessTy, NewF)) {
		if (!TTI.isLegalAddImmediate((uint64_t)NewF.UnfoldedOffset +
		RegDiff))
		continue;
		NewF = F;
		NewF.UnfoldedOffset = (uint64_t)NewF.UnfoldedOffset + RegDiff;
		}
		NewF.BaseRegs[N] = SE.getAddExpr(NegRegDiff, F.BaseRegs[N]);
		Changed = true;
		}
		if (!Changed)
		continue;
		NewF.canonicalize();
		(void)InsertFormula(LU, LUIdx, NewF);
		}
		}
		}
		}

/// Look for registers which are a constant distance apart and try to form reuse		/// Look for registers which are a constant distance apart and try to form reuse
/// opportunities between them.		/// opportunities between them.
void LSRInstance::GenerateCrossUseConstantOffsets() {		void LSRInstance::GenerateCrossUseConstantOffsets() {
// Group the registers by their value without any added constant offset.		// Group the registers by their value without any added constant offset.
typedef std::map<int64_t, const SCEV *> ImmMapTy;		typedef std::map<int64_t, const SCEV *> ImmMapTy;
DenseMap<const SCEV *, ImmMapTy> Map;		DenseMap<const SCEV *, ImmMapTy> Map;
DenseMap<const SCEV *, SmallBitVector> UsedByIndicesMap;		DenseMap<const SCEV *, SmallBitVector> UsedByIndicesMap;
SmallVector<const SCEV *, 8> Sequence;		SmallVector<const SCEV *, 8> Sequence;
▲ Show 20 Lines • Show All 173 Lines • ▼ Show 20 Lines	for (size_t LUIdx = 0, NumUses = Uses.size(); LUIdx != NumUses; ++LUIdx) {
for (size_t i = 0, f = LU.Formulae.size(); i != f; ++i)		for (size_t i = 0, f = LU.Formulae.size(); i != f; ++i)
GenerateScales(LU, LUIdx, LU.Formulae[i]);		GenerateScales(LU, LUIdx, LU.Formulae[i]);
}		}
for (size_t LUIdx = 0, NumUses = Uses.size(); LUIdx != NumUses; ++LUIdx) {		for (size_t LUIdx = 0, NumUses = Uses.size(); LUIdx != NumUses; ++LUIdx) {
LSRUse &LU = Uses[LUIdx];		LSRUse &LU = Uses[LUIdx];
for (size_t i = 0, f = LU.Formulae.size(); i != f; ++i)		for (size_t i = 0, f = LU.Formulae.size(); i != f; ++i)
GenerateTruncates(LU, LUIdx, LU.Formulae[i]);		GenerateTruncates(LU, LUIdx, LU.Formulae[i]);
}		}
		GenerateCrossUseICmpZero();
GenerateCrossUseConstantOffsets();		GenerateCrossUseConstantOffsets();

DEBUG(dbgs() << "\n"		DEBUG(dbgs() << "\n"
"After generating reuse formulae:\n";		"After generating reuse formulae:\n";
print_uses(dbgs()));		print_uses(dbgs()));
}		}

/// If there are multiple formulae with the same set of registers used		/// If there are multiple formulae with the same set of registers used
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	for (size_t FIdx = 0, NumForms = LU.Formulae.size();
if (P.second)		if (P.second)
continue;		continue;

Formula &Best = LU.Formulae[P.first->second];		Formula &Best = LU.Formulae[P.first->second];

Cost CostBest;		Cost CostBest;
Regs.clear();		Regs.clear();
CostBest.RateFormula(TTI, Best, Regs, VisitedRegs, L, SE, DT, LU);		CostBest.RateFormula(TTI, Best, Regs, VisitedRegs, L, SE, DT, LU);
if (CostF < CostBest)		if (CostF.isLower(CostBest, TTI))
std::swap(F, Best);		std::swap(F, Best);
DEBUG(dbgs() << " Filtering out formula "; F.print(dbgs());		DEBUG(dbgs() << " Filtering out formula "; F.print(dbgs());
dbgs() << "\n"		dbgs() << "\n"
" in favor of formula "; Best.print(dbgs());		" in favor of formula "; Best.print(dbgs());
dbgs() << '\n');		dbgs() << '\n');
}		}
#ifndef NDEBUG		#ifndef NDEBUG
ChangedFormulae = true;		ChangedFormulae = true;
▲ Show 20 Lines • Show All 310 Lines • ▼ Show 20 Lines	if (NumReqRegsToFind != 0) {
continue;		continue;
}		}

// Evaluate the cost of the current formula. If it's already worse than		// Evaluate the cost of the current formula. If it's already worse than
// the current best, prune the search at that point.		// the current best, prune the search at that point.
NewCost = CurCost;		NewCost = CurCost;
NewRegs = CurRegs;		NewRegs = CurRegs;
NewCost.RateFormula(TTI, F, NewRegs, VisitedRegs, L, SE, DT, LU);		NewCost.RateFormula(TTI, F, NewRegs, VisitedRegs, L, SE, DT, LU);
if (NewCost < SolutionCost) {		if (NewCost.isLower(SolutionCost, TTI)) {
Workspace.push_back(&F);		Workspace.push_back(&F);
if (Workspace.size() != Uses.size()) {		if (Workspace.size() != Uses.size()) {
SolveRecurse(Solution, SolutionCost, Workspace, NewCost,		SolveRecurse(Solution, SolutionCost, Workspace, NewCost,
NewRegs, VisitedRegs);		NewRegs, VisitedRegs);
if (F.getNumRegs() == 1 && Workspace.size() == 1)		if (F.getNumRegs() == 1 && Workspace.size() == 1)
VisitedRegs.insert(F.ScaledReg ? F.ScaledReg : F.BaseRegs[0]);		VisitedRegs.insert(F.ScaledReg ? F.ScaledReg : F.BaseRegs[0]);
} else {		} else {
DEBUG(dbgs() << "New best at "; NewCost.print(dbgs());		DEBUG(dbgs() << "New best at "; NewCost.print(dbgs());
▲ Show 20 Lines • Show All 788 Lines • Show Last 20 Lines

test/CodeGen/X86/2006-05-11-InstrSched.ll

	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: llc < %s -march=x86 -mtriple=i386-linux-gnu -mcpu=penryn -mattr=+sse2 -stats 2>&1 \| \			; RUN: llc < %s -march=x86 -mtriple=i386-linux-gnu -mcpu=penryn -mattr=+sse2 -stats 2>&1 \| \
	; RUN: grep "asm-printer" \| grep 35			; RUN: grep "asm-printer" \| grep 33

	target datalayout = "e-p:32:32"			target datalayout = "e-p:32:32"
	define void @foo(i32* %mc, i32* %bp, i32* %ms, i32* %xmb, i32* %mpp, i32* %tpmm, i32* %ip, i32* %tpim, i32* %dpp, i32* %tpdm, i32* %bpi, i32 %M) nounwind {			define void @foo(i32* %mc, i32* %bp, i32* %ms, i32* %xmb, i32* %mpp, i32* %tpmm, i32* %ip, i32* %tpim, i32* %dpp, i32* %tpdm, i32* %bpi, i32 %M) nounwind {
	entry:			entry:
	%tmp9 = icmp slt i32 %M, 5 ; <i1> [#uses=1]			%tmp9 = icmp slt i32 %M, 5 ; <i1> [#uses=1]
	br i1 %tmp9, label %return, label %cond_true			br i1 %tmp9, label %return, label %cond_true

	cond_true: ; preds = %cond_true, %entry			cond_true: ; preds = %cond_true, %entry
	▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

test/CodeGen/X86/atom-fixup-lea3.ll

	; RUN: llc < %s -mcpu=atom -mtriple=i686-linux \| FileCheck %s			; RUN: llc < %s -mcpu=atom -mtriple=i686-linux \| FileCheck %s
	; CHECK: addl ([[reg:%[a-z]+]])			; CHECK: addl ({{%[a-z]+}},[[reg:%[a-z]+]],4)
	; CHECK-NEXT: addl $4, [[reg]]			; CHECK-NEXT: movl
				; CHECK-NEXT: addl 4({{%[a-z]+}},[[reg:%[a-z]+]],4)
				; CHECK-NEXT: incl

	; Test for the FixupLEAs pre-emit pass.			; Test for the FixupLEAs pre-emit pass.
	; An LEA should NOT be substituted for the ADD instruction			; An LEA should NOT be substituted for the ADD instruction
	; that increments the array pointer if it is greater than 5 instructions			; that increments the array pointer if it is greater than 5 instructions
	; away from the memory reference that uses it.			; away from the memory reference that uses it.

	; Original C code: clang -m32 -S -O2			; Original C code: clang -m32 -S -O2
	;int test(int n, int * array, int * m, int * array2)			;int test(int n, int * array, int * m, int * array2)
	;{			;{
	; int i, j = 0;			; int i, j = 0;
	; int sum = 0;			; int sum = 0;
	; for (i = 0, j = 0; i < n;) {			; for (i = 0, j = 0; i < n;) {
	; ++i;			; ++i;
	; *m += array2[j++];			; *m += array2[j++];
	; sum += array[i];			; sum += array[i];
	; }			; }
	; return sum;			; return sum;
	;}			;}

	define i32 @test(i32 %n, i32* nocapture %array, i32* nocapture %m, i32* nocapture %array2) #0 {			define i32 @test(i32 %n, i32* nocapture %array, i32* nocapture %k, i32* nocapture %l, i32* nocapture %m, i32* nocapture %array2) #0 {
	entry:			entry:
	%cmp7 = icmp sgt i32 %n, 0			%cmp7 = icmp sgt i32 %n, 0
	br i1 %cmp7, label %for.body.lr.ph, label %for.end			br i1 %cmp7, label %for.body.lr.ph, label %for.end

	for.body.lr.ph: ; preds = %entry			for.body.lr.ph: ; preds = %entry
	%.pre = load i32, i32* %m, align 4			%.pre = load i32, i32* %m, align 4
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %for.body.lr.ph			for.body: ; preds = %for.body, %for.body.lr.ph
	%0 = phi i32 [ %.pre, %for.body.lr.ph ], [ %add, %for.body ]			%0 = phi i32 [ %.pre, %for.body.lr.ph ], [ %add, %for.body ]
	%sum.010 = phi i32 [ 0, %for.body.lr.ph ], [ %add3, %for.body ]			%sum.010 = phi i32 [ 0, %for.body.lr.ph ], [ %add3, %for.body ]
	%j.09 = phi i32 [ 0, %for.body.lr.ph ], [ %inc1, %for.body ]			%j.09 = phi i32 [ 0, %for.body.lr.ph ], [ %inc1, %for.body ]
	%inc1 = add nsw i32 %j.09, 1			%inc1 = add nsw i32 %j.09, 1
	%arrayidx = getelementptr inbounds i32, i32* %array2, i32 %j.09			%arrayidx = getelementptr inbounds i32, i32* %array2, i32 %j.09
				store i32 %0, i32* %m, align 4
				store i32 %sum.010, i32* %m, align 4
				store i32 %0, i32* %m, align 4
	%1 = load i32, i32* %arrayidx, align 4			%1 = load i32, i32* %arrayidx, align 4
	%add = add nsw i32 %0, %1			%add = add nsw i32 %0, %1
	store i32 %add, i32* %m, align 4			store i32 %add, i32* %m, align 4
	%arrayidx2 = getelementptr inbounds i32, i32* %array, i32 %inc1			%arrayidx2 = getelementptr inbounds i32, i32* %array, i32 %inc1
	%2 = load i32, i32* %arrayidx2, align 4			%2 = load i32, i32* %arrayidx2, align 4
	%add3 = add nsw i32 %2, %sum.010			%add3 = add nsw i32 %2, %sum.010
	%exitcond = icmp eq i32 %inc1, %n			%exitcond = icmp eq i32 %inc1, %n
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end: ; preds = %for.body, %entry			for.end: ; preds = %for.body, %entry
	%sum.0.lcssa = phi i32 [ 0, %entry ], [ %add3, %for.body ]			%sum.0.lcssa = phi i32 [ 0, %entry ], [ %add3, %for.body ]
	ret i32 %sum.0.lcssa			ret i32 %sum.0.lcssa
	}			}

test/CodeGen/X86/avoid_complex_am.ll

	; RUN: opt -S -loop-reduce < %s \| FileCheck %s			; RUN: opt -S -loop-reduce < %s \| FileCheck %s
	; Complex addressing mode are costly.			; Complex addressing mode are costly.
	; Make loop-reduce prefer unscaled accesses.			; Make loop-reduce prefer unscaled accesses.
	; On X86, reg1 + 1reg2 has the same cost as reg1 + 8reg2.			; On X86, reg1 + 1reg2 has the same cost as reg1 + 8reg2.
	; Therefore, LSR currently prefers to fold as much computation as possible			; Therefore, LSR currently prefers to fold as much computation as possible
	; in the addressing mode.			; in the addressing mode.
	; <rdar://problem/16730541>			; <rdar://problem/16730541>
	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-macosx"			target triple = "x86_64-apple-macosx"

	define void @mulDouble(double* nocapture %a, double* nocapture %b, double* nocapture %c) {			define void @mulDouble(double* nocapture %a, double* nocapture %b, double* nocapture %c, i32 %n) {
	; CHECK: @mulDouble			; CHECK: @mulDouble
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	; CHECK: [[IV:%[^ ]+]] = phi i64 [ [[IVNEXT:%[^,]+]], %for.body ], [ 0, %entry ]			; CHECK: [[IV:%[^ ]+]] = phi i64 [ [[IVNEXT:%[^,]+]], %for.body ], [ 0, %entry ]
	; Only one induction variable should have been generated.			; Only one induction variable should have been generated.
	; CHECK-NOT: phi			; CHECK-NOT: phi
	%indvars.iv = phi i64 [ 1, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 1, %entry ], [ %indvars.iv.next, %for.body ]
	%tmp = add nsw i64 %indvars.iv, -1			%tmp = add nsw i64 %indvars.iv, -1
	%arrayidx = getelementptr inbounds double, double* %b, i64 %tmp			%arrayidx = getelementptr inbounds double, double* %b, i64 %tmp
	%tmp1 = load double, double* %arrayidx, align 8			%tmp1 = load double, double* %arrayidx, align 8
	; The induction variable should carry the scaling factor: 1.			; The induction variable should carry the scaling factor: 1.
	; CHECK: [[IVNEXT]] = add nuw nsw i64 [[IV]], 1			; CHECK: [[IVNEXT]] = add nuw nsw i64 [[IV]], 1
	%indvars.iv.next = add i64 %indvars.iv, 1			%indvars.iv.next = add i64 %indvars.iv, 1
	%arrayidx2 = getelementptr inbounds double, double* %c, i64 %indvars.iv.next			%arrayidx2 = getelementptr inbounds double, double* %c, i64 %indvars.iv.next
	%tmp2 = load double, double* %arrayidx2, align 8			%tmp2 = load double, double* %arrayidx2, align 8
	%mul = fmul double %tmp1, %tmp2			%mul = fmul double %tmp1, %tmp2
	%arrayidx4 = getelementptr inbounds double, double* %a, i64 %indvars.iv			%arrayidx4 = getelementptr inbounds double, double* %a, i64 %indvars.iv
	store double %mul, double* %arrayidx4, align 8			store double %mul, double* %arrayidx4, align 8
	%lftr.wideiv = trunc i64 %indvars.iv.next to i32			%lftr.wideiv = trunc i64 %indvars.iv.next to i32
	; Comparison should be 19 * 1 = 19.			%exitcond = icmp eq i32 %lftr.wideiv, %n
	; CHECK: icmp eq i32 {{%[^,]+}}, 19
	%exitcond = icmp eq i32 %lftr.wideiv, 20
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end: ; preds = %for.body			for.end: ; preds = %for.body
	ret void			ret void
	}			}

test/CodeGen/X86/compact-unwind.ll

	Show First 20 Lines • Show All 60 Lines • ▼ Show 20 Lines

	declare void @OSMemoryBarrier() optsize			declare void @OSMemoryBarrier() optsize

	; Test the code below uses UNWIND_X86_64_MODE_STACK_IMMD compact unwind			; Test the code below uses UNWIND_X86_64_MODE_STACK_IMMD compact unwind
	; encoding.			; encoding.

	; NOFP-CU: Entry at offset 0x20:			; NOFP-CU: Entry at offset 0x20:
	; NOFP-CU-NEXT: start: 0x1d _test1			; NOFP-CU-NEXT: start: 0x1d _test1
	; NOFP-CU-NEXT: length: 0x42			; NOFP-CU-NEXT: length: 0x4b
	; NOFP-CU-NEXT: compact encoding: 0x02040c0a			; NOFP-CU-NEXT: compact encoding: 0x02040c0a

	; NOFP-FROM-ASM: Entry at offset 0x20:			; NOFP-FROM-ASM: Entry at offset 0x20:
	; NOFP-FROM-ASM-NEXT: start: 0x1d _test1			; NOFP-FROM-ASM-NEXT: start: 0x1d _test1
	; NOFP-FROM-ASM-NEXT: length: 0x42			; NOFP-FROM-ASM-NEXT: length: 0x4b
	; NOFP-FROM-ASM-NEXT: compact encoding: 0x02040c0a			; NOFP-FROM-ASM-NEXT: compact encoding: 0x02040c0a

	define void @test1(%class.ImageLoader* %image) optsize ssp uwtable {			define void @test1(%class.ImageLoader* %image) optsize ssp uwtable {
	entry:			entry:
	br label %for.cond1.preheader			br label %for.cond1.preheader

	for.cond1.preheader: ; preds = %for.inc10, %entry			for.cond1.preheader: ; preds = %for.inc10, %entry
	%p.019 = phi %"struct.dyld::MappedRanges"* [ @G1, %entry ], [ %1, %for.inc10 ]			%p.019 = phi %"struct.dyld::MappedRanges"* [ @G1, %entry ], [ %1, %for.inc10 ]
	Show All 29 Lines

test/CodeGen/X86/full-lsr.ll

	; RUN: llc < %s -march=x86 -mcpu=generic \| FileCheck %s			; RUN: llc < %s -march=x86 -mcpu=generic \| FileCheck %s
	; RUN: llc < %s -march=x86 -mcpu=atom \| FileCheck -check-prefix=ATOM %s			; RUN: llc < %s -march=x86 -mcpu=atom \| FileCheck %s

	define void @foo(float* nocapture %A, float* nocapture %B, float* nocapture %C, i32 %N) nounwind {			define void @foo(float* nocapture %A, float* nocapture %B, float* nocapture %C, i32 %N) nounwind {
	; ATOM: foo
	; ATOM: addl
	; ATOM: addl
	; ATOM: leal

	; CHECK: foo			; CHECK: foo
	; CHECK: addl			; CHECK: incl
	; CHECK: addl
	; CHECK: addl

	entry:			entry:
	%0 = icmp sgt i32 %N, 0 ; <i1> [#uses=1]			%0 = icmp sgt i32 %N, 0 ; <i1> [#uses=1]
	br i1 %0, label %bb, label %return			br i1 %0, label %bb, label %return

	bb: ; preds = %bb, %entry			bb: ; preds = %bb, %entry
	%i.03 = phi i32 [ 0, %entry ], [ %indvar.next, %bb ] ; <i32> [#uses=5]			%i.03 = phi i32 [ 0, %entry ], [ %indvar.next, %bb ] ; <i32> [#uses=5]
	%1 = getelementptr float, float* %A, i32 %i.03 ; <float*> [#uses=1]			%1 = getelementptr float, float* %A, i32 %i.03 ; <float*> [#uses=1]
	Show All 21 Lines

test/CodeGen/X86/loop-strength-reduce4.ll

	; RUN: llc < %s -mtriple=i686-apple-darwin -relocation-model=static \| FileCheck %s -check-prefix=STATIC			; RUN: llc < %s -mtriple=i686-apple-darwin -relocation-model=static \| FileCheck %s -check-prefix=STATIC
	; RUN: llc < %s -mtriple=i686-apple-darwin -relocation-model=pic \| FileCheck %s -check-prefix=PIC			; RUN: llc < %s -mtriple=i686-apple-darwin -relocation-model=pic \| FileCheck %s -check-prefix=PIC

	; By starting the IV at -64 instead of 0, a cmp is eliminated,			; By starting the IV at -64 instead of 0, a cmp is eliminated,
	; as the flags from the add can be used directly.			; as the flags from the add can be used directly.

	; STATIC: movl $-64, [[ECX:%e..]]			; STATIC: movl $-64, [[EAX:%e..]]

	; STATIC: movl [[EAX:%e..]], _state+76([[ECX]])			; STATIC: movl %{{.+}}, _state+76([[EAX]])
	; STATIC: addl $16, [[ECX]]			; STATIC: addl $16, [[EAX]]
	; STATIC: jne			; STATIC: jne

	; In PIC mode the symbol can't be folded, so the change-compare-stride			; The same for PIC mode.
	; trick applies.

	; PIC: cmpl $64			; PIC: movl $-64, [[EAX:%e..]]

				; PIC: movl %{{.+}}, 76(%{{.+}},[[EAX]])
				; PIC: addl $16, [[EAX]]
				; PIC: jne

	@state = external global [0 x i32] ; <[0 x i32]*> [#uses=4]			@state = external global [0 x i32] ; <[0 x i32]*> [#uses=4]
	@S = external global [0 x i32] ; <[0 x i32]*> [#uses=4]			@S = external global [0 x i32] ; <[0 x i32]*> [#uses=4]

	define i32 @foo() nounwind {			define i32 @foo() nounwind {
	entry:			entry:
	br label %bb			br label %bb

	Show All 39 Lines

test/CodeGen/X86/masked-iv-safe.ll

; RUN: llc < %s -mcpu=generic -march=x86-64 \| FileCheck %s		; RUN: llc < %s -mcpu=generic -march=x86-64 \| FileCheck %s

; Optimize away zext-inreg and sext-inreg on the loop induction		; Optimize away zext-inreg and sext-inreg on the loop induction
; variable using trip-count information.		; variable using trip-count information.

; CHECK-LABEL: count_up		; CHECK-LABEL: count_up
; CHECK-NOT: {{and\|movz\|sar\|shl}}		; CHECK-NOT: {{and\|movz\|sar\|shl}}
; CHECK: incq		; CHECK: addq $8
; CHECK-NOT: {{and\|movz\|sar\|shl}}		; CHECK-NOT: {{and\|movz\|sar\|shl}}
; CHECK: jne		; CHECK: jne
define void @count_up(double* %d, i64 %n) nounwind {		define void @count_up(double* %d, i64 %n) nounwind {
entry:		entry:
br label %loop		br label %loop

loop:		loop:
%indvar = phi i64 [ 0, %entry ], [ %indvar.next, %loop ]		%indvar = phi i64 [ 0, %entry ], [ %indvar.next, %loop ]
Show All 16 Lines	loop:
br i1 %exitcond, label %return, label %loop		br i1 %exitcond, label %return, label %loop

return:		return:
ret void		ret void
}		}

; CHECK-LABEL: count_down		; CHECK-LABEL: count_down
; CHECK-NOT: {{and\|movz\|sar\|shl}}		; CHECK-NOT: {{and\|movz\|sar\|shl}}
; CHECK: addq		; CHECK: addq $-8
; CHECK-NOT: {{and\|movz\|sar\|shl}}		; CHECK-NOT: {{and\|movz\|sar\|shl}}
; CHECK: jne		; CHECK: jne
define void @count_down(double* %d, i64 %n) nounwind {		define void @count_down(double* %d, i64 %n) nounwind {
entry:		entry:
br label %loop		br label %loop

loop:		loop:
%indvar = phi i64 [ 10, %entry ], [ %indvar.next, %loop ]		%indvar = phi i64 [ 10, %entry ], [ %indvar.next, %loop ]
Show All 16 Lines	loop:
br i1 %exitcond, label %return, label %loop		br i1 %exitcond, label %return, label %loop

return:		return:
ret void		ret void
}		}

; CHECK-LABEL: count_up_signed		; CHECK-LABEL: count_up_signed
; CHECK-NOT: {{and\|movz\|sar\|shl}}		; CHECK-NOT: {{and\|movz\|sar\|shl}}
; CHECK: incq		; CHECK: addq $8
; CHECK-NOT: {{and\|movz\|sar\|shl}}		; CHECK-NOT: {{and\|movz\|sar\|shl}}
; CHECK: jne		; CHECK: jne
define void @count_up_signed(double* %d, i64 %n) nounwind {		define void @count_up_signed(double* %d, i64 %n) nounwind {
entry:		entry:
br label %loop		br label %loop

loop:		loop:
%indvar = phi i64 [ 0, %entry ], [ %indvar.next, %loop ]		%indvar = phi i64 [ 0, %entry ], [ %indvar.next, %loop ]
Show All 18 Lines	loop:
br i1 %exitcond, label %return, label %loop		br i1 %exitcond, label %return, label %loop

return:		return:
ret void		ret void
}		}

; CHECK-LABEL: count_down_signed		; CHECK-LABEL: count_down_signed
; CHECK-NOT: {{and\|movz\|sar\|shl}}		; CHECK-NOT: {{and\|movz\|sar\|shl}}
; CHECK: addq		; CHECK: addq $-8
; CHECK-NOT: {{and\|movz\|sar\|shl}}		; CHECK-NOT: {{and\|movz\|sar\|shl}}
; CHECK: jne		; CHECK: jne
define void @count_down_signed(double* %d, i64 %n) nounwind {		define void @count_down_signed(double* %d, i64 %n) nounwind {
entry:		entry:
br label %loop		br label %loop

loop:		loop:
%indvar = phi i64 [ 10, %entry ], [ %indvar.next, %loop ]		%indvar = phi i64 [ 10, %entry ], [ %indvar.next, %loop ]
Show All 18 Lines	loop:
br i1 %exitcond, label %return, label %loop		br i1 %exitcond, label %return, label %loop

return:		return:
ret void		ret void
}		}

; CHECK-LABEL: another_count_up		; CHECK-LABEL: another_count_up
; CHECK-NOT: {{and\|movz\|sar\|shl}}		; CHECK-NOT: {{and\|movz\|sar\|shl}}
; CHECK: addq		; CHECK: addq $8
; CHECK-NOT: {{and\|movz\|sar\|shl}}		; CHECK-NOT: {{and\|movz\|sar\|shl}}
; CHECK: jne		; CHECK: jne
define void @another_count_up(double* %d, i64 %n) nounwind {		define void @another_count_up(double* %d, i64 %n) nounwind {
entry:		entry:
br label %loop		br label %loop

loop:		loop:
%indvar = phi i64 [ 18446744073709551615, %entry ], [ %indvar.next, %loop ]		%indvar = phi i64 [ 18446744073709551615, %entry ], [ %indvar.next, %loop ]
Show All 16 Lines	loop:
br i1 %exitcond, label %return, label %loop		br i1 %exitcond, label %return, label %loop

return:		return:
ret void		ret void
}		}

; CHECK-LABEL: another_count_down		; CHECK-LABEL: another_count_down
; CHECK-NOT: {{and\|movz\|sar\|shl}}		; CHECK-NOT: {{and\|movz\|sar\|shl}}
; CHECK: addq $-8,		; CHECK: addq $-8
; CHECK-NOT: {{and\|movz\|sar\|shl}}		; CHECK-NOT: {{and\|movz\|sar\|shl}}
; CHECK: jne		; CHECK: jne
define void @another_count_down(double* %d, i64 %n) nounwind {		define void @another_count_down(double* %d, i64 %n) nounwind {
entry:		entry:
br label %loop		br label %loop

loop:		loop:
%indvar = phi i64 [ 0, %entry ], [ %indvar.next, %loop ]		%indvar = phi i64 [ 0, %entry ], [ %indvar.next, %loop ]
Show All 16 Lines	loop:
br i1 %exitcond, label %return, label %loop		br i1 %exitcond, label %return, label %loop

return:		return:
ret void		ret void
}		}

; CHECK-LABEL: another_count_up_signed		; CHECK-LABEL: another_count_up_signed
; CHECK-NOT: {{and\|movz\|sar\|shl}}		; CHECK-NOT: {{and\|movz\|sar\|shl}}
; CHECK: addq		; CHECK: addq $8
; CHECK-NOT: {{and\|movz\|sar\|shl}}		; CHECK-NOT: {{and\|movz\|sar\|shl}}
; CHECK: jne		; CHECK: jne
define void @another_count_up_signed(double* %d, i64 %n) nounwind {		define void @another_count_up_signed(double* %d, i64 %n) nounwind {
entry:		entry:
br label %loop		br label %loop

loop:		loop:
%indvar = phi i64 [ 18446744073709551615, %entry ], [ %indvar.next, %loop ]		%indvar = phi i64 [ 18446744073709551615, %entry ], [ %indvar.next, %loop ]
Show All 18 Lines	loop:
br i1 %exitcond, label %return, label %loop		br i1 %exitcond, label %return, label %loop

return:		return:
ret void		ret void
}		}

; CHECK-LABEL: another_count_down_signed		; CHECK-LABEL: another_count_down_signed
; CHECK-NOT: {{and\|movz\|sar\|shl}}		; CHECK-NOT: {{and\|movz\|sar\|shl}}
; CHECK: decq		; CHECK: addq $-8
; CHECK-NOT: {{and\|movz\|sar\|shl}}		; CHECK-NOT: {{and\|movz\|sar\|shl}}
; CHECK: jne		; CHECK: jne
define void @another_count_down_signed(double* %d, i64 %n) nounwind {		define void @another_count_down_signed(double* %d, i64 %n) nounwind {
entry:		entry:
br label %loop		br label %loop

loop:		loop:
%indvar = phi i64 [ 0, %entry ], [ %indvar.next, %loop ]		%indvar = phi i64 [ 0, %entry ], [ %indvar.next, %loop ]
Show All 23 Lines

test/CodeGen/X86/misched-matrix.ll

	Show All 10 Lines
	; Verify that the MI scheduler minimizes register pressure for a			; Verify that the MI scheduler minimizes register pressure for a
	; uniform set of bottom-up subtrees (unrolled matrix multiply).			; uniform set of bottom-up subtrees (unrolled matrix multiply).
	;			;
	; For current top-down heuristics, ensure that some folded imulls have			; For current top-down heuristics, ensure that some folded imulls have
	; been reordered with the stores. This tests the scheduler's cheap			; been reordered with the stores. This tests the scheduler's cheap
	; alias analysis ability (that doesn't require any AliasAnalysis pass).			; alias analysis ability (that doesn't require any AliasAnalysis pass).
	;			;
	; TOPDOWN-LABEL: %for.body			; TOPDOWN-LABEL: %for.body
	; TOPDOWN: movl %{{.*}}, (			; TOPDOWN: movl %{{.*}}, 64(
	; TOPDOWN: imull {{[0-9]*}}(			; TOPDOWN: imull {{[0-9]*}}(
	; TOPDOWN: movl %{{.*}}, 4(			; TOPDOWN: movl %{{.*}}, 68(
	; TOPDOWN: imull {{[0-9]*}}(			; TOPDOWN: imull {{[0-9]*}}(
	; TOPDOWN: movl %{{.*}}, 8(			; TOPDOWN: movl %{{.*}}, 72(
	; TOPDOWN: movl %{{.*}}, 12(			; TOPDOWN: movl %{{.*}}, 76(
	; TOPDOWN-LABEL: %for.end			; TOPDOWN-LABEL: %for.end
	;			;
	; For -misched=ilpmin, verify that each expression subtree is			; For -misched=ilpmin, verify that each expression subtree is
	; scheduled independently, and that the imull/adds are interleaved.			; scheduled independently, and that the imull/adds are interleaved.
	;			;
	; ILPMIN-LABEL: %for.body			; ILPMIN-LABEL: %for.body
	; ILPMIN: movl %{{.*}}, (			; ILPMIN: movl %{{.*}}, 64(
	; ILPMIN: imull			; ILPMIN: imull
	; ILPMIN: imull			; ILPMIN: imull
	; ILPMIN: addl			; ILPMIN: addl
	; ILPMIN: imull			; ILPMIN: imull
	; ILPMIN: addl			; ILPMIN: addl
	; ILPMIN: imull			; ILPMIN: imull
	; ILPMIN: addl			; ILPMIN: addl
	; ILPMIN: movl %{{.*}}, 4(			; ILPMIN: movl %{{.*}}, 68(
	; ILPMIN: imull			; ILPMIN: imull
	; ILPMIN: imull			; ILPMIN: imull
	; ILPMIN: addl			; ILPMIN: addl
	; ILPMIN: imull			; ILPMIN: imull
	; ILPMIN: addl			; ILPMIN: addl
	; ILPMIN: imull			; ILPMIN: imull
	; ILPMIN: addl			; ILPMIN: addl
	; ILPMIN: movl %{{.*}}, 8(			; ILPMIN: movl %{{.*}}, 72(
	; ILPMIN: imull			; ILPMIN: imull
	; ILPMIN: imull			; ILPMIN: imull
	; ILPMIN: addl			; ILPMIN: addl
	; ILPMIN: imull			; ILPMIN: imull
	; ILPMIN: addl			; ILPMIN: addl
	; ILPMIN: imull			; ILPMIN: imull
	; ILPMIN: addl			; ILPMIN: addl
	; ILPMIN: movl %{{.*}}, 12(			; ILPMIN: movl %{{.*}}, 76(
	; ILPMIN-LABEL: %for.end			; ILPMIN-LABEL: %for.end
	;			;
	; For -misched=ilpmax, verify that each expression subtree is			; For -misched=ilpmax, verify that each expression subtree is
	; scheduled independently, and that the imull/adds are clustered.			; scheduled independently, and that the imull/adds are clustered.
	;			;
	; ILPMAX-LABEL: %for.body			; ILPMAX-LABEL: %for.body
	; ILPMAX: movl %{{.*}}, (			; ILPMAX: movl %{{.*}}, 64(
	; ILPMAX: imull			; ILPMAX: imull
	; ILPMAX: imull			; ILPMAX: imull
	; ILPMAX: imull			; ILPMAX: imull
	; ILPMAX: imull			; ILPMAX: imull
	; ILPMAX: addl			; ILPMAX: addl
	; ILPMAX: addl			; ILPMAX: addl
	; ILPMAX: addl			; ILPMAX: addl
	; ILPMAX: movl %{{.*}}, 4(			; ILPMAX: movl %{{.*}}, 68(
	; ILPMAX: imull			; ILPMAX: imull
	; ILPMAX: imull			; ILPMAX: imull
	; ILPMAX: imull			; ILPMAX: imull
	; ILPMAX: imull			; ILPMAX: imull
	; ILPMAX: addl			; ILPMAX: addl
	; ILPMAX: addl			; ILPMAX: addl
	; ILPMAX: addl			; ILPMAX: addl
	; ILPMAX: movl %{{.*}}, 8(			; ILPMAX: movl %{{.*}}, 72(
	; ILPMAX: imull			; ILPMAX: imull
	; ILPMAX: imull			; ILPMAX: imull
	; ILPMAX: imull			; ILPMAX: imull
	; ILPMAX: imull			; ILPMAX: imull
	; ILPMAX: addl			; ILPMAX: addl
	; ILPMAX: addl			; ILPMAX: addl
	; ILPMAX: addl			; ILPMAX: addl
	; ILPMAX: movl %{{.*}}, 12(			; ILPMAX: movl %{{.*}}, 76(
	; ILPMAX-LABEL: %for.end			; ILPMAX-LABEL: %for.end

	define void @mmult([4 x i32]* noalias nocapture %m1, [4 x i32]* noalias nocapture %m2,			define void @mmult([4 x i32]* noalias nocapture %m1, [4 x i32]* noalias nocapture %m2,
	[4 x i32]* noalias nocapture %m3) nounwind uwtable ssp {			[4 x i32]* noalias nocapture %m3) nounwind uwtable ssp {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	▲ Show 20 Lines • Show All 97 Lines • Show Last 20 Lines

test/Transforms/LoopStrengthReduce/X86/ivchain-X86.ll

	Show First 20 Lines • Show All 156 Lines • ▼ Show 20 Lines
	; @foldedidx is an unrolled variant of this loop:			; @foldedidx is an unrolled variant of this loop:
	; for (unsigned long i = 0; i < len; i += s) {			; for (unsigned long i = 0; i < len; i += s) {
	; c[i] = a[i] + b[i];			; c[i] = a[i] + b[i];
	; }			; }
	; where 's' can be folded into the addressing mode.			; where 's' can be folded into the addressing mode.
	; Consequently, we should not form any chains.			; Consequently, we should not form any chains.
	;			;
	; X64: foldedidx:			; X64: foldedidx:
	; X64: movzbl -3(			; X64: movzbl 400(
	;			;
	; X32: foldedidx:			; X32: foldedidx:
	; X32: movzbl -3(			; X32: movzbl 400(
	define void @foldedidx(i8* nocapture %a, i8* nocapture %b, i8* nocapture %c) nounwind ssp {			define void @foldedidx(i8* nocapture %a, i8* nocapture %b, i8* nocapture %c) nounwind ssp {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%i.07 = phi i32 [ 0, %entry ], [ %inc.3, %for.body ]			%i.07 = phi i32 [ 0, %entry ], [ %inc.3, %for.body ]
	%arrayidx = getelementptr inbounds i8, i8* %a, i32 %i.07			%arrayidx = getelementptr inbounds i8, i8* %a, i32 %i.07
	%0 = load i8, i8* %arrayidx, align 1			%0 = load i8, i8* %arrayidx, align 1
	▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines
	}			}

	; @testCmpZero has a ICmpZero LSR use that should not be hidden from			; @testCmpZero has a ICmpZero LSR use that should not be hidden from
	; LSR. Profitable chains should have more than one nonzero increment			; LSR. Profitable chains should have more than one nonzero increment
	; anyway.			; anyway.
	;			;
	; X32: @testCmpZero			; X32: @testCmpZero
	; X32: %for.body82.us			; X32: %for.body82.us
	; X32: dec			; X32: cmp
	; X32: jne			; X32: jne
	define void @testCmpZero(i8* %src, i8* %dst, i32 %srcidx, i32 %dstidx, i32 %len) nounwind ssp {			define void @testCmpZero(i8* %src, i8* %dst, i32 %srcidx, i32 %dstidx, i32 %len) nounwind ssp {
	entry:			entry:
	%dest0 = getelementptr inbounds i8, i8* %src, i32 %srcidx			%dest0 = getelementptr inbounds i8, i8* %src, i32 %srcidx
	%source0 = getelementptr inbounds i8, i8* %dst, i32 %dstidx			%source0 = getelementptr inbounds i8, i8* %dst, i32 %dstidx
	%add.ptr79.us.sum = add i32 %srcidx, %len			%add.ptr79.us.sum = add i32 %srcidx, %len
	%lftr.limit = getelementptr i8, i8* %src, i32 %add.ptr79.us.sum			%lftr.limit = getelementptr i8, i8* %src, i32 %add.ptr79.us.sum
	br label %for.body82.us			br label %for.body82.us
	Show All 16 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Add Instruction number to LSR cost model (PR23384)AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 81165

include/llvm/Analysis/TargetTransformInfo.h

include/llvm/Analysis/TargetTransformInfoImpl.h

include/llvm/CodeGen/BasicTTIImpl.h

lib/Analysis/TargetTransformInfo.cpp

lib/Target/X86/X86TargetTransformInfo.h

lib/Target/X86/X86TargetTransformInfo.cpp

lib/Transforms/Scalar/LoopStrengthReduce.cpp

test/CodeGen/X86/2006-05-11-InstrSched.ll

test/CodeGen/X86/atom-fixup-lea3.ll

test/CodeGen/X86/avoid_complex_am.ll

test/CodeGen/X86/compact-unwind.ll

test/CodeGen/X86/full-lsr.ll

test/CodeGen/X86/loop-strength-reduce4.ll

test/CodeGen/X86/masked-iv-safe.ll

test/CodeGen/X86/misched-matrix.ll

test/Transforms/LoopStrengthReduce/X86/ivchain-X86.ll

Add Instruction number to LSR cost model (PR23384)
AbandonedPublic