This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/Analysis/
-
llvm/
-
Analysis/
1/2
TargetTransformInfo.h
7/10
TargetTransformInfoImpl.h
-
lib/
-
Analysis/
-
TargetTransformInfo.cpp
-
Target/X86/
-
X86/
2
X86TargetTransformInfo.h
1/3
X86TargetTransformInfo.cpp

Differential D37051

Model cache size and associativity in TargetTransformInfo
ClosedPublic

Authored by grosser on Aug 23 2017, 12:44 AM.

Download Raw Diff

Details

Reviewers

Meinersbur
bollu
singam-sanjay
hfinkel
gareevroman
fhahn
sebpop
efriedma
asb

Commits

rGd7eb61929929: Model cache size and associativity in TargetTransformInfo
rL311647: Model cache size and associativity in TargetTransformInfo

Summary

We add the precise cache sizes and associativity for the following Intel
architectures:

Penry
Nehalem
Westmere
Sandy Bridge
Ivy Bridge
Haswell
Broadwell
Skylake
Kabylake

Polly uses since several months a performance model for BLAS computations that
derives optimal cache and register tile sizes from cache and latency
information (based on ideas from "Analytical Modeling Is Enough for High-Performance BLIS", by Tze Meng Low published at TOMS 2016).
While bootstrapping this model, these target values have been kept in Polly.
However, as our implementation is now rather mature, it seems time to teach
LLVM itself about cache sizes.

Interestingly, L1 and L2 cache sizes are pretty constant across
micro-architectures, hence a set of architecture specific default values
seems like a good start. They can be expanded to more target specific values,
in case certain newer architectures require different values. For now a set
of Intel architectures are provided.

Just as a little teaser, for a simple gemm kernel this model allows us to
improve performance from 1.2s to 0.27s. For gemm kernels with less optimal
memory layouts even larger speedups can be reported.

Diff Detail

Build Status

Buildable 9537
Build 9537: arc lint + arc unit

Event Timeline

grosser created this revision.Aug 23 2017, 12:44 AM

Hi Florian, hi Sebastian, Eli,

I added you in case you also would like to provide feedback.

Best,
Tobias

information (based on ideas from "Analytical Models for the BLIS Framework").

Could you add the reference to the paper to the summary?

I think throughput and latency of vector fma instructions are pretty constant across micro-architectures too. Can we also add them?

Sorry, probably, it’d require to specify it for each architecture.

asb added a subscriber: asb.Aug 23 2017, 1:07 AM

asb added inline comments.

include/llvm/Analysis/TargetTransformInfo.h
607	It's conceivable that targets might want to differentiate between the I-cache and D-cache. e.g an embedded target may use info on the I-cache to guide custom code layout optimisations. Is it worth starting with CL_L1D and CL_L2D instead?
include/llvm/Analysis/TargetTransformInfoImpl.h
348	Either the comment or the calculation is incorrect!
lib/Target/X86/X86TargetTransformInfo.cpp
92	Either the comment or the calculation is incorrect!
lib/Target/X86/X86TargetTransformInfo.h
49	Why not implement getCacheAssociativity too?

Incorporate asb's comments.

Use 'D' postfix to clarify we talk about the data caches
Fix typo in cache size
Implement associativity for X86

Harbormaster completed remote builds in B9537: Diff 112304.Aug 23 2017, 1:28 AM

Hi Alex, hi Roman,

thanks for your feedback. I addressed your comments.

Best,
Tobias

include/llvm/Analysis/TargetTransformInfo.h
607	Very good point. I changed this.
include/llvm/Analysis/TargetTransformInfoImpl.h
348	The calculation was wrong. I fixed this.
lib/Target/X86/X86TargetTransformInfo.cpp
92	The calculation. Fixed.
lib/Target/X86/X86TargetTransformInfo.h
49	Added!

Add a newline between two functions

lsaba added a subscriber: lsaba.Aug 23 2017, 1:35 AM

fhahn added inline comments.Aug 23 2017, 3:38 AM

include/llvm/Analysis/TargetTransformInfoImpl.h
343	Maybe it would be safer to be conservative and return 0 here, similar to what getCacheLineSize does currently? That allows passes to check if the target provides accurate information ( != 0). For example, for ARM Cortex cores the level 1 cache size can vary between 0KByte and 64KByte. [1], [2] Also, we already have getCacheLineSize. Would it make sense to express the cache size in terms of cache lines, (eg X * Cache line size)? That would make it slightly easier to keep them in sync and avoid situations where getCacheSize() is not a multiple of getCacheLineSize() [1] https://en.wikipedia.org/wiki/Comparison_of_ARMv8-A_cores [2] https://en.wikipedia.org/wiki/ARM_Cortex-M

asb added inline comments.Aug 23 2017, 3:55 AM

include/llvm/Analysis/TargetTransformInfoImpl.h
343	I wondered about returning 0 too, but also think we need a new sentinel value. This function should be able to indicate 1) cache is known size X, 2) cache is unknown size, 3) cache is known size 0 (e.g. there is no L2).

mcrosier edited reviewers, added: efriedma; removed: eli.friedman.Aug 23 2017, 7:31 AM

Make return values optional to be able to distinguish between unknown cache
size and no cache.

We use Optional in the very same way to indicate clearly that no associativity
has been specified.

In D37051#849798, @gareevroman wrote:

I think throughput and latency of vector fma instructions are pretty constant across micro-architectures too. Can we also add them?

Sorry, probably, it’d require to specify it for each architecture.

We could do this. Some of the backend experts might be able to help you how to do this best. I think we should do this in a separate patch. Care to propose one?

include/llvm/Analysis/TargetTransformInfoImpl.h
343	I changed this to llvm::Optional.

Thanks for the useful comments. I now use optional. I am not convinced regarding the cache size in multiples of cache lines. This seems to always require conversions in case we want the actual size in bytes. At least for us, this is the more common value, I believe.

What do other's think. Size in bytes or in cache lines? In the end it's the same information. (We could also add size in bytes for now and add an assert that checks that the returned result (if any) is always a multiple of the cache line size. If somebody needs this information in cache lines, we can always add a helper method.

Size in bytes makes most sense to me. The first thing any caller is going to do is convert away from cache lines to bytes or kilobytes anyway.

In D37051#850696, @asb wrote:

Size in bytes makes most sense to me. The first thing any caller is going to do is convert away from cache lines to bytes or kilobytes anyway.

Agreed, Returning the number of cache lines as result of getCacheSize wouldn't be a good idea.

I meant using the cache line size in X86TTIImpl::getCacheSize. This was just a minor potential nit, I think either way is fine.

In D37051#850795, @fhahn wrote:

Agreed, Returning the number of cache lines as result of getCacheSize wouldn't be a good idea.

I meant using the cache line size in X86TTIImpl::getCacheSize. This was just a minor potential nit, I think either way is fine.

Ah, sorry for the misunderstanding. I suspect that the cost of potential confusion outweighs the potential benefit of mistakenly setting a cache size that isn't a multiple of the cache line size, but as you say it's a minor thing and either way would work.

OK. Let me know when this is worth an official LGTM.

asb added inline comments.Aug 23 2017, 11:40 PM

include/llvm/Analysis/TargetTransformInfoImpl.h
348	Shouldn't this have something like: ` default: llvm_unreachable("Unknown TargetTransformInfo::CacheLevel")

In D37051#850439, @grosser wrote:

In D37051#849798, @gareevroman wrote:

I think throughput and latency of vector fma instructions are pretty constant across micro-architectures too. Can we also add them?

Sorry, probably, it’d require to specify it for each architecture.

We could do this. Some of the backend experts might be able to help you how to do this best. I think we should do this in a separate patch. Care to propose one?

Yes, I will propose it. Sometimes it can be difficult to find this information, since it's not always publicly available.

Use C++ class enum

Add llvm_unreachables

Addressed Alex comments.

include/llvm/Analysis/TargetTransformInfoImpl.h
348	The switch above is fully covered, but I can add this as a form of defensive programming. Thinking about this, it might also be useful to used typed enums for the cache level. I will update the patch accordingly.

Harbormaster completed remote builds in B9589: Diff 112510.Aug 24 2017, 12:41 AM

fhahn added inline comments.Aug 24 2017, 12:47 AM

include/llvm/Analysis/TargetTransformInfoImpl.h
348	I think using an enum class would be enough. AFAIK, compilers are pretty good at warning about missing cases in switch statements over enum classes. I think there is no need for default, as the compiler would catch missing patterns.

grosser marked an inline comment as done.Aug 24 2017, 12:50 AM

grosser added inline comments.

include/llvm/Analysis/TargetTransformInfoImpl.h
348	Right. Especially clang got very good at this and I think gcc also supports this meanwhile. I did not add a default case on purpose, as this would prevent compiler warnings about uncovered cases. I added a unreachable _after_ the swich, just do document this code is never reached. Most compilers, likely will know this already, but older compilers might warn if they cannot prove the return is unreachable. We help them out here as well.

LGTM, just a minor nit about the C-style fallthrough comment

include/llvm/Analysis/TargetTransformInfoImpl.h
358	Are you using the C-style comment to tell GCC that this fallthrough case is intentional? `LLVM_FALLTHROUGH` provides a slightly more portable and direct way of doing that I think
lib/Target/X86/X86TargetTransformInfo.cpp
112	Same as above

This revision is now accepted and ready to land.Aug 24 2017, 12:57 AM

Use LLVM_FALLTHROUGH

@fhahn, I just addressed your comments.

@asb, I think this is now ready. Could you have a final look?

Looks good to me.

Closed by commit rL311647: Model cache size and associativity in TargetTransformInfo (authored by grosser). · Explain WhyAug 24 2017, 2:47 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

include/

llvm/

Analysis/

TargetTransformInfo.h

24 lines

TargetTransformInfoImpl.h

13 lines

lib/

Analysis/

TargetTransformInfo.cpp

8 lines

Target/

X86/

X86TargetTransformInfo.h

6 lines

X86TargetTransformInfo.cpp

40 lines

Diff 112304

include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 597 Lines • ▼ Show 20 Lines	public:
/// \p AllowPromotionWithoutCommonHeader Set true if promoting \p I is		/// \p AllowPromotionWithoutCommonHeader Set true if promoting \p I is
/// profitable without finding other extensions fed by the same input.		/// profitable without finding other extensions fed by the same input.
bool shouldConsiderAddressTypePromotion(		bool shouldConsiderAddressTypePromotion(
const Instruction &I, bool &AllowPromotionWithoutCommonHeader) const;		const Instruction &I, bool &AllowPromotionWithoutCommonHeader) const;

/// \return The size of a cache line in bytes.		/// \return The size of a cache line in bytes.
unsigned getCacheLineSize() const;		unsigned getCacheLineSize() const;

		/// The possible cache levels
		enum CacheLevel {
		asbUnsubmitted Done Reply Inline Actions It's conceivable that targets might want to differentiate between the I-cache and D-cache. e.g an embedded target may use info on the I-cache to guide custom code layout optimisations. Is it worth starting with CL_L1D and CL_L2D instead? asb: It's conceivable that targets might want to differentiate between the I-cache and D-cache. e.g…
		grosserAuthorUnsubmitted Not Done Reply Inline Actions Very good point. I changed this. grosser: Very good point. I changed this.
		CL_L1D, // The L1 data cache
		CL_L2D, // The L2 data cache

		// We currently do not model L3 caches, as their sizes differ widely between
		// microarchitectures. Also, we currently do not have a use for L3 cache
		// size modeling yet.
		};

		/// \return The size of the cache level in bytes.
		unsigned getCacheSize(CacheLevel Level) const;

		/// \return The associativity of the cache level.
		unsigned getCacheAssociativity(CacheLevel Level) const;

/// \return How much before a load we should place the prefetch instruction.		/// \return How much before a load we should place the prefetch instruction.
/// This is currently measured in number of instructions.		/// This is currently measured in number of instructions.
unsigned getPrefetchDistance() const;		unsigned getPrefetchDistance() const;

/// \return Some HW prefetchers can handle accesses up to a certain constant		/// \return Some HW prefetchers can handle accesses up to a certain constant
/// stride. This is the minimum stride in bytes where it makes sense to start		/// stride. This is the minimum stride in bytes where it makes sense to start
/// adding SW prefetches. The default is 1, i.e. prefetch with any stride.		/// adding SW prefetches. The default is 1, i.e. prefetch with any stride.
unsigned getMinPrefetchStride() const;		unsigned getMinPrefetchStride() const;
▲ Show 20 Lines • Show All 318 Lines • ▼ Show 20 Lines	public:
virtual int getIntImmCost(Intrinsic::ID IID, unsigned Idx, const APInt &Imm,		virtual int getIntImmCost(Intrinsic::ID IID, unsigned Idx, const APInt &Imm,
Type *Ty) = 0;		Type *Ty) = 0;
virtual unsigned getNumberOfRegisters(bool Vector) = 0;		virtual unsigned getNumberOfRegisters(bool Vector) = 0;
virtual unsigned getRegisterBitWidth(bool Vector) const = 0;		virtual unsigned getRegisterBitWidth(bool Vector) const = 0;
virtual unsigned getMinVectorRegisterBitWidth() = 0;		virtual unsigned getMinVectorRegisterBitWidth() = 0;
virtual bool shouldConsiderAddressTypePromotion(		virtual bool shouldConsiderAddressTypePromotion(
const Instruction &I, bool &AllowPromotionWithoutCommonHeader) = 0;		const Instruction &I, bool &AllowPromotionWithoutCommonHeader) = 0;
virtual unsigned getCacheLineSize() = 0;		virtual unsigned getCacheLineSize() = 0;
		virtual unsigned getCacheSize(CacheLevel Level) = 0;
		virtual unsigned getCacheAssociativity(CacheLevel Level) = 0;
virtual unsigned getPrefetchDistance() = 0;		virtual unsigned getPrefetchDistance() = 0;
virtual unsigned getMinPrefetchStride() = 0;		virtual unsigned getMinPrefetchStride() = 0;
virtual unsigned getMaxPrefetchIterationsAhead() = 0;		virtual unsigned getMaxPrefetchIterationsAhead() = 0;
virtual unsigned getMaxInterleaveFactor(unsigned VF) = 0;		virtual unsigned getMaxInterleaveFactor(unsigned VF) = 0;
virtual unsigned		virtual unsigned
getArithmeticInstrCost(unsigned Opcode, Type *Ty, OperandValueKind Opd1Info,		getArithmeticInstrCost(unsigned Opcode, Type *Ty, OperandValueKind Opd1Info,
OperandValueKind Opd2Info,		OperandValueKind Opd2Info,
OperandValueProperties Opd1PropInfo,		OperandValueProperties Opd1PropInfo,
▲ Show 20 Lines • Show All 256 Lines • ▼ Show 20 Lines	public:
bool shouldConsiderAddressTypePromotion(		bool shouldConsiderAddressTypePromotion(
const Instruction &I, bool &AllowPromotionWithoutCommonHeader) override {		const Instruction &I, bool &AllowPromotionWithoutCommonHeader) override {
return Impl.shouldConsiderAddressTypePromotion(		return Impl.shouldConsiderAddressTypePromotion(
I, AllowPromotionWithoutCommonHeader);		I, AllowPromotionWithoutCommonHeader);
}		}
unsigned getCacheLineSize() override {		unsigned getCacheLineSize() override {
return Impl.getCacheLineSize();		return Impl.getCacheLineSize();
}		}
		unsigned getCacheSize(CacheLevel Level) override {
		return Impl.getCacheSize(Level);
		}
		unsigned getCacheAssociativity(CacheLevel Level) override {
		return Impl.getCacheAssociativity(Level);
		}
unsigned getPrefetchDistance() override { return Impl.getPrefetchDistance(); }		unsigned getPrefetchDistance() override { return Impl.getPrefetchDistance(); }
unsigned getMinPrefetchStride() override {		unsigned getMinPrefetchStride() override {
return Impl.getMinPrefetchStride();		return Impl.getMinPrefetchStride();
}		}
unsigned getMaxPrefetchIterationsAhead() override {		unsigned getMaxPrefetchIterationsAhead() override {
return Impl.getMaxPrefetchIterationsAhead();		return Impl.getMaxPrefetchIterationsAhead();
}		}
unsigned getMaxInterleaveFactor(unsigned VF) override {		unsigned getMaxInterleaveFactor(unsigned VF) override {
▲ Show 20 Lines • Show All 252 Lines • Show Last 20 Lines

include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 334 Lines • ▼ Show 20 Lines	public:
shouldConsiderAddressTypePromotion(const Instruction &I,		shouldConsiderAddressTypePromotion(const Instruction &I,
bool &AllowPromotionWithoutCommonHeader) {		bool &AllowPromotionWithoutCommonHeader) {
AllowPromotionWithoutCommonHeader = false;		AllowPromotionWithoutCommonHeader = false;
return false;		return false;
}		}

unsigned getCacheLineSize() { return 0; }		unsigned getCacheLineSize() { return 0; }

		unsigned getCacheSize(TargetTransformInfo::CacheLevel Level) {
		fhahnUnsubmitted Not Done Reply Inline Actions Maybe it would be safer to be conservative and return 0 here, similar to what getCacheLineSize does currently? That allows passes to check if the target provides accurate information ( != 0). For example, for ARM Cortex cores the level 1 cache size can vary between 0KByte and 64KByte. [1], [2] Also, we already have getCacheLineSize. Would it make sense to express the cache size in terms of cache lines, (eg X * Cache line size)? That would make it slightly easier to keep them in sync and avoid situations where getCacheSize() is not a multiple of getCacheLineSize() [1] https://en.wikipedia.org/wiki/Comparison_of_ARMv8-A_cores [2] https://en.wikipedia.org/wiki/ARM_Cortex-M fhahn: Maybe it would be safer to be conservative and return 0 here, similar to what getCacheLineSize…
		asbUnsubmitted Not Done Reply Inline Actions I wondered about returning 0 too, but also think we need a new sentinel value. This function should be able to indicate 1) cache is known size X, 2) cache is unknown size, 3) cache is known size 0 (e.g. there is no L2). asb: I wondered about returning 0 too, but also think we need a new sentinel value. This function…
		grosserAuthorUnsubmitted Done Reply Inline Actions I changed this to llvm::Optional. grosser: I changed this to llvm::Optional.
		switch (Level) {
		case TargetTransformInfo::CL_L1D:
		return 32 * 1024; // 32 KByte
		case TargetTransformInfo::CL_L2D:
		return 256 * 1024; // 256 KByte
		asbUnsubmitted Done Reply Inline Actions Either the comment or the calculation is incorrect! asb: Either the comment or the calculation is incorrect!
		grosserAuthorUnsubmitted Not Done Reply Inline Actions The calculation was wrong. I fixed this. grosser: The calculation was wrong. I fixed this.
		asbUnsubmitted Done Reply Inline Actions Shouldn't this have something like: ` default: llvm_unreachable("Unknown TargetTransformInfo::CacheLevel") asb: Shouldn't this have something like: ```` default: llvm_unreachable("Unknown…
		grosserAuthorUnsubmitted Done Reply Inline Actions The switch above is fully covered, but I can add this as a form of defensive programming. Thinking about this, it might also be useful to used typed enums for the cache level. I will update the patch accordingly. grosser: The switch above is fully covered, but I can add this as a form of defensive programming.
		fhahnUnsubmitted Done Reply Inline Actions I think using an enum class would be enough. AFAIK, compilers are pretty good at warning about missing cases in switch statements over enum classes. I think there is no need for default, as the compiler would catch missing patterns. fhahn: I think using an enum class would be enough. AFAIK, compilers are pretty good at warning about…
		grosserAuthorUnsubmitted Done Reply Inline Actions Right. Especially clang got very good at this and I think gcc also supports this meanwhile. I did not add a default case on purpose, as this would prevent compiler warnings about uncovered cases. I added a unreachable _after_ the swich, just do document this code is never reached. Most compilers, likely will know this already, but older compilers might warn if they cannot prove the return is unreachable. We help them out here as well. grosser: Right. Especially clang got very good at this and I think gcc also supports this meanwhile. I…
		}
		}

		unsigned getCacheAssociativity(TargetTransformInfo::CacheLevel Level) {
		return 8;
		}

unsigned getPrefetchDistance() { return 0; }		unsigned getPrefetchDistance() { return 0; }

unsigned getMinPrefetchStride() { return 1; }		unsigned getMinPrefetchStride() { return 1; }
		fhahnUnsubmitted Done Reply Inline Actions Are you using the C-style comment to tell GCC that this fallthrough case is intentional? `LLVM_FALLTHROUGH` provides a slightly more portable and direct way of doing that I think fhahn: Are you using the C-style comment to tell GCC that this fallthrough case is intentional?

unsigned getMaxPrefetchIterationsAhead() { return UINT_MAX; }		unsigned getMaxPrefetchIterationsAhead() { return UINT_MAX; }

unsigned getMaxInterleaveFactor(unsigned VF) { return 1; }		unsigned getMaxInterleaveFactor(unsigned VF) { return 1; }

unsigned getArithmeticInstrCost(unsigned Opcode, Type *Ty,		unsigned getArithmeticInstrCost(unsigned Opcode, Type *Ty,
TTI::OperandValueKind Opd1Info,		TTI::OperandValueKind Opd1Info,
TTI::OperandValueKind Opd2Info,		TTI::OperandValueKind Opd2Info,
▲ Show 20 Lines • Show All 393 Lines • Show Last 20 Lines

lib/Analysis/TargetTransformInfo.cpp

Show First 20 Lines • Show All 315 Lines • ▼ Show 20 Lines	bool TargetTransformInfo::shouldConsiderAddressTypePromotion(
return TTIImpl->shouldConsiderAddressTypePromotion(		return TTIImpl->shouldConsiderAddressTypePromotion(
I, AllowPromotionWithoutCommonHeader);		I, AllowPromotionWithoutCommonHeader);
}		}

unsigned TargetTransformInfo::getCacheLineSize() const {		unsigned TargetTransformInfo::getCacheLineSize() const {
return TTIImpl->getCacheLineSize();		return TTIImpl->getCacheLineSize();
}		}

		unsigned TargetTransformInfo::getCacheSize(CacheLevel Level) const {
		return TTIImpl->getCacheSize(Level);
		}

		unsigned TargetTransformInfo::getCacheAssociativity(CacheLevel Level) const {
		return TTIImpl->getCacheAssociativity(Level);
		}

unsigned TargetTransformInfo::getPrefetchDistance() const {		unsigned TargetTransformInfo::getPrefetchDistance() const {
return TTIImpl->getPrefetchDistance();		return TTIImpl->getPrefetchDistance();
}		}

unsigned TargetTransformInfo::getMinPrefetchStride() const {		unsigned TargetTransformInfo::getMinPrefetchStride() const {
return TTIImpl->getMinPrefetchStride();		return TTIImpl->getMinPrefetchStride();
}		}

▲ Show 20 Lines • Show All 284 Lines • Show Last 20 Lines

lib/Target/X86/X86TargetTransformInfo.h

Show All 40 Lines	explicit X86TTIImpl(const X86TargetMachine *TM, const Function &F)
: BaseT(TM, F.getParent()->getDataLayout()), ST(TM->getSubtargetImpl(F)),		: BaseT(TM, F.getParent()->getDataLayout()), ST(TM->getSubtargetImpl(F)),
TLI(ST->getTargetLowering()) {}		TLI(ST->getTargetLowering()) {}

/// \name Scalar TTI Implementations		/// \name Scalar TTI Implementations
/// @{		/// @{
TTI::PopcntSupportKind getPopcntSupport(unsigned TyWidth);		TTI::PopcntSupportKind getPopcntSupport(unsigned TyWidth);

/// @}		/// @}

		asbUnsubmitted Not Done Reply Inline Actions Why not implement getCacheAssociativity too? asb: Why not implement getCacheAssociativity too?
		grosserAuthorUnsubmitted Not Done Reply Inline Actions Added! grosser: Added!
		/// \name Cache TTI Implementation
		/// @{
		unsigned getCacheSize(TargetTransformInfo::CacheLevel Level) const;
		unsigned getCacheAssociativity(TargetTransformInfo::CacheLevel Level) const;
		/// @}

/// \name Vector TTI Implementations		/// \name Vector TTI Implementations
/// @{		/// @{

unsigned getNumberOfRegisters(bool Vector);		unsigned getNumberOfRegisters(bool Vector);
unsigned getRegisterBitWidth(bool Vector) const;		unsigned getRegisterBitWidth(bool Vector) const;
unsigned getLoadStoreVecRegBitWidth(unsigned AS) const;		unsigned getLoadStoreVecRegBitWidth(unsigned AS) const;
unsigned getMaxInterleaveFactor(unsigned VF);		unsigned getMaxInterleaveFactor(unsigned VF);
int getArithmeticInstrCost(		int getArithmeticInstrCost(
▲ Show 20 Lines • Show All 74 Lines • Show Last 20 Lines

lib/Target/X86/X86TargetTransformInfo.cpp

	Show First 20 Lines • Show All 60 Lines • ▼ Show 20 Lines
	X86TTIImpl::getPopcntSupport(unsigned TyWidth) {			X86TTIImpl::getPopcntSupport(unsigned TyWidth) {
	assert(isPowerOf2_32(TyWidth) && "Ty width must be power of 2");			assert(isPowerOf2_32(TyWidth) && "Ty width must be power of 2");
	// TODO: Currently the __builtin_popcount() implementation using SSE3			// TODO: Currently the __builtin_popcount() implementation using SSE3
	// instructions is inefficient. Once the problem is fixed, we should			// instructions is inefficient. Once the problem is fixed, we should
	// call ST->hasSSE3() instead of ST->hasPOPCNT().			// call ST->hasSSE3() instead of ST->hasPOPCNT().
	return ST->hasPOPCNT() ? TTI::PSK_FastHardware : TTI::PSK_Software;			return ST->hasPOPCNT() ? TTI::PSK_FastHardware : TTI::PSK_Software;
	}			}

				unsigned X86TTIImpl::getCacheSize(TargetTransformInfo::CacheLevel Level) const {
				switch (Level) {
				case TargetTransformInfo::CL_L1D:
				// - Penry
				// - Nehalem
				// - Westmere
				// - Sandy Bridge
				// - Ivy Bridge
				// - Haswell
				// - Broadwell
				// - Skylake
				// - Kabylake
				return 32 * 1024; // 32 KByte
				case TargetTransformInfo::CL_L2D:
				// - Penry
				// - Nehalem
				// - Westmere
				// - Sandy Bridge
				// - Ivy Bridge
				// - Haswell
				// - Broadwell
				// - Skylake
				// - Kabylake
				return 256 * 1024; // 256 KByte
				asbUnsubmitted Not Done Reply Inline Actions Either the comment or the calculation is incorrect! asb: Either the comment or the calculation is incorrect!
				grosserAuthorUnsubmitted Not Done Reply Inline Actions The calculation. Fixed. grosser: The calculation. Fixed.
				}
				}
				unsigned X86TTIImpl::getCacheAssociativity(
				TargetTransformInfo::CacheLevel Level) const {
				// - Penry
				// - Nehalem
				// - Westmere
				// - Sandy Bridge
				// - Ivy Bridge
				// - Haswell
				// - Broadwell
				// - Skylake
				// - Kabylake
				return 8;
				}

	unsigned X86TTIImpl::getNumberOfRegisters(bool Vector) {			unsigned X86TTIImpl::getNumberOfRegisters(bool Vector) {
	if (Vector && !ST->hasSSE1())			if (Vector && !ST->hasSSE1())
	return 0;			return 0;

				fhahnUnsubmitted Done Reply Inline Actions Same as above fhahn: Same as above
	if (ST->is64Bit()) {			if (ST->is64Bit()) {
	if (Vector && ST->hasAVX512())			if (Vector && ST->hasAVX512())
	return 32;			return 32;
	return 16;			return 16;
	}			}
	return 8;			return 8;
	}			}

	▲ Show 20 Lines • Show All 2,488 Lines • Show Last 20 Lines