This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/X86/
-
Target/
-
X86/
2/2
X86.td
4/6
X86Subtarget.h
4/12
X86Subtarget.cpp

Differential D35348

Adding all X86 Processor families which can help initializing several uArch properties
ClosedPublic

Authored by magabari on Jul 13 2017, 4:08 AM.

Download Raw Diff

Details

Reviewers

delena
zvi
craig.topper
RKSimon
aaboud
dorit

Commits

rGe9aebf26af9c: [X86] Adding X86 Processor Families
rL313132: [X86] Adding X86 Processor Families

Summary

Adding all x86 Processor families to initialize several uArch properties (based on the family)
this shows how gather cost can be initialized based on the proc. family

Diff Detail

Event Timeline

magabari created this revision.Jul 13 2017, 4:08 AM

magabari updated this revision to Diff 106400.Jul 13 2017, 4:21 AM

I really don't like this approach - we've been getting closer and closer to killing off X86ProcFamilyEnum for years now and move to a mixture of feature bits and scheduler cost driven decisions that works in a more general manner.

Assuming you intend to use getGatherOverhead() inside X86TTIImpl::getGatherScatterOpCost, might not it be better to just provide a Fast/Slow Gather feature bit, or (even better) finally start trying to use the scheduler model data within X86TTIImpl ?

I agree with you. We also considered using scheduling model (or any other target micro-architecture data base) from TTI. It requires detailed design and enters to our long term plans.
We already started feeding micro-architecture details to LLVM, step-by-step, and it will take a time to complete.
The cost, that is kept in X86TargetTransformInfo.cpp should also be moved to .td file and it will require a special interface for communication between IR and MachineInstructions.
Using processorFamily as a property allows tuning compiler for a processor meanwhile, otherwise we just tie all patches like "gather-enabling" in one chain, waiting until all .td files are entered.

At the very least: This is adding a ton of unused and unneeded details. If we end up needing to dispatch on processor family, then maybe we can add (some of) these.
Otherwise, I don't think it's reasonable to add all these enum values.

lib/Target/X86/X86Subtarget.cpp
273	`GatherOverhead` is `unsigned`. Why set it to `INT_MAX`?
274	This could just be encoded in a `bool ShouldUseGather`/`ShouldAvoidGather` or similar. What does having a non-boolean get us now*? Why `2`? I'm ok with getting a `bool` now and changing it to `int`/`unsigned`/`whatever` in the future, but only if we need to.
lib/Target/X86/X86Subtarget.h
529	What's a "suitable property", btw?
532	Stray change.

delena added inline comments.Jul 13 2017, 9:27 AM

lib/Target/X86/X86Subtarget.cpp
274	Gather is available since Haswell (AVX2 set). So technically, we can generate Gathers on all AVX2 processors. But the overhead on HSW is high, Mohamed put int_max, it will be replaced later with another number. Skylake Client processor has faster Gathers than HSW and performance is similar to Skylake Server (AVX-512). The specified overhead is relative to the Load operation. "2" is the number provided by Intel architects, we are already using it to calculate GS cost. I want to say that "GSOverhead = 2" we have today inside X86TTI. The problem, that TTI can't distinguish between HSW and SKL. They both have Gather, but the cost is different. I assume that TTI should provide a cost for Gather, for all AVX2 and AVX-512 processors, but again, it is not one value for all. Vectorizer compares this cost with scalar and interleave variants and chooses the best solution.

If we were to go the route of adding a feature bit for every CPU. I wonder if we should just convert the CPU name into an enum in the target independent subtarget classes. Going through a feature bit to create an enum is just making the number of feature bits larger. Feature bits are stored in a std::bitset that last I checked was 160 bits and is stored in some of the tablegen generated data structures. Due to limitations in various place this max size of 160 is determined by the target with the most feature bits. I believe it got as high as it is because ARM or AArch64 has added a feature bit per CPU similar to this patch.

Another option may be to allow an enum value to be passed as a separate input to ProcModel that would get stored in a separate field in the target independent subtarget. This would also avoid the feature bit pressure.

I still don't get why this can't be achieved with a single 'FeatureFastGather' feature bit, just used by Skylake CPU target - then X86TTIImpl::getGatherScatterOpCost can just call ST->hasFastGather() to decide whether to use gathers or not.

Or if you want to keep the overhead as a cost multiplier you can do something like:

unsigned getGatherOverhead() const { return HasFastGather ? 2 : UINT_MAX; }

These would be much easier to remove when you've had time to sort out the scheduler models.

In D35348#808219, @RKSimon wrote:
I still don't get why this can't be achieved with a single 'FeatureFastGather' feature bit, just used by Skylake CPU target - then X86TTIImpl::getGatherScatterOpCost can just call ST->hasFastGather() to decide whether to use gathers or not.

Or if you want to keep the overhead as a cost multiplier you can do something like:
unsigned getGatherOverhead() const { return HasFastGather ? 2 : UINT_MAX; }
These would be much easier to remove when you've had time to sort out the scheduler models.

In order to set different numbers to different mach. Each processor that supports Gather should be able to provide a number, not only /NoGather/SlowGather/FastGather.

In D35348#808263, @delena wrote:

In order to set different numbers to different mach. Each processor that supports Gather should be able to provide a number, not only /NoGather/SlowGather/FastGather.

So how many numbers are we talking about? Is it just Skylake, SKX and Cannonlake? (KNL?) I guess we don't want to handle weaker AVX2 CPUs so we don't need a feature bit for those (default getGatherOverhead() to UINT_MAX)

Whatever happens, almost all those new processor enums are not required.

igorb added a subscriber: igorb.Jul 13 2017, 12:51 PM

removing unnecessary processor
fixing uint issue

lib/Target/X86/X86Subtarget.h
529	it depends in usage of the isAtom\isSLM. for example in X86TargetTransformInfo.cpp::getMaxInterleaveFactor the property will be getMaxInterleaveFactor

Before you continue with this, is there any chance that you can create a phab for the follow on patches that are dependent on this please? To get a better idea of what you are needing this for.

lib/Target/X86/X86Subtarget.cpp
269	You got to all that trouble of adding the other intel cpus and then just need IntelSkylake and IntelSKX?

delena added inline comments.Jul 19 2017, 6:34 AM

lib/Target/X86/X86Subtarget.cpp
270	Mohamed, you have to add ScatterOverhead for SKX.

magabari added inline comments.Jul 22 2017, 11:40 PM

lib/Target/X86/X86Subtarget.cpp
269	This is only one property. even HSW has Gather and later will add it's overhead (currently we dont recommend generating gathers for HSW). There is other properties that will need to return different values based on the family. so my patch just getting the infrastructure ready for this. I will upload my patch soon, it uses the "getGatherOverhead" property from the subtarget.

adding ScatterOverhead and updating the max overhead.
max overhead can't be MAX_INT or MAX_UINT because it will overflow on the CM calculation and causing a wrong decision.

delena added inline comments.Jul 23 2017, 4:44 AM

lib/Target/X86/X86Subtarget.h
62	You can use Others instead of Generic.

replacing Generic with Others

craig.topper added inline comments.Jul 24 2017, 10:34 PM

lib/Target/X86/X86Subtarget.cpp
267	Why can't this just be done by comparing the CPU string? What did we get by creating an enum that is named the same as the CPU string?

delena added inline comments.Jul 25 2017, 10:42 AM

lib/Target/X86/X86.td
360	You don't need Others here, I think.
412	Why do you need this change?

magabari marked 2 inline comments as done.Jul 27 2017, 12:54 AM

magabari added inline comments.

lib/Target/X86/X86Subtarget.cpp
267	X86ProcFamily enum was already exist and we have just added more values. In general I think it's better to parse the CPU strings in one place and after that to start using the enum values instead of comparing the strings all the time.

magabari updated this revision to Diff 108429.Jul 27 2017, 12:55 AM

ping

ping ping

magabari added a reviewer: dorit.Aug 14 2017, 3:46 AM

RKSimon added inline comments.Aug 14 2017, 3:56 AM

lib/Target/X86/X86Subtarget.h
74	Do this in X86Subtarget::initializeEnvironment() ?
346	Do this in X86Subtarget::initializeEnvironment() ?

magabari updated this revision to Diff 111943.Aug 21 2017, 3:49 AM

magabari marked 2 inline comments as done.

grosser added a subscriber: grosser.Aug 26 2017, 12:09 AM

Simon, could you please take a look on the changes and see if its okay now?

delena added inline comments.Aug 27 2017, 5:11 AM

lib/Target/X86/X86Subtarget.cpp
20	Gather is available since Haswell (AVX2 set). So technically, we can generate Gathers on all AVX2 processors. But the overhead on HSW is high. Skylake Client processor has faster Gathers than HSW and performance is similar to Skylake Server (AVX-512). The specified overhead is relative to the Load operation. "2" is the number provided by Intel architects, we are already using it to calculate GS cost. if (X86ProcFamily == IntelSkylake \|\| hasAVX512) GatherOverhead = 2; if (hasAVX512) // SKX and KNL fail here ScatterOverhead = 2;
23	Please remove.
39	Please remove the "else", the values are already initialized.

aymanmus added a subscriber: aymanmus.Aug 31 2017, 12:55 AM

fixing Elena comments

magabari added a comment.Aug 31 2017, 1:05 AM

This comment was removed by magabari.

LGTM + a minor comment fix

lib/Target/X86/X86Subtarget.cpp
24	*we are already using it to calculate GS cost -> This parameter is used for cost estimation of Gather Op and comparison with other alternatives.

This revision is now accepted and ready to land.Aug 31 2017, 1:31 AM

fixed comment

In D35348#853420, @magabari wrote:

Simon, could you please take a look on the changes and see if its okay now?

LGTM - I still think this patch only needs to add IntelSkylake but it's a lot better than it was.

Closed by commit rL313132: [X86] Adding X86 Processor Families (authored by magabari). · Explain WhySep 13 2017, 2:02 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Target/

X86/

	X86.td
	X86.td (revision 307753)

89 lines

	X86Subtarget.h
	X86Subtarget.h (revision 307753)

20 lines

	X86Subtarget.cpp
	X86Subtarget.cpp (revision 307753)

18 lines

Diff 107853

lib/Target/X86/X86.td

Show First 20 Lines • Show All 290 Lines • ▼ Show 20 Lines	: SubtargetFeature<
"REP MOVS/STOS are fast">;		"REP MOVS/STOS are fast">;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// X86 processors supported.		// X86 processors supported.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

include "X86Schedule.td"		include "X86Schedule.td"

		def ProcIntelOthers : SubtargetFeature<"generic", "X86ProcFamily",
		"Others", "Intel Generic processors">;
def ProcIntelAtom : SubtargetFeature<"atom", "X86ProcFamily", "IntelAtom",		def ProcIntelAtom : SubtargetFeature<"atom", "X86ProcFamily", "IntelAtom",
"Intel Atom processors">;		"Intel Atom processors">;
def ProcIntelSLM : SubtargetFeature<"slm", "X86ProcFamily", "IntelSLM",		def ProcIntelSLM : SubtargetFeature<"slm", "X86ProcFamily", "IntelSLM",
"Intel Silvermont processors">;		"Intel Silvermont processors">;
def ProcIntelGLM : SubtargetFeature<"glm", "X86ProcFamily", "IntelGLM",		def ProcIntelGLM : SubtargetFeature<"glm", "X86ProcFamily", "IntelGLM",
"Intel Goldmont processors">;		"Intel Goldmont processors">;
		def ProcIntelHSW : SubtargetFeature<"haswell", "X86ProcFamily",
		"IntelHaswell", "Intel Haswell processors">;
		def ProcIntelBDW : SubtargetFeature<"broadwell", "X86ProcFamily",
		"IntelBroadwell", "Intel Broadwell processors">;
		def ProcIntelSKL : SubtargetFeature<"skylake", "X86ProcFamily",
		"IntelSkylake", "Intel Skylake processors">;
		def ProcIntelKNL : SubtargetFeature<"knl", "X86ProcFamily",
		"IntelKNL", "Intel Knights Landing processors">;
		def ProcIntelSKX : SubtargetFeature<"skx", "X86ProcFamily",
		"IntelSKX", "Intel Skylake Server processors">;
		def ProcIntelCNL : SubtargetFeature<"cannonlake", "X86ProcFamily",
		"IntelCannonlake", "Intel Cannonlake processors">;

class Proc<string Name, list<SubtargetFeature> Features>		class Proc<string Name, list<SubtargetFeature> Features>
: ProcessorModel<Name, GenericModel, Features>;		: ProcessorModel<Name, GenericModel, Features>;

def : Proc<"generic", [FeatureX87, FeatureSlowUAMem16]>;		def : Proc<"generic", [ProcIntelOthers, FeatureX87,
def : Proc<"i386", [FeatureX87, FeatureSlowUAMem16]>;		FeatureSlowUAMem16]>;
def : Proc<"i486", [FeatureX87, FeatureSlowUAMem16]>;		def : Proc<"i386", [ProcIntelOthers, FeatureX87,
def : Proc<"i586", [FeatureX87, FeatureSlowUAMem16]>;		FeatureSlowUAMem16]>;
def : Proc<"pentium", [FeatureX87, FeatureSlowUAMem16]>;		def : Proc<"i486", [ProcIntelOthers, FeatureX87,
def : Proc<"pentium-mmx", [FeatureX87, FeatureSlowUAMem16, FeatureMMX]>;		FeatureSlowUAMem16]>;
def : Proc<"i686", [FeatureX87, FeatureSlowUAMem16]>;		def : Proc<"i586", [ProcIntelOthers, FeatureX87,
def : Proc<"pentiumpro", [FeatureX87, FeatureSlowUAMem16, FeatureCMOV]>;		FeatureSlowUAMem16]>;
def : Proc<"pentium2", [FeatureX87, FeatureSlowUAMem16, FeatureMMX,		def : Proc<"pentium", [ProcIntelOthers, FeatureX87,
		FeatureSlowUAMem16]>;
		def : Proc<"pentium-mmx", [ProcIntelOthers, FeatureX87,
		FeatureSlowUAMem16, FeatureMMX]>;
		def : Proc<"i686", [ProcIntelOthers, FeatureX87,
		FeatureSlowUAMem16]>;
		def : Proc<"pentiumpro", [ProcIntelOthers, FeatureX87,
		FeatureSlowUAMem16, FeatureCMOV]>;
		def : Proc<"pentium2", [ProcIntelOthers, FeatureX87,
		FeatureSlowUAMem16, FeatureMMX,
FeatureCMOV, FeatureFXSR]>;		FeatureCMOV, FeatureFXSR]>;
def : Proc<"pentium3", [FeatureX87, FeatureSlowUAMem16, FeatureMMX,		def : Proc<"pentium3", [ProcIntelOthers, FeatureX87,
		FeatureSlowUAMem16, FeatureMMX,
FeatureSSE1, FeatureFXSR]>;		FeatureSSE1, FeatureFXSR]>;
def : Proc<"pentium3m", [FeatureX87, FeatureSlowUAMem16, FeatureMMX,		def : Proc<"pentium3m", [ProcIntelOthers, FeatureX87,
		FeatureSlowUAMem16, FeatureMMX,
FeatureSSE1, FeatureFXSR, FeatureSlowBTMem]>;		FeatureSSE1, FeatureFXSR, FeatureSlowBTMem]>;

// Enable the PostRAScheduler for SSE2 and SSE3 class cpus.		// Enable the PostRAScheduler for SSE2 and SSE3 class cpus.
// The intent is to enable it for pentium4 which is the current default		// The intent is to enable it for pentium4 which is the current default
// processor in a vanilla 32-bit clang compilation when no specific		// processor in a vanilla 32-bit clang compilation when no specific
// architecture is specified. This generally gives a nice performance		// architecture is specified. This generally gives a nice performance
// increase on silvermont, with largely neutral behavior on other		// increase on silvermont, with largely neutral behavior on other
// contemporary large core processors.		// contemporary large core processors.
// pentium-m, pentium4m, prescott and nocona are included as a preventative		// pentium-m, pentium4m, prescott and nocona are included as a preventative
// measure to avoid performance surprises, in case clang's default cpu		// measure to avoid performance surprises, in case clang's default cpu
// changes slightly.		// changes slightly.

def : ProcessorModel<"pentium-m", GenericPostRAModel,		def : ProcessorModel<"pentium-m", GenericPostRAModel,
[FeatureX87, FeatureSlowUAMem16, FeatureMMX,		[ProcIntelOthers, FeatureX87,
		delenaUnsubmitted Done Reply Inline Actions You don't need Others here, I think. delena: You don't need Others here, I think.
		FeatureSlowUAMem16, FeatureMMX,
FeatureSSE2, FeatureFXSR, FeatureSlowBTMem]>;		FeatureSSE2, FeatureFXSR, FeatureSlowBTMem]>;

def : ProcessorModel<"pentium4", GenericPostRAModel,		def : ProcessorModel<"pentium4", GenericPostRAModel,
[FeatureX87, FeatureSlowUAMem16, FeatureMMX,		[ProcIntelOthers, FeatureX87,
		FeatureSlowUAMem16, FeatureMMX,
FeatureSSE2, FeatureFXSR]>;		FeatureSSE2, FeatureFXSR]>;

def : ProcessorModel<"pentium4m", GenericPostRAModel,		def : ProcessorModel<"pentium4m", GenericPostRAModel,
[FeatureX87, FeatureSlowUAMem16, FeatureMMX,		[ProcIntelOthers, FeatureX87,
		FeatureSlowUAMem16, FeatureMMX,
FeatureSSE2, FeatureFXSR, FeatureSlowBTMem]>;		FeatureSSE2, FeatureFXSR, FeatureSlowBTMem]>;

// Intel Quark.		// Intel Quark.
def : Proc<"lakemont", []>;		def : Proc<"lakemont", [ProcIntelOthers]>;

// Intel Core Duo.		// Intel Core Duo.
def : ProcessorModel<"yonah", SandyBridgeModel,		def : ProcessorModel<"yonah", SandyBridgeModel,
[FeatureX87, FeatureSlowUAMem16, FeatureMMX, FeatureSSE3,		[ProcIntelOthers, FeatureX87,
		FeatureSlowUAMem16, FeatureMMX, FeatureSSE3,
FeatureFXSR, FeatureSlowBTMem]>;		FeatureFXSR, FeatureSlowBTMem]>;

// NetBurst.		// NetBurst.
def : ProcessorModel<"prescott", GenericPostRAModel,		def : ProcessorModel<"prescott", GenericPostRAModel,
[FeatureX87, FeatureSlowUAMem16, FeatureMMX, FeatureSSE3,		[ProcIntelOthers, FeatureX87,
		FeatureSlowUAMem16, FeatureMMX, FeatureSSE3,
FeatureFXSR, FeatureSlowBTMem]>;		FeatureFXSR, FeatureSlowBTMem]>;
def : ProcessorModel<"nocona", GenericPostRAModel, [		def : ProcessorModel<"nocona", GenericPostRAModel, [
		ProcIntelOthers,
FeatureX87,		FeatureX87,
FeatureSlowUAMem16,		FeatureSlowUAMem16,
FeatureMMX,		FeatureMMX,
FeatureSSE3,		FeatureSSE3,
FeatureFXSR,		FeatureFXSR,
FeatureCMPXCHG16B,		FeatureCMPXCHG16B,
FeatureSlowBTMem		FeatureSlowBTMem
]>;		]>;

// Intel Core 2 Solo/Duo.		// Intel Core 2 Solo/Duo.
def : ProcessorModel<"core2", SandyBridgeModel, [		def : ProcessorModel<"core2", SandyBridgeModel, [
		ProcIntelOthers,
FeatureX87,		FeatureX87,
FeatureSlowUAMem16,		FeatureSlowUAMem16,
FeatureMMX,		FeatureMMX,
FeatureSSSE3,		FeatureSSSE3,
FeatureFXSR,		FeatureFXSR,
FeatureCMPXCHG16B,		FeatureCMPXCHG16B,
FeatureSlowBTMem,		FeatureSlowBTMem,
FeatureLAHFSAHF		FeatureLAHFSAHF
]>;		]>;
def : ProcessorModel<"penryn", SandyBridgeModel, [		def : ProcessorModel<"penryn", SandyBridgeModel, [
		ProcIntelOthers,
		delenaUnsubmitted Done Reply Inline Actions Why do you need this change? delena: Why do you need this change?
FeatureX87,		FeatureX87,
FeatureSlowUAMem16,		FeatureSlowUAMem16,
FeatureMMX,		FeatureMMX,
FeatureSSE41,		FeatureSSE41,
FeatureFXSR,		FeatureFXSR,
FeatureCMPXCHG16B,		FeatureCMPXCHG16B,
FeatureSlowBTMem,		FeatureSlowBTMem,
FeatureLAHFSAHF		FeatureLAHFSAHF
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	class GoldmontProc<string Name> : ProcessorModel<Name, SLMModel, [
FeatureXSAVEC,		FeatureXSAVEC,
FeatureXSAVES,		FeatureXSAVES,
FeatureCLFLUSHOPT		FeatureCLFLUSHOPT
]>;		]>;
def : GoldmontProc<"goldmont">;		def : GoldmontProc<"goldmont">;

// "Arrandale" along with corei3 and corei5		// "Arrandale" along with corei3 and corei5
class NehalemProc<string Name> : ProcessorModel<Name, SandyBridgeModel, [		class NehalemProc<string Name> : ProcessorModel<Name, SandyBridgeModel, [
		ProcIntelOthers,
FeatureX87,		FeatureX87,
FeatureMMX,		FeatureMMX,
FeatureSSE42,		FeatureSSE42,
FeatureFXSR,		FeatureFXSR,
FeatureCMPXCHG16B,		FeatureCMPXCHG16B,
FeatureSlowBTMem,		FeatureSlowBTMem,
FeaturePOPCNT,		FeaturePOPCNT,
FeatureLAHFSAHF		FeatureLAHFSAHF
]>;		]>;
def : NehalemProc<"nehalem">;		def : NehalemProc<"nehalem">;
def : NehalemProc<"corei7">;		def : NehalemProc<"corei7">;

// Westmere is a similar machine to nehalem with some additional features.		// Westmere is a similar machine to nehalem with some additional features.
// Westmere is the corei3/i5/i7 path from nehalem to sandybridge		// Westmere is the corei3/i5/i7 path from nehalem to sandybridge
class WestmereProc<string Name> : ProcessorModel<Name, SandyBridgeModel, [		class WestmereProc<string Name> : ProcessorModel<Name, SandyBridgeModel, [
		ProcIntelOthers,
FeatureX87,		FeatureX87,
FeatureMMX,		FeatureMMX,
FeatureSSE42,		FeatureSSE42,
FeatureFXSR,		FeatureFXSR,
FeatureCMPXCHG16B,		FeatureCMPXCHG16B,
FeatureSlowBTMem,		FeatureSlowBTMem,
FeaturePOPCNT,		FeaturePOPCNT,
FeatureAES,		FeatureAES,
Show All 29 Lines	def SNBFeatures : ProcessorFeatures<[], [
FeatureLAHFSAHF,		FeatureLAHFSAHF,
FeatureSlow3OpsLEA,		FeatureSlow3OpsLEA,
FeatureFastScalarFSQRT,		FeatureFastScalarFSQRT,
FeatureFastSHLDRotate		FeatureFastSHLDRotate
]>;		]>;

class SandyBridgeProc<string Name> : ProcModel<Name, SandyBridgeModel,		class SandyBridgeProc<string Name> : ProcModel<Name, SandyBridgeModel,
SNBFeatures.Value, [		SNBFeatures.Value, [
		ProcIntelOthers,
FeatureSlowBTMem,		FeatureSlowBTMem,
FeatureSlowUAMem32		FeatureSlowUAMem32
]>;		]>;
def : SandyBridgeProc<"sandybridge">;		def : SandyBridgeProc<"sandybridge">;
def : SandyBridgeProc<"corei7-avx">; // Legacy alias.		def : SandyBridgeProc<"corei7-avx">; // Legacy alias.

def IVBFeatures : ProcessorFeatures<SNBFeatures.Value, [		def IVBFeatures : ProcessorFeatures<SNBFeatures.Value, [
FeatureRDRAND,		FeatureRDRAND,
FeatureF16C,		FeatureF16C,
FeatureFSGSBase		FeatureFSGSBase
]>;		]>;

class IvyBridgeProc<string Name> : ProcModel<Name, SandyBridgeModel,		class IvyBridgeProc<string Name> : ProcModel<Name, SandyBridgeModel,
IVBFeatures.Value, [		IVBFeatures.Value, [
		ProcIntelOthers,
FeatureSlowBTMem,		FeatureSlowBTMem,
FeatureSlowUAMem32		FeatureSlowUAMem32
]>;		]>;
def : IvyBridgeProc<"ivybridge">;		def : IvyBridgeProc<"ivybridge">;
def : IvyBridgeProc<"core-avx-i">; // Legacy alias.		def : IvyBridgeProc<"core-avx-i">; // Legacy alias.

def HSWFeatures : ProcessorFeatures<IVBFeatures.Value, [		def HSWFeatures : ProcessorFeatures<IVBFeatures.Value, [
FeatureAVX2,		FeatureAVX2,
FeatureBMI,		FeatureBMI,
FeatureBMI2,		FeatureBMI2,
FeatureERMSB,		FeatureERMSB,
FeatureFMA,		FeatureFMA,
FeatureLZCNT,		FeatureLZCNT,
FeatureMOVBE,		FeatureMOVBE,
FeatureSlowIncDec		FeatureSlowIncDec
]>;		]>;

class HaswellProc<string Name> : ProcModel<Name, HaswellModel,		class HaswellProc<string Name> : ProcModel<Name, HaswellModel,
HSWFeatures.Value, []>;		HSWFeatures.Value, [
		ProcIntelHSW
		]>;
def : HaswellProc<"haswell">;		def : HaswellProc<"haswell">;
def : HaswellProc<"core-avx2">; // Legacy alias.		def : HaswellProc<"core-avx2">; // Legacy alias.

def BDWFeatures : ProcessorFeatures<HSWFeatures.Value, [		def BDWFeatures : ProcessorFeatures<HSWFeatures.Value, [
		ProcIntelBDW,
FeatureADX,		FeatureADX,
FeatureRDSEED		FeatureRDSEED
]>;		]>;
class BroadwellProc<string Name> : ProcModel<Name, HaswellModel,		class BroadwellProc<string Name> : ProcModel<Name, HaswellModel,
BDWFeatures.Value, []>;		BDWFeatures.Value, []>;
def : BroadwellProc<"broadwell">;		def : BroadwellProc<"broadwell">;

def SKLFeatures : ProcessorFeatures<BDWFeatures.Value, [		def SKLFeatures : ProcessorFeatures<BDWFeatures.Value, [
FeatureMPX,		FeatureMPX,
FeatureRTM,		FeatureRTM,
FeatureXSAVEC,		FeatureXSAVEC,
FeatureXSAVES,		FeatureXSAVES,
FeatureSGX,		FeatureSGX,
FeatureCLFLUSHOPT,		FeatureCLFLUSHOPT,
FeatureFastVectorFSQRT		FeatureFastVectorFSQRT
]>;		]>;

// FIXME: define SKL model		// FIXME: define SKL model
class SkylakeClientProc<string Name> : ProcModel<Name, HaswellModel,		class SkylakeClientProc<string Name> : ProcModel<Name, HaswellModel,
SKLFeatures.Value, []>;		SKLFeatures.Value, [
		ProcIntelSKL
		]>;
def : SkylakeClientProc<"skylake">;		def : SkylakeClientProc<"skylake">;

// FIXME: define KNL model		// FIXME: define KNL model
class KnightsLandingProc<string Name> : ProcModel<Name, HaswellModel,		class KnightsLandingProc<string Name> : ProcModel<Name, HaswellModel,
IVBFeatures.Value, [		IVBFeatures.Value, [
		ProcIntelKNL,
FeatureAVX512,		FeatureAVX512,
FeatureERI,		FeatureERI,
FeatureCDI,		FeatureCDI,
FeaturePFI,		FeaturePFI,
FeaturePREFETCHWT1,		FeaturePREFETCHWT1,
FeatureADX,		FeatureADX,
FeatureRDSEED,		FeatureRDSEED,
FeatureMOVBE,		FeatureMOVBE,
Show All 12 Lines	def SKXFeatures : ProcessorFeatures<SKLFeatures.Value, [
FeatureBWI,		FeatureBWI,
FeatureVLX,		FeatureVLX,
FeaturePKU,		FeaturePKU,
FeatureCLWB		FeatureCLWB
]>;		]>;

// FIXME: define SKX model		// FIXME: define SKX model
class SkylakeServerProc<string Name> : ProcModel<Name, HaswellModel,		class SkylakeServerProc<string Name> : ProcModel<Name, HaswellModel,
SKXFeatures.Value, []>;		SKXFeatures.Value, [
		ProcIntelSKX
		]>;
def : SkylakeServerProc<"skylake-avx512">;		def : SkylakeServerProc<"skylake-avx512">;
def : SkylakeServerProc<"skx">; // Legacy alias.		def : SkylakeServerProc<"skx">; // Legacy alias.

def CNLFeatures : ProcessorFeatures<SKXFeatures.Value, [		def CNLFeatures : ProcessorFeatures<SKXFeatures.Value, [
FeatureVBMI,		FeatureVBMI,
FeatureIFMA,		FeatureIFMA,
FeatureSHA		FeatureSHA
]>;		]>;

class CannonlakeProc<string Name> : ProcModel<Name, HaswellModel,		class CannonlakeProc<string Name> : ProcModel<Name, HaswellModel,
CNLFeatures.Value, []>;		CNLFeatures.Value, [
		ProcIntelCNL
		]>;
def : CannonlakeProc<"cannonlake">;		def : CannonlakeProc<"cannonlake">;

// AMD CPUs.		// AMD CPUs.

def : Proc<"k6", [FeatureX87, FeatureSlowUAMem16, FeatureMMX]>;		def : Proc<"k6", [FeatureX87, FeatureSlowUAMem16, FeatureMMX]>;
def : Proc<"k6-2", [FeatureX87, FeatureSlowUAMem16, Feature3DNow]>;		def : Proc<"k6-2", [FeatureX87, FeatureSlowUAMem16, Feature3DNow]>;
def : Proc<"k6-3", [FeatureX87, FeatureSlowUAMem16, Feature3DNow]>;		def : Proc<"k6-3", [FeatureX87, FeatureSlowUAMem16, Feature3DNow]>;
def : Proc<"athlon", [FeatureX87, FeatureSlowUAMem16, Feature3DNowA,		def : Proc<"athlon", [FeatureX87, FeatureSlowUAMem16, Feature3DNowA,
▲ Show 20 Lines • Show All 312 Lines • Show Last 20 Lines

lib/Target/X86/X86Subtarget.h

Show First 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	enum X86SSEEnum {
NoSSE, SSE1, SSE2, SSE3, SSSE3, SSE41, SSE42, AVX, AVX2, AVX512F		NoSSE, SSE1, SSE2, SSE3, SSSE3, SSE41, SSE42, AVX, AVX2, AVX512F
};		};

enum X863DNowEnum {		enum X863DNowEnum {
NoThreeDNow, MMX, ThreeDNow, ThreeDNowA		NoThreeDNow, MMX, ThreeDNow, ThreeDNowA
};		};

enum X86ProcFamilyEnum {		enum X86ProcFamilyEnum {
Others, IntelAtom, IntelSLM, IntelGLM		Others,
		IntelAtom,
		delenaUnsubmitted Done Reply Inline Actions You can use Others instead of Generic. delena: You can use Others instead of Generic.
		IntelSLM,
		IntelGLM,
		IntelHaswell,
		IntelBroadwell,
		IntelSkylake,
		IntelKNL,
		IntelSKX,
		IntelCannonlake
};		};

/// X86 processor family: Intel Atom, and others		/// X86 processor family: Intel Atom, and others
X86ProcFamilyEnum X86ProcFamily;		X86ProcFamilyEnum X86ProcFamily;
		RKSimonUnsubmitted Done Reply Inline Actions Do this in X86Subtarget::initializeEnvironment() ? RKSimon: Do this in X86Subtarget::initializeEnvironment() ?

/// Which PIC style to use		/// Which PIC style to use
PICStyles::Style PICStyle;		PICStyles::Style PICStyle;

const TargetMachine &TM;		const TargetMachine &TM;

/// SSE1, SSE2, SSE3, SSSE3, SSE41, SSE42, or none supported.		/// SSE1, SSE2, SSE3, SSSE3, SSE41, SSE42, or none supported.
X86SSEEnum X86SSELevel;		X86SSEEnum X86SSELevel;
▲ Show 20 Lines • Show All 253 Lines • ▼ Show 20 Lines	private:
bool In64BitMode;		bool In64BitMode;

/// True if compiling for 32-bit, false for 16-bit or 64-bit.		/// True if compiling for 32-bit, false for 16-bit or 64-bit.
bool In32BitMode;		bool In32BitMode;

/// True if compiling for 16-bit, false for 32-bit or 64-bit.		/// True if compiling for 16-bit, false for 32-bit or 64-bit.
bool In16BitMode;		bool In16BitMode;

		/// Contains the Overhead of gather\scatter instructions
		int GatherOverhead;
		int ScatterOverhead;
		RKSimonUnsubmitted Done Reply Inline Actions Do this in X86Subtarget::initializeEnvironment() ? RKSimon: Do this in X86Subtarget::initializeEnvironment() ?

X86SelectionDAGInfo TSInfo;		X86SelectionDAGInfo TSInfo;
// Ordering here is important. X86InstrInfo initializes X86RegisterInfo which		// Ordering here is important. X86InstrInfo initializes X86RegisterInfo which
// X86TargetLowering needs.		// X86TargetLowering needs.
X86InstrInfo InstrInfo;		X86InstrInfo InstrInfo;
X86TargetLowering TLInfo;		X86TargetLowering TLInfo;
X86FrameLowering FrameLowering;		X86FrameLowering FrameLowering;

public:		public:
▲ Show 20 Lines • Show All 129 Lines • ▼ Show 20 Lines	public:
bool hasLAHFSAHF() const { return HasLAHFSAHF; }		bool hasLAHFSAHF() const { return HasLAHFSAHF; }
bool hasMWAITX() const { return HasMWAITX; }		bool hasMWAITX() const { return HasMWAITX; }
bool hasCLZERO() const { return HasCLZERO; }		bool hasCLZERO() const { return HasCLZERO; }
bool isBTMemSlow() const { return IsBTMemSlow; }		bool isBTMemSlow() const { return IsBTMemSlow; }
bool isSHLDSlow() const { return IsSHLDSlow; }		bool isSHLDSlow() const { return IsSHLDSlow; }
bool isPMULLDSlow() const { return IsPMULLDSlow; }		bool isPMULLDSlow() const { return IsPMULLDSlow; }
bool isUnalignedMem16Slow() const { return IsUAMem16Slow; }		bool isUnalignedMem16Slow() const { return IsUAMem16Slow; }
bool isUnalignedMem32Slow() const { return IsUAMem32Slow; }		bool isUnalignedMem32Slow() const { return IsUAMem32Slow; }
		int getGatherOverhead() const { return GatherOverhead; }
		int getScatterOverhead() const { return ScatterOverhead; }
bool hasSSEUnalignedMem() const { return HasSSEUnalignedMem; }		bool hasSSEUnalignedMem() const { return HasSSEUnalignedMem; }
bool hasCmpxchg16b() const { return HasCmpxchg16b; }		bool hasCmpxchg16b() const { return HasCmpxchg16b; }
bool useLeaForSP() const { return UseLeaForSP; }		bool useLeaForSP() const { return UseLeaForSP; }
bool hasFastPartialYMMorZMMWrite() const {		bool hasFastPartialYMMorZMMWrite() const {
return HasFastPartialYMMorZMMWrite;		return HasFastPartialYMMorZMMWrite;
}		}
bool hasFastScalarFSQRT() const { return HasFastScalarFSQRT; }		bool hasFastScalarFSQRT() const { return HasFastScalarFSQRT; }
bool hasFastVectorFSQRT() const { return HasFastVectorFSQRT; }		bool hasFastVectorFSQRT() const { return HasFastVectorFSQRT; }
Show All 16 Lines	public:
bool hasBWI() const { return HasBWI; }		bool hasBWI() const { return HasBWI; }
bool hasVLX() const { return HasVLX; }		bool hasVLX() const { return HasVLX; }
bool hasPKU() const { return HasPKU; }		bool hasPKU() const { return HasPKU; }
bool hasMPX() const { return HasMPX; }		bool hasMPX() const { return HasMPX; }
bool hasCLFLUSHOPT() const { return HasCLFLUSHOPT; }		bool hasCLFLUSHOPT() const { return HasCLFLUSHOPT; }

bool isXRaySupported() const override { return is64Bit(); }		bool isXRaySupported() const override { return is64Bit(); }

		X86ProcFamilyEnum getProcFamily() const { return X86ProcFamily; }

		/// TODO: to be removed later and replaced with suitable properties
		filcabUnsubmitted Not Done Reply Inline Actions What's a "suitable property", btw? filcab: What's a "suitable property", btw?
		magabariAuthorUnsubmitted Not Done Reply Inline Actions it depends in usage of the isAtom\isSLM. for example in X86TargetTransformInfo.cpp::getMaxInterleaveFactor the property will be getMaxInterleaveFactor magabari: it depends in usage of the isAtom\isSLM. for example in X86TargetTransformInfo.cpp…
bool isAtom() const { return X86ProcFamily == IntelAtom; }		bool isAtom() const { return X86ProcFamily == IntelAtom; }
bool isSLM() const { return X86ProcFamily == IntelSLM; }		bool isSLM() const { return X86ProcFamily == IntelSLM; }
bool useSoftFloat() const { return UseSoftFloat; }		bool useSoftFloat() const { return UseSoftFloat; }
		filcabUnsubmitted Done Reply Inline Actions Stray change. filcab: Stray change.

/// Use mfence if we have SSE2 or we're on x86-64 (even if we asked for		/// Use mfence if we have SSE2 or we're on x86-64 (even if we asked for
/// no-sse2). There isn't any reason to disable it if the target processor		/// no-sse2). There isn't any reason to disable it if the target processor
/// supports it.		/// supports it.
bool hasMFence() const { return hasSSE2() \|\| is64Bit(); }		bool hasMFence() const { return hasSSE2() \|\| is64Bit(); }

const Triple &getTargetTriple() const { return TargetTriple; }		const Triple &getTargetTriple() const { return TargetTriple; }

▲ Show 20 Lines • Show All 140 Lines • Show Last 20 Lines

lib/Target/X86/X86Subtarget.cpp

Show All 11 Lines
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "X86.h"		#include "X86.h"

#ifdef LLVM_BUILD_GLOBAL_ISEL		#ifdef LLVM_BUILD_GLOBAL_ISEL
#include "X86CallLowering.h"		#include "X86CallLowering.h"
#include "X86LegalizerInfo.h"		#include "X86LegalizerInfo.h"
#include "X86RegisterBankInfo.h"		#include "X86RegisterBankInfo.h"
#endif		#endif
		delenaUnsubmitted Done Reply Inline Actions Gather is available since Haswell (AVX2 set). So technically, we can generate Gathers on all AVX2 processors. But the overhead on HSW is high. Skylake Client processor has faster Gathers than HSW and performance is similar to Skylake Server (AVX-512). The specified overhead is relative to the Load operation. "2" is the number provided by Intel architects, we are already using it to calculate GS cost. if (X86ProcFamily == IntelSkylake \|\| hasAVX512) GatherOverhead = 2; if (hasAVX512) // SKX and KNL fail here ScatterOverhead = 2; delena: // Gather is available since Haswell (AVX2 set). So technically, we can generate Gathers on all…
#include "X86Subtarget.h"		#include "X86Subtarget.h"
#include "MCTargetDesc/X86BaseInfo.h"		#include "MCTargetDesc/X86BaseInfo.h"
#include "X86TargetMachine.h"		#include "X86TargetMachine.h"
		delenaUnsubmitted Done Reply Inline Actions Please remove. delena: Please remove.
#include "llvm/ADT/Triple.h"		#include "llvm/ADT/Triple.h"
		delenaUnsubmitted Not Done Reply Inline Actions we are already using it to calculate GS cost -> This parameter is used for cost estimation of Gather Op and comparison with other alternatives. delena:* *we are already using it to calculate GS cost -> This parameter is used for cost estimation of…
#ifdef LLVM_BUILD_GLOBAL_ISEL		#ifdef LLVM_BUILD_GLOBAL_ISEL
#include "llvm/CodeGen/GlobalISel/CallLowering.h"		#include "llvm/CodeGen/GlobalISel/CallLowering.h"
#include "llvm/CodeGen/GlobalISel/InstructionSelect.h"		#include "llvm/CodeGen/GlobalISel/InstructionSelect.h"
#include "llvm/CodeGen/GlobalISel/Legalizer.h"		#include "llvm/CodeGen/GlobalISel/Legalizer.h"
#include "llvm/CodeGen/GlobalISel/RegBankSelect.h"		#include "llvm/CodeGen/GlobalISel/RegBankSelect.h"
#endif		#endif
#include "llvm/IR/Attributes.h"		#include "llvm/IR/Attributes.h"
#include "llvm/IR/ConstantRange.h"		#include "llvm/IR/ConstantRange.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/GlobalValue.h"		#include "llvm/IR/GlobalValue.h"
#include "llvm/Support/Casting.h"		#include "llvm/Support/Casting.h"
#include "llvm/Support/CodeGen.h"		#include "llvm/Support/CodeGen.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
		delenaUnsubmitted Done Reply Inline Actions Please remove the "else", the values are already initialized. delena: Please remove the "else", the values are already initialized.
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Target/TargetMachine.h"		#include "llvm/Target/TargetMachine.h"
#include <cassert>		#include <cassert>
#include <string>		#include <string>

#if defined(_MSC_VER)		#if defined(_MSC_VER)
#include <intrin.h>		#include <intrin.h>
#endif		#endif
▲ Show 20 Lines • Show All 210 Lines • ▼ Show 20 Lines	void X86Subtarget::initSubtargetFeatures(StringRef CPU, StringRef FS) {

// Stack alignment is 16 bytes on Darwin, Linux, kFreeBSD and Solaris (both		// Stack alignment is 16 bytes on Darwin, Linux, kFreeBSD and Solaris (both
// 32 and 64 bit) and for all 64-bit targets.		// 32 and 64 bit) and for all 64-bit targets.
if (StackAlignOverride)		if (StackAlignOverride)
stackAlignment = StackAlignOverride;		stackAlignment = StackAlignOverride;
else if (isTargetDarwin() \|\| isTargetLinux() \|\| isTargetSolaris() \|\|		else if (isTargetDarwin() \|\| isTargetLinux() \|\| isTargetSolaris() \|\|
isTargetKFreeBSD() \|\| In64BitMode)		isTargetKFreeBSD() \|\| In64BitMode)
stackAlignment = 16;		stackAlignment = 16;

		switch(X86ProcFamily) {
		craig.topperUnsubmitted Not Done Reply Inline Actions Why can't this just be done by comparing the CPU string? What did we get by creating an enum that is named the same as the CPU string? craig.topper: Why can't this just be done by comparing the CPU string? What did we get by creating an enum…
		magabariAuthorUnsubmitted Not Done Reply Inline Actions X86ProcFamily enum was already exist and we have just added more values. In general I think it's better to parse the CPU strings in one place and after that to start using the enum values instead of comparing the strings all the time. magabari: X86ProcFamily enum was already exist and we have just added more values. In general I think…
		case IntelSkylake:
		GatherOverhead = 2;
		RKSimonUnsubmitted Not Done Reply Inline Actions You got to all that trouble of adding the other intel cpus and then just need IntelSkylake and IntelSKX? RKSimon: You got to all that trouble of adding the other intel cpus and then just need IntelSkylake and…
		magabariAuthorUnsubmitted Not Done Reply Inline Actions This is only one property. even HSW has Gather and later will add it's overhead (currently we dont recommend generating gathers for HSW). There is other properties that will need to return different values based on the family. so my patch just getting the infrastructure ready for this. I will upload my patch soon, it uses the "getGatherOverhead" property from the subtarget. magabari: This is only one property. even HSW has Gather and later will add it's overhead (currently we…
		ScatterOverhead = 1024; // not relevant for AVX2
		delenaUnsubmitted Not Done Reply Inline Actions Mohamed, you have to add ScatterOverhead for SKX. delena: Mohamed, you have to add ScatterOverhead for SKX.
		break;
		case IntelSKX:
		GatherOverhead = 2;
		filcabUnsubmitted Done Reply Inline Actions `GatherOverhead` is `unsigned`. Why set it to `INT_MAX`? filcab: `GatherOverhead` is `unsigned`. Why set it to `INT_MAX`?
		ScatterOverhead = 2;
		filcabUnsubmitted Not Done Reply Inline Actions This could just be encoded in a `bool ShouldUseGather`/`ShouldAvoidGather` or similar. What does having a non-boolean get us now? Why `2`? I'm ok with getting a `bool` now and changing it to `int`/`unsigned`/`whatever` in the future, but only if we need to. filcab:* This could just be encoded in a `bool ShouldUseGather`/`ShouldAvoidGather` or similar. What…
		delenaUnsubmitted Not Done Reply Inline Actions Gather is available since Haswell (AVX2 set). So technically, we can generate Gathers on all AVX2 processors. But the overhead on HSW is high, Mohamed put int_max, it will be replaced later with another number. Skylake Client processor has faster Gathers than HSW and performance is similar to Skylake Server (AVX-512). The specified overhead is relative to the Load operation. "2" is the number provided by Intel architects, we are already using it to calculate GS cost. I want to say that "GSOverhead = 2" we have today inside X86TTI. The problem, that TTI can't distinguish between HSW and SKL. They both have Gather, but the cost is different. I assume that TTI should provide a cost for Gather, for all AVX2 and AVX-512 processors, but again, it is not one value for all. Vectorizer compares this cost with scalar and interleave variants and chooses the best solution. delena: Gather is available since Haswell (AVX2 set). So technically, we can generate Gathers on all…
		break;
		default:
		// Currently picking high overheads for other targets in order not to be selected
		// TODO: need to get uArch overheads for hsw\bdw
		// FIXME: giving 1024 as a max int because it may overflow in the CM calucation causing a
		// wrong desicion or negative values, maybe need to move to FP?
		GatherOverhead = 1024;
		ScatterOverhead = 1024;
		}
}		}

void X86Subtarget::initializeEnvironment() {		void X86Subtarget::initializeEnvironment() {
X86SSELevel = NoSSE;		X86SSELevel = NoSSE;
X863DNowLevel = NoThreeDNow;		X863DNowLevel = NoThreeDNow;
HasX87 = false;		HasX87 = false;
HasCMov = false;		HasCMov = false;
HasX86_64 = false;		HasX86_64 = false;
▲ Show 20 Lines • Show All 169 Lines • Show Last 20 Lines