This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/X86/
-
Target/
-
X86/
1/1
X86.td
3/5
X86Subtarget.h
4/12
X86Subtarget.cpp

Differential D35348

Adding all X86 Processor families which can help initializing several uArch properties
ClosedPublic

Authored by magabari on Jul 13 2017, 4:08 AM.

Download Raw Diff

Details

Reviewers

delena
zvi
craig.topper
RKSimon
aaboud
dorit

Commits

rGe9aebf26af9c: [X86] Adding X86 Processor Families
rL313132: [X86] Adding X86 Processor Families

Summary

Adding all x86 Processor families to initialize several uArch properties (based on the family)
this shows how gather cost can be initialized based on the proc. family

Diff Detail

Event Timeline

magabari created this revision.Jul 13 2017, 4:08 AM

magabari updated this revision to Diff 106400.Jul 13 2017, 4:21 AM

I really don't like this approach - we've been getting closer and closer to killing off X86ProcFamilyEnum for years now and move to a mixture of feature bits and scheduler cost driven decisions that works in a more general manner.

Assuming you intend to use getGatherOverhead() inside X86TTIImpl::getGatherScatterOpCost, might not it be better to just provide a Fast/Slow Gather feature bit, or (even better) finally start trying to use the scheduler model data within X86TTIImpl ?

I agree with you. We also considered using scheduling model (or any other target micro-architecture data base) from TTI. It requires detailed design and enters to our long term plans.
We already started feeding micro-architecture details to LLVM, step-by-step, and it will take a time to complete.
The cost, that is kept in X86TargetTransformInfo.cpp should also be moved to .td file and it will require a special interface for communication between IR and MachineInstructions.
Using processorFamily as a property allows tuning compiler for a processor meanwhile, otherwise we just tie all patches like "gather-enabling" in one chain, waiting until all .td files are entered.

At the very least: This is adding a ton of unused and unneeded details. If we end up needing to dispatch on processor family, then maybe we can add (some of) these.
Otherwise, I don't think it's reasonable to add all these enum values.

lib/Target/X86/X86Subtarget.cpp
35	`GatherOverhead` is `unsigned`. Why set it to `INT_MAX`?
36	This could just be encoded in a `bool ShouldUseGather`/`ShouldAvoidGather` or similar. What does having a non-boolean get us now*? Why `2`? I'm ok with getting a `bool` now and changing it to `int`/`unsigned`/`whatever` in the future, but only if we need to.
lib/Target/X86/X86Subtarget.h
19	Stray change.
21	What's a "suitable property", btw?

delena added inline comments.Jul 13 2017, 9:27 AM

lib/Target/X86/X86Subtarget.cpp
36	Gather is available since Haswell (AVX2 set). So technically, we can generate Gathers on all AVX2 processors. But the overhead on HSW is high, Mohamed put int_max, it will be replaced later with another number. Skylake Client processor has faster Gathers than HSW and performance is similar to Skylake Server (AVX-512). The specified overhead is relative to the Load operation. "2" is the number provided by Intel architects, we are already using it to calculate GS cost. I want to say that "GSOverhead = 2" we have today inside X86TTI. The problem, that TTI can't distinguish between HSW and SKL. They both have Gather, but the cost is different. I assume that TTI should provide a cost for Gather, for all AVX2 and AVX-512 processors, but again, it is not one value for all. Vectorizer compares this cost with scalar and interleave variants and chooses the best solution.

If we were to go the route of adding a feature bit for every CPU. I wonder if we should just convert the CPU name into an enum in the target independent subtarget classes. Going through a feature bit to create an enum is just making the number of feature bits larger. Feature bits are stored in a std::bitset that last I checked was 160 bits and is stored in some of the tablegen generated data structures. Due to limitations in various place this max size of 160 is determined by the target with the most feature bits. I believe it got as high as it is because ARM or AArch64 has added a feature bit per CPU similar to this patch.

Another option may be to allow an enum value to be passed as a separate input to ProcModel that would get stored in a separate field in the target independent subtarget. This would also avoid the feature bit pressure.

I still don't get why this can't be achieved with a single 'FeatureFastGather' feature bit, just used by Skylake CPU target - then X86TTIImpl::getGatherScatterOpCost can just call ST->hasFastGather() to decide whether to use gathers or not.

Or if you want to keep the overhead as a cost multiplier you can do something like:

unsigned getGatherOverhead() const { return HasFastGather ? 2 : UINT_MAX; }

These would be much easier to remove when you've had time to sort out the scheduler models.

In D35348#808219, @RKSimon wrote:
I still don't get why this can't be achieved with a single 'FeatureFastGather' feature bit, just used by Skylake CPU target - then X86TTIImpl::getGatherScatterOpCost can just call ST->hasFastGather() to decide whether to use gathers or not.

Or if you want to keep the overhead as a cost multiplier you can do something like:
unsigned getGatherOverhead() const { return HasFastGather ? 2 : UINT_MAX; }
These would be much easier to remove when you've had time to sort out the scheduler models.

In order to set different numbers to different mach. Each processor that supports Gather should be able to provide a number, not only /NoGather/SlowGather/FastGather.

In D35348#808263, @delena wrote:

In order to set different numbers to different mach. Each processor that supports Gather should be able to provide a number, not only /NoGather/SlowGather/FastGather.

So how many numbers are we talking about? Is it just Skylake, SKX and Cannonlake? (KNL?) I guess we don't want to handle weaker AVX2 CPUs so we don't need a feature bit for those (default getGatherOverhead() to UINT_MAX)

Whatever happens, almost all those new processor enums are not required.

igorb added a subscriber: igorb.Jul 13 2017, 12:51 PM

removing unnecessary processor
fixing uint issue

lib/Target/X86/X86Subtarget.h
21	it depends in usage of the isAtom\isSLM. for example in X86TargetTransformInfo.cpp::getMaxInterleaveFactor the property will be getMaxInterleaveFactor

Before you continue with this, is there any chance that you can create a phab for the follow on patches that are dependent on this please? To get a better idea of what you are needing this for.

lib/Target/X86/X86Subtarget.cpp
31	You got to all that trouble of adding the other intel cpus and then just need IntelSkylake and IntelSKX?

delena added inline comments.Jul 19 2017, 6:34 AM

lib/Target/X86/X86Subtarget.cpp
32	Mohamed, you have to add ScatterOverhead for SKX.

magabari added inline comments.Jul 22 2017, 11:40 PM

lib/Target/X86/X86Subtarget.cpp
31	This is only one property. even HSW has Gather and later will add it's overhead (currently we dont recommend generating gathers for HSW). There is other properties that will need to return different values based on the family. so my patch just getting the infrastructure ready for this. I will upload my patch soon, it uses the "getGatherOverhead" property from the subtarget.

adding ScatterOverhead and updating the max overhead.
max overhead can't be MAX_INT or MAX_UINT because it will overflow on the CM calculation and causing a wrong decision.

delena added inline comments.Jul 23 2017, 4:44 AM

lib/Target/X86/X86Subtarget.h
19	You can use Others instead of Generic.

replacing Generic with Others

craig.topper added inline comments.Jul 24 2017, 10:34 PM

lib/Target/X86/X86Subtarget.cpp
29	Why can't this just be done by comparing the CPU string? What did we get by creating an enum that is named the same as the CPU string?

delena added inline comments.Jul 25 2017, 10:42 AM

lib/Target/X86/X86.td
22–1	You don't need Others here, I think.
23	Why do you need this change?

magabari marked 2 inline comments as done.Jul 27 2017, 12:54 AM

magabari added inline comments.

lib/Target/X86/X86Subtarget.cpp
29	X86ProcFamily enum was already exist and we have just added more values. In general I think it's better to parse the CPU strings in one place and after that to start using the enum values instead of comparing the strings all the time.

magabari updated this revision to Diff 108429.Jul 27 2017, 12:55 AM

ping

ping ping

magabari added a reviewer: dorit.Aug 14 2017, 3:46 AM

RKSimon added inline comments.Aug 14 2017, 3:56 AM

lib/Target/X86/X86Subtarget.h
18–1	Do this in X86Subtarget::initializeEnvironment() ?
21	Do this in X86Subtarget::initializeEnvironment() ?

magabari updated this revision to Diff 111943.Aug 21 2017, 3:49 AM

magabari marked 2 inline comments as done.

grosser added a subscriber: grosser.Aug 26 2017, 12:09 AM

Simon, could you please take a look on the changes and see if its okay now?

delena added inline comments.Aug 27 2017, 5:11 AM

lib/Target/X86/X86Subtarget.cpp
287	Gather is available since Haswell (AVX2 set). So technically, we can generate Gathers on all AVX2 processors. But the overhead on HSW is high. Skylake Client processor has faster Gathers than HSW and performance is similar to Skylake Server (AVX-512). The specified overhead is relative to the Load operation. "2" is the number provided by Intel architects, we are already using it to calculate GS cost. if (X86ProcFamily == IntelSkylake \|\| hasAVX512) GatherOverhead = 2; if (hasAVX512) // SKX and KNL fail here ScatterOverhead = 2;
290	Please remove.
306	Please remove the "else", the values are already initialized.

aymanmus added a subscriber: aymanmus.Aug 31 2017, 12:55 AM

fixing Elena comments

magabari added a comment.Aug 31 2017, 1:05 AM

This comment was removed by magabari.

LGTM + a minor comment fix

lib/Target/X86/X86Subtarget.cpp
291	*we are already using it to calculate GS cost -> This parameter is used for cost estimation of Gather Op and comparison with other alternatives.

This revision is now accepted and ready to land.Aug 31 2017, 1:31 AM

fixed comment

In D35348#853420, @magabari wrote:

Simon, could you please take a look on the changes and see if its okay now?

LGTM - I still think this patch only needs to add IntelSkylake but it's a lot better than it was.

Closed by commit rL313132: [X86] Adding X86 Processor Families (authored by magabari). · Explain WhySep 13 2017, 2:02 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Target/

X86/

X86.td

30 lines

X86Subtarget.h

20 lines

X86Subtarget.cpp

27 lines

Diff 111943

lib/Target/X86/X86.td

Show All 14 Lines
// Get the target-independent interfaces which we are implementing...		// Get the target-independent interfaces which we are implementing...
//		//
include "llvm/Target/Target.td"		include "llvm/Target/Target.td"

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// X86 Subtarget state		// X86 Subtarget state
//		//

def Mode64Bit : SubtargetFeature<"64bit-mode", "In64BitMode", "true",		def Mode64Bit : SubtargetFeature<"64bit-mode", "In64BitMode", "true",
		delenaUnsubmitted Done Reply Inline Actions Why do you need this change? delena: Why do you need this change?
"64-bit mode (x86_64)">;		"64-bit mode (x86_64)">;
def Mode32Bit : SubtargetFeature<"32bit-mode", "In32BitMode", "true",		def Mode32Bit : SubtargetFeature<"32bit-mode", "In32BitMode", "true",
"32-bit mode (80386)">;		"32-bit mode (80386)">;
def Mode16Bit : SubtargetFeature<"16bit-mode", "In16BitMode", "true",		def Mode16Bit : SubtargetFeature<"16bit-mode", "In16BitMode", "true",
"16-bit mode (i8086)">;		"16-bit mode (i8086)">;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// X86 Subtarget features		// X86 Subtarget features
▲ Show 20 Lines • Show All 265 Lines • ▼ Show 20 Lines
include "X86Schedule.td"		include "X86Schedule.td"

def ProcIntelAtom : SubtargetFeature<"atom", "X86ProcFamily", "IntelAtom",		def ProcIntelAtom : SubtargetFeature<"atom", "X86ProcFamily", "IntelAtom",
"Intel Atom processors">;		"Intel Atom processors">;
def ProcIntelSLM : SubtargetFeature<"slm", "X86ProcFamily", "IntelSLM",		def ProcIntelSLM : SubtargetFeature<"slm", "X86ProcFamily", "IntelSLM",
"Intel Silvermont processors">;		"Intel Silvermont processors">;
def ProcIntelGLM : SubtargetFeature<"glm", "X86ProcFamily", "IntelGLM",		def ProcIntelGLM : SubtargetFeature<"glm", "X86ProcFamily", "IntelGLM",
"Intel Goldmont processors">;		"Intel Goldmont processors">;
		def ProcIntelHSW : SubtargetFeature<"haswell", "X86ProcFamily",
		"IntelHaswell", "Intel Haswell processors">;
		def ProcIntelBDW : SubtargetFeature<"broadwell", "X86ProcFamily",
		"IntelBroadwell", "Intel Broadwell processors">;
		def ProcIntelSKL : SubtargetFeature<"skylake", "X86ProcFamily",
		"IntelSkylake", "Intel Skylake processors">;
		def ProcIntelKNL : SubtargetFeature<"knl", "X86ProcFamily",
		"IntelKNL", "Intel Knights Landing processors">;
		def ProcIntelSKX : SubtargetFeature<"skx", "X86ProcFamily",
		"IntelSKX", "Intel Skylake Server processors">;
		def ProcIntelCNL : SubtargetFeature<"cannonlake", "X86ProcFamily",
		"IntelCannonlake", "Intel Cannonlake processors">;

class Proc<string Name, list<SubtargetFeature> Features>		class Proc<string Name, list<SubtargetFeature> Features>
: ProcessorModel<Name, GenericModel, Features>;		: ProcessorModel<Name, GenericModel, Features>;

def : Proc<"generic", [FeatureX87, FeatureSlowUAMem16]>;		def : Proc<"generic", [FeatureX87, FeatureSlowUAMem16]>;
def : Proc<"i386", [FeatureX87, FeatureSlowUAMem16]>;		def : Proc<"i386", [FeatureX87, FeatureSlowUAMem16]>;
def : Proc<"i486", [FeatureX87, FeatureSlowUAMem16]>;		def : Proc<"i486", [FeatureX87, FeatureSlowUAMem16]>;
def : Proc<"i586", [FeatureX87, FeatureSlowUAMem16]>;		def : Proc<"i586", [FeatureX87, FeatureSlowUAMem16]>;
▲ Show 20 Lines • Show All 237 Lines • ▼ Show 20 Lines	def HSWFeatures : ProcessorFeatures<IVBFeatures.Value, [
FeatureERMSB,		FeatureERMSB,
FeatureFMA,		FeatureFMA,
FeatureLZCNT,		FeatureLZCNT,
FeatureMOVBE,		FeatureMOVBE,
FeatureSlowIncDec		FeatureSlowIncDec
]>;		]>;

class HaswellProc<string Name> : ProcModel<Name, HaswellModel,		class HaswellProc<string Name> : ProcModel<Name, HaswellModel,
HSWFeatures.Value, []>;		HSWFeatures.Value, [
		ProcIntelHSW
		]>;
def : HaswellProc<"haswell">;		def : HaswellProc<"haswell">;
def : HaswellProc<"core-avx2">; // Legacy alias.		def : HaswellProc<"core-avx2">; // Legacy alias.

def BDWFeatures : ProcessorFeatures<HSWFeatures.Value, [		def BDWFeatures : ProcessorFeatures<HSWFeatures.Value, [
		ProcIntelBDW,
FeatureADX,		FeatureADX,
FeatureRDSEED		FeatureRDSEED
]>;		]>;
class BroadwellProc<string Name> : ProcModel<Name, HaswellModel,		class BroadwellProc<string Name> : ProcModel<Name, HaswellModel,
BDWFeatures.Value, []>;		BDWFeatures.Value, []>;
def : BroadwellProc<"broadwell">;		def : BroadwellProc<"broadwell">;

def SKLFeatures : ProcessorFeatures<BDWFeatures.Value, [		def SKLFeatures : ProcessorFeatures<BDWFeatures.Value, [
FeatureMPX,		FeatureMPX,
FeatureRTM,		FeatureRTM,
FeatureXSAVEC,		FeatureXSAVEC,
FeatureXSAVES,		FeatureXSAVES,
FeatureSGX,		FeatureSGX,
FeatureCLFLUSHOPT,		FeatureCLFLUSHOPT,
FeatureFastVectorFSQRT		FeatureFastVectorFSQRT
]>;		]>;

// FIXME: define SKL model		// FIXME: define SKL model
class SkylakeClientProc<string Name> : ProcModel<Name, HaswellModel,		class SkylakeClientProc<string Name> : ProcModel<Name, HaswellModel,
SKLFeatures.Value, []>;		SKLFeatures.Value, [
		ProcIntelSKL
		]>;
def : SkylakeClientProc<"skylake">;		def : SkylakeClientProc<"skylake">;

// FIXME: define KNL model		// FIXME: define KNL model
class KnightsLandingProc<string Name> : ProcModel<Name, HaswellModel,		class KnightsLandingProc<string Name> : ProcModel<Name, HaswellModel,
IVBFeatures.Value, [		IVBFeatures.Value, [
		ProcIntelKNL,
FeatureAVX512,		FeatureAVX512,
FeatureERI,		FeatureERI,
FeatureCDI,		FeatureCDI,
FeaturePFI,		FeaturePFI,
FeaturePREFETCHWT1,		FeaturePREFETCHWT1,
FeatureADX,		FeatureADX,
FeatureRDSEED,		FeatureRDSEED,
FeatureMOVBE,		FeatureMOVBE,
Show All 12 Lines	def SKXFeatures : ProcessorFeatures<SKLFeatures.Value, [
FeatureBWI,		FeatureBWI,
FeatureVLX,		FeatureVLX,
FeaturePKU,		FeaturePKU,
FeatureCLWB		FeatureCLWB
]>;		]>;

// FIXME: define SKX model		// FIXME: define SKX model
class SkylakeServerProc<string Name> : ProcModel<Name, HaswellModel,		class SkylakeServerProc<string Name> : ProcModel<Name, HaswellModel,
SKXFeatures.Value, []>;		SKXFeatures.Value, [
		ProcIntelSKX
		]>;
def : SkylakeServerProc<"skylake-avx512">;		def : SkylakeServerProc<"skylake-avx512">;
def : SkylakeServerProc<"skx">; // Legacy alias.		def : SkylakeServerProc<"skx">; // Legacy alias.

def CNLFeatures : ProcessorFeatures<SKXFeatures.Value, [		def CNLFeatures : ProcessorFeatures<SKXFeatures.Value, [
FeatureVBMI,		FeatureVBMI,
FeatureIFMA,		FeatureIFMA,
FeatureSHA		FeatureSHA
]>;		]>;

class CannonlakeProc<string Name> : ProcModel<Name, HaswellModel,		class CannonlakeProc<string Name> : ProcModel<Name, HaswellModel,
CNLFeatures.Value, []>;		CNLFeatures.Value, [
		ProcIntelCNL
		]>;
def : CannonlakeProc<"cannonlake">;		def : CannonlakeProc<"cannonlake">;

// AMD CPUs.		// AMD CPUs.

def : Proc<"k6", [FeatureX87, FeatureSlowUAMem16, FeatureMMX]>;		def : Proc<"k6", [FeatureX87, FeatureSlowUAMem16, FeatureMMX]>;
def : Proc<"k6-2", [FeatureX87, FeatureSlowUAMem16, Feature3DNow]>;		def : Proc<"k6-2", [FeatureX87, FeatureSlowUAMem16, Feature3DNow]>;
def : Proc<"k6-3", [FeatureX87, FeatureSlowUAMem16, Feature3DNow]>;		def : Proc<"k6-3", [FeatureX87, FeatureSlowUAMem16, Feature3DNow]>;
def : Proc<"athlon", [FeatureX87, FeatureSlowUAMem16, Feature3DNowA,		def : Proc<"athlon", [FeatureX87, FeatureSlowUAMem16, Feature3DNowA,
▲ Show 20 Lines • Show All 318 Lines • Show Last 20 Lines

lib/Target/X86/X86Subtarget.h

Show All 10 Lines
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_LIB_TARGET_X86_X86SUBTARGET_H		#ifndef LLVM_LIB_TARGET_X86_X86SUBTARGET_H
#define LLVM_LIB_TARGET_X86_X86SUBTARGET_H		#define LLVM_LIB_TARGET_X86_X86SUBTARGET_H

#include "X86FrameLowering.h"		#include "X86FrameLowering.h"
#include "X86ISelLowering.h"		#include "X86ISelLowering.h"
#include "X86InstrInfo.h"		#include "X86InstrInfo.h"
		filcabUnsubmitted Done Reply Inline Actions Stray change. filcab: Stray change.
		delenaUnsubmitted Done Reply Inline Actions You can use Others instead of Generic. delena: You can use Others instead of Generic.
#include "X86SelectionDAGInfo.h"		#include "X86SelectionDAGInfo.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
		filcabUnsubmitted Not Done Reply Inline Actions What's a "suitable property", btw? filcab: What's a "suitable property", btw?
		magabariAuthorUnsubmitted Not Done Reply Inline Actions it depends in usage of the isAtom\isSLM. for example in X86TargetTransformInfo.cpp::getMaxInterleaveFactor the property will be getMaxInterleaveFactor magabari: it depends in usage of the isAtom\isSLM. for example in X86TargetTransformInfo.cpp…
		RKSimonUnsubmitted Done Reply Inline Actions Do this in X86Subtarget::initializeEnvironment() ? RKSimon: Do this in X86Subtarget::initializeEnvironment() ?
#include "llvm/ADT/Triple.h"		#include "llvm/ADT/Triple.h"
#include "llvm/CodeGen/GlobalISel/CallLowering.h"		#include "llvm/CodeGen/GlobalISel/CallLowering.h"
#include "llvm/CodeGen/GlobalISel/InstructionSelector.h"		#include "llvm/CodeGen/GlobalISel/InstructionSelector.h"
#include "llvm/CodeGen/GlobalISel/LegalizerInfo.h"		#include "llvm/CodeGen/GlobalISel/LegalizerInfo.h"
#include "llvm/CodeGen/GlobalISel/RegisterBankInfo.h"		#include "llvm/CodeGen/GlobalISel/RegisterBankInfo.h"
#include "llvm/IR/CallingConv.h"		#include "llvm/IR/CallingConv.h"
#include "llvm/MC/MCInstrItineraries.h"		#include "llvm/MC/MCInstrItineraries.h"
#include "llvm/Target/TargetMachine.h"		#include "llvm/Target/TargetMachine.h"
Show All 26 Lines	enum X86SSEEnum {
NoSSE, SSE1, SSE2, SSE3, SSSE3, SSE41, SSE42, AVX, AVX2, AVX512F		NoSSE, SSE1, SSE2, SSE3, SSSE3, SSE41, SSE42, AVX, AVX2, AVX512F
};		};

enum X863DNowEnum {		enum X863DNowEnum {
NoThreeDNow, MMX, ThreeDNow, ThreeDNowA		NoThreeDNow, MMX, ThreeDNow, ThreeDNowA
};		};

enum X86ProcFamilyEnum {		enum X86ProcFamilyEnum {
Others, IntelAtom, IntelSLM, IntelGLM		Others,
		IntelAtom,
		IntelSLM,
		IntelGLM,
		IntelHaswell,
		IntelBroadwell,
		IntelSkylake,
		IntelKNL,
		IntelSKX,
		IntelCannonlake
};		};

/// X86 processor family: Intel Atom, and others		/// X86 processor family: Intel Atom, and others
X86ProcFamilyEnum X86ProcFamily;		X86ProcFamilyEnum X86ProcFamily;

/// Which PIC style to use		/// Which PIC style to use
PICStyles::Style PICStyle;		PICStyles::Style PICStyle;

▲ Show 20 Lines • Show All 258 Lines • ▼ Show 20 Lines	private:
bool In64BitMode;		bool In64BitMode;

/// True if compiling for 32-bit, false for 16-bit or 64-bit.		/// True if compiling for 32-bit, false for 16-bit or 64-bit.
bool In32BitMode;		bool In32BitMode;

/// True if compiling for 16-bit, false for 32-bit or 64-bit.		/// True if compiling for 16-bit, false for 32-bit or 64-bit.
bool In16BitMode;		bool In16BitMode;

		/// Contains the Overhead of gather\scatter instructions
		int GatherOverhead;
		int ScatterOverhead;

X86SelectionDAGInfo TSInfo;		X86SelectionDAGInfo TSInfo;
// Ordering here is important. X86InstrInfo initializes X86RegisterInfo which		// Ordering here is important. X86InstrInfo initializes X86RegisterInfo which
// X86TargetLowering needs.		// X86TargetLowering needs.
X86InstrInfo InstrInfo;		X86InstrInfo InstrInfo;
X86TargetLowering TLInfo;		X86TargetLowering TLInfo;
X86FrameLowering FrameLowering;		X86FrameLowering FrameLowering;

public:		public:
▲ Show 20 Lines • Show All 126 Lines • ▼ Show 20 Lines	public:
bool hasLAHFSAHF() const { return HasLAHFSAHF; }		bool hasLAHFSAHF() const { return HasLAHFSAHF; }
bool hasMWAITX() const { return HasMWAITX; }		bool hasMWAITX() const { return HasMWAITX; }
bool hasCLZERO() const { return HasCLZERO; }		bool hasCLZERO() const { return HasCLZERO; }
bool isBTMemSlow() const { return IsBTMemSlow; }		bool isBTMemSlow() const { return IsBTMemSlow; }
bool isSHLDSlow() const { return IsSHLDSlow; }		bool isSHLDSlow() const { return IsSHLDSlow; }
bool isPMULLDSlow() const { return IsPMULLDSlow; }		bool isPMULLDSlow() const { return IsPMULLDSlow; }
bool isUnalignedMem16Slow() const { return IsUAMem16Slow; }		bool isUnalignedMem16Slow() const { return IsUAMem16Slow; }
bool isUnalignedMem32Slow() const { return IsUAMem32Slow; }		bool isUnalignedMem32Slow() const { return IsUAMem32Slow; }
		int getGatherOverhead() const { return GatherOverhead; }
		int getScatterOverhead() const { return ScatterOverhead; }
bool hasSSEUnalignedMem() const { return HasSSEUnalignedMem; }		bool hasSSEUnalignedMem() const { return HasSSEUnalignedMem; }
bool hasCmpxchg16b() const { return HasCmpxchg16b; }		bool hasCmpxchg16b() const { return HasCmpxchg16b; }
bool useLeaForSP() const { return UseLeaForSP; }		bool useLeaForSP() const { return UseLeaForSP; }
bool hasFastPartialYMMorZMMWrite() const {		bool hasFastPartialYMMorZMMWrite() const {
return HasFastPartialYMMorZMMWrite;		return HasFastPartialYMMorZMMWrite;
}		}
bool hasFastScalarFSQRT() const { return HasFastScalarFSQRT; }		bool hasFastScalarFSQRT() const { return HasFastScalarFSQRT; }
bool hasFastVectorFSQRT() const { return HasFastVectorFSQRT; }		bool hasFastVectorFSQRT() const { return HasFastVectorFSQRT; }
Show All 16 Lines	public:
bool hasBWI() const { return HasBWI; }		bool hasBWI() const { return HasBWI; }
bool hasVLX() const { return HasVLX; }		bool hasVLX() const { return HasVLX; }
bool hasPKU() const { return HasPKU; }		bool hasPKU() const { return HasPKU; }
bool hasMPX() const { return HasMPX; }		bool hasMPX() const { return HasMPX; }
bool hasCLFLUSHOPT() const { return HasCLFLUSHOPT; }		bool hasCLFLUSHOPT() const { return HasCLFLUSHOPT; }

bool isXRaySupported() const override { return is64Bit(); }		bool isXRaySupported() const override { return is64Bit(); }

		X86ProcFamilyEnum getProcFamily() const { return X86ProcFamily; }

		/// TODO: to be removed later and replaced with suitable properties
bool isAtom() const { return X86ProcFamily == IntelAtom; }		bool isAtom() const { return X86ProcFamily == IntelAtom; }
bool isSLM() const { return X86ProcFamily == IntelSLM; }		bool isSLM() const { return X86ProcFamily == IntelSLM; }
bool useSoftFloat() const { return UseSoftFloat; }		bool useSoftFloat() const { return UseSoftFloat; }

/// Use mfence if we have SSE2 or we're on x86-64 (even if we asked for		/// Use mfence if we have SSE2 or we're on x86-64 (even if we asked for
/// no-sse2). There isn't any reason to disable it if the target processor		/// no-sse2). There isn't any reason to disable it if the target processor
/// supports it.		/// supports it.
bool hasMFence() const { return hasSSE2() \|\| is64Bit(); }		bool hasMFence() const { return hasSSE2() \|\| is64Bit(); }
▲ Show 20 Lines • Show All 143 Lines • Show Last 20 Lines

lib/Target/X86/X86Subtarget.cpp

Show All 20 Lines
#include "X86TargetMachine.h"		#include "X86TargetMachine.h"
#include "llvm/ADT/Triple.h"		#include "llvm/ADT/Triple.h"
#include "llvm/CodeGen/GlobalISel/CallLowering.h"		#include "llvm/CodeGen/GlobalISel/CallLowering.h"
#include "llvm/CodeGen/GlobalISel/InstructionSelect.h"		#include "llvm/CodeGen/GlobalISel/InstructionSelect.h"
#include "llvm/CodeGen/GlobalISel/Legalizer.h"		#include "llvm/CodeGen/GlobalISel/Legalizer.h"
#include "llvm/CodeGen/GlobalISel/RegBankSelect.h"		#include "llvm/CodeGen/GlobalISel/RegBankSelect.h"
#include "llvm/IR/Attributes.h"		#include "llvm/IR/Attributes.h"
#include "llvm/IR/ConstantRange.h"		#include "llvm/IR/ConstantRange.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
		craig.topperUnsubmitted Not Done Reply Inline Actions Why can't this just be done by comparing the CPU string? What did we get by creating an enum that is named the same as the CPU string? craig.topper: Why can't this just be done by comparing the CPU string? What did we get by creating an enum…
		magabariAuthorUnsubmitted Not Done Reply Inline Actions X86ProcFamily enum was already exist and we have just added more values. In general I think it's better to parse the CPU strings in one place and after that to start using the enum values instead of comparing the strings all the time. magabari: X86ProcFamily enum was already exist and we have just added more values. In general I think…
#include "llvm/IR/GlobalValue.h"		#include "llvm/IR/GlobalValue.h"
#include "llvm/Support/Casting.h"		#include "llvm/Support/Casting.h"
		RKSimonUnsubmitted Not Done Reply Inline Actions You got to all that trouble of adding the other intel cpus and then just need IntelSkylake and IntelSKX? RKSimon: You got to all that trouble of adding the other intel cpus and then just need IntelSkylake and…
		magabariAuthorUnsubmitted Not Done Reply Inline Actions This is only one property. even HSW has Gather and later will add it's overhead (currently we dont recommend generating gathers for HSW). There is other properties that will need to return different values based on the family. so my patch just getting the infrastructure ready for this. I will upload my patch soon, it uses the "getGatherOverhead" property from the subtarget. magabari: This is only one property. even HSW has Gather and later will add it's overhead (currently we…
#include "llvm/Support/CodeGen.h"		#include "llvm/Support/CodeGen.h"
		delenaUnsubmitted Not Done Reply Inline Actions Mohamed, you have to add ScatterOverhead for SKX. delena: Mohamed, you have to add ScatterOverhead for SKX.
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
		filcabUnsubmitted Done Reply Inline Actions `GatherOverhead` is `unsigned`. Why set it to `INT_MAX`? filcab: `GatherOverhead` is `unsigned`. Why set it to `INT_MAX`?
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
		filcabUnsubmitted Not Done Reply Inline Actions This could just be encoded in a `bool ShouldUseGather`/`ShouldAvoidGather` or similar. What does having a non-boolean get us now? Why `2`? I'm ok with getting a `bool` now and changing it to `int`/`unsigned`/`whatever` in the future, but only if we need to. filcab:* This could just be encoded in a `bool ShouldUseGather`/`ShouldAvoidGather` or similar. What…
		delenaUnsubmitted Not Done Reply Inline Actions Gather is available since Haswell (AVX2 set). So technically, we can generate Gathers on all AVX2 processors. But the overhead on HSW is high, Mohamed put int_max, it will be replaced later with another number. Skylake Client processor has faster Gathers than HSW and performance is similar to Skylake Server (AVX-512). The specified overhead is relative to the Load operation. "2" is the number provided by Intel architects, we are already using it to calculate GS cost. I want to say that "GSOverhead = 2" we have today inside X86TTI. The problem, that TTI can't distinguish between HSW and SKL. They both have Gather, but the cost is different. I assume that TTI should provide a cost for Gather, for all AVX2 and AVX-512 processors, but again, it is not one value for all. Vectorizer compares this cost with scalar and interleave variants and chooses the best solution. delena: Gather is available since Haswell (AVX2 set). So technically, we can generate Gathers on all…
#include "llvm/Target/TargetMachine.h"		#include "llvm/Target/TargetMachine.h"
#include <cassert>		#include <cassert>
#include <string>		#include <string>

#if defined(_MSC_VER)		#if defined(_MSC_VER)
#include <intrin.h>		#include <intrin.h>
#endif		#endif

▲ Show 20 Lines • Show All 233 Lines • ▼ Show 20 Lines	void X86Subtarget::initSubtargetFeatures(StringRef CPU, StringRef FS) {

// Stack alignment is 16 bytes on Darwin, Linux, kFreeBSD and Solaris (both		// Stack alignment is 16 bytes on Darwin, Linux, kFreeBSD and Solaris (both
// 32 and 64 bit) and for all 64-bit targets.		// 32 and 64 bit) and for all 64-bit targets.
if (StackAlignOverride)		if (StackAlignOverride)
stackAlignment = StackAlignOverride;		stackAlignment = StackAlignOverride;
else if (isTargetDarwin() \|\| isTargetLinux() \|\| isTargetSolaris() \|\|		else if (isTargetDarwin() \|\| isTargetLinux() \|\| isTargetSolaris() \|\|
isTargetKFreeBSD() \|\| In64BitMode)		isTargetKFreeBSD() \|\| In64BitMode)
stackAlignment = 16;		stackAlignment = 16;

		switch(X86ProcFamily) {
		delenaUnsubmitted Done Reply Inline Actions Gather is available since Haswell (AVX2 set). So technically, we can generate Gathers on all AVX2 processors. But the overhead on HSW is high. Skylake Client processor has faster Gathers than HSW and performance is similar to Skylake Server (AVX-512). The specified overhead is relative to the Load operation. "2" is the number provided by Intel architects, we are already using it to calculate GS cost. if (X86ProcFamily == IntelSkylake \|\| hasAVX512) GatherOverhead = 2; if (hasAVX512) // SKX and KNL fail here ScatterOverhead = 2; delena: // Gather is available since Haswell (AVX2 set). So technically, we can generate Gathers on all…
		case IntelSkylake:
		GatherOverhead = 2;
		ScatterOverhead = 1024; // not relevant for AVX2
		delenaUnsubmitted Done Reply Inline Actions Please remove. delena: Please remove.
		break;
		delenaUnsubmitted Not Done Reply Inline Actions we are already using it to calculate GS cost -> This parameter is used for cost estimation of Gather Op and comparison with other alternatives. delena:* *we are already using it to calculate GS cost -> This parameter is used for cost estimation of…
		case IntelSKX:
		GatherOverhead = 2;
		ScatterOverhead = 2;
		break;
		default:
		// Currently picking high overheads for other targets in order not to be selected
		// TODO: need to get uArch overheads for hsw\bdw
		// FIXME: giving 1024 as a max int because it may overflow in the CM calucation causing a
		// wrong desicion or negative values, maybe need to move to FP?
		if (hasAVX512()) {
		GatherOverhead = 2;
		ScatterOverhead = 2;
		}
		else {
		GatherOverhead = 1024;
		delenaUnsubmitted Done Reply Inline Actions Please remove the "else", the values are already initialized. delena: Please remove the "else", the values are already initialized.
		ScatterOverhead = 1024;
		}
		}
}		}

void X86Subtarget::initializeEnvironment() {		void X86Subtarget::initializeEnvironment() {
X86SSELevel = NoSSE;		X86SSELevel = NoSSE;
X863DNowLevel = NoThreeDNow;		X863DNowLevel = NoThreeDNow;
HasX87 = false;		HasX87 = false;
HasCMov = false;		HasCMov = false;
HasX86_64 = false;		HasX86_64 = false;
▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines	void X86Subtarget::initializeEnvironment() {
LEAUsesAG = false;		LEAUsesAG = false;
SlowLEA = false;		SlowLEA = false;
Slow3OpsLEA = false;		Slow3OpsLEA = false;
SlowIncDec = false;		SlowIncDec = false;
stackAlignment = 4;		stackAlignment = 4;
// FIXME: this is a known good value for Yonah. How about others?		// FIXME: this is a known good value for Yonah. How about others?
MaxInlineSizeThreshold = 128;		MaxInlineSizeThreshold = 128;
UseSoftFloat = false;		UseSoftFloat = false;
		X86ProcFamily = Others;
		GatherOverhead = 1024;
		ScatterOverhead = 1024;
}		}

X86Subtarget &X86Subtarget::initializeSubtargetDependencies(StringRef CPU,		X86Subtarget &X86Subtarget::initializeSubtargetDependencies(StringRef CPU,
StringRef FS) {		StringRef FS) {
initializeEnvironment();		initializeEnvironment();
initSubtargetFeatures(CPU, FS);		initSubtargetFeatures(CPU, FS);
return *this;		return *this;
}		}
▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines