This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
docs/
-
ReleaseNotes.rst
-
include/clang/Driver/
-
clang/
-
Driver/
-
Options.td
-
test/Driver/
-
Driver/
-
x86-target-features.c
-
llvm/
-
docs/
-
ReleaseNotes.rst
-
lib/Target/X86/
-
Target/
-
X86/
-
X86.td
-
X86Subtarget.h
-
X86TargetTransformInfo.h
-
X86VZeroUpper.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
avx-vzeroupper.ll

Differential D69786

[X86] Add support for -mvzeroupper and -mno-vzeroupper to match gcc.
ClosedPublic

Authored by craig.topper on Nov 3 2019, 10:58 PM.

Download Raw Diff

Details

Reviewers

RKSimon
spatel
davezarzycki

Commits

rGb2b6a54f847f: [X86] Add support for -mvzeroupper and -mno-vzeroupper to match gcc

Summary

-mvzeroupper will force the vzeroupper insertion pass to run on
CPUs that normally wouldn't. -mno-vzeroupper disables it on CPUs
where it normally runs.

To support this with the default feature handling in clang, we
need a vzeroupper feature flag in X86.td. Since this flag has
the opposite polarity of the fast-partial-ymm-or-zmm-write we
used to use to disable the pass, we now need to add this new
flag to every CPU except KNL/KNM and BTVER2 to keep identical
behavior.

Remove -fast-partial-ymm-or-zmm-write which is no longer used.

Diff Detail

Repository

rG LLVM Github Monorepo

Build Status

Buildable 40461
Build 40568: arc lint + arc unit

Event Timeline

craig.topper created this revision.Nov 3 2019, 10:58 PM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptNov 3 2019, 10:58 PM

Herald added subscribers: cfe-commits, jfb, hiraditya. · View Herald Transcript

craig.topper added a reviewer: davezarzycki.Nov 3 2019, 10:58 PM

Harbormaster completed remote builds in B40461: Diff 227648.Nov 3 2019, 11:01 PM

LGTM. Thanks! I'll close my differential proposal then.

This revision is now accepted and ready to land.Nov 3 2019, 11:27 PM

Actually, wait, what does it mean for a CPU without AVX to have FeatureInsertVZEROUPPER?

In D69786#1731942, @davezarzycki wrote:

Actually, wait, what does it mean for a CPU without AVX to have FeatureInsertVZEROUPPER?

Means we’ll do vzeroupper insertion if you add -mavx to the command line and ymm is used. At minimum we need to have it set for pentium4, x86-64, core2, and maybe some others since those are default CPUs on some platforms and we want vzeroupper if someone uses a default cpu and adds -mavx.

davezarzycki mentioned this in D69500: [X86] NFC: Convert LLVM command-line flag to target attribute.Nov 3 2019, 11:56 PM

Closed by commit rGb2b6a54f847f: [X86] Add support for -mvzeroupper and -mno-vzeroupper to match gcc (authored by craig.topper). · Explain WhyNov 4 2019, 11:07 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

clang/

docs/

ReleaseNotes.rst

4 lines

include/

clang/

Driver/

Options.td

2 lines

test/

Driver/

x86-target-features.c

5 lines

llvm/

docs/

ReleaseNotes.rst

11 lines

lib/

Target/

X86/

X86.td

107 lines

X86Subtarget.h

10 lines

X86TargetTransformInfo.h

2 lines

X86VZeroUpper.cpp

2 lines

test/

CodeGen/

X86/

avx-vzeroupper.ll

109 lines

Diff 227648

clang/docs/ReleaseNotes.rst

	Show First 20 Lines • Show All 87 Lines • ▼ Show 20 Lines
	New Compiler Flags			New Compiler Flags
	------------------			------------------

	- The -fgnuc-version= flag now controls the value of ``__GNUC__`` and related			- The -fgnuc-version= flag now controls the value of ``__GNUC__`` and related
	macros. This flag does not enable or disable any GCC extensions implemented in			macros. This flag does not enable or disable any GCC extensions implemented in
	Clang. Setting the version to zero causes Clang to leave ``__GNUC__`` and			Clang. Setting the version to zero causes Clang to leave ``__GNUC__`` and
	other GNU-namespaced macros, such as ``__GXX_WEAK__``, undefined.			other GNU-namespaced macros, such as ``__GXX_WEAK__``, undefined.

				- vzeroupper insertion on X86 targets can now be disabled with -mno-vzeroupper.
				You can also force vzeroupper insertion to be used on CPUs that normally
				wouldn't with -mvzeroupper.

	Deprecated Compiler Flags			Deprecated Compiler Flags
	-------------------------			-------------------------

	The following options are deprecated and ignored. They will be removed in			The following options are deprecated and ignored. They will be removed in
	future versions of Clang.			future versions of Clang.

	- -mmpx used to enable the __MPX__ preprocessor define for the Intel MPX			- -mmpx used to enable the __MPX__ preprocessor define for the Intel MPX
	instructions. There were no MPX intrinsics.			instructions. There were no MPX intrinsics.
	▲ Show 20 Lines • Show All 229 Lines • Show Last 20 Lines

clang/include/clang/Driver/Options.td

	Show First 20 Lines • Show All 3,120 Lines • ▼ Show 20 Lines
	def mxsaveopt : Flag<["-"], "mxsaveopt">, Group<m_x86_Features_Group>;			def mxsaveopt : Flag<["-"], "mxsaveopt">, Group<m_x86_Features_Group>;
	def mno_xsaveopt : Flag<["-"], "mno-xsaveopt">, Group<m_x86_Features_Group>;			def mno_xsaveopt : Flag<["-"], "mno-xsaveopt">, Group<m_x86_Features_Group>;
	def mxsaves : Flag<["-"], "mxsaves">, Group<m_x86_Features_Group>;			def mxsaves : Flag<["-"], "mxsaves">, Group<m_x86_Features_Group>;
	def mno_xsaves : Flag<["-"], "mno-xsaves">, Group<m_x86_Features_Group>;			def mno_xsaves : Flag<["-"], "mno-xsaves">, Group<m_x86_Features_Group>;
	def mshstk : Flag<["-"], "mshstk">, Group<m_x86_Features_Group>;			def mshstk : Flag<["-"], "mshstk">, Group<m_x86_Features_Group>;
	def mno_shstk : Flag<["-"], "mno-shstk">, Group<m_x86_Features_Group>;			def mno_shstk : Flag<["-"], "mno-shstk">, Group<m_x86_Features_Group>;
	def mretpoline_external_thunk : Flag<["-"], "mretpoline-external-thunk">, Group<m_x86_Features_Group>;			def mretpoline_external_thunk : Flag<["-"], "mretpoline-external-thunk">, Group<m_x86_Features_Group>;
	def mno_retpoline_external_thunk : Flag<["-"], "mno-retpoline-external-thunk">, Group<m_x86_Features_Group>;			def mno_retpoline_external_thunk : Flag<["-"], "mno-retpoline-external-thunk">, Group<m_x86_Features_Group>;
				def mvzeroupper : Flag<["-"], "mvzeroupper">, Group<m_x86_Features_Group>;
				def mno_vzeroupper : Flag<["-"], "mno-vzeroupper">, Group<m_x86_Features_Group>;

	// These are legacy user-facing driver-level option spellings. They are always			// These are legacy user-facing driver-level option spellings. They are always
	// aliases for options that are spelled using the more common Unix / GNU flag			// aliases for options that are spelled using the more common Unix / GNU flag
	// style of double-dash and equals-joined flags.			// style of double-dash and equals-joined flags.
	def gcc_toolchain_legacy_spelling : Separate<["-"], "gcc-toolchain">, Alias<gcc_toolchain>;			def gcc_toolchain_legacy_spelling : Separate<["-"], "gcc-toolchain">, Alias<gcc_toolchain>;
	def target_legacy_spelling : Separate<["-"], "target">, Alias<target>;			def target_legacy_spelling : Separate<["-"], "target">, Alias<target>;

	// Special internal option to handle -Xlinker --no-demangle.			// Special internal option to handle -Xlinker --no-demangle.
	▲ Show 20 Lines • Show All 193 Lines • Show Last 20 Lines

clang/test/Driver/x86-target-features.c

	Show First 20 Lines • Show All 187 Lines • ▼ Show 20 Lines
	// RUN: %clang -target i386-unknown-linux-gnu -march=i386 -mno-avx512bf16 %s -### -o %t.o 2>&1 \| FileCheck -check-prefix=NO-AVX512BF16 %s			// RUN: %clang -target i386-unknown-linux-gnu -march=i386 -mno-avx512bf16 %s -### -o %t.o 2>&1 \| FileCheck -check-prefix=NO-AVX512BF16 %s
	// AVX512BF16: "-target-feature" "+avx512bf16"			// AVX512BF16: "-target-feature" "+avx512bf16"
	// NO-AVX512BF16: "-target-feature" "-avx512bf16"			// NO-AVX512BF16: "-target-feature" "-avx512bf16"

	// RUN: %clang -target i386-unknown-linux-gnu -march=i386 -menqcmd %s -### -o %t.o 2>&1 \| FileCheck --check-prefix=ENQCMD %s			// RUN: %clang -target i386-unknown-linux-gnu -march=i386 -menqcmd %s -### -o %t.o 2>&1 \| FileCheck --check-prefix=ENQCMD %s
	// RUN: %clang -target i386-unknown-linux-gnu -march=i386 -mno-enqcmd %s -### -o %t.o 2>&1 \| FileCheck --check-prefix=NO-ENQCMD %s			// RUN: %clang -target i386-unknown-linux-gnu -march=i386 -mno-enqcmd %s -### -o %t.o 2>&1 \| FileCheck --check-prefix=NO-ENQCMD %s
	// ENQCMD: "-target-feature" "+enqcmd"			// ENQCMD: "-target-feature" "+enqcmd"
	// NO-ENQCMD: "-target-feature" "-enqcmd"			// NO-ENQCMD: "-target-feature" "-enqcmd"

				// RUN: %clang -target i386-unknown-linux-gnu -march=i386 -mvzeroupper %s -### -o %t.o 2>&1 \| FileCheck --check-prefix=VZEROUPPER %s
				// RUN: %clang -target i386-unknown-linux-gnu -march=i386 -mno-vzeroupper %s -### -o %t.o 2>&1 \| FileCheck --check-prefix=NO-VZEROUPPER %s
				// VZEROUPPER: "-target-feature" "+vzeroupper"
				// NO-VZEROUPPER: "-target-feature" "-vzeroupper"

llvm/docs/ReleaseNotes.rst

	Show First 20 Lines • Show All 128 Lines • ▼ Show 20 Lines
	* v32i8 and v64i8 vectors with AVX512F enabled, but AVX512BW disabled will now			* v32i8 and v64i8 vectors with AVX512F enabled, but AVX512BW disabled will now
	be passed in ZMM registers for calls and returns. Previously they were passed			be passed in ZMM registers for calls and returns. Previously they were passed
	in two YMM registers. Old behavior can be enabled by passing			in two YMM registers. Old behavior can be enabled by passing
	-x86-enable-old-knl-abi			-x86-enable-old-knl-abi
	* -mprefer-vector-width=256 is now the default behavior skylake-avx512 and later			* -mprefer-vector-width=256 is now the default behavior skylake-avx512 and later
	Intel CPUs. This tries to limit the use of 512-bit registers which can cause a			Intel CPUs. This tries to limit the use of 512-bit registers which can cause a
	decrease in CPU frequency on these CPUs. This can be re-enabled by passing			decrease in CPU frequency on these CPUs. This can be re-enabled by passing
	-mprefer-vector-width=512 to clang or passing -mattr=-prefer-256-bit to llc.			-mprefer-vector-width=512 to clang or passing -mattr=-prefer-256-bit to llc.
				* Deprecated the mpx feature flag for the Intel MPX instructions. There were no
				intrinsics for this feature. This change only this effects the results
				returned by getHostCPUFeatures on CPUs that implement the MPX instructions.
				* The feature flag fast-partial-ymm-or-zmm-write which previously disabled
				vzeroupper insertion has been removed. It has been replaced with a vzeroupper
				feature flag which has the opposite polarity. So -vzeroupper has the same
				effect as +fast-partial-ymm-or-zmm-write.

	Changes to the AMDGPU Target			Changes to the AMDGPU Target
	-----------------------------			-----------------------------

	Changes to the AVR Target			Changes to the AVR Target
	-----------------------------			-----------------------------

	During this release ...			During this release ...

	* Deprecated the mpx feature flag for the Intel MPX instructions. There were no
	intrinsics for this feature. This change only this effects the results
	returned by getHostCPUFeatures on CPUs that implement the MPX instructions.

	Changes to the WebAssembly Target			Changes to the WebAssembly Target
	---------------------------------			---------------------------------

	During this release ...			During this release ...


	Changes to the OCaml bindings			Changes to the OCaml bindings
	-----------------------------			-----------------------------
	Show All 31 Lines

llvm/lib/Target/X86/X86.td

Show First 20 Lines • Show All 298 Lines • ▼ Show 20 Lines
def FeaturePCONFIG : SubtargetFeature<"pconfig", "HasPCONFIG", "true",		def FeaturePCONFIG : SubtargetFeature<"pconfig", "HasPCONFIG", "true",
"platform configuration instruction">;		"platform configuration instruction">;
// On recent X86 (port bound) processors, its preferable to combine to a single shuffle		// On recent X86 (port bound) processors, its preferable to combine to a single shuffle
// using a variable mask over multiple fixed shuffles.		// using a variable mask over multiple fixed shuffles.
def FeatureFastVariableShuffle		def FeatureFastVariableShuffle
: SubtargetFeature<"fast-variable-shuffle",		: SubtargetFeature<"fast-variable-shuffle",
"HasFastVariableShuffle",		"HasFastVariableShuffle",
"true", "Shuffles with variable masks are fast">;		"true", "Shuffles with variable masks are fast">;
// On some X86 processors, there is no performance hazard to writing only the		// On some X86 processors, a vzeroupper instruction should be inserted after
// lower parts of a YMM or ZMM register without clearing the upper part.		// using ymm/zmm registers before executing code that may use SSE instructions.
def FeatureFastPartialYMMorZMMWrite		def FeatureInsertVZEROUPPER
: SubtargetFeature<"fast-partial-ymm-or-zmm-write",		: SubtargetFeature<"vzeroupper",
"HasFastPartialYMMorZMMWrite",		"InsertVZEROUPPER",
"true", "Partial writes to YMM/ZMM registers are fast">;		"true", "Should insert vzeroupper instructions">;
// FeatureFastScalarFSQRT should be enabled if scalar FSQRT has shorter latency		// FeatureFastScalarFSQRT should be enabled if scalar FSQRT has shorter latency
// than the corresponding NR code. FeatureFastVectorFSQRT should be enabled if		// than the corresponding NR code. FeatureFastVectorFSQRT should be enabled if
// vector FSQRT has higher throughput than the corresponding NR code.		// vector FSQRT has higher throughput than the corresponding NR code.
// The idea is that throughput bound code is likely to be vectorized, so for		// The idea is that throughput bound code is likely to be vectorized, so for
// vectorized code we should care about the throughput of SQRT operations.		// vectorized code we should care about the throughput of SQRT operations.
// But if the code is scalar that probably means that the code has some kind of		// But if the code is scalar that probably means that the code has some kind of
// dependency and we should care more about reducing the latency.		// dependency and we should care more about reducing the latency.
def FeatureFastScalarFSQRT		def FeatureFastScalarFSQRT
▲ Show 20 Lines • Show All 199 Lines • ▼ Show 20 Lines	list<SubtargetFeature> NHMInheritableFeatures = [FeatureX87,
FeatureMMX,		FeatureMMX,
FeatureSSE42,		FeatureSSE42,
FeatureFXSR,		FeatureFXSR,
FeatureNOPL,		FeatureNOPL,
Feature64Bit,		Feature64Bit,
FeatureCMPXCHG16B,		FeatureCMPXCHG16B,
FeaturePOPCNT,		FeaturePOPCNT,
FeatureLAHFSAHF,		FeatureLAHFSAHF,
FeatureMacroFusion];		FeatureMacroFusion,
		FeatureInsertVZEROUPPER];
list<SubtargetFeature> NHMSpecificFeatures = [];		list<SubtargetFeature> NHMSpecificFeatures = [];
list<SubtargetFeature> NHMFeatures =		list<SubtargetFeature> NHMFeatures =
!listconcat(NHMInheritableFeatures, NHMSpecificFeatures);		!listconcat(NHMInheritableFeatures, NHMSpecificFeatures);

// Westmere		// Westmere
list<SubtargetFeature> WSMAdditionalFeatures = [FeaturePCLMUL];		list<SubtargetFeature> WSMAdditionalFeatures = [FeaturePCLMUL];
list<SubtargetFeature> WSMSpecificFeatures = [];		list<SubtargetFeature> WSMSpecificFeatures = [];
list<SubtargetFeature> WSMInheritableFeatures =		list<SubtargetFeature> WSMInheritableFeatures =
▲ Show 20 Lines • Show All 163 Lines • ▼ Show 20 Lines	list<SubtargetFeature> AtomInheritableFeatures = [FeatureX87,
FeatureMMX,		FeatureMMX,
FeatureSSSE3,		FeatureSSSE3,
FeatureFXSR,		FeatureFXSR,
FeatureNOPL,		FeatureNOPL,
Feature64Bit,		Feature64Bit,
FeatureCMPXCHG16B,		FeatureCMPXCHG16B,
FeatureMOVBE,		FeatureMOVBE,
FeatureSlowTwoMemOps,		FeatureSlowTwoMemOps,
FeatureLAHFSAHF];		FeatureLAHFSAHF,
		FeatureInsertVZEROUPPER];
list<SubtargetFeature> AtomSpecificFeatures = [ProcIntelAtom,		list<SubtargetFeature> AtomSpecificFeatures = [ProcIntelAtom,
FeatureSlowUAMem16,		FeatureSlowUAMem16,
FeatureLEAForSP,		FeatureLEAForSP,
FeatureSlowDivide32,		FeatureSlowDivide32,
FeatureSlowDivide64,		FeatureSlowDivide64,
FeatureLEAUsesAG,		FeatureLEAUsesAG,
FeaturePadShortFunctions];		FeaturePadShortFunctions];
list<SubtargetFeature> AtomFeatures =		list<SubtargetFeature> AtomFeatures =
▲ Show 20 Lines • Show All 85 Lines • ▼ Show 20 Lines	list<SubtargetFeature> KNLFeatures = [FeatureX87,
FeatureMOVBE,		FeatureMOVBE,
FeatureLZCNT,		FeatureLZCNT,
FeatureBMI,		FeatureBMI,
FeatureBMI2,		FeatureBMI2,
FeatureFMA,		FeatureFMA,
FeaturePRFCHW,		FeaturePRFCHW,
FeaturePreferMaskRegisters,		FeaturePreferMaskRegisters,
FeatureSlowTwoMemOps,		FeatureSlowTwoMemOps,
FeatureFastPartialYMMorZMMWrite,
FeatureHasFastGather,		FeatureHasFastGather,
FeatureSlowPMADDWD];		FeatureSlowPMADDWD];
// TODO Add AVX5124FMAPS/AVX5124VNNIW features		// TODO Add AVX5124FMAPS/AVX5124VNNIW features
list<SubtargetFeature> KNMFeatures =		list<SubtargetFeature> KNMFeatures =
!listconcat(KNLFeatures, [FeatureVPOPCNTDQ]);		!listconcat(KNLFeatures, [FeatureVPOPCNTDQ]);

// Barcelona		// Barcelona
list<SubtargetFeature> BarcelonaInheritableFeatures = [FeatureX87,		list<SubtargetFeature> BarcelonaInheritableFeatures = [FeatureX87,
FeatureCMPXCHG8B,		FeatureCMPXCHG8B,
FeatureSSE4A,		FeatureSSE4A,
Feature3DNowA,		Feature3DNowA,
FeatureFXSR,		FeatureFXSR,
FeatureNOPL,		FeatureNOPL,
FeatureCMPXCHG16B,		FeatureCMPXCHG16B,
FeatureLZCNT,		FeatureLZCNT,
FeaturePOPCNT,		FeaturePOPCNT,
FeatureSlowSHLD,		FeatureSlowSHLD,
FeatureLAHFSAHF,		FeatureLAHFSAHF,
FeatureCMOV,		FeatureCMOV,
Feature64Bit,		Feature64Bit,
FeatureFastScalarShiftMasks];		FeatureFastScalarShiftMasks,
		FeatureInsertVZEROUPPER];
list<SubtargetFeature> BarcelonaFeatures = BarcelonaInheritableFeatures;		list<SubtargetFeature> BarcelonaFeatures = BarcelonaInheritableFeatures;

// Bobcat		// Bobcat
list<SubtargetFeature> BtVer1InheritableFeatures = [FeatureX87,		list<SubtargetFeature> BtVer1InheritableFeatures = [FeatureX87,
FeatureCMPXCHG8B,		FeatureCMPXCHG8B,
FeatureCMOV,		FeatureCMOV,
FeatureMMX,		FeatureMMX,
FeatureSSSE3,		FeatureSSSE3,
FeatureSSE4A,		FeatureSSE4A,
FeatureFXSR,		FeatureFXSR,
FeatureNOPL,		FeatureNOPL,
Feature64Bit,		Feature64Bit,
FeatureCMPXCHG16B,		FeatureCMPXCHG16B,
FeaturePRFCHW,		FeaturePRFCHW,
FeatureLZCNT,		FeatureLZCNT,
FeaturePOPCNT,		FeaturePOPCNT,
FeatureSlowSHLD,		FeatureSlowSHLD,
FeatureLAHFSAHF,		FeatureLAHFSAHF,
FeatureFast15ByteNOP,		FeatureFast15ByteNOP,
FeatureFastScalarShiftMasks,		FeatureFastScalarShiftMasks,
FeatureFastVectorShiftMasks];		FeatureFastVectorShiftMasks];
list<SubtargetFeature> BtVer1Features = BtVer1InheritableFeatures;		list<SubtargetFeature> BtVer1SpecificFeatures = [FeatureInsertVZEROUPPER];
		list<SubtargetFeature> BtVer1Features =
		!listconcat(BtVer1InheritableFeatures, BtVer1SpecificFeatures);

// Jaguar		// Jaguar
list<SubtargetFeature> BtVer2AdditionalFeatures = [FeatureAVX,		list<SubtargetFeature> BtVer2AdditionalFeatures = [FeatureAVX,
FeatureAES,		FeatureAES,
FeaturePCLMUL,		FeaturePCLMUL,
FeatureBMI,		FeatureBMI,
FeatureF16C,		FeatureF16C,
FeatureMOVBE,		FeatureMOVBE,
FeatureXSAVE,		FeatureXSAVE,
FeatureXSAVEOPT];		FeatureXSAVEOPT];
list<SubtargetFeature> BtVer2SpecificFeatures = [FeatureFastLZCNT,		list<SubtargetFeature> BtVer2SpecificFeatures = [FeatureFastLZCNT,
FeatureFastBEXTR,		FeatureFastBEXTR,
FeatureFastPartialYMMorZMMWrite,
FeatureFastHorizontalOps];		FeatureFastHorizontalOps];
list<SubtargetFeature> BtVer2InheritableFeatures =		list<SubtargetFeature> BtVer2InheritableFeatures =
!listconcat(BtVer1InheritableFeatures, BtVer2AdditionalFeatures);		!listconcat(BtVer1InheritableFeatures, BtVer2AdditionalFeatures);
list<SubtargetFeature> BtVer2Features =		list<SubtargetFeature> BtVer2Features =
!listconcat(BtVer2InheritableFeatures, BtVer2SpecificFeatures);		!listconcat(BtVer2InheritableFeatures, BtVer2SpecificFeatures);

// Bulldozer		// Bulldozer
list<SubtargetFeature> BdVer1InheritableFeatures = [FeatureX87,		list<SubtargetFeature> BdVer1InheritableFeatures = [FeatureX87,
Show All 11 Lines	list<SubtargetFeature> BdVer1InheritableFeatures = [FeatureX87,
FeatureLZCNT,		FeatureLZCNT,
FeaturePOPCNT,		FeaturePOPCNT,
FeatureXSAVE,		FeatureXSAVE,
FeatureLWP,		FeatureLWP,
FeatureSlowSHLD,		FeatureSlowSHLD,
FeatureLAHFSAHF,		FeatureLAHFSAHF,
FeatureFast11ByteNOP,		FeatureFast11ByteNOP,
FeatureFastScalarShiftMasks,		FeatureFastScalarShiftMasks,
FeatureBranchFusion];		FeatureBranchFusion,
		FeatureInsertVZEROUPPER];
list<SubtargetFeature> BdVer1Features = BdVer1InheritableFeatures;		list<SubtargetFeature> BdVer1Features = BdVer1InheritableFeatures;

// PileDriver		// PileDriver
list<SubtargetFeature> BdVer2AdditionalFeatures = [FeatureF16C,		list<SubtargetFeature> BdVer2AdditionalFeatures = [FeatureF16C,
FeatureBMI,		FeatureBMI,
FeatureTBM,		FeatureTBM,
FeatureFMA,		FeatureFMA,
FeatureFastBEXTR];		FeatureFastBEXTR];
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	list<SubtargetFeature> ZNFeatures = [FeatureADX,
FeaturePCLMUL,		FeaturePCLMUL,
FeaturePOPCNT,		FeaturePOPCNT,
FeaturePRFCHW,		FeaturePRFCHW,
FeatureRDRAND,		FeatureRDRAND,
FeatureRDSEED,		FeatureRDSEED,
FeatureSHA,		FeatureSHA,
FeatureSSE4A,		FeatureSSE4A,
FeatureSlowSHLD,		FeatureSlowSHLD,
		FeatureInsertVZEROUPPER,
FeatureX87,		FeatureX87,
FeatureXSAVE,		FeatureXSAVE,
FeatureXSAVEC,		FeatureXSAVEC,
FeatureXSAVEOPT,		FeatureXSAVEOPT,
FeatureXSAVES];		FeatureXSAVES];
list<SubtargetFeature> ZN2AdditionalFeatures = [FeatureCLWB,		list<SubtargetFeature> ZN2AdditionalFeatures = [FeatureCLWB,
FeatureRDPID,		FeatureRDPID,
FeatureWBNOINVD];		FeatureWBNOINVD];
list<SubtargetFeature> ZN2Features =		list<SubtargetFeature> ZN2Features =
!listconcat(ZNFeatures, ZN2AdditionalFeatures);		!listconcat(ZNFeatures, ZN2AdditionalFeatures);
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// X86 processors supported.		// X86 processors supported.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

class Proc<string Name, list<SubtargetFeature> Features>		class Proc<string Name, list<SubtargetFeature> Features>
: ProcessorModel<Name, GenericModel, Features>;		: ProcessorModel<Name, GenericModel, Features>;

// NOTE: CMPXCHG8B is here for legacy compatbility so that it is only disabled		// NOTE: CMPXCHG8B is here for legacy compatbility so that it is only disabled
// if i386/i486 is specifically requested.		// if i386/i486 is specifically requested.
def : Proc<"generic", [FeatureX87, FeatureSlowUAMem16,		def : Proc<"generic", [FeatureX87, FeatureSlowUAMem16,
FeatureCMPXCHG8B]>;		FeatureCMPXCHG8B, FeatureInsertVZEROUPPER]>;
def : Proc<"i386", [FeatureX87, FeatureSlowUAMem16]>;		def : Proc<"i386", [FeatureX87, FeatureSlowUAMem16,
def : Proc<"i486", [FeatureX87, FeatureSlowUAMem16]>;		FeatureInsertVZEROUPPER]>;
		def : Proc<"i486", [FeatureX87, FeatureSlowUAMem16,
		FeatureInsertVZEROUPPER]>;
def : Proc<"i586", [FeatureX87, FeatureSlowUAMem16,		def : Proc<"i586", [FeatureX87, FeatureSlowUAMem16,
FeatureCMPXCHG8B]>;		FeatureCMPXCHG8B, FeatureInsertVZEROUPPER]>;
def : Proc<"pentium", [FeatureX87, FeatureSlowUAMem16,		def : Proc<"pentium", [FeatureX87, FeatureSlowUAMem16,
FeatureCMPXCHG8B]>;		FeatureCMPXCHG8B, FeatureInsertVZEROUPPER]>;
def : Proc<"pentium-mmx", [FeatureX87, FeatureSlowUAMem16,		def : Proc<"pentium-mmx", [FeatureX87, FeatureSlowUAMem16,
FeatureCMPXCHG8B, FeatureMMX]>;		FeatureCMPXCHG8B, FeatureMMX,
		FeatureInsertVZEROUPPER]>;

def : Proc<"i686", [FeatureX87, FeatureSlowUAMem16, FeatureCMPXCHG8B,		def : Proc<"i686", [FeatureX87, FeatureSlowUAMem16, FeatureCMPXCHG8B,
FeatureCMOV]>;		FeatureCMOV, FeatureInsertVZEROUPPER]>;
def : Proc<"pentiumpro", [FeatureX87, FeatureSlowUAMem16, FeatureCMPXCHG8B,		def : Proc<"pentiumpro", [FeatureX87, FeatureSlowUAMem16, FeatureCMPXCHG8B,
FeatureCMOV, FeatureNOPL]>;		FeatureCMOV, FeatureNOPL, FeatureInsertVZEROUPPER]>;

def : Proc<"pentium2", [FeatureX87, FeatureSlowUAMem16, FeatureCMPXCHG8B,		def : Proc<"pentium2", [FeatureX87, FeatureSlowUAMem16, FeatureCMPXCHG8B,
FeatureMMX, FeatureCMOV, FeatureFXSR,		FeatureMMX, FeatureCMOV, FeatureFXSR,
FeatureNOPL]>;		FeatureNOPL, FeatureInsertVZEROUPPER]>;

foreach P = ["pentium3", "pentium3m"] in {		foreach P = ["pentium3", "pentium3m"] in {
def : Proc<P, [FeatureX87, FeatureSlowUAMem16, FeatureCMPXCHG8B,FeatureMMX,		def : Proc<P, [FeatureX87, FeatureSlowUAMem16, FeatureCMPXCHG8B,FeatureMMX,
FeatureSSE1, FeatureFXSR, FeatureNOPL, FeatureCMOV]>;		FeatureSSE1, FeatureFXSR, FeatureNOPL, FeatureCMOV,
		FeatureInsertVZEROUPPER]>;
}		}

// Enable the PostRAScheduler for SSE2 and SSE3 class cpus.		// Enable the PostRAScheduler for SSE2 and SSE3 class cpus.
// The intent is to enable it for pentium4 which is the current default		// The intent is to enable it for pentium4 which is the current default
// processor in a vanilla 32-bit clang compilation when no specific		// processor in a vanilla 32-bit clang compilation when no specific
// architecture is specified. This generally gives a nice performance		// architecture is specified. This generally gives a nice performance
// increase on silvermont, with largely neutral behavior on other		// increase on silvermont, with largely neutral behavior on other
// contemporary large core processors.		// contemporary large core processors.
// pentium-m, pentium4m, prescott and nocona are included as a preventative		// pentium-m, pentium4m, prescott and nocona are included as a preventative
// measure to avoid performance surprises, in case clang's default cpu		// measure to avoid performance surprises, in case clang's default cpu
// changes slightly.		// changes slightly.

def : ProcessorModel<"pentium-m", GenericPostRAModel,		def : ProcessorModel<"pentium-m", GenericPostRAModel,
[FeatureX87, FeatureSlowUAMem16, FeatureCMPXCHG8B,		[FeatureX87, FeatureSlowUAMem16, FeatureCMPXCHG8B,
FeatureMMX, FeatureSSE2, FeatureFXSR, FeatureNOPL,		FeatureMMX, FeatureSSE2, FeatureFXSR, FeatureNOPL,
FeatureCMOV]>;		FeatureCMOV, FeatureInsertVZEROUPPER]>;

foreach P = ["pentium4", "pentium4m"] in {		foreach P = ["pentium4", "pentium4m"] in {
def : ProcessorModel<P, GenericPostRAModel,		def : ProcessorModel<P, GenericPostRAModel,
[FeatureX87, FeatureSlowUAMem16, FeatureCMPXCHG8B,		[FeatureX87, FeatureSlowUAMem16, FeatureCMPXCHG8B,
FeatureMMX, FeatureSSE2, FeatureFXSR, FeatureNOPL,		FeatureMMX, FeatureSSE2, FeatureFXSR, FeatureNOPL,
FeatureCMOV]>;		FeatureCMOV, FeatureInsertVZEROUPPER]>;
}		}

// Intel Quark.		// Intel Quark.
def : Proc<"lakemont", []>;		def : Proc<"lakemont", [FeatureInsertVZEROUPPER]>;

// Intel Core Duo.		// Intel Core Duo.
def : ProcessorModel<"yonah", SandyBridgeModel,		def : ProcessorModel<"yonah", SandyBridgeModel,
[FeatureX87, FeatureSlowUAMem16, FeatureCMPXCHG8B,		[FeatureX87, FeatureSlowUAMem16, FeatureCMPXCHG8B,
FeatureMMX, FeatureSSE3, FeatureFXSR, FeatureNOPL,		FeatureMMX, FeatureSSE3, FeatureFXSR, FeatureNOPL,
FeatureCMOV]>;		FeatureCMOV, FeatureInsertVZEROUPPER]>;

// NetBurst.		// NetBurst.
def : ProcessorModel<"prescott", GenericPostRAModel,		def : ProcessorModel<"prescott", GenericPostRAModel,
[FeatureX87, FeatureSlowUAMem16, FeatureCMPXCHG8B,		[FeatureX87, FeatureSlowUAMem16, FeatureCMPXCHG8B,
FeatureMMX, FeatureSSE3, FeatureFXSR, FeatureNOPL,		FeatureMMX, FeatureSSE3, FeatureFXSR, FeatureNOPL,
FeatureCMOV]>;		FeatureCMOV, FeatureInsertVZEROUPPER]>;
def : ProcessorModel<"nocona", GenericPostRAModel, [		def : ProcessorModel<"nocona", GenericPostRAModel, [
FeatureX87,		FeatureX87,
FeatureSlowUAMem16,		FeatureSlowUAMem16,
FeatureCMPXCHG8B,		FeatureCMPXCHG8B,
FeatureCMOV,		FeatureCMOV,
FeatureMMX,		FeatureMMX,
FeatureSSE3,		FeatureSSE3,
FeatureFXSR,		FeatureFXSR,
FeatureNOPL,		FeatureNOPL,
Feature64Bit,		Feature64Bit,
FeatureCMPXCHG16B		FeatureCMPXCHG16B,
		FeatureInsertVZEROUPPER
]>;		]>;

// Intel Core 2 Solo/Duo.		// Intel Core 2 Solo/Duo.
def : ProcessorModel<"core2", SandyBridgeModel, [		def : ProcessorModel<"core2", SandyBridgeModel, [
FeatureX87,		FeatureX87,
FeatureSlowUAMem16,		FeatureSlowUAMem16,
FeatureCMPXCHG8B,		FeatureCMPXCHG8B,
FeatureCMOV,		FeatureCMOV,
FeatureMMX,		FeatureMMX,
FeatureSSSE3,		FeatureSSSE3,
FeatureFXSR,		FeatureFXSR,
FeatureNOPL,		FeatureNOPL,
Feature64Bit,		Feature64Bit,
FeatureCMPXCHG16B,		FeatureCMPXCHG16B,
FeatureLAHFSAHF,		FeatureLAHFSAHF,
FeatureMacroFusion		FeatureMacroFusion,
		FeatureInsertVZEROUPPER
]>;		]>;
def : ProcessorModel<"penryn", SandyBridgeModel, [		def : ProcessorModel<"penryn", SandyBridgeModel, [
FeatureX87,		FeatureX87,
FeatureSlowUAMem16,		FeatureSlowUAMem16,
FeatureCMPXCHG8B,		FeatureCMPXCHG8B,
FeatureCMOV,		FeatureCMOV,
FeatureMMX,		FeatureMMX,
FeatureSSE41,		FeatureSSE41,
FeatureFXSR,		FeatureFXSR,
FeatureNOPL,		FeatureNOPL,
Feature64Bit,		Feature64Bit,
FeatureCMPXCHG16B,		FeatureCMPXCHG16B,
FeatureLAHFSAHF,		FeatureLAHFSAHF,
FeatureMacroFusion		FeatureMacroFusion,
		FeatureInsertVZEROUPPER
]>;		]>;

// Atom CPUs.		// Atom CPUs.
foreach P = ["bonnell", "atom"] in {		foreach P = ["bonnell", "atom"] in {
def : ProcessorModel<P, AtomModel, ProcessorFeatures.AtomFeatures>;		def : ProcessorModel<P, AtomModel, ProcessorFeatures.AtomFeatures>;
}		}

foreach P = ["silvermont", "slm"] in {		foreach P = ["silvermont", "slm"] in {
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
def : ProcessorModel<"icelake-server", SkylakeServerModel,		def : ProcessorModel<"icelake-server", SkylakeServerModel,
ProcessorFeatures.ICXFeatures>;		ProcessorFeatures.ICXFeatures>;
def : ProcessorModel<"tigerlake", SkylakeServerModel,		def : ProcessorModel<"tigerlake", SkylakeServerModel,
ProcessorFeatures.TGLFeatures>;		ProcessorFeatures.TGLFeatures>;

// AMD CPUs.		// AMD CPUs.

def : Proc<"k6", [FeatureX87, FeatureSlowUAMem16, FeatureCMPXCHG8B,		def : Proc<"k6", [FeatureX87, FeatureSlowUAMem16, FeatureCMPXCHG8B,
FeatureMMX]>;		FeatureMMX, FeatureInsertVZEROUPPER]>;
def : Proc<"k6-2", [FeatureX87, FeatureSlowUAMem16, FeatureCMPXCHG8B,		def : Proc<"k6-2", [FeatureX87, FeatureSlowUAMem16, FeatureCMPXCHG8B,
Feature3DNow]>;		Feature3DNow, FeatureInsertVZEROUPPER]>;
def : Proc<"k6-3", [FeatureX87, FeatureSlowUAMem16, FeatureCMPXCHG8B,		def : Proc<"k6-3", [FeatureX87, FeatureSlowUAMem16, FeatureCMPXCHG8B,
Feature3DNow]>;		Feature3DNow, FeatureInsertVZEROUPPER]>;

foreach P = ["athlon", "athlon-tbird"] in {		foreach P = ["athlon", "athlon-tbird"] in {
def : Proc<P, [FeatureX87, FeatureSlowUAMem16, FeatureCMPXCHG8B, FeatureCMOV,		def : Proc<P, [FeatureX87, FeatureSlowUAMem16, FeatureCMPXCHG8B, FeatureCMOV,
Feature3DNowA, FeatureNOPL, FeatureSlowSHLD]>;		Feature3DNowA, FeatureNOPL, FeatureSlowSHLD,
		FeatureInsertVZEROUPPER]>;
}		}

foreach P = ["athlon-4", "athlon-xp", "athlon-mp"] in {		foreach P = ["athlon-4", "athlon-xp", "athlon-mp"] in {
def : Proc<P, [FeatureX87, FeatureSlowUAMem16, FeatureCMPXCHG8B, FeatureCMOV,		def : Proc<P, [FeatureX87, FeatureSlowUAMem16, FeatureCMPXCHG8B, FeatureCMOV,
FeatureSSE1, Feature3DNowA, FeatureFXSR, FeatureNOPL,		FeatureSSE1, Feature3DNowA, FeatureFXSR, FeatureNOPL,
FeatureSlowSHLD]>;		FeatureSlowSHLD, FeatureInsertVZEROUPPER]>;
}		}

foreach P = ["k8", "opteron", "athlon64", "athlon-fx"] in {		foreach P = ["k8", "opteron", "athlon64", "athlon-fx"] in {
def : Proc<P, [FeatureX87, FeatureSlowUAMem16, FeatureCMPXCHG8B,		def : Proc<P, [FeatureX87, FeatureSlowUAMem16, FeatureCMPXCHG8B,
FeatureSSE2, Feature3DNowA, FeatureFXSR, FeatureNOPL,		FeatureSSE2, Feature3DNowA, FeatureFXSR, FeatureNOPL,
Feature64Bit, FeatureSlowSHLD, FeatureCMOV,		Feature64Bit, FeatureSlowSHLD, FeatureCMOV,
FeatureFastScalarShiftMasks]>;		FeatureFastScalarShiftMasks, FeatureInsertVZEROUPPER]>;
}		}

foreach P = ["k8-sse3", "opteron-sse3", "athlon64-sse3"] in {		foreach P = ["k8-sse3", "opteron-sse3", "athlon64-sse3"] in {
def : Proc<P, [FeatureX87, FeatureSlowUAMem16, FeatureCMPXCHG8B, FeatureSSE3,		def : Proc<P, [FeatureX87, FeatureSlowUAMem16, FeatureCMPXCHG8B, FeatureSSE3,
Feature3DNowA, FeatureFXSR, FeatureNOPL, FeatureCMPXCHG16B,		Feature3DNowA, FeatureFXSR, FeatureNOPL, FeatureCMPXCHG16B,
FeatureSlowSHLD, FeatureCMOV, Feature64Bit,		FeatureSlowSHLD, FeatureCMOV, Feature64Bit,
FeatureFastScalarShiftMasks]>;		FeatureFastScalarShiftMasks, FeatureInsertVZEROUPPER]>;
}		}

foreach P = ["amdfam10", "barcelona"] in {		foreach P = ["amdfam10", "barcelona"] in {
def : Proc<P, ProcessorFeatures.BarcelonaFeatures>;		def : Proc<P, ProcessorFeatures.BarcelonaFeatures>;
}		}

// Bobcat		// Bobcat
def : Proc<"btver1", ProcessorFeatures.BtVer1Features>;		def : Proc<"btver1", ProcessorFeatures.BtVer1Features>;
// Jaguar		// Jaguar
def : ProcessorModel<"btver2", BtVer2Model, ProcessorFeatures.BtVer2Features>;		def : ProcessorModel<"btver2", BtVer2Model, ProcessorFeatures.BtVer2Features>;

// Bulldozer		// Bulldozer
def : ProcessorModel<"bdver1", BdVer2Model, ProcessorFeatures.BdVer1Features>;		def : ProcessorModel<"bdver1", BdVer2Model, ProcessorFeatures.BdVer1Features>;
// Piledriver		// Piledriver
def : ProcessorModel<"bdver2", BdVer2Model, ProcessorFeatures.BdVer2Features>;		def : ProcessorModel<"bdver2", BdVer2Model, ProcessorFeatures.BdVer2Features>;
// Steamroller		// Steamroller
def : Proc<"bdver3", ProcessorFeatures.BdVer3Features>;		def : Proc<"bdver3", ProcessorFeatures.BdVer3Features>;
// Excavator		// Excavator
def : Proc<"bdver4", ProcessorFeatures.BdVer4Features>;		def : Proc<"bdver4", ProcessorFeatures.BdVer4Features>;

def : ProcessorModel<"znver1", Znver1Model, ProcessorFeatures.ZNFeatures>;		def : ProcessorModel<"znver1", Znver1Model, ProcessorFeatures.ZNFeatures>;
def : ProcessorModel<"znver2", Znver1Model, ProcessorFeatures.ZN2Features>;		def : ProcessorModel<"znver2", Znver1Model, ProcessorFeatures.ZN2Features>;

def : Proc<"geode", [FeatureX87, FeatureSlowUAMem16, FeatureCMPXCHG8B,		def : Proc<"geode", [FeatureX87, FeatureSlowUAMem16, FeatureCMPXCHG8B,
Feature3DNowA]>;		Feature3DNowA, FeatureInsertVZEROUPPER]>;

def : Proc<"winchip-c6", [FeatureX87, FeatureSlowUAMem16, FeatureMMX]>;		def : Proc<"winchip-c6", [FeatureX87, FeatureSlowUAMem16, FeatureMMX,
def : Proc<"winchip2", [FeatureX87, FeatureSlowUAMem16, Feature3DNow]>;		FeatureInsertVZEROUPPER]>;
def : Proc<"c3", [FeatureX87, FeatureSlowUAMem16, Feature3DNow]>;		def : Proc<"winchip2", [FeatureX87, FeatureSlowUAMem16, Feature3DNow,
		FeatureInsertVZEROUPPER]>;
		def : Proc<"c3", [FeatureX87, FeatureSlowUAMem16, Feature3DNow,
		FeatureInsertVZEROUPPER]>;
def : Proc<"c3-2", [FeatureX87, FeatureSlowUAMem16, FeatureCMPXCHG8B,		def : Proc<"c3-2", [FeatureX87, FeatureSlowUAMem16, FeatureCMPXCHG8B,
FeatureMMX, FeatureSSE1, FeatureFXSR,		FeatureMMX, FeatureSSE1, FeatureFXSR,
FeatureCMOV]>;		FeatureCMOV, FeatureInsertVZEROUPPER]>;

// We also provide a generic 64-bit specific x86 processor model which tries to		// We also provide a generic 64-bit specific x86 processor model which tries to
// be good for modern chips without enabling instruction set encodings past the		// be good for modern chips without enabling instruction set encodings past the
// basic SSE2 and 64-bit ones. It disables slow things from any mainstream and		// basic SSE2 and 64-bit ones. It disables slow things from any mainstream and
// modern 64-bit x86 chip, and enables features that are generally beneficial.		// modern 64-bit x86 chip, and enables features that are generally beneficial.
//		//
// We currently use the Sandy Bridge model as the default scheduling model as		// We currently use the Sandy Bridge model as the default scheduling model as
// we use it across Nehalem, Westmere, Sandy Bridge, and Ivy Bridge which		// we use it across Nehalem, Westmere, Sandy Bridge, and Ivy Bridge which
// covers a huge swath of x86 processors. If there are specific scheduling		// covers a huge swath of x86 processors. If there are specific scheduling
// knobs which need to be tuned differently for AMD chips, we might consider		// knobs which need to be tuned differently for AMD chips, we might consider
// forming a common base for them.		// forming a common base for them.
def : ProcessorModel<"x86-64", SandyBridgeModel, [		def : ProcessorModel<"x86-64", SandyBridgeModel, [
FeatureX87,		FeatureX87,
FeatureCMPXCHG8B,		FeatureCMPXCHG8B,
FeatureCMOV,		FeatureCMOV,
FeatureMMX,		FeatureMMX,
FeatureSSE2,		FeatureSSE2,
FeatureFXSR,		FeatureFXSR,
FeatureNOPL,		FeatureNOPL,
Feature64Bit,		Feature64Bit,
FeatureSlow3OpsLEA,		FeatureSlow3OpsLEA,
FeatureSlowIncDec,		FeatureSlowIncDec,
FeatureMacroFusion		FeatureMacroFusion,
		FeatureInsertVZEROUPPER
]>;		]>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Calling Conventions		// Calling Conventions
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

include "X86CallingConv.td"		include "X86CallingConv.td"

▲ Show 20 Lines • Show All 59 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86Subtarget.h

Show First 20 Lines • Show All 250 Lines • ▼ Show 20 Lines	protected:

/// True if LZCNT/TZCNT instructions have a false dependency on the destination register.		/// True if LZCNT/TZCNT instructions have a false dependency on the destination register.
bool HasLZCNTFalseDeps = false;		bool HasLZCNTFalseDeps = false;

/// True if its preferable to combine to a single shuffle using a variable		/// True if its preferable to combine to a single shuffle using a variable
/// mask over multiple fixed shuffles.		/// mask over multiple fixed shuffles.
bool HasFastVariableShuffle = false;		bool HasFastVariableShuffle = false;

/// True if there is no performance penalty to writing only the lower parts		/// True if vzeroupper instructions should be inserted after code that uses
/// of a YMM or ZMM register without clearing the upper part.		/// ymm or zmm registers.
bool HasFastPartialYMMorZMMWrite = false;		bool InsertVZEROUPPER = false;

/// True if there is no performance penalty for writing NOPs with up to		/// True if there is no performance penalty for writing NOPs with up to
/// 11 bytes.		/// 11 bytes.
bool HasFast11ByteNOP = false;		bool HasFast11ByteNOP = false;

/// True if there is no performance penalty for writing NOPs with up to		/// True if there is no performance penalty for writing NOPs with up to
/// 15 bytes.		/// 15 bytes.
bool HasFast15ByteNOP = false;		bool HasFast15ByteNOP = false;
▲ Show 20 Lines • Show All 383 Lines • ▼ Show 20 Lines	public:
bool hasSSEUnalignedMem() const { return HasSSEUnalignedMem; }		bool hasSSEUnalignedMem() const { return HasSSEUnalignedMem; }
bool hasCmpxchg16b() const { return HasCmpxchg16b && is64Bit(); }		bool hasCmpxchg16b() const { return HasCmpxchg16b && is64Bit(); }
bool useLeaForSP() const { return UseLeaForSP; }		bool useLeaForSP() const { return UseLeaForSP; }
bool hasPOPCNTFalseDeps() const { return HasPOPCNTFalseDeps; }		bool hasPOPCNTFalseDeps() const { return HasPOPCNTFalseDeps; }
bool hasLZCNTFalseDeps() const { return HasLZCNTFalseDeps; }		bool hasLZCNTFalseDeps() const { return HasLZCNTFalseDeps; }
bool hasFastVariableShuffle() const {		bool hasFastVariableShuffle() const {
return HasFastVariableShuffle;		return HasFastVariableShuffle;
}		}
bool hasFastPartialYMMorZMMWrite() const {		bool insertVZEROUPPER() const { return InsertVZEROUPPER; }
return HasFastPartialYMMorZMMWrite;
}
bool hasFastGather() const { return HasFastGather; }		bool hasFastGather() const { return HasFastGather; }
bool hasFastScalarFSQRT() const { return HasFastScalarFSQRT; }		bool hasFastScalarFSQRT() const { return HasFastScalarFSQRT; }
bool hasFastVectorFSQRT() const { return HasFastVectorFSQRT; }		bool hasFastVectorFSQRT() const { return HasFastVectorFSQRT; }
bool hasFastLZCNT() const { return HasFastLZCNT; }		bool hasFastLZCNT() const { return HasFastLZCNT; }
bool hasFastSHLDRotate() const { return HasFastSHLDRotate; }		bool hasFastSHLDRotate() const { return HasFastSHLDRotate; }
bool hasFastBEXTR() const { return HasFastBEXTR; }		bool hasFastBEXTR() const { return HasFastBEXTR; }
bool hasFastHorizontalOps() const { return HasFastHorizontalOps; }		bool hasFastHorizontalOps() const { return HasFastHorizontalOps; }
bool hasFastScalarShiftMasks() const { return HasFastScalarShiftMasks; }		bool hasFastScalarShiftMasks() const { return HasFastScalarShiftMasks; }
▲ Show 20 Lines • Show All 214 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86TargetTransformInfo.h

Show First 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	const FeatureBitset InlineFeatureIgnoreList = {
X86::FeatureLAHFSAHF,		X86::FeatureLAHFSAHF,

// Codegen control options.		// Codegen control options.
X86::FeatureFast11ByteNOP,		X86::FeatureFast11ByteNOP,
X86::FeatureFast15ByteNOP,		X86::FeatureFast15ByteNOP,
X86::FeatureFastBEXTR,		X86::FeatureFastBEXTR,
X86::FeatureFastHorizontalOps,		X86::FeatureFastHorizontalOps,
X86::FeatureFastLZCNT,		X86::FeatureFastLZCNT,
X86::FeatureFastPartialYMMorZMMWrite,
X86::FeatureFastScalarFSQRT,		X86::FeatureFastScalarFSQRT,
X86::FeatureFastSHLDRotate,		X86::FeatureFastSHLDRotate,
X86::FeatureFastScalarShiftMasks,		X86::FeatureFastScalarShiftMasks,
X86::FeatureFastVectorShiftMasks,		X86::FeatureFastVectorShiftMasks,
X86::FeatureFastVariableShuffle,		X86::FeatureFastVariableShuffle,
X86::FeatureFastVectorFSQRT,		X86::FeatureFastVectorFSQRT,
X86::FeatureLEAForSP,		X86::FeatureLEAForSP,
X86::FeatureLEAUsesAG,		X86::FeatureLEAUsesAG,
Show All 10 Lines	const FeatureBitset InlineFeatureIgnoreList = {
X86::FeatureSlowIncDec,		X86::FeatureSlowIncDec,
X86::FeatureSlowLEA,		X86::FeatureSlowLEA,
X86::FeatureSlowPMADDWD,		X86::FeatureSlowPMADDWD,
X86::FeatureSlowPMULLD,		X86::FeatureSlowPMULLD,
X86::FeatureSlowSHLD,		X86::FeatureSlowSHLD,
X86::FeatureSlowTwoMemOps,		X86::FeatureSlowTwoMemOps,
X86::FeatureSlowUAMem16,		X86::FeatureSlowUAMem16,
X86::FeaturePreferMaskRegisters,		X86::FeaturePreferMaskRegisters,
		X86::FeatureInsertVZEROUPPER,

// Perf-tuning flags.		// Perf-tuning flags.
X86::FeatureHasFastGather,		X86::FeatureHasFastGather,
X86::FeatureSlowUAMem32,		X86::FeatureSlowUAMem32,

// Based on whether user set the -mprefer-vector-width command line.		// Based on whether user set the -mprefer-vector-width command line.
X86::FeaturePrefer128Bit,		X86::FeaturePrefer128Bit,
X86::FeaturePrefer256Bit,		X86::FeaturePrefer256Bit,
▲ Show 20 Lines • Show All 130 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86VZeroUpper.cpp

Show First 20 Lines • Show All 273 Lines • ▼ Show 20 Lines	void VZeroUpperInserter::processBasicBlock(MachineBasicBlock &MBB) {

BlockStates[MBB.getNumber()].ExitState = CurState;		BlockStates[MBB.getNumber()].ExitState = CurState;
}		}

/// Loop over all of the basic blocks, inserting vzeroupper instructions before		/// Loop over all of the basic blocks, inserting vzeroupper instructions before
/// function calls.		/// function calls.
bool VZeroUpperInserter::runOnMachineFunction(MachineFunction &MF) {		bool VZeroUpperInserter::runOnMachineFunction(MachineFunction &MF) {
const X86Subtarget &ST = MF.getSubtarget<X86Subtarget>();		const X86Subtarget &ST = MF.getSubtarget<X86Subtarget>();
if (!ST.hasAVX() \|\| ST.hasFastPartialYMMorZMMWrite())		if (!ST.hasAVX() \|\| !ST.insertVZEROUPPER())
return false;		return false;
TII = ST.getInstrInfo();		TII = ST.getInstrInfo();
MachineRegisterInfo &MRI = MF.getRegInfo();		MachineRegisterInfo &MRI = MF.getRegInfo();
EverMadeChange = false;		EverMadeChange = false;
IsX86INTR = MF.getFunction().getCallingConv() == CallingConv::X86_INTR;		IsX86INTR = MF.getFunction().getCallingConv() == CallingConv::X86_INTR;

bool FnHasLiveInYmmOrZmm = checkFnHasLiveInYmmOrZmm(MRI);		bool FnHasLiveInYmmOrZmm = checkFnHasLiveInYmmOrZmm(MRI);

▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/avx-vzeroupper.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -x86-use-vzeroupper -mtriple=x86_64-unknown-unknown -mattr=+avx \| FileCheck %s --check-prefix=ALL --check-prefix=VZ --check-prefix=AVX			; RUN: llc < %s -x86-use-vzeroupper -mtriple=x86_64-unknown-unknown -mattr=+avx \| FileCheck %s --check-prefix=ALL --check-prefix=VZ --check-prefix=AVX
	; RUN: llc < %s -x86-use-vzeroupper -mtriple=x86_64-unknown-unknown -mattr=+avx512f \| FileCheck %s --check-prefix=ALL --check-prefix=VZ --check-prefix=AVX512			; RUN: llc < %s -x86-use-vzeroupper -mtriple=x86_64-unknown-unknown -mattr=+avx512f \| FileCheck %s --check-prefix=ALL --check-prefix=VZ --check-prefix=AVX512
	; RUN: llc < %s -x86-use-vzeroupper -mtriple=x86_64-unknown-unknown -mattr=+avx,+fast-partial-ymm-or-zmm-write \| FileCheck %s --check-prefix=ALL --check-prefix=NO-VZ --check-prefix=FAST-ymm-zmm			; RUN: llc < %s -x86-use-vzeroupper -mtriple=x86_64-unknown-unknown -mattr=+avx,-vzeroupper \| FileCheck %s --check-prefix=ALL --check-prefix=NO-VZ --check-prefix=DISABLE-VZ
	; RUN: llc < %s -x86-use-vzeroupper -mtriple=x86_64-unknown-unknown -mcpu=bdver2 \| FileCheck %s --check-prefix=ALL --check-prefix=NO-VZ --check-prefix=BDVER2			; RUN: llc < %s -x86-use-vzeroupper -mtriple=x86_64-unknown-unknown -mcpu=bdver2 \| FileCheck %s --check-prefix=ALL --check-prefix=NO-VZ --check-prefix=BDVER2
	; RUN: llc < %s -x86-use-vzeroupper -mtriple=x86_64-unknown-unknown -mcpu=btver2 \| FileCheck %s --check-prefix=ALL --check-prefix=NO-VZ --check-prefix=BTVER2			; RUN: llc < %s -x86-use-vzeroupper -mtriple=x86_64-unknown-unknown -mcpu=btver2 \| FileCheck %s --check-prefix=ALL --check-prefix=NO-VZ --check-prefix=BTVER2

	declare i32 @foo()			declare i32 @foo()
	declare <4 x float> @do_sse(<4 x float>)			declare <4 x float> @do_sse(<4 x float>)
	declare <8 x float> @do_avx(<8 x float>)			declare <8 x float> @do_avx(<8 x float>)
	declare <4 x float> @llvm.x86.avx.vextractf128.ps.256(<8 x float>, i8) nounwind readnone			declare <4 x float> @llvm.x86.avx.vextractf128.ps.256(<8 x float>, i8) nounwind readnone
	@x = common global <4 x float> zeroinitializer, align 16			@x = common global <4 x float> zeroinitializer, align 16
	Show All 26 Lines
	; VZ-NEXT: callq do_sse			; VZ-NEXT: callq do_sse
	; VZ-NEXT: vmovaps %xmm0, {{.*}}(%rip)			; VZ-NEXT: vmovaps %xmm0, {{.*}}(%rip)
	; VZ-NEXT: callq do_sse			; VZ-NEXT: callq do_sse
	; VZ-NEXT: vmovaps %xmm0, {{.*}}(%rip)			; VZ-NEXT: vmovaps %xmm0, {{.*}}(%rip)
	; VZ-NEXT: vmovups (%rsp), %ymm0 # 32-byte Reload			; VZ-NEXT: vmovups (%rsp), %ymm0 # 32-byte Reload
	; VZ-NEXT: addq $56, %rsp			; VZ-NEXT: addq $56, %rsp
	; VZ-NEXT: retq			; VZ-NEXT: retq
	;			;
	; FAST-ymm-zmm-LABEL: test01:			; DISABLE-VZ-LABEL: test01:
	; FAST-ymm-zmm: # %bb.0:			; DISABLE-VZ: # %bb.0:
	; FAST-ymm-zmm-NEXT: subq $56, %rsp			; DISABLE-VZ-NEXT: subq $56, %rsp
	; FAST-ymm-zmm-NEXT: vmovups %ymm2, (%rsp) # 32-byte Spill			; DISABLE-VZ-NEXT: vmovups %ymm2, (%rsp) # 32-byte Spill
	; FAST-ymm-zmm-NEXT: vmovaps {{.*}}(%rip), %xmm0			; DISABLE-VZ-NEXT: vmovaps {{.*}}(%rip), %xmm0
	; FAST-ymm-zmm-NEXT: callq do_sse			; DISABLE-VZ-NEXT: callq do_sse
	; FAST-ymm-zmm-NEXT: vmovaps %xmm0, {{.*}}(%rip)			; DISABLE-VZ-NEXT: vmovaps %xmm0, {{.*}}(%rip)
	; FAST-ymm-zmm-NEXT: callq do_sse			; DISABLE-VZ-NEXT: callq do_sse
	; FAST-ymm-zmm-NEXT: vmovaps %xmm0, {{.*}}(%rip)			; DISABLE-VZ-NEXT: vmovaps %xmm0, {{.*}}(%rip)
	; FAST-ymm-zmm-NEXT: vmovups (%rsp), %ymm0 # 32-byte Reload			; DISABLE-VZ-NEXT: vmovups (%rsp), %ymm0 # 32-byte Reload
	; FAST-ymm-zmm-NEXT: addq $56, %rsp			; DISABLE-VZ-NEXT: addq $56, %rsp
	; FAST-ymm-zmm-NEXT: retq			; DISABLE-VZ-NEXT: retq
	;			;
	; BDVER2-LABEL: test01:			; BDVER2-LABEL: test01:
	; BDVER2: # %bb.0:			; BDVER2: # %bb.0:
	; BDVER2-NEXT: subq $56, %rsp			; BDVER2-NEXT: subq $56, %rsp
	; BDVER2-NEXT: vmovaps {{.*}}(%rip), %xmm0			; BDVER2-NEXT: vmovaps {{.*}}(%rip), %xmm0
	; BDVER2-NEXT: vmovups %ymm2, (%rsp) # 32-byte Spill			; BDVER2-NEXT: vmovups %ymm2, (%rsp) # 32-byte Spill
	; BDVER2-NEXT: vzeroupper			; BDVER2-NEXT: vzeroupper
	; BDVER2-NEXT: callq do_sse			; BDVER2-NEXT: callq do_sse
	Show All 11 Lines
	; BTVER2-NEXT: vmovups %ymm2, (%rsp) # 32-byte Spill			; BTVER2-NEXT: vmovups %ymm2, (%rsp) # 32-byte Spill
	; BTVER2-NEXT: callq do_sse			; BTVER2-NEXT: callq do_sse
	; BTVER2-NEXT: vmovaps %xmm0, {{.*}}(%rip)			; BTVER2-NEXT: vmovaps %xmm0, {{.*}}(%rip)
	; BTVER2-NEXT: callq do_sse			; BTVER2-NEXT: callq do_sse
	; BTVER2-NEXT: vmovaps %xmm0, {{.*}}(%rip)			; BTVER2-NEXT: vmovaps %xmm0, {{.*}}(%rip)
	; BTVER2-NEXT: vmovups (%rsp), %ymm0 # 32-byte Reload			; BTVER2-NEXT: vmovups (%rsp), %ymm0 # 32-byte Reload
	; BTVER2-NEXT: addq $56, %rsp			; BTVER2-NEXT: addq $56, %rsp
	; BTVER2-NEXT: retq			; BTVER2-NEXT: retq
				; DISABLE-VZ # %bb.0:
	%tmp = load <4 x float>, <4 x float>* @x, align 16			%tmp = load <4 x float>, <4 x float>* @x, align 16
	%call = tail call <4 x float> @do_sse(<4 x float> %tmp) nounwind			%call = tail call <4 x float> @do_sse(<4 x float> %tmp) nounwind
	store <4 x float> %call, <4 x float>* @x, align 16			store <4 x float> %call, <4 x float>* @x, align 16
	%call2 = tail call <4 x float> @do_sse(<4 x float> %call) nounwind			%call2 = tail call <4 x float> @do_sse(<4 x float> %call) nounwind
	store <4 x float> %call2, <4 x float>* @x, align 16			store <4 x float> %call2, <4 x float>* @x, align 16
	ret <8 x float> %c			ret <8 x float> %c
	}			}

	;; Check that vzeroupper is emitted for tail calls.			;; Check that vzeroupper is emitted for tail calls.

	define <4 x float> @test02(<8 x float> %a, <8 x float> %b) nounwind {			define <4 x float> @test02(<8 x float> %a, <8 x float> %b) nounwind {
	; VZ-LABEL: test02:			; VZ-LABEL: test02:
	; VZ: # %bb.0:			; VZ: # %bb.0:
	; VZ-NEXT: vaddps %xmm1, %xmm0, %xmm0			; VZ-NEXT: vaddps %xmm1, %xmm0, %xmm0
	; VZ-NEXT: vzeroupper			; VZ-NEXT: vzeroupper
	; VZ-NEXT: jmp do_sse # TAILCALL			; VZ-NEXT: jmp do_sse # TAILCALL
	;			;
	; FAST-ymm-zmm-LABEL: test02:			; DISABLE-VZ-LABEL: test02:
	; FAST-ymm-zmm: # %bb.0:			; DISABLE-VZ: # %bb.0:
	; FAST-ymm-zmm-NEXT: vaddps %xmm1, %xmm0, %xmm0			; DISABLE-VZ-NEXT: vaddps %xmm1, %xmm0, %xmm0
	; FAST-ymm-zmm-NEXT: jmp do_sse # TAILCALL			; DISABLE-VZ-NEXT: jmp do_sse # TAILCALL
	;			;
	; BDVER2-LABEL: test02:			; BDVER2-LABEL: test02:
	; BDVER2: # %bb.0:			; BDVER2: # %bb.0:
	; BDVER2-NEXT: vaddps %xmm1, %xmm0, %xmm0			; BDVER2-NEXT: vaddps %xmm1, %xmm0, %xmm0
	; BDVER2-NEXT: vzeroupper			; BDVER2-NEXT: vzeroupper
	; BDVER2-NEXT: jmp do_sse # TAILCALL			; BDVER2-NEXT: jmp do_sse # TAILCALL
	;			;
	; BTVER2-LABEL: test02:			; BTVER2-LABEL: test02:
	Show All 34 Lines
	; VZ-NEXT: callq do_sse			; VZ-NEXT: callq do_sse
	; VZ-NEXT: decl %ebx			; VZ-NEXT: decl %ebx
	; VZ-NEXT: jne .LBB3_3			; VZ-NEXT: jne .LBB3_3
	; VZ-NEXT: # %bb.4: # %for.end			; VZ-NEXT: # %bb.4: # %for.end
	; VZ-NEXT: addq $16, %rsp			; VZ-NEXT: addq $16, %rsp
	; VZ-NEXT: popq %rbx			; VZ-NEXT: popq %rbx
	; VZ-NEXT: retq			; VZ-NEXT: retq
	;			;
	; FAST-ymm-zmm-LABEL: test03:			; DISABLE-VZ-LABEL: test03:
	; FAST-ymm-zmm: # %bb.0: # %entry			; DISABLE-VZ: # %bb.0: # %entry
	; FAST-ymm-zmm-NEXT: pushq %rbx			; DISABLE-VZ-NEXT: pushq %rbx
	; FAST-ymm-zmm-NEXT: subq $16, %rsp			; DISABLE-VZ-NEXT: subq $16, %rsp
	; FAST-ymm-zmm-NEXT: vaddps %xmm1, %xmm0, %xmm0			; DISABLE-VZ-NEXT: vaddps %xmm1, %xmm0, %xmm0
	; FAST-ymm-zmm-NEXT: vmovaps %xmm0, (%rsp) # 16-byte Spill			; DISABLE-VZ-NEXT: vmovaps %xmm0, (%rsp) # 16-byte Spill
	; FAST-ymm-zmm-NEXT: .p2align 4, 0x90			; DISABLE-VZ-NEXT: .p2align 4, 0x90
	; FAST-ymm-zmm-NEXT: .LBB3_1: # %while.cond			; DISABLE-VZ-NEXT: .LBB3_1: # %while.cond
	; FAST-ymm-zmm-NEXT: # =>This Inner Loop Header: Depth=1			; DISABLE-VZ-NEXT: # =>This Inner Loop Header: Depth=1
	; FAST-ymm-zmm-NEXT: callq foo			; DISABLE-VZ-NEXT: callq foo
	; FAST-ymm-zmm-NEXT: testl %eax, %eax			; DISABLE-VZ-NEXT: testl %eax, %eax
	; FAST-ymm-zmm-NEXT: jne .LBB3_1			; DISABLE-VZ-NEXT: jne .LBB3_1
	; FAST-ymm-zmm-NEXT: # %bb.2: # %for.body.preheader			; DISABLE-VZ-NEXT: # %bb.2: # %for.body.preheader
	; FAST-ymm-zmm-NEXT: movl $4, %ebx			; DISABLE-VZ-NEXT: movl $4, %ebx
	; FAST-ymm-zmm-NEXT: vmovaps (%rsp), %xmm0 # 16-byte Reload			; DISABLE-VZ-NEXT: vmovaps (%rsp), %xmm0 # 16-byte Reload
	; FAST-ymm-zmm-NEXT: .p2align 4, 0x90			; DISABLE-VZ-NEXT: .p2align 4, 0x90
	; FAST-ymm-zmm-NEXT: .LBB3_3: # %for.body			; DISABLE-VZ-NEXT: .LBB3_3: # %for.body
	; FAST-ymm-zmm-NEXT: # =>This Inner Loop Header: Depth=1			; DISABLE-VZ-NEXT: # =>This Inner Loop Header: Depth=1
	; FAST-ymm-zmm-NEXT: callq do_sse			; DISABLE-VZ-NEXT: callq do_sse
	; FAST-ymm-zmm-NEXT: callq do_sse			; DISABLE-VZ-NEXT: callq do_sse
	; FAST-ymm-zmm-NEXT: vmovaps g+{{.*}}(%rip), %xmm0			; DISABLE-VZ-NEXT: vmovaps g+{{.*}}(%rip), %xmm0
	; FAST-ymm-zmm-NEXT: callq do_sse			; DISABLE-VZ-NEXT: callq do_sse
	; FAST-ymm-zmm-NEXT: decl %ebx			; DISABLE-VZ-NEXT: decl %ebx
	; FAST-ymm-zmm-NEXT: jne .LBB3_3			; DISABLE-VZ-NEXT: jne .LBB3_3
	; FAST-ymm-zmm-NEXT: # %bb.4: # %for.end			; DISABLE-VZ-NEXT: # %bb.4: # %for.end
	; FAST-ymm-zmm-NEXT: addq $16, %rsp			; DISABLE-VZ-NEXT: addq $16, %rsp
	; FAST-ymm-zmm-NEXT: popq %rbx			; DISABLE-VZ-NEXT: popq %rbx
	; FAST-ymm-zmm-NEXT: retq			; DISABLE-VZ-NEXT: retq
	;			;
	; BDVER2-LABEL: test03:			; BDVER2-LABEL: test03:
	; BDVER2: # %bb.0: # %entry			; BDVER2: # %bb.0: # %entry
	; BDVER2-NEXT: pushq %rbx			; BDVER2-NEXT: pushq %rbx
	; BDVER2-NEXT: subq $16, %rsp			; BDVER2-NEXT: subq $16, %rsp
	; BDVER2-NEXT: vaddps %xmm1, %xmm0, %xmm0			; BDVER2-NEXT: vaddps %xmm1, %xmm0, %xmm0
	; BDVER2-NEXT: vmovaps %xmm0, (%rsp) # 16-byte Spill			; BDVER2-NEXT: vmovaps %xmm0, (%rsp) # 16-byte Spill
	; BDVER2-NEXT: .p2align 4, 0x90			; BDVER2-NEXT: .p2align 4, 0x90
	▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines
	; VZ-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0			; VZ-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0
	; VZ-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0			; VZ-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
	; VZ-NEXT: callq do_avx			; VZ-NEXT: callq do_avx
	; VZ-NEXT: # kill: def $xmm0 killed $xmm0 killed $ymm0			; VZ-NEXT: # kill: def $xmm0 killed $xmm0 killed $ymm0
	; VZ-NEXT: popq %rax			; VZ-NEXT: popq %rax
	; VZ-NEXT: vzeroupper			; VZ-NEXT: vzeroupper
	; VZ-NEXT: retq			; VZ-NEXT: retq
	;			;
	; FAST-ymm-zmm-LABEL: test04:			; DISABLE-VZ-LABEL: test04:
	; FAST-ymm-zmm: # %bb.0:			; DISABLE-VZ: # %bb.0:
	; FAST-ymm-zmm-NEXT: pushq %rax			; DISABLE-VZ-NEXT: pushq %rax
	; FAST-ymm-zmm-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0			; DISABLE-VZ-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0
	; FAST-ymm-zmm-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0			; DISABLE-VZ-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
	; FAST-ymm-zmm-NEXT: callq do_avx			; DISABLE-VZ-NEXT: callq do_avx
	; FAST-ymm-zmm-NEXT: # kill: def $xmm0 killed $xmm0 killed $ymm0			; DISABLE-VZ-NEXT: # kill: def $xmm0 killed $xmm0 killed $ymm0
	; FAST-ymm-zmm-NEXT: popq %rax			; DISABLE-VZ-NEXT: popq %rax
	; FAST-ymm-zmm-NEXT: retq			; DISABLE-VZ-NEXT: retq
	;			;
	; BDVER2-LABEL: test04:			; BDVER2-LABEL: test04:
	; BDVER2: # %bb.0:			; BDVER2: # %bb.0:
	; BDVER2-NEXT: pushq %rax			; BDVER2-NEXT: pushq %rax
	; BDVER2-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0			; BDVER2-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0
	; BDVER2-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0			; BDVER2-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
	; BDVER2-NEXT: callq do_avx			; BDVER2-NEXT: callq do_avx
	; BDVER2-NEXT: # kill: def $xmm0 killed $xmm0 killed $ymm0			; BDVER2-NEXT: # kill: def $xmm0 killed $xmm0 killed $ymm0
	Show All 19 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Add support for -mvzeroupper and -mno-vzeroupper to match gcc.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 227648

clang/docs/ReleaseNotes.rst

clang/include/clang/Driver/Options.td

clang/test/Driver/x86-target-features.c

llvm/docs/ReleaseNotes.rst

llvm/lib/Target/X86/X86.td

llvm/lib/Target/X86/X86Subtarget.h

llvm/lib/Target/X86/X86TargetTransformInfo.h

llvm/lib/Target/X86/X86VZeroUpper.cpp

llvm/test/CodeGen/X86/avx-vzeroupper.ll

[X86] Add support for -mvzeroupper and -mno-vzeroupper to match gcc.
ClosedPublic