This is an archive of the discontinued LLVM Phabricator instance.

[X86] Provide a separate feature bit for macro fusion support instead of basing it on the AVX flag
ClosedPublic

Authored by craig.topper on Aug 29 2017, 3:59 PM.

Download Raw Diff

Details

Reviewers

spatel
chandlerc
RKSimon
zvi

Commits

rG641e2af9e8c4: [X86] Provide a separate feature bit for macro fusion support instead of basing…
rL312097: [X86] Provide a separate feature bit for macro fusion support instead of basing…

Summary

Currently we determine if macro fusion is supported based on the AVX flag as a proxy for the processor being Sandy Bridge".

This is really strange as now AMD supports AVX. It also means if user explicitly disables AVX we disable macro fusion.

This patch adds an explicit macro fusion feature. I've also enabled for the generic 64-bit CPU (which doesn't have AVX)

This is probably another candidate for being in the MI layer, but for now I at least wanted to correct the overloading of the AVX feature.

Diff Detail

Repository: rL LLVM

Event Timeline

craig.topper created this revision.Aug 29 2017, 3:59 PM

A quick skim of Agner's suggsets that all of Core2, Nehalem, Westmere, Bulldozer, Piledriver, Steamroler, and Zen all do nearly full blown macro fusion for cmp/test and a branch. Even the Via Nana apparently does some of this apparently....

I don't think we should restrict it to SNB. I think we should default it on everywhere we don't have explicit information that it doesn't occur (Silvermont, KNL, Athlon, etc)

Anyways, several of the AMD chips listed support AVX and so this would be a regression there.

Add flag to bdver*, znver1, core2, penryn, nehalem, and westmere.

Harbormaster completed remote builds in B9760: Diff 113194.Aug 29 2017, 9:19 PM

LGTM!

This revision is now accepted and ready to land.Aug 29 2017, 9:26 PM

Closed by commit rL312097: [X86] Provide a separate feature bit for macro fusion support instead of basing… (authored by ctopper). · Explain WhyAug 29 2017, 9:36 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

X86/

38 lines

6 lines

4 lines

1 line

test/

CodeGen/

X86/

4 lines

2 lines

4 lines

4 lines

x86-cmov-converter.ll

2 lines

Diff 113195

llvm/trunk/lib/Target/X86/X86.td

Show First 20 Lines • Show All 282 Lines • ▼ Show 20 Lines
// Development Manual. This feature essentially means that REP MOVSB will copy		// Development Manual. This feature essentially means that REP MOVSB will copy
// using the largest available size instead of copying bytes one by one, making		// using the largest available size instead of copying bytes one by one, making
// it at least as fast as REPMOVS{W,D,Q}.		// it at least as fast as REPMOVS{W,D,Q}.
def FeatureERMSB		def FeatureERMSB
: SubtargetFeature<		: SubtargetFeature<
"ermsb", "HasERMSB", "true",		"ermsb", "HasERMSB", "true",
"REP MOVS/STOS are fast">;		"REP MOVS/STOS are fast">;

		// Sandy Bridge and newer processors have many instructions that can be
		// fused with conditional branches and pass through the CPU as a single
		// operation.
		def FeatureMacroFusion
		: SubtargetFeature<"macrofusion", "HasMacroFusion", "true",
		"Various instructions can be fused with conditional branches">;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// X86 processors supported.		// X86 processors supported.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

include "X86Schedule.td"		include "X86Schedule.td"

def ProcIntelAtom : SubtargetFeature<"atom", "X86ProcFamily", "IntelAtom",		def ProcIntelAtom : SubtargetFeature<"atom", "X86ProcFamily", "IntelAtom",
"Intel Atom processors">;		"Intel Atom processors">;
▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines
def : ProcessorModel<"core2", SandyBridgeModel, [		def : ProcessorModel<"core2", SandyBridgeModel, [
FeatureX87,		FeatureX87,
FeatureSlowUAMem16,		FeatureSlowUAMem16,
FeatureMMX,		FeatureMMX,
FeatureSSSE3,		FeatureSSSE3,
FeatureFXSR,		FeatureFXSR,
FeatureCMPXCHG16B,		FeatureCMPXCHG16B,
FeatureSlowBTMem,		FeatureSlowBTMem,
FeatureLAHFSAHF		FeatureLAHFSAHF,
		FeatureMacroFusion
]>;		]>;
def : ProcessorModel<"penryn", SandyBridgeModel, [		def : ProcessorModel<"penryn", SandyBridgeModel, [
FeatureX87,		FeatureX87,
FeatureSlowUAMem16,		FeatureSlowUAMem16,
FeatureMMX,		FeatureMMX,
FeatureSSE41,		FeatureSSE41,
FeatureFXSR,		FeatureFXSR,
FeatureCMPXCHG16B,		FeatureCMPXCHG16B,
FeatureSlowBTMem,		FeatureSlowBTMem,
FeatureLAHFSAHF		FeatureLAHFSAHF,
		FeatureMacroFusion
]>;		]>;

// Atom CPUs.		// Atom CPUs.
class BonnellProc<string Name> : ProcessorModel<Name, AtomModel, [		class BonnellProc<string Name> : ProcessorModel<Name, AtomModel, [
ProcIntelAtom,		ProcIntelAtom,
FeatureX87,		FeatureX87,
FeatureSlowUAMem16,		FeatureSlowUAMem16,
FeatureMMX,		FeatureMMX,
▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines
class NehalemProc<string Name> : ProcessorModel<Name, SandyBridgeModel, [		class NehalemProc<string Name> : ProcessorModel<Name, SandyBridgeModel, [
FeatureX87,		FeatureX87,
FeatureMMX,		FeatureMMX,
FeatureSSE42,		FeatureSSE42,
FeatureFXSR,		FeatureFXSR,
FeatureCMPXCHG16B,		FeatureCMPXCHG16B,
FeatureSlowBTMem,		FeatureSlowBTMem,
FeaturePOPCNT,		FeaturePOPCNT,
FeatureLAHFSAHF		FeatureLAHFSAHF,
		FeatureMacroFusion
]>;		]>;
def : NehalemProc<"nehalem">;		def : NehalemProc<"nehalem">;
def : NehalemProc<"corei7">;		def : NehalemProc<"corei7">;

// Westmere is a similar machine to nehalem with some additional features.		// Westmere is a similar machine to nehalem with some additional features.
// Westmere is the corei3/i5/i7 path from nehalem to sandybridge		// Westmere is the corei3/i5/i7 path from nehalem to sandybridge
class WestmereProc<string Name> : ProcessorModel<Name, SandyBridgeModel, [		class WestmereProc<string Name> : ProcessorModel<Name, SandyBridgeModel, [
FeatureX87,		FeatureX87,
FeatureMMX,		FeatureMMX,
FeatureSSE42,		FeatureSSE42,
FeatureFXSR,		FeatureFXSR,
FeatureCMPXCHG16B,		FeatureCMPXCHG16B,
FeatureSlowBTMem,		FeatureSlowBTMem,
FeaturePOPCNT,		FeaturePOPCNT,
FeatureAES,		FeatureAES,
FeaturePCLMUL,		FeaturePCLMUL,
FeatureLAHFSAHF		FeatureLAHFSAHF,
		FeatureMacroFusion
]>;		]>;
def : WestmereProc<"westmere">;		def : WestmereProc<"westmere">;

class ProcessorFeatures<list<SubtargetFeature> Inherited,		class ProcessorFeatures<list<SubtargetFeature> Inherited,
list<SubtargetFeature> NewFeatures> {		list<SubtargetFeature> NewFeatures> {
list<SubtargetFeature> Value = !listconcat(Inherited, NewFeatures);		list<SubtargetFeature> Value = !listconcat(Inherited, NewFeatures);
}		}

Show All 14 Lines	def SNBFeatures : ProcessorFeatures<[], [
FeatureAES,		FeatureAES,
FeatureSlowDivide64,		FeatureSlowDivide64,
FeaturePCLMUL,		FeaturePCLMUL,
FeatureXSAVE,		FeatureXSAVE,
FeatureXSAVEOPT,		FeatureXSAVEOPT,
FeatureLAHFSAHF,		FeatureLAHFSAHF,
FeatureSlow3OpsLEA,		FeatureSlow3OpsLEA,
FeatureFastScalarFSQRT,		FeatureFastScalarFSQRT,
FeatureFastSHLDRotate		FeatureFastSHLDRotate,
		FeatureMacroFusion
]>;		]>;

class SandyBridgeProc<string Name> : ProcModel<Name, SandyBridgeModel,		class SandyBridgeProc<string Name> : ProcModel<Name, SandyBridgeModel,
SNBFeatures.Value, [		SNBFeatures.Value, [
FeatureSlowBTMem,		FeatureSlowBTMem,
FeatureSlowUAMem32		FeatureSlowUAMem32
]>;		]>;
def : SandyBridgeProc<"sandybridge">;		def : SandyBridgeProc<"sandybridge">;
▲ Show 20 Lines • Show All 198 Lines • ▼ Show 20 Lines	def : Proc<"bdver1", [
FeatureAVX,		FeatureAVX,
FeatureFXSR,		FeatureFXSR,
FeatureSSE4A,		FeatureSSE4A,
FeatureLZCNT,		FeatureLZCNT,
FeaturePOPCNT,		FeaturePOPCNT,
FeatureXSAVE,		FeatureXSAVE,
FeatureLWP,		FeatureLWP,
FeatureSlowSHLD,		FeatureSlowSHLD,
FeatureLAHFSAHF		FeatureLAHFSAHF,
		FeatureMacroFusion
]>;		]>;
// Piledriver		// Piledriver
def : Proc<"bdver2", [		def : Proc<"bdver2", [
FeatureX87,		FeatureX87,
FeatureXOP,		FeatureXOP,
FeatureFMA4,		FeatureFMA4,
FeatureCMPXCHG16B,		FeatureCMPXCHG16B,
FeatureAES,		FeatureAES,
FeaturePRFCHW,		FeaturePRFCHW,
FeaturePCLMUL,		FeaturePCLMUL,
FeatureMMX,		FeatureMMX,
FeatureAVX,		FeatureAVX,
FeatureFXSR,		FeatureFXSR,
FeatureSSE4A,		FeatureSSE4A,
FeatureF16C,		FeatureF16C,
FeatureLZCNT,		FeatureLZCNT,
FeaturePOPCNT,		FeaturePOPCNT,
FeatureXSAVE,		FeatureXSAVE,
FeatureBMI,		FeatureBMI,
FeatureTBM,		FeatureTBM,
FeatureLWP,		FeatureLWP,
FeatureFMA,		FeatureFMA,
FeatureSlowSHLD,		FeatureSlowSHLD,
FeatureLAHFSAHF		FeatureLAHFSAHF,
		FeatureMacroFusion
]>;		]>;

// Steamroller		// Steamroller
def : Proc<"bdver3", [		def : Proc<"bdver3", [
FeatureX87,		FeatureX87,
FeatureXOP,		FeatureXOP,
FeatureFMA4,		FeatureFMA4,
FeatureCMPXCHG16B,		FeatureCMPXCHG16B,
Show All 10 Lines	def : Proc<"bdver3", [
FeatureXSAVE,		FeatureXSAVE,
FeatureBMI,		FeatureBMI,
FeatureTBM,		FeatureTBM,
FeatureLWP,		FeatureLWP,
FeatureFMA,		FeatureFMA,
FeatureXSAVEOPT,		FeatureXSAVEOPT,
FeatureSlowSHLD,		FeatureSlowSHLD,
FeatureFSGSBase,		FeatureFSGSBase,
FeatureLAHFSAHF		FeatureLAHFSAHF,
		FeatureMacroFusion
]>;		]>;

// Excavator		// Excavator
def : Proc<"bdver4", [		def : Proc<"bdver4", [
FeatureX87,		FeatureX87,
FeatureMMX,		FeatureMMX,
FeatureAVX2,		FeatureAVX2,
FeatureFXSR,		FeatureFXSR,
Show All 11 Lines	def : Proc<"bdver4", [
FeatureBMI2,		FeatureBMI2,
FeatureTBM,		FeatureTBM,
FeatureLWP,		FeatureLWP,
FeatureFMA,		FeatureFMA,
FeatureXSAVEOPT,		FeatureXSAVEOPT,
FeatureSlowSHLD,		FeatureSlowSHLD,
FeatureFSGSBase,		FeatureFSGSBase,
FeatureLAHFSAHF,		FeatureLAHFSAHF,
FeatureMWAITX		FeatureMWAITX,
		FeatureMacroFusion
]>;		]>;

// Znver1		// Znver1
def: ProcessorModel<"znver1", Znver1Model, [		def: ProcessorModel<"znver1", Znver1Model, [
FeatureADX,		FeatureADX,
FeatureAES,		FeatureAES,
FeatureAVX2,		FeatureAVX2,
FeatureBMI,		FeatureBMI,
FeatureBMI2,		FeatureBMI2,
FeatureCLFLUSHOPT,		FeatureCLFLUSHOPT,
FeatureCLZERO,		FeatureCLZERO,
FeatureCMPXCHG16B,		FeatureCMPXCHG16B,
FeatureF16C,		FeatureF16C,
FeatureFMA,		FeatureFMA,
FeatureFSGSBase,		FeatureFSGSBase,
FeatureFXSR,		FeatureFXSR,
FeatureFastLZCNT,		FeatureFastLZCNT,
FeatureLAHFSAHF,		FeatureLAHFSAHF,
FeatureLZCNT,		FeatureLZCNT,
		FeatureMacroFusion,
FeatureMMX,		FeatureMMX,
FeatureMOVBE,		FeatureMOVBE,
FeatureMWAITX,		FeatureMWAITX,
FeaturePCLMUL,		FeaturePCLMUL,
FeaturePOPCNT,		FeaturePOPCNT,
FeaturePRFCHW,		FeaturePRFCHW,
FeatureRDRAND,		FeatureRDRAND,
FeatureRDSEED,		FeatureRDSEED,
Show All 27 Lines
def : ProcessorModel<"x86-64", SandyBridgeModel, [		def : ProcessorModel<"x86-64", SandyBridgeModel, [
FeatureX87,		FeatureX87,
FeatureMMX,		FeatureMMX,
FeatureSSE2,		FeatureSSE2,
FeatureFXSR,		FeatureFXSR,
Feature64Bit,		Feature64Bit,
FeatureSlow3OpsLEA,		FeatureSlow3OpsLEA,
FeatureSlowBTMem,		FeatureSlowBTMem,
FeatureSlowIncDec		FeatureSlowIncDec,
		FeatureMacroFusion
]>;		]>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Register File Description		// Register File Description
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

include "X86RegisterInfo.td"		include "X86RegisterInfo.td"
include "X86RegisterBanks.td"		include "X86RegisterBanks.td"
▲ Show 20 Lines • Show All 67 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86MacroFusion.cpp

	Show All 21 Lines
	/// \brief Check if the instr pair, FirstMI and SecondMI, should be fused			/// \brief Check if the instr pair, FirstMI and SecondMI, should be fused
	/// together. Given SecondMI, when FirstMI is unspecified, then check if			/// together. Given SecondMI, when FirstMI is unspecified, then check if
	/// SecondMI may be part of a fused pair at all.			/// SecondMI may be part of a fused pair at all.
	static bool shouldScheduleAdjacent(const TargetInstrInfo &TII,			static bool shouldScheduleAdjacent(const TargetInstrInfo &TII,
	const TargetSubtargetInfo &TSI,			const TargetSubtargetInfo &TSI,
	const MachineInstr *FirstMI,			const MachineInstr *FirstMI,
	const MachineInstr &SecondMI) {			const MachineInstr &SecondMI) {
	const X86Subtarget &ST = static_cast<const X86Subtarget&>(TSI);			const X86Subtarget &ST = static_cast<const X86Subtarget&>(TSI);
	// Check if this processor supports macro-fusion. Since this is a minor			// Check if this processor supports macro-fusion.
	// heuristic, we haven't specifically reserved a feature. hasAVX is a decent			if (!ST.hasMacroFusion())
	// proxy for SandyBridge+.
	if (!ST.hasAVX())
	return false;			return false;

	enum {			enum {
	FuseTest,			FuseTest,
	FuseCmp,			FuseCmp,
	FuseInc			FuseInc
	} FuseKind;			} FuseKind;

	▲ Show 20 Lines • Show All 161 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86Subtarget.h

Show First 20 Lines • Show All 232 Lines • ▼ Show 20 Lines	protected:
bool HasSlowDivide64;		bool HasSlowDivide64;

/// True if LZCNT instruction is fast.		/// True if LZCNT instruction is fast.
bool HasFastLZCNT;		bool HasFastLZCNT;

/// True if SHLD based rotate is fast.		/// True if SHLD based rotate is fast.
bool HasFastSHLDRotate;		bool HasFastSHLDRotate;

		/// True if the processor supports macrofusion.
		bool HasMacroFusion;

/// True if the processor has enhanced REP MOVSB/STOSB.		/// True if the processor has enhanced REP MOVSB/STOSB.
bool HasERMSB;		bool HasERMSB;

/// True if the short functions should be padded to prevent		/// True if the short functions should be padded to prevent
/// a stall when returning too early.		/// a stall when returning too early.
bool PadShortFunctions;		bool PadShortFunctions;

/// True if two memory operand instructions should use a temporary register		/// True if two memory operand instructions should use a temporary register
▲ Show 20 Lines • Show All 234 Lines • ▼ Show 20 Lines	public:
bool useLeaForSP() const { return UseLeaForSP; }		bool useLeaForSP() const { return UseLeaForSP; }
bool hasFastPartialYMMorZMMWrite() const {		bool hasFastPartialYMMorZMMWrite() const {
return HasFastPartialYMMorZMMWrite;		return HasFastPartialYMMorZMMWrite;
}		}
bool hasFastScalarFSQRT() const { return HasFastScalarFSQRT; }		bool hasFastScalarFSQRT() const { return HasFastScalarFSQRT; }
bool hasFastVectorFSQRT() const { return HasFastVectorFSQRT; }		bool hasFastVectorFSQRT() const { return HasFastVectorFSQRT; }
bool hasFastLZCNT() const { return HasFastLZCNT; }		bool hasFastLZCNT() const { return HasFastLZCNT; }
bool hasFastSHLDRotate() const { return HasFastSHLDRotate; }		bool hasFastSHLDRotate() const { return HasFastSHLDRotate; }
		bool hasMacroFusion() const { return HasMacroFusion; }
bool hasERMSB() const { return HasERMSB; }		bool hasERMSB() const { return HasERMSB; }
bool hasSlowDivide32() const { return HasSlowDivide32; }		bool hasSlowDivide32() const { return HasSlowDivide32; }
bool hasSlowDivide64() const { return HasSlowDivide64; }		bool hasSlowDivide64() const { return HasSlowDivide64; }
bool padShortFunctions() const { return PadShortFunctions; }		bool padShortFunctions() const { return PadShortFunctions; }
bool slowTwoMemOps() const { return SlowTwoMemOps; }		bool slowTwoMemOps() const { return SlowTwoMemOps; }
bool LEAusesAG() const { return LEAUsesAG; }		bool LEAusesAG() const { return LEAUsesAG; }
bool slowLEA() const { return SlowLEA; }		bool slowLEA() const { return SlowLEA; }
bool slow3OpsLEA() const { return Slow3OpsLEA; }		bool slow3OpsLEA() const { return Slow3OpsLEA; }
▲ Show 20 Lines • Show All 166 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86Subtarget.cpp

Show First 20 Lines • Show All 341 Lines • ▼ Show 20 Lines	void X86Subtarget::initializeEnvironment() {
HasSSEUnalignedMem = false;		HasSSEUnalignedMem = false;
HasCmpxchg16b = false;		HasCmpxchg16b = false;
UseLeaForSP = false;		UseLeaForSP = false;
HasFastPartialYMMorZMMWrite = false;		HasFastPartialYMMorZMMWrite = false;
HasFastScalarFSQRT = false;		HasFastScalarFSQRT = false;
HasFastVectorFSQRT = false;		HasFastVectorFSQRT = false;
HasFastLZCNT = false;		HasFastLZCNT = false;
HasFastSHLDRotate = false;		HasFastSHLDRotate = false;
		HasMacroFusion = false;
HasERMSB = false;		HasERMSB = false;
HasSlowDivide32 = false;		HasSlowDivide32 = false;
HasSlowDivide64 = false;		HasSlowDivide64 = false;
PadShortFunctions = false;		PadShortFunctions = false;
SlowTwoMemOps = false;		SlowTwoMemOps = false;
LEAUsesAG = false;		LEAUsesAG = false;
SlowLEA = false;		SlowLEA = false;
Slow3OpsLEA = false;		Slow3OpsLEA = false;
▲ Show 20 Lines • Show All 66 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/avx-select.ll

	Show All 10 Lines
	; X86-NEXT: # BB#1:			; X86-NEXT: # BB#1:
	; X86-NEXT: vmovaps %ymm0, %ymm1			; X86-NEXT: vmovaps %ymm0, %ymm1
	; X86-NEXT: .LBB0_2:			; X86-NEXT: .LBB0_2:
	; X86-NEXT: vxorps %ymm1, %ymm0, %ymm0			; X86-NEXT: vxorps %ymm1, %ymm0, %ymm0
	; X86-NEXT: retl			; X86-NEXT: retl
	;			;
	; X64-LABEL: select00:			; X64-LABEL: select00:
	; X64: # BB#0:			; X64: # BB#0:
	; X64-NEXT: vxorps %xmm1, %xmm1, %xmm1
	; X64-NEXT: cmpl $255, %edi			; X64-NEXT: cmpl $255, %edi
				; X64-NEXT: vxorps %xmm1, %xmm1, %xmm1
	; X64-NEXT: je .LBB0_2			; X64-NEXT: je .LBB0_2
	; X64-NEXT: # BB#1:			; X64-NEXT: # BB#1:
	; X64-NEXT: vmovaps %ymm0, %ymm1			; X64-NEXT: vmovaps %ymm0, %ymm1
	; X64-NEXT: .LBB0_2:			; X64-NEXT: .LBB0_2:
	; X64-NEXT: vxorps %ymm1, %ymm0, %ymm0			; X64-NEXT: vxorps %ymm1, %ymm0, %ymm0
	; X64-NEXT: retq			; X64-NEXT: retq
	%cmpres = icmp eq i32 %a, 255			%cmpres = icmp eq i32 %a, 255
	%selres = select i1 %cmpres, <8 x i32> zeroinitializer, <8 x i32> %b			%selres = select i1 %cmpres, <8 x i32> zeroinitializer, <8 x i32> %b
	Show All 10 Lines
	; X86-NEXT: # BB#1:			; X86-NEXT: # BB#1:
	; X86-NEXT: vmovaps %ymm0, %ymm1			; X86-NEXT: vmovaps %ymm0, %ymm1
	; X86-NEXT: .LBB1_2:			; X86-NEXT: .LBB1_2:
	; X86-NEXT: vxorps %ymm1, %ymm0, %ymm0			; X86-NEXT: vxorps %ymm1, %ymm0, %ymm0
	; X86-NEXT: retl			; X86-NEXT: retl
	;			;
	; X64-LABEL: select01:			; X64-LABEL: select01:
	; X64: # BB#0:			; X64: # BB#0:
	; X64-NEXT: vxorps %xmm1, %xmm1, %xmm1
	; X64-NEXT: cmpl $255, %edi			; X64-NEXT: cmpl $255, %edi
				; X64-NEXT: vxorps %xmm1, %xmm1, %xmm1
	; X64-NEXT: je .LBB1_2			; X64-NEXT: je .LBB1_2
	; X64-NEXT: # BB#1:			; X64-NEXT: # BB#1:
	; X64-NEXT: vmovaps %ymm0, %ymm1			; X64-NEXT: vmovaps %ymm0, %ymm1
	; X64-NEXT: .LBB1_2:			; X64-NEXT: .LBB1_2:
	; X64-NEXT: vxorps %ymm1, %ymm0, %ymm0			; X64-NEXT: vxorps %ymm1, %ymm0, %ymm0
	; X64-NEXT: retq			; X64-NEXT: retq
	%cmpres = icmp eq i32 %a, 255			%cmpres = icmp eq i32 %a, 255
	%selres = select i1 %cmpres, <4 x i64> zeroinitializer, <4 x i64> %b			%selres = select i1 %cmpres, <4 x i64> zeroinitializer, <4 x i64> %b
	%res = xor <4 x i64> %b, %selres			%res = xor <4 x i64> %b, %selres
	ret <4 x i64> %res			ret <4 x i64> %res
	}			}

llvm/trunk/test/CodeGen/X86/avx-splat.ll

	Show First 20 Lines • Show All 54 Lines • ▼ Show 20 Lines

	; Test this turns into a broadcast:			; Test this turns into a broadcast:
	; shuffle (scalar_to_vector (load (ptr + 4))), undef, <0, 0, 0, 0>			; shuffle (scalar_to_vector (load (ptr + 4))), undef, <0, 0, 0, 0>
	;			;
	define <8 x float> @funcE() nounwind {			define <8 x float> @funcE() nounwind {
	; CHECK-LABEL: funcE:			; CHECK-LABEL: funcE:
	; CHECK: # BB#0: # %for_exit499			; CHECK: # BB#0: # %for_exit499
	; CHECK-NEXT: xorl %eax, %eax			; CHECK-NEXT: xorl %eax, %eax
	; CHECK-NEXT: # implicit-def: %YMM0
	; CHECK-NEXT: testb %al, %al			; CHECK-NEXT: testb %al, %al
				; CHECK-NEXT: # implicit-def: %YMM0
	; CHECK-NEXT: jne .LBB4_2			; CHECK-NEXT: jne .LBB4_2
	; CHECK-NEXT: # BB#1: # %load.i1247			; CHECK-NEXT: # BB#1: # %load.i1247
	; CHECK-NEXT: pushq %rbp			; CHECK-NEXT: pushq %rbp
	; CHECK-NEXT: movq %rsp, %rbp			; CHECK-NEXT: movq %rsp, %rbp
	; CHECK-NEXT: andq $-32, %rsp			; CHECK-NEXT: andq $-32, %rsp
	; CHECK-NEXT: subq $1312, %rsp # imm = 0x520			; CHECK-NEXT: subq $1312, %rsp # imm = 0x520
	; CHECK-NEXT: vbroadcastss {{[0-9]+}}(%rsp), %ymm0			; CHECK-NEXT: vbroadcastss {{[0-9]+}}(%rsp), %ymm0
	; CHECK-NEXT: movq %rbp, %rsp			; CHECK-NEXT: movq %rbp, %rsp
	▲ Show 20 Lines • Show All 101 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/avx512-mask-op.ll

	Show First 20 Lines • Show All 686 Lines • ▼ Show 20 Lines
	; SKX-NEXT: LBB17_1:			; SKX-NEXT: LBB17_1:
	; SKX-NEXT: vpcmpgtd %zmm2, %zmm0, %k0			; SKX-NEXT: vpcmpgtd %zmm2, %zmm0, %k0
	; SKX-NEXT: vpmovm2b %k0, %xmm0			; SKX-NEXT: vpmovm2b %k0, %xmm0
	; SKX-NEXT: vzeroupper			; SKX-NEXT: vzeroupper
	; SKX-NEXT: retq			; SKX-NEXT: retq
	;			;
	; AVX512BW-LABEL: test8:			; AVX512BW-LABEL: test8:
	; AVX512BW: ## BB#0:			; AVX512BW: ## BB#0:
	; AVX512BW-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX512BW-NEXT: cmpl %esi, %edi			; AVX512BW-NEXT: cmpl %esi, %edi
				; AVX512BW-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX512BW-NEXT: jg LBB17_1			; AVX512BW-NEXT: jg LBB17_1
	; AVX512BW-NEXT: ## BB#2:			; AVX512BW-NEXT: ## BB#2:
	; AVX512BW-NEXT: vpcmpltud %zmm2, %zmm1, %k0			; AVX512BW-NEXT: vpcmpltud %zmm2, %zmm1, %k0
	; AVX512BW-NEXT: jmp LBB17_3			; AVX512BW-NEXT: jmp LBB17_3
	; AVX512BW-NEXT: LBB17_1:			; AVX512BW-NEXT: LBB17_1:
	; AVX512BW-NEXT: vpcmpgtd %zmm2, %zmm0, %k0			; AVX512BW-NEXT: vpcmpgtd %zmm2, %zmm0, %k0
	; AVX512BW-NEXT: LBB17_3:			; AVX512BW-NEXT: LBB17_3:
	; AVX512BW-NEXT: vpmovm2b %k0, %zmm0			; AVX512BW-NEXT: vpmovm2b %k0, %zmm0
	; AVX512BW-NEXT: ## kill: %XMM0<def> %XMM0<kill> %ZMM0<kill>			; AVX512BW-NEXT: ## kill: %XMM0<def> %XMM0<kill> %ZMM0<kill>
	; AVX512BW-NEXT: vzeroupper			; AVX512BW-NEXT: vzeroupper
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	;			;
	; AVX512DQ-LABEL: test8:			; AVX512DQ-LABEL: test8:
	; AVX512DQ: ## BB#0:			; AVX512DQ: ## BB#0:
	; AVX512DQ-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX512DQ-NEXT: cmpl %esi, %edi			; AVX512DQ-NEXT: cmpl %esi, %edi
				; AVX512DQ-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; AVX512DQ-NEXT: jg LBB17_1			; AVX512DQ-NEXT: jg LBB17_1
	; AVX512DQ-NEXT: ## BB#2:			; AVX512DQ-NEXT: ## BB#2:
	; AVX512DQ-NEXT: vpcmpltud %zmm2, %zmm1, %k0			; AVX512DQ-NEXT: vpcmpltud %zmm2, %zmm1, %k0
	; AVX512DQ-NEXT: jmp LBB17_3			; AVX512DQ-NEXT: jmp LBB17_3
	; AVX512DQ-NEXT: LBB17_1:			; AVX512DQ-NEXT: LBB17_1:
	; AVX512DQ-NEXT: vpcmpgtd %zmm2, %zmm0, %k0			; AVX512DQ-NEXT: vpcmpgtd %zmm2, %zmm0, %k0
	; AVX512DQ-NEXT: LBB17_3:			; AVX512DQ-NEXT: LBB17_3:
	; AVX512DQ-NEXT: vpmovm2d %k0, %zmm0			; AVX512DQ-NEXT: vpmovm2d %k0, %zmm0
	▲ Show 20 Lines • Show All 3,189 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/vec_int_to_fp.ll

	Show First 20 Lines • Show All 1,672 Lines • ▼ Show 20 Lines
	; VEX-NEXT: movq %rax, %rcx			; VEX-NEXT: movq %rax, %rcx
	; VEX-NEXT: shrq %rcx			; VEX-NEXT: shrq %rcx
	; VEX-NEXT: andl $1, %eax			; VEX-NEXT: andl $1, %eax
	; VEX-NEXT: orq %rcx, %rax			; VEX-NEXT: orq %rcx, %rax
	; VEX-NEXT: vcvtsi2ssq %rax, %xmm2, %xmm0			; VEX-NEXT: vcvtsi2ssq %rax, %xmm2, %xmm0
	; VEX-NEXT: vaddss %xmm0, %xmm0, %xmm0			; VEX-NEXT: vaddss %xmm0, %xmm0, %xmm0
	; VEX-NEXT: .LBB39_6:			; VEX-NEXT: .LBB39_6:
	; VEX-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[2,3]			; VEX-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[2,3]
	; VEX-NEXT: vxorps %xmm1, %xmm1, %xmm1
	; VEX-NEXT: testq %rax, %rax			; VEX-NEXT: testq %rax, %rax
				; VEX-NEXT: vxorps %xmm1, %xmm1, %xmm1
	; VEX-NEXT: js .LBB39_8			; VEX-NEXT: js .LBB39_8
	; VEX-NEXT: # BB#7:			; VEX-NEXT: # BB#7:
	; VEX-NEXT: vcvtsi2ssq %rax, %xmm2, %xmm1			; VEX-NEXT: vcvtsi2ssq %rax, %xmm2, %xmm1
	; VEX-NEXT: .LBB39_8:			; VEX-NEXT: .LBB39_8:
	; VEX-NEXT: vshufps {{.*#+}} xmm0 = xmm0[0,1],xmm1[0,0]			; VEX-NEXT: vshufps {{.*#+}} xmm0 = xmm0[0,1],xmm1[0,0]
	; VEX-NEXT: retq			; VEX-NEXT: retq
	;			;
	; AVX512F-LABEL: uitofp_2i64_to_4f32:			; AVX512F-LABEL: uitofp_2i64_to_4f32:
	▲ Show 20 Lines • Show All 218 Lines • ▼ Show 20 Lines
	; VEX-NEXT: movq %rax, %rcx			; VEX-NEXT: movq %rax, %rcx
	; VEX-NEXT: shrq %rcx			; VEX-NEXT: shrq %rcx
	; VEX-NEXT: andl $1, %eax			; VEX-NEXT: andl $1, %eax
	; VEX-NEXT: orq %rcx, %rax			; VEX-NEXT: orq %rcx, %rax
	; VEX-NEXT: vcvtsi2ssq %rax, %xmm2, %xmm0			; VEX-NEXT: vcvtsi2ssq %rax, %xmm2, %xmm0
	; VEX-NEXT: vaddss %xmm0, %xmm0, %xmm0			; VEX-NEXT: vaddss %xmm0, %xmm0, %xmm0
	; VEX-NEXT: .LBB41_6:			; VEX-NEXT: .LBB41_6:
	; VEX-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[2,3]			; VEX-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[2,3]
	; VEX-NEXT: vxorps %xmm1, %xmm1, %xmm1
	; VEX-NEXT: testq %rax, %rax			; VEX-NEXT: testq %rax, %rax
				; VEX-NEXT: vxorps %xmm1, %xmm1, %xmm1
	; VEX-NEXT: js .LBB41_8			; VEX-NEXT: js .LBB41_8
	; VEX-NEXT: # BB#7:			; VEX-NEXT: # BB#7:
	; VEX-NEXT: vcvtsi2ssq %rax, %xmm2, %xmm1			; VEX-NEXT: vcvtsi2ssq %rax, %xmm2, %xmm1
	; VEX-NEXT: .LBB41_8:			; VEX-NEXT: .LBB41_8:
	; VEX-NEXT: vshufps {{.*#+}} xmm0 = xmm0[0,1],xmm1[0,0]			; VEX-NEXT: vshufps {{.*#+}} xmm0 = xmm0[0,1],xmm1[0,0]
	; VEX-NEXT: retq			; VEX-NEXT: retq
	;			;
	; AVX512F-LABEL: uitofp_4i64_to_4f32_undef:			; AVX512F-LABEL: uitofp_4i64_to_4f32_undef:
	▲ Show 20 Lines • Show All 2,999 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/x86-cmov-converter.ll

	Show First 20 Lines • Show All 290 Lines • ▼ Show 20 Lines
	;; ; true-value with false-value			;; ; true-value with false-value
	;; ; Phi instruction cannot use			;; ; Phi instruction cannot use
	;; ; previous Phi instruction result			;; ; previous Phi instruction result
	;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;			;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

	; CHECK-LABEL: Transform			; CHECK-LABEL: Transform
	; CHECK-NOT: cmov			; CHECK-NOT: cmov
	; CHECK: divl [[a:%[0-9a-z]*]]			; CHECK: divl [[a:%[0-9a-z]*]]
	; CHECK: cmpl [[a]], %eax
	; CHECK: movl $11, [[s1:%[0-9a-z]*]]			; CHECK: movl $11, [[s1:%[0-9a-z]*]]
	; CHECK: movl [[a]], [[s2:%[0-9a-z]*]]			; CHECK: movl [[a]], [[s2:%[0-9a-z]*]]
				; CHECK: cmpl [[a]], %edx
	; CHECK: ja [[SinkBB:.*]]			; CHECK: ja [[SinkBB:.*]]
	; CHECK: [[FalseBB:.*]]:			; CHECK: [[FalseBB:.*]]:
	; CHECK: movl $22, [[s1]]			; CHECK: movl $22, [[s1]]
	; CHECK: movl $22, [[s2]]			; CHECK: movl $22, [[s2]]
	; CHECK: [[SinkBB]]:			; CHECK: [[SinkBB]]:
	; CHECK: ja			; CHECK: ja

	define void @Transform(i32 %arr, i32 %arr2, i32 %a, i32 %b, i32 %c, i32 %n) #0 {			define void @Transform(i32 %arr, i32 %arr2, i32 %a, i32 %b, i32 %c, i32 %n) #0 {
	▲ Show 20 Lines • Show All 183 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Provide a separate feature bit for macro fusion support instead of basing it on the AVX flagClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 113195

llvm/trunk/lib/Target/X86/X86.td

llvm/trunk/lib/Target/X86/X86MacroFusion.cpp

llvm/trunk/lib/Target/X86/X86Subtarget.h

llvm/trunk/lib/Target/X86/X86Subtarget.cpp

llvm/trunk/test/CodeGen/X86/avx-select.ll

llvm/trunk/test/CodeGen/X86/avx-splat.ll

llvm/trunk/test/CodeGen/X86/avx512-mask-op.ll

llvm/trunk/test/CodeGen/X86/vec_int_to_fp.ll

llvm/trunk/test/CodeGen/X86/x86-cmov-converter.ll

[X86] Provide a separate feature bit for macro fusion support instead of basing it on the AVX flag
ClosedPublic