Diff 528775

llvm/lib/Target/X86/X86.td

Show First 20 Lines • Show All 416 Lines • ▼ Show 20 Lines
def FeatureHardenSlsIJmp		def FeatureHardenSlsIJmp
: SubtargetFeature<		: SubtargetFeature<
"harden-sls-ijmp", "HardenSlsIJmp", "true",		"harden-sls-ijmp", "HardenSlsIJmp", "true",
"Harden against straight line speculation across indirect JMP instructions.">;		"Harden against straight line speculation across indirect JMP instructions.">;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// X86 Subtarget Tuning features		// X86 Subtarget Tuning features
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
		def TuningPreferMovmskOverVTest : SubtargetFeature<"prefer-movmsk-over-vtest",
		pengfeiUnsubmitted Not Done Reply Inline Actions Should be `Is`? pengfei: Should be `Is`?
		"PreferMovmskOverVTest", "true",
		goldstein.w.nUnsubmitted Not Done Reply Inline Actions Personally think slowvtest is a kind of confusing name b.c vtest also has a perf dropoff from SnB -> HSW. Maybe "PreferMovmskOverVTest" would be clearer? goldstein.w.n: Personally think slowvtest is a kind of confusing name b.c vtest also has a perf dropoff from…
		LuoYuankeAuthorUnsubmitted Done Reply Inline Actions I just follow the previous naming convention. I'm open to "PreferMovmskOverVTest". @RKSimon and @pengfei, what's your opinion? LuoYuanke: I just follow the previous naming convention. I'm open to "PreferMovmskOverVTest". @RKSimon and…
		pengfeiUnsubmitted Not Done Reply Inline Actions I think we should start from `Tuning`, but `TuningPreferMovmskOverVTest` looks verbose.. pengfei: I think we should start from `Tuning`, but `TuningPreferMovmskOverVTest` looks verbose..
		RKSimonUnsubmitted Not Done Reply Inline Actions +1 TuningPreferMovmskOverVTest explains the purpose of the tuning flag better RKSimon: +1 TuningPreferMovmskOverVTest explains the purpose of the tuning flag better
		"Prefer movmsk over vtest instruction">;

def TuningSlowSHLD : SubtargetFeature<"slow-shld", "IsSHLDSlow", "true",		def TuningSlowSHLD : SubtargetFeature<"slow-shld", "IsSHLDSlow", "true",
"SHLD instruction is slow">;		"SHLD instruction is slow">;

def TuningSlowPMULLD : SubtargetFeature<"slow-pmulld", "IsPMULLDSlow", "true",		def TuningSlowPMULLD : SubtargetFeature<"slow-pmulld", "IsPMULLDSlow", "true",
"PMULLD instruction is slow (compared to PMULLW/PMULHW and PMULUDQ)">;		"PMULLD instruction is slow (compared to PMULLW/PMULHW and PMULUDQ)">;

def TuningSlowPMADDWD : SubtargetFeature<"slow-pmaddwd", "IsPMADDWDSlow",		def TuningSlowPMADDWD : SubtargetFeature<"slow-pmaddwd", "IsPMADDWDSlow",
▲ Show 20 Lines • Show All 728 Lines • ▼ Show 20 Lines	list<SubtargetFeature> ADLAdditionalFeatures = [FeatureSERIALIZE,
FeatureLZCNT,		FeatureLZCNT,
FeatureAVXVNNI,		FeatureAVXVNNI,
FeaturePKU,		FeaturePKU,
FeatureHRESET,		FeatureHRESET,
FeatureCLDEMOTE,		FeatureCLDEMOTE,
FeatureMOVDIRI,		FeatureMOVDIRI,
FeatureMOVDIR64B,		FeatureMOVDIR64B,
FeatureWAITPKG];		FeatureWAITPKG];
list<SubtargetFeature> ADLAdditionalTuning = [TuningPERMFalseDeps];		list<SubtargetFeature> ADLAdditionalTuning = [TuningPERMFalseDeps,
		TuningPreferMovmskOverVTest];
list<SubtargetFeature> ADLTuning = !listconcat(SKLTuning, ADLAdditionalTuning);		list<SubtargetFeature> ADLTuning = !listconcat(SKLTuning, ADLAdditionalTuning);
list<SubtargetFeature> ADLFeatures =		list<SubtargetFeature> ADLFeatures =
!listconcat(TRMFeatures, ADLAdditionalFeatures);		!listconcat(TRMFeatures, ADLAdditionalFeatures);

// Sierraforest		// Sierraforest
list<SubtargetFeature> SRFAdditionalFeatures = [FeatureCMPCCXADD,		list<SubtargetFeature> SRFAdditionalFeatures = [FeatureCMPCCXADD,
FeatureAVXIFMA,		FeatureAVXIFMA,
FeatureAVXNECONVERT,		FeatureAVXNECONVERT,
▲ Show 20 Lines • Show All 662 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 32,648 Lines • ▼ Show 20 Lines
	DAG.getConstant(CmpMask, DL, MVT::i32));			DAG.getConstant(CmpMask, DL, MVT::i32));
	}			}
	}			}

	// MOVMSK(PCMPEQ(X,0)) == -1 -> PTESTZ(X,X).			// MOVMSK(PCMPEQ(X,0)) == -1 -> PTESTZ(X,X).
	// MOVMSK(PCMPEQ(X,0)) != -1 -> !PTESTZ(X,X).			// MOVMSK(PCMPEQ(X,0)) != -1 -> !PTESTZ(X,X).
	// MOVMSK(PCMPEQ(X,Y)) == -1 -> PTESTZ(XOR(X,Y),XOR(X,Y)).			// MOVMSK(PCMPEQ(X,Y)) == -1 -> PTESTZ(XOR(X,Y),XOR(X,Y)).
	// MOVMSK(PCMPEQ(X,Y)) != -1 -> !PTESTZ(XOR(X,Y),XOR(X,Y)).			// MOVMSK(PCMPEQ(X,Y)) != -1 -> !PTESTZ(XOR(X,Y),XOR(X,Y)).
	if (IsAllOf && Subtarget.hasSSE41() && IsOneUse) {			if (IsAllOf && Subtarget.hasSSE41() && IsOneUse) {
				RKSimonUnsubmitted Not Done Reply Inline Actions What about these folds? RKSimon: What about these folds?
				LuoYuankeAuthorUnsubmitted Done Reply Inline Actions I'll take a look at it and I prefer to implement it in a separate patch if MOVMSK is better. LuoYuanke: I'll take a look at it and I prefer to implement it in a separate patch if MOVMSK is better.
				LuoYuankeAuthorUnsubmitted Done Reply Inline Actions It seems this transform is good. If I revert the transform, I get below lit test change. -; AVX-LABEL: movmsk_or_v2i64: -; AVX: # %bb.0: -; AVX-NEXT: vpxor %xmm1, %xmm0, %xmm0 -; AVX-NEXT: vptest %xmm0, %xmm0 -; AVX-NEXT: setne %al -; AVX-NEXT: retq +; AVX1OR2-LABEL: movmsk_or_v2i64: +; AVX1OR2: # %bb.0: +; AVX1OR2-NEXT: vpcmpeqq %xmm1, %xmm0, %xmm0 +; AVX1OR2-NEXT: vpcmpeqd %xmm1, %xmm1, %xmm1 +; AVX1OR2-NEXT: vtestpd %xmm1, %xmm0 +; AVX1OR2-NEXT: setae %al +; AVX1OR2-NEXT: retq LuoYuanke: It seems this transform is good. If I revert the transform, I get below lit test change. ```…
	MVT TestVT = VecVT.is128BitVector() ? MVT::v2i64 : MVT::v4i64;			MVT TestVT = VecVT.is128BitVector() ? MVT::v2i64 : MVT::v4i64;
	SDValue BC = peekThroughBitcasts(Vec);			SDValue BC = peekThroughBitcasts(Vec);
	// Ensure MOVMSK was testing every signbit of BC.			// Ensure MOVMSK was testing every signbit of BC.
	if (BC.getValueType().getVectorNumElements() <= NumElts) {			if (BC.getValueType().getVectorNumElements() <= NumElts) {
	if (BC.getOpcode() == X86ISD::PCMPEQ) {			if (BC.getOpcode() == X86ISD::PCMPEQ) {
	SDValue V = DAG.getNode(ISD::XOR, SDLoc(BC), BC.getValueType(),			SDValue V = DAG.getNode(ISD::XOR, SDLoc(BC), BC.getValueType(),
	BC.getOperand(0), BC.getOperand(1));			BC.getOperand(0), BC.getOperand(1));
	V = DAG.getBitcast(TestVT, V);			V = DAG.getBitcast(TestVT, V);
	▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
	// PMOVMSKB(PACKSSBW(LO(X), HI(X)))			// PMOVMSKB(PACKSSBW(LO(X), HI(X)))
	// -> PMOVMSKB(BITCAST_v32i8(X)) & 0xAAAAAAAA.			// -> PMOVMSKB(BITCAST_v32i8(X)) & 0xAAAAAAAA.
	if (CmpBits >= 16 && Subtarget.hasInt256() &&			if (CmpBits >= 16 && Subtarget.hasInt256() &&
	(IsAnyOf \|\| (SignExt0 && SignExt1))) {			(IsAnyOf \|\| (SignExt0 && SignExt1))) {
	if (SDValue Src = getSplitVectorSrc(VecOp0, VecOp1, true)) {			if (SDValue Src = getSplitVectorSrc(VecOp0, VecOp1, true)) {
	SDLoc DL(EFLAGS);			SDLoc DL(EFLAGS);
	SDValue Result = peekThroughBitcasts(Src);			SDValue Result = peekThroughBitcasts(Src);
	if (IsAllOf && Result.getOpcode() == X86ISD::PCMPEQ &&			if (IsAllOf && Result.getOpcode() == X86ISD::PCMPEQ &&
	Result.getValueType().getVectorNumElements() <= NumElts) {			Result.getValueType().getVectorNumElements() <= NumElts) {
				RKSimonUnsubmitted Not Done Reply Inline Actions What about these folds? RKSimon: What about these folds?
				LuoYuankeAuthorUnsubmitted Done Reply Inline Actions Ditto LuoYuanke: Ditto
				LuoYuankeAuthorUnsubmitted Done Reply Inline Actions It seems there is not lit test failure if I disable this code. LuoYuanke: It seems there is not lit test failure if I disable this code.
	SDValue V = DAG.getNode(ISD::XOR, DL, Result.getValueType(),			SDValue V = DAG.getNode(ISD::XOR, DL, Result.getValueType(),
	Result.getOperand(0), Result.getOperand(1));			Result.getOperand(0), Result.getOperand(1));
	V = DAG.getBitcast(MVT::v4i64, V);			V = DAG.getBitcast(MVT::v4i64, V);
	return DAG.getNode(X86ISD::PTEST, SDLoc(EFLAGS), MVT::i32, V, V);			return DAG.getNode(X86ISD::PTEST, SDLoc(EFLAGS), MVT::i32, V, V);
	}			}
	Result = DAG.getBitcast(MVT::v32i8, Result);			Result = DAG.getBitcast(MVT::v32i8, Result);
	Result = DAG.getNode(X86ISD::MOVMSK, DL, MVT::i32, Result);			Result = DAG.getNode(X86ISD::MOVMSK, DL, MVT::i32, Result);
	unsigned CmpMask = IsAnyOf ? 0 : 0xFFFFFFFF;			unsigned CmpMask = IsAnyOf ? 0 : 0xFFFFFFFF;
	Show All 34 Lines
	}			}
	}			}

	// MOVMSKPS(V) !=/== 0 -> TESTPS(V,V)			// MOVMSKPS(V) !=/== 0 -> TESTPS(V,V)
	// MOVMSKPD(V) !=/== 0 -> TESTPD(V,V)			// MOVMSKPD(V) !=/== 0 -> TESTPD(V,V)
	// MOVMSKPS(V) !=/== -1 -> TESTPS(V,V)			// MOVMSKPS(V) !=/== -1 -> TESTPS(V,V)
	// MOVMSKPD(V) !=/== -1 -> TESTPD(V,V)			// MOVMSKPD(V) !=/== -1 -> TESTPD(V,V)
	// iff every element is referenced.			// iff every element is referenced.
	if (NumElts <= CmpBits && Subtarget.hasAVX() && IsOneUse &&			if (NumElts <= CmpBits && Subtarget.hasAVX() &&
				!Subtarget.preferMovmskOverVTest() && IsOneUse &&
	(NumEltBits == 32 \|\| NumEltBits == 64)) {			(NumEltBits == 32 \|\| NumEltBits == 64)) {
	SDLoc DL(EFLAGS);			SDLoc DL(EFLAGS);
	MVT FloatSVT = MVT::getFloatingPointVT(NumEltBits);			MVT FloatSVT = MVT::getFloatingPointVT(NumEltBits);
	MVT FloatVT = MVT::getVectorVT(FloatSVT, NumElts);			MVT FloatVT = MVT::getVectorVT(FloatSVT, NumElts);
	MVT IntVT = FloatVT.changeVectorElementTypeToInteger();			MVT IntVT = FloatVT.changeVectorElementTypeToInteger();
	SDValue LHS = Vec;			SDValue LHS = Vec;
	SDValue RHS = IsAnyOf ? Vec : DAG.getAllOnesConstant(DL, IntVT);			SDValue RHS = IsAnyOf ? Vec : DAG.getAllOnesConstant(DL, IntVT);
	CC = IsAnyOf ? CC : (CC == X86::COND_E ? X86::COND_B : X86::COND_AE);			CC = IsAnyOf ? CC : (CC == X86::COND_E ? X86::COND_B : X86::COND_AE);
	▲ Show 20 Lines • Show All 11,442 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/combine-movmsk-avx.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx \| FileCheck %s --check-prefixes=CHECK,AVX1			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx \| FileCheck %s --check-prefixes=CHECK,AVX1
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx2 \| FileCheck %s --check-prefixes=CHECK,AVX2			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx2 \| FileCheck %s --check-prefixes=CHECK,AVX2
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=alderlake \| FileCheck %s --check-prefixes=ADL			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx2,+prefer-movmsk-over-vtest \| FileCheck %s --check-prefixes=ADL

	declare i32 @llvm.x86.avx.movmsk.pd.256(<4 x double>)			declare i32 @llvm.x86.avx.movmsk.pd.256(<4 x double>)
	declare i32 @llvm.x86.avx.movmsk.ps.256(<8 x float>)			declare i32 @llvm.x86.avx.movmsk.ps.256(<8 x float>)

	; Use widest possible vector for movmsk comparisons (PR37087)			; Use widest possible vector for movmsk comparisons (PR37087)

	define i1 @movmskps_noneof_bitcast_v4f64(<4 x double> %a0) {			define i1 @movmskps_noneof_bitcast_v4f64(<4 x double> %a0) {
	; CHECK-LABEL: movmskps_noneof_bitcast_v4f64:			; CHECK-LABEL: movmskps_noneof_bitcast_v4f64:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: vxorpd %xmm1, %xmm1, %xmm1			; CHECK-NEXT: vxorpd %xmm1, %xmm1, %xmm1
	; CHECK-NEXT: vcmpeqpd %ymm1, %ymm0, %ymm0			; CHECK-NEXT: vcmpeqpd %ymm1, %ymm0, %ymm0
	; CHECK-NEXT: vtestpd %ymm0, %ymm0			; CHECK-NEXT: vtestpd %ymm0, %ymm0
	; CHECK-NEXT: sete %al			; CHECK-NEXT: sete %al
	; CHECK-NEXT: vzeroupper			; CHECK-NEXT: vzeroupper
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	;			;
	; ADL-LABEL: movmskps_noneof_bitcast_v4f64:			; ADL-LABEL: movmskps_noneof_bitcast_v4f64:
	; ADL: # %bb.0:			; ADL: # %bb.0:
	; ADL-NEXT: vxorpd %xmm1, %xmm1, %xmm1			; ADL-NEXT: vxorpd %xmm1, %xmm1, %xmm1
	; ADL-NEXT: vcmpeqpd %ymm1, %ymm0, %ymm0			; ADL-NEXT: vcmpeqpd %ymm1, %ymm0, %ymm0
	; ADL-NEXT: vtestpd %ymm0, %ymm0			; ADL-NEXT: vmovmskpd %ymm0, %eax
				; ADL-NEXT: testl %eax, %eax
	; ADL-NEXT: sete %al			; ADL-NEXT: sete %al
	; ADL-NEXT: vzeroupper			; ADL-NEXT: vzeroupper
	; ADL-NEXT: retq			; ADL-NEXT: retq
	%1 = fcmp oeq <4 x double> %a0, zeroinitializer			%1 = fcmp oeq <4 x double> %a0, zeroinitializer
	%2 = sext <4 x i1> %1 to <4 x i64>			%2 = sext <4 x i1> %1 to <4 x i64>
	%3 = bitcast <4 x i64> %2 to <8 x float>			%3 = bitcast <4 x i64> %2 to <8 x float>
	%4 = tail call i32 @llvm.x86.avx.movmsk.ps.256(<8 x float> %3)			%4 = tail call i32 @llvm.x86.avx.movmsk.ps.256(<8 x float> %3)
	%5 = icmp eq i32 %4, 0			%5 = icmp eq i32 %4, 0
	Show All 20 Lines
	; AVX2-NEXT: setb %al			; AVX2-NEXT: setb %al
	; AVX2-NEXT: vzeroupper			; AVX2-NEXT: vzeroupper
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; ADL-LABEL: movmskps_allof_bitcast_v4f64:			; ADL-LABEL: movmskps_allof_bitcast_v4f64:
	; ADL: # %bb.0:			; ADL: # %bb.0:
	; ADL-NEXT: vxorpd %xmm1, %xmm1, %xmm1			; ADL-NEXT: vxorpd %xmm1, %xmm1, %xmm1
	; ADL-NEXT: vcmpeqpd %ymm1, %ymm0, %ymm0			; ADL-NEXT: vcmpeqpd %ymm1, %ymm0, %ymm0
	; ADL-NEXT: vpcmpeqd %ymm1, %ymm1, %ymm1			; ADL-NEXT: vmovmskpd %ymm0, %eax
	; ADL-NEXT: vtestpd %ymm1, %ymm0			; ADL-NEXT: cmpl $15, %eax
	; ADL-NEXT: setb %al			; ADL-NEXT: sete %al
	; ADL-NEXT: vzeroupper			; ADL-NEXT: vzeroupper
	; ADL-NEXT: retq			; ADL-NEXT: retq
	%1 = fcmp oeq <4 x double> %a0, zeroinitializer			%1 = fcmp oeq <4 x double> %a0, zeroinitializer
	%2 = sext <4 x i1> %1 to <4 x i64>			%2 = sext <4 x i1> %1 to <4 x i64>
	%3 = bitcast <4 x i64> %2 to <8 x float>			%3 = bitcast <4 x i64> %2 to <8 x float>
	%4 = tail call i32 @llvm.x86.avx.movmsk.ps.256(<8 x float> %3)			%4 = tail call i32 @llvm.x86.avx.movmsk.ps.256(<8 x float> %3)
	%5 = icmp eq i32 %4, 255			%5 = icmp eq i32 %4, 255
	ret i1 %5			ret i1 %5
	▲ Show 20 Lines • Show All 125 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: vtestps %xmm0, %xmm0			; CHECK-NEXT: vtestps %xmm0, %xmm0
	; CHECK-NEXT: setne %al			; CHECK-NEXT: setne %al
	; CHECK-NEXT: negl %eax			; CHECK-NEXT: negl %eax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	;			;
	; ADL-LABEL: movmskps_concat_v4f32:			; ADL-LABEL: movmskps_concat_v4f32:
	; ADL: # %bb.0:			; ADL: # %bb.0:
	; ADL-NEXT: vorps %xmm1, %xmm0, %xmm0			; ADL-NEXT: vorps %xmm1, %xmm0, %xmm0
				; ADL-NEXT: vmovmskps %xmm0, %ecx
	; ADL-NEXT: xorl %eax, %eax			; ADL-NEXT: xorl %eax, %eax
	; ADL-NEXT: vtestps %xmm0, %xmm0			; ADL-NEXT: negl %ecx
	; ADL-NEXT: setne %al			; ADL-NEXT: sbbl %eax, %eax
	; ADL-NEXT: negl %eax
	; ADL-NEXT: retq			; ADL-NEXT: retq
	%1 = shufflevector <4 x float> %a0, <4 x float> %a1, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>			%1 = shufflevector <4 x float> %a0, <4 x float> %a1, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
	%2 = tail call i32 @llvm.x86.avx.movmsk.ps.256(<8 x float> %1)			%2 = tail call i32 @llvm.x86.avx.movmsk.ps.256(<8 x float> %1)
	%3 = icmp ne i32 %2, 0			%3 = icmp ne i32 %2, 0
	%4 = sext i1 %3 to i32			%4 = sext i1 %3 to i32
	ret i32 %4			ret i32 %4
	}			}

	Show All 25 Lines

llvm/test/CodeGen/X86/combine-movmsk.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse2 \| FileCheck %s --check-prefixes=SSE,SSE2			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse2 \| FileCheck %s --check-prefixes=SSE,SSE2
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse4.2 \| FileCheck %s --check-prefixes=SSE,SSE42			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse4.2 \| FileCheck %s --check-prefixes=SSE,SSE42
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx \| FileCheck %s --check-prefix=AVX			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx \| FileCheck %s --check-prefix=AVX
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx2 \| FileCheck %s --check-prefix=AVX			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx2 \| FileCheck %s --check-prefix=AVX
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=alderlake \| FileCheck %s --check-prefixes=ADL			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx2,+prefer-movmsk-over-vtest \| FileCheck %s --check-prefixes=ADL
				RKSimonUnsubmitted Not Done Reply Inline Actions Add a variant that tests the tuning flag directly ; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx2,+slow-vtest \| FileCheck %s --check-prefix=ADL RKSimon: Add a variant that tests the tuning flag directly ; RUN: llc < %s -mtriple=x86_64-unknown…

	declare i32 @llvm.x86.sse.movmsk.ps(<4 x float>)			declare i32 @llvm.x86.sse.movmsk.ps(<4 x float>)
	declare i32 @llvm.x86.sse2.movmsk.pd(<2 x double>)			declare i32 @llvm.x86.sse2.movmsk.pd(<2 x double>)
	declare i32 @llvm.x86.sse2.pmovmskb.128(<16 x i8>)			declare i32 @llvm.x86.sse2.pmovmskb.128(<16 x i8>)

	; Use widest possible vector for movmsk comparisons (PR37087)			; Use widest possible vector for movmsk comparisons (PR37087)

	define i1 @movmskps_noneof_bitcast_v2f64(<2 x double> %a0) {			define i1 @movmskps_noneof_bitcast_v2f64(<2 x double> %a0) {
	Show All 13 Lines
	; AVX-NEXT: vtestpd %xmm0, %xmm0			; AVX-NEXT: vtestpd %xmm0, %xmm0
	; AVX-NEXT: sete %al			; AVX-NEXT: sete %al
	; AVX-NEXT: retq			; AVX-NEXT: retq
	;			;
	; ADL-LABEL: movmskps_noneof_bitcast_v2f64:			; ADL-LABEL: movmskps_noneof_bitcast_v2f64:
	; ADL: # %bb.0:			; ADL: # %bb.0:
	; ADL-NEXT: vxorpd %xmm1, %xmm1, %xmm1			; ADL-NEXT: vxorpd %xmm1, %xmm1, %xmm1
	; ADL-NEXT: vcmpeqpd %xmm0, %xmm1, %xmm0			; ADL-NEXT: vcmpeqpd %xmm0, %xmm1, %xmm0
	; ADL-NEXT: vtestpd %xmm0, %xmm0			; ADL-NEXT: vmovmskpd %xmm0, %eax
				; ADL-NEXT: testl %eax, %eax
	; ADL-NEXT: sete %al			; ADL-NEXT: sete %al
	; ADL-NEXT: retq			; ADL-NEXT: retq
	%1 = fcmp oeq <2 x double> zeroinitializer, %a0			%1 = fcmp oeq <2 x double> zeroinitializer, %a0
	%2 = sext <2 x i1> %1 to <2 x i64>			%2 = sext <2 x i1> %1 to <2 x i64>
	%3 = bitcast <2 x i64> %2 to <4 x float>			%3 = bitcast <2 x i64> %2 to <4 x float>
	%4 = tail call i32 @llvm.x86.sse.movmsk.ps(<4 x float> %3)			%4 = tail call i32 @llvm.x86.sse.movmsk.ps(<4 x float> %3)
	%5 = icmp eq i32 %4, 0			%5 = icmp eq i32 %4, 0
	ret i1 %5			ret i1 %5
	Show All 17 Lines
	; AVX-NEXT: vtestpd %xmm1, %xmm0			; AVX-NEXT: vtestpd %xmm1, %xmm0
	; AVX-NEXT: setb %al			; AVX-NEXT: setb %al
	; AVX-NEXT: retq			; AVX-NEXT: retq
	;			;
	; ADL-LABEL: movmskps_allof_bitcast_v2f64:			; ADL-LABEL: movmskps_allof_bitcast_v2f64:
	; ADL: # %bb.0:			; ADL: # %bb.0:
	; ADL-NEXT: vxorpd %xmm1, %xmm1, %xmm1			; ADL-NEXT: vxorpd %xmm1, %xmm1, %xmm1
	; ADL-NEXT: vcmpeqpd %xmm0, %xmm1, %xmm0			; ADL-NEXT: vcmpeqpd %xmm0, %xmm1, %xmm0
	; ADL-NEXT: vpcmpeqd %xmm1, %xmm1, %xmm1			; ADL-NEXT: vmovmskpd %xmm0, %eax
	; ADL-NEXT: vtestpd %xmm1, %xmm0			; ADL-NEXT: cmpl $3, %eax
	; ADL-NEXT: setb %al			; ADL-NEXT: sete %al
	; ADL-NEXT: retq			; ADL-NEXT: retq
	%1 = fcmp oeq <2 x double> zeroinitializer, %a0			%1 = fcmp oeq <2 x double> zeroinitializer, %a0
	%2 = sext <2 x i1> %1 to <2 x i64>			%2 = sext <2 x i1> %1 to <2 x i64>
	%3 = bitcast <2 x i64> %2 to <4 x float>			%3 = bitcast <2 x i64> %2 to <4 x float>
	%4 = tail call i32 @llvm.x86.sse.movmsk.ps(<4 x float> %3)			%4 = tail call i32 @llvm.x86.sse.movmsk.ps(<4 x float> %3)
	%5 = icmp eq i32 %4, 15			%5 = icmp eq i32 %4, 15
	ret i1 %5			ret i1 %5
	}			}
	Show All 17 Lines
	; AVX-LABEL: pmovmskb_noneof_bitcast_v2i64:			; AVX-LABEL: pmovmskb_noneof_bitcast_v2i64:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vtestpd %xmm0, %xmm0			; AVX-NEXT: vtestpd %xmm0, %xmm0
	; AVX-NEXT: sete %al			; AVX-NEXT: sete %al
	; AVX-NEXT: retq			; AVX-NEXT: retq
	;			;
	; ADL-LABEL: pmovmskb_noneof_bitcast_v2i64:			; ADL-LABEL: pmovmskb_noneof_bitcast_v2i64:
	; ADL: # %bb.0:			; ADL: # %bb.0:
	; ADL-NEXT: vtestpd %xmm0, %xmm0			; ADL-NEXT: vmovmskpd %xmm0, %eax
				; ADL-NEXT: testl %eax, %eax
	; ADL-NEXT: sete %al			; ADL-NEXT: sete %al
	; ADL-NEXT: retq			; ADL-NEXT: retq
	%1 = icmp sgt <2 x i64> zeroinitializer, %a0			%1 = icmp sgt <2 x i64> zeroinitializer, %a0
	%2 = sext <2 x i1> %1 to <2 x i64>			%2 = sext <2 x i1> %1 to <2 x i64>
	%3 = bitcast <2 x i64> %2 to <16 x i8>			%3 = bitcast <2 x i64> %2 to <16 x i8>
	%4 = tail call i32 @llvm.x86.sse2.pmovmskb.128(<16 x i8> %3)			%4 = tail call i32 @llvm.x86.sse2.pmovmskb.128(<16 x i8> %3)
	%5 = icmp eq i32 %4, 0			%5 = icmp eq i32 %4, 0
	ret i1 %5			ret i1 %5
	Show All 19 Lines
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vpcmpeqd %xmm1, %xmm1, %xmm1			; AVX-NEXT: vpcmpeqd %xmm1, %xmm1, %xmm1
	; AVX-NEXT: vtestpd %xmm1, %xmm0			; AVX-NEXT: vtestpd %xmm1, %xmm0
	; AVX-NEXT: setb %al			; AVX-NEXT: setb %al
	; AVX-NEXT: retq			; AVX-NEXT: retq
	;			;
	; ADL-LABEL: pmovmskb_allof_bitcast_v2i64:			; ADL-LABEL: pmovmskb_allof_bitcast_v2i64:
	; ADL: # %bb.0:			; ADL: # %bb.0:
	; ADL-NEXT: vpcmpeqd %xmm1, %xmm1, %xmm1			; ADL-NEXT: vmovmskpd %xmm0, %eax
	; ADL-NEXT: vtestpd %xmm1, %xmm0			; ADL-NEXT: cmpl $3, %eax
	; ADL-NEXT: setb %al			; ADL-NEXT: sete %al
	; ADL-NEXT: retq			; ADL-NEXT: retq
	%1 = icmp sgt <2 x i64> zeroinitializer, %a0			%1 = icmp sgt <2 x i64> zeroinitializer, %a0
	%2 = sext <2 x i1> %1 to <2 x i64>			%2 = sext <2 x i1> %1 to <2 x i64>
	%3 = bitcast <2 x i64> %2 to <16 x i8>			%3 = bitcast <2 x i64> %2 to <16 x i8>
	%4 = tail call i32 @llvm.x86.sse2.pmovmskb.128(<16 x i8> %3)			%4 = tail call i32 @llvm.x86.sse2.pmovmskb.128(<16 x i8> %3)
	%5 = icmp eq i32 %4, 65535			%5 = icmp eq i32 %4, 65535
	ret i1 %5			ret i1 %5
	}			}
	Show All 15 Lines
	; AVX-NEXT: vtestps %xmm0, %xmm0			; AVX-NEXT: vtestps %xmm0, %xmm0
	; AVX-NEXT: sete %al			; AVX-NEXT: sete %al
	; AVX-NEXT: retq			; AVX-NEXT: retq
	;			;
	; ADL-LABEL: pmovmskb_noneof_bitcast_v4f32:			; ADL-LABEL: pmovmskb_noneof_bitcast_v4f32:
	; ADL: # %bb.0:			; ADL: # %bb.0:
	; ADL-NEXT: vxorps %xmm1, %xmm1, %xmm1			; ADL-NEXT: vxorps %xmm1, %xmm1, %xmm1
	; ADL-NEXT: vcmpeqps %xmm1, %xmm0, %xmm0			; ADL-NEXT: vcmpeqps %xmm1, %xmm0, %xmm0
	; ADL-NEXT: vtestps %xmm0, %xmm0			; ADL-NEXT: vmovmskps %xmm0, %eax
				; ADL-NEXT: testl %eax, %eax
	; ADL-NEXT: sete %al			; ADL-NEXT: sete %al
	; ADL-NEXT: retq			; ADL-NEXT: retq
	%1 = fcmp oeq <4 x float> %a0, zeroinitializer			%1 = fcmp oeq <4 x float> %a0, zeroinitializer
	%2 = sext <4 x i1> %1 to <4 x i32>			%2 = sext <4 x i1> %1 to <4 x i32>
	%3 = bitcast <4 x i32> %2 to <16 x i8>			%3 = bitcast <4 x i32> %2 to <16 x i8>
	%4 = tail call i32 @llvm.x86.sse2.pmovmskb.128(<16 x i8> %3)			%4 = tail call i32 @llvm.x86.sse2.pmovmskb.128(<16 x i8> %3)
	%5 = icmp eq i32 %4, 0			%5 = icmp eq i32 %4, 0
	ret i1 %5			ret i1 %5
	Show All 17 Lines
	; AVX-NEXT: vtestps %xmm1, %xmm0			; AVX-NEXT: vtestps %xmm1, %xmm0
	; AVX-NEXT: setb %al			; AVX-NEXT: setb %al
	; AVX-NEXT: retq			; AVX-NEXT: retq
	;			;
	; ADL-LABEL: pmovmskb_allof_bitcast_v4f32:			; ADL-LABEL: pmovmskb_allof_bitcast_v4f32:
	; ADL: # %bb.0:			; ADL: # %bb.0:
	; ADL-NEXT: vxorps %xmm1, %xmm1, %xmm1			; ADL-NEXT: vxorps %xmm1, %xmm1, %xmm1
	; ADL-NEXT: vcmpeqps %xmm1, %xmm0, %xmm0			; ADL-NEXT: vcmpeqps %xmm1, %xmm0, %xmm0
	; ADL-NEXT: vpcmpeqd %xmm1, %xmm1, %xmm1			; ADL-NEXT: vmovmskps %xmm0, %eax
	; ADL-NEXT: vtestps %xmm1, %xmm0			; ADL-NEXT: cmpl $15, %eax
	; ADL-NEXT: setb %al			; ADL-NEXT: sete %al
	; ADL-NEXT: retq			; ADL-NEXT: retq
	%1 = fcmp oeq <4 x float> %a0, zeroinitializer			%1 = fcmp oeq <4 x float> %a0, zeroinitializer
	%2 = sext <4 x i1> %1 to <4 x i32>			%2 = sext <4 x i1> %1 to <4 x i32>
	%3 = bitcast <4 x i32> %2 to <16 x i8>			%3 = bitcast <4 x i32> %2 to <16 x i8>
	%4 = tail call i32 @llvm.x86.sse2.pmovmskb.128(<16 x i8> %3)			%4 = tail call i32 @llvm.x86.sse2.pmovmskb.128(<16 x i8> %3)
	%5 = icmp eq i32 %4, 65535			%5 = icmp eq i32 %4, 65535
	ret i1 %5			ret i1 %5
	}			}
	▲ Show 20 Lines • Show All 287 Lines • ▼ Show 20 Lines
	; AVX-NEXT: vtestps %xmm1, %xmm0			; AVX-NEXT: vtestps %xmm1, %xmm0
	; AVX-NEXT: sbbl %eax, %eax			; AVX-NEXT: sbbl %eax, %eax
	; AVX-NEXT: retq			; AVX-NEXT: retq
	;			;
	; ADL-LABEL: movmskps_ptest_numelts_mismatch:			; ADL-LABEL: movmskps_ptest_numelts_mismatch:
	; ADL: # %bb.0:			; ADL: # %bb.0:
	; ADL-NEXT: vpxor %xmm1, %xmm1, %xmm1			; ADL-NEXT: vpxor %xmm1, %xmm1, %xmm1
	; ADL-NEXT: vpcmpeqb %xmm1, %xmm0, %xmm0			; ADL-NEXT: vpcmpeqb %xmm1, %xmm0, %xmm0
	; ADL-NEXT: vpcmpeqd %xmm1, %xmm1, %xmm1			; ADL-NEXT: vmovmskps %xmm0, %ecx
	; ADL-NEXT: xorl %eax, %eax			; ADL-NEXT: xorl %eax, %eax
	; ADL-NEXT: vtestps %xmm1, %xmm0			; ADL-NEXT: cmpl $15, %ecx
	; ADL-NEXT: sbbl %eax, %eax			; ADL-NEXT: sete %al
				; ADL-NEXT: negl %eax
	; ADL-NEXT: retq			; ADL-NEXT: retq
	%1 = icmp eq <16 x i8> %a0, zeroinitializer			%1 = icmp eq <16 x i8> %a0, zeroinitializer
	%2 = sext <16 x i1> %1 to <16 x i8>			%2 = sext <16 x i1> %1 to <16 x i8>
	%3 = bitcast <16 x i8> %2 to <4 x float>			%3 = bitcast <16 x i8> %2 to <4 x float>
	%4 = tail call i32 @llvm.x86.sse.movmsk.ps(<4 x float> %3)			%4 = tail call i32 @llvm.x86.sse.movmsk.ps(<4 x float> %3)
	%5 = icmp eq i32 %4, 15			%5 = icmp eq i32 %4, 15
	%6 = sext i1 %5 to i32			%6 = sext i1 %5 to i32
	ret i32 %6			ret i32 %6
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Prefer vmovmsk instead of vtest for alderlake.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 528775

llvm/lib/Target/X86/X86.td

llvm/lib/Target/X86/X86ISelLowering.cpp

llvm/test/CodeGen/X86/combine-movmsk-avx.ll

llvm/test/CodeGen/X86/combine-movmsk.ll

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Prefer vmovmsk instead of vtest for alderlake.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 528775

llvm/lib/Target/X86/X86.td

llvm/lib/Target/X86/X86ISelLowering.cpp

llvm/test/CodeGen/X86/combine-movmsk-avx.ll

llvm/test/CodeGen/X86/combine-movmsk.ll

[X86] Prefer vmovmsk instead of vtest for alderlake.
ClosedPublic