This is an archive of the discontinued LLVM Phabricator instance.

[X86] Prefer KORTEST on Knights Landing or later for memcmp()
ClosedPublic

Authored by davezarzycki on Oct 17 2019, 9:24 PM.

Download Raw Diff

Details

Reviewers

craig.topper
RKSimon
spatel

Summary

PTEST and especially the MOVMSK instructions are slow on Knights Landing or later. As a bonus, this patch increases instruction parallelism by emitting.

KORTEST(PCMPNEQ(a, b), PCMPNEQ(c, d)) == 0

Instead of:

KORTEST(AND(PCMPEQ(a, b), PCMPEQ(c, d))) == ~0

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

davezarzycki created this revision.Oct 17 2019, 9:24 PM

craig.topper added inline comments.Oct 17 2019, 10:07 PM

lib/Target/X86/X86ISelLowering.cpp
42635	WidenVector isn't officially zeroing the upper bits. It's inserting into an undef vector. The assembly for the test cases is coming out correctly, but I think we really need to explicitly put 0s in the upper bits in the DAG.

No longer leave the upper vector register as UNDEF.

RKSimon mentioned this in rL375215: [X86] Regenerate memcmp tests and add X64-AVX512 common prefix.Oct 18 2019, 3:04 AM

RKSimon mentioned this in rGef04598e1473: [X86] Regenerate memcmp tests and add X64-AVX512 common prefix.

Rebased the patch after r375215

Ping. I think this is ready to land. What am I missing? Thanks!

Sorry for the delay a lot of us have been traveling and at the developers conference.

lib/Target/X86/X86ISelLowering.cpp
42612	This isn't ScalarToVector. It's Vector to wider Vector right?
42620	You can just use getIntPtrConstant here. VecIdxVT will return pointer type. That's consistent with other insert_subvectors.
lib/Target/X86/X86TargetTransformInfo.h
88	I think this should be with the CodeGen control options. The FeaturePrefer128Bit/256Bit were special because they are properties of the CPUs and they can be implied by a function attribute. I can't explain why SlowUAMem32 and SlowUAMem16 are in separate sections....

Thanks for getting back to me. This isn't urgent so please enjoy the conference!

lib/Target/X86/X86ISelLowering.cpp
42612	The memcmp expansion creates large scalar values and that are normally bitcast to a vector with this closure. In the case of Xeon Phi, it may also widen the vector too. If you have a better name for the closure, I'll happily rename it.
lib/Target/X86/X86TargetTransformInfo.h
88	Interesting. I was seriously considering naming this "SlowPTESTAndMOVMSK" to be consistent with the "Fast" and "Slow" pattern for fast/slow instructions. I can make this a CodeGen control option if you want, but please help me understand why slow PTEST/MOVMSK instructions (a.k.a. "prefer mask registers") is different than the other slow feature flags. Thanks!

craig.topper added inline comments.Oct 24 2019, 10:48 AM

lib/Target/X86/X86ISelLowering.cpp
42612	I missed that the bitcast is in here too. So it is scalar to vector. Sorry about that.
lib/Target/X86/X86TargetTransformInfo.h
88	Its not different that's why I wanted it grouped with the Fast/Slow flags.

Updated to address the last round of feedback. Also, this is rebased on top of D69222 so that we can see more of the net code gen changes.

davezarzycki mentioned this in D69222: [X86] NFC: expand inline memcmp test coverage.Oct 26 2019, 5:58 AM

LGTM

This revision is now accepted and ready to land.Oct 26 2019, 10:19 AM

11c920207afa92ad13fdf72daba14c9af336293a

Revision Contents

Path

Size


	llvm/

lib/

Target/

X86/

X86.td

5 lines

X86ISelLowering.cpp

67 lines

X86Subtarget.h

4 lines

X86TargetTransformInfo.h

1 line

test/

CodeGen/

X86/

memcmp.ll

260 lines

setcc-wide-types.ll

48 lines

Diff 225577

lib/Target/X86/X86.td

Show First 20 Lines • Show All 380 Lines • ▼ Show 20 Lines
def FeaturePrefer128Bit		def FeaturePrefer128Bit
: SubtargetFeature<"prefer-128-bit", "Prefer128Bit", "true",		: SubtargetFeature<"prefer-128-bit", "Prefer128Bit", "true",
"Prefer 128-bit AVX instructions">;		"Prefer 128-bit AVX instructions">;

def FeaturePrefer256Bit		def FeaturePrefer256Bit
: SubtargetFeature<"prefer-256-bit", "Prefer256Bit", "true",		: SubtargetFeature<"prefer-256-bit", "Prefer256Bit", "true",
"Prefer 256-bit AVX instructions">;		"Prefer 256-bit AVX instructions">;

		def FeaturePreferMaskRegisters
		: SubtargetFeature<"prefer-mask-registers", "PreferMaskRegisters", "true",
		"Prefer AVX512 mask registers over PTEST/MOVMSK">;

// Lower indirect calls using a special construct called a `retpoline` to		// Lower indirect calls using a special construct called a `retpoline` to
// mitigate potential Spectre v2 attacks against them.		// mitigate potential Spectre v2 attacks against them.
def FeatureRetpolineIndirectCalls		def FeatureRetpolineIndirectCalls
: SubtargetFeature<		: SubtargetFeature<
"retpoline-indirect-calls", "UseRetpolineIndirectCalls", "true",		"retpoline-indirect-calls", "UseRetpolineIndirectCalls", "true",
"Remove speculation of indirect calls from the generated code">;		"Remove speculation of indirect calls from the generated code">;

// Lower indirect branches and switches either using conditional branch trees		// Lower indirect branches and switches either using conditional branch trees
▲ Show 20 Lines • Show All 399 Lines • ▼ Show 20 Lines	list<SubtargetFeature> KNLFeatures = [FeatureX87,
FeatureADX,		FeatureADX,
FeatureRDSEED,		FeatureRDSEED,
FeatureMOVBE,		FeatureMOVBE,
FeatureLZCNT,		FeatureLZCNT,
FeatureBMI,		FeatureBMI,
FeatureBMI2,		FeatureBMI2,
FeatureFMA,		FeatureFMA,
FeaturePRFCHW,		FeaturePRFCHW,
		FeaturePreferMaskRegisters,
FeatureSlowTwoMemOps,		FeatureSlowTwoMemOps,
FeatureFastPartialYMMorZMMWrite,		FeatureFastPartialYMMorZMMWrite,
FeatureHasFastGather,		FeatureHasFastGather,
FeatureSlowPMADDWD];		FeatureSlowPMADDWD];
// TODO Add AVX5124FMAPS/AVX5124VNNIW features		// TODO Add AVX5124FMAPS/AVX5124VNNIW features
list<SubtargetFeature> KNMFeatures =		list<SubtargetFeature> KNMFeatures =
!listconcat(KNLFeatures, [FeatureVPOPCNTDQ]);		!listconcat(KNLFeatures, [FeatureVPOPCNTDQ]);

▲ Show 20 Lines • Show All 480 Lines • Show Last 20 Lines

lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 42,568 Lines • ▼ Show 20 Lines	static SDValue combineVectorSizedSetCCEquality(SDNode *SetCC, SelectionDAG &DAG,
if ((!IsVectorBitCastCheap(X) \|\| !IsVectorBitCastCheap(Y)) &&		if ((!IsVectorBitCastCheap(X) \|\| !IsVectorBitCastCheap(Y)) &&
!IsOrXorXorCCZero)		!IsOrXorXorCCZero)
return SDValue();		return SDValue();

EVT VT = SetCC->getValueType(0);		EVT VT = SetCC->getValueType(0);
SDLoc DL(SetCC);		SDLoc DL(SetCC);
bool HasAVX = Subtarget.hasAVX();		bool HasAVX = Subtarget.hasAVX();

// Use XOR (plus OR) and PTEST after SSE4.1 and before AVX512.		// Use XOR (plus OR) and PTEST after SSE4.1 for 128/256-bit operands.
		// Use PCMPNEQ (plus OR) and KORTEST for 512-bit operands.
// Otherwise use PCMPEQ (plus AND) and mask testing.		// Otherwise use PCMPEQ (plus AND) and mask testing.
if ((OpSize == 128 && Subtarget.hasSSE2()) \|\|		if ((OpSize == 128 && Subtarget.hasSSE2()) \|\|
(OpSize == 256 && HasAVX) \|\|		(OpSize == 256 && HasAVX) \|\|
(OpSize == 512 && Subtarget.useAVX512Regs())) {		(OpSize == 512 && Subtarget.useAVX512Regs())) {
bool HasPT = Subtarget.hasSSE41();		bool HasPT = Subtarget.hasSSE41();

		// PTEST and MOVMSK are slow on Knights Landing and Knights Mill and widened
		// vector registers are essentially free. (Technically, widening registers
		// prevents load folding, but the tradeoff is worth it.)
		bool PreferKOT = Subtarget.preferMaskRegisters();
		bool NeedZExt = PreferKOT && !Subtarget.hasVLX() && OpSize != 512;

EVT VecVT = MVT::v16i8;		EVT VecVT = MVT::v16i8;
EVT CmpVT = MVT::v16i8;		EVT CmpVT = PreferKOT ? MVT::v16i1 : VecVT;
if (OpSize == 256)		if (OpSize == 256) {
VecVT = CmpVT = MVT::v32i8;		VecVT = MVT::v32i8;
if (OpSize == 512) {		CmpVT = PreferKOT ? MVT::v32i1 : VecVT;
		}
		EVT CastVT = VecVT;
		if (OpSize == 512 \|\| NeedZExt) {
if (Subtarget.hasBWI()) {		if (Subtarget.hasBWI()) {
VecVT = MVT::v64i8;		VecVT = MVT::v64i8;
CmpVT = MVT::v64i1;		CmpVT = MVT::v64i1;
		if (OpSize == 512)
		CastVT = VecVT;
} else {		} else {
VecVT = MVT::v16i32;		VecVT = MVT::v16i32;
CmpVT = MVT::v16i1;		CmpVT = MVT::v16i1;
		CastVT = OpSize == 512 ? VecVT :
		OpSize == 256 ? MVT::v8i32 : MVT::v4i32;
}		}
}		}

		auto ScalarToVector = [&](SDValue X) -> SDValue {
		craig.topperUnsubmitted Not Done Reply Inline Actions This isn't ScalarToVector. It's Vector to wider Vector right? craig.topper: This isn't ScalarToVector. It's Vector to wider Vector right?
		davezarzyckiAuthorUnsubmitted Done Reply Inline Actions The memcmp expansion creates large scalar values and that are normally bitcast to a vector with this closure. In the case of Xeon Phi, it may also widen the vector too. If you have a better name for the closure, I'll happily rename it. davezarzycki: The memcmp expansion creates large scalar values and that are normally bitcast to a vector with…
		craig.topperUnsubmitted Not Done Reply Inline Actions I missed that the bitcast is in here too. So it is scalar to vector. Sorry about that. craig.topper: I missed that the bitcast is in here too. So it is scalar to vector. Sorry about that.
		X = DAG.getBitcast(CastVT, X);
		if (!NeedZExt)
		return X;
		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
		MVT VecIdxVT = TLI.getVectorIdxTy(DAG.getDataLayout());
		return DAG.getNode(ISD::INSERT_SUBVECTOR, DL, VecVT,
		DAG.getConstant(0, DL, VecVT), X,
		DAG.getConstant(0, DL, VecIdxVT));
		craig.topperUnsubmitted Not Done Reply Inline Actions You can just use getIntPtrConstant here. VecIdxVT will return pointer type. That's consistent with other insert_subvectors. craig.topper: You can just use getIntPtrConstant here. VecIdxVT will return pointer type. That's consistent…
		};

SDValue Cmp;		SDValue Cmp;
if (IsOrXorXorCCZero) {		if (IsOrXorXorCCZero) {
// This is a bitwise-combined equality comparison of 2 pairs of vectors:		// This is a bitwise-combined equality comparison of 2 pairs of vectors:
// setcc i128 (or (xor A, B), (xor C, D)), 0, eq\|ne		// setcc i128 (or (xor A, B), (xor C, D)), 0, eq\|ne
// Use 2 vector equality compares and 'and' the results before doing a		// Use 2 vector equality compares and 'and' the results before doing a
// MOVMSK.		// MOVMSK.
SDValue A = DAG.getBitcast(VecVT, X.getOperand(0).getOperand(0));		SDValue A = ScalarToVector(X.getOperand(0).getOperand(0));
SDValue B = DAG.getBitcast(VecVT, X.getOperand(0).getOperand(1));		SDValue B = ScalarToVector(X.getOperand(0).getOperand(1));
SDValue C = DAG.getBitcast(VecVT, X.getOperand(1).getOperand(0));		SDValue C = ScalarToVector(X.getOperand(1).getOperand(0));
SDValue D = DAG.getBitcast(VecVT, X.getOperand(1).getOperand(1));		SDValue D = ScalarToVector(X.getOperand(1).getOperand(1));
if (VecVT == CmpVT && HasPT) {		if (VecVT != CmpVT) {
		SDValue Cmp1 = DAG.getSetCC(DL, CmpVT, A, B, ISD::SETNE);
		SDValue Cmp2 = DAG.getSetCC(DL, CmpVT, C, D, ISD::SETNE);
		craig.topperUnsubmitted Not Done Reply Inline Actions WidenVector isn't officially zeroing the upper bits. It's inserting into an undef vector. The assembly for the test cases is coming out correctly, but I think we really need to explicitly put 0s in the upper bits in the DAG. craig.topper: WidenVector isn't officially zeroing the upper bits. It's inserting into an undef vector. The…
		Cmp = DAG.getNode(ISD::OR, DL, CmpVT, Cmp1, Cmp2);
		} else if (HasPT) {
SDValue Cmp1 = DAG.getNode(ISD::XOR, DL, VecVT, A, B);		SDValue Cmp1 = DAG.getNode(ISD::XOR, DL, VecVT, A, B);
SDValue Cmp2 = DAG.getNode(ISD::XOR, DL, VecVT, C, D);		SDValue Cmp2 = DAG.getNode(ISD::XOR, DL, VecVT, C, D);
Cmp = DAG.getNode(ISD::OR, DL, VecVT, Cmp1, Cmp2);		Cmp = DAG.getNode(ISD::OR, DL, VecVT, Cmp1, Cmp2);
} else {		} else {
SDValue Cmp1 = DAG.getSetCC(DL, CmpVT, A, B, ISD::SETEQ);		SDValue Cmp1 = DAG.getSetCC(DL, CmpVT, A, B, ISD::SETEQ);
SDValue Cmp2 = DAG.getSetCC(DL, CmpVT, C, D, ISD::SETEQ);		SDValue Cmp2 = DAG.getSetCC(DL, CmpVT, C, D, ISD::SETEQ);
Cmp = DAG.getNode(ISD::AND, DL, CmpVT, Cmp1, Cmp2);		Cmp = DAG.getNode(ISD::AND, DL, CmpVT, Cmp1, Cmp2);
}		}
} else {		} else {
SDValue VecX = DAG.getBitcast(VecVT, X);		SDValue VecX = ScalarToVector(X);
SDValue VecY = DAG.getBitcast(VecVT, Y);		SDValue VecY = ScalarToVector(Y);
if (VecVT == CmpVT && HasPT) {		if (VecVT != CmpVT) {
		Cmp = DAG.getSetCC(DL, CmpVT, VecX, VecY, ISD::SETNE);
		} else if (HasPT) {
Cmp = DAG.getNode(ISD::XOR, DL, VecVT, VecX, VecY);		Cmp = DAG.getNode(ISD::XOR, DL, VecVT, VecX, VecY);
} else {		} else {
Cmp = DAG.getSetCC(DL, CmpVT, VecX, VecY, ISD::SETEQ);		Cmp = DAG.getSetCC(DL, CmpVT, VecX, VecY, ISD::SETEQ);
}		}
}		}
// For 512-bits we want to emit a setcc that will lower to kortest.		// AVX512 should emit a setcc that will lower to kortest.
if (VecVT != CmpVT) {		if (VecVT != CmpVT) {
EVT KRegVT = CmpVT == MVT::v64i1 ? MVT::i64 : MVT::i16;		EVT KRegVT = CmpVT == MVT::v64i1 ? MVT::i64 :
SDValue Mask = DAG.getAllOnesConstant(DL, KRegVT);		CmpVT == MVT::v32i1 ? MVT::i32 : MVT::i16;
return DAG.getSetCC(DL, VT, DAG.getBitcast(KRegVT, Cmp), Mask, CC);		return DAG.getSetCC(DL, VT, DAG.getBitcast(KRegVT, Cmp),
		DAG.getConstant(0, DL, KRegVT), CC);
}		}
if (HasPT) {		if (HasPT) {
SDValue BCCmp = DAG.getBitcast(OpSize == 256 ? MVT::v4i64 : MVT::v2i64,		SDValue BCCmp = DAG.getBitcast(OpSize == 256 ? MVT::v4i64 : MVT::v2i64,
Cmp);		Cmp);
SDValue PT = DAG.getNode(X86ISD::PTEST, DL, MVT::i32, BCCmp, BCCmp);		SDValue PT = DAG.getNode(X86ISD::PTEST, DL, MVT::i32, BCCmp, BCCmp);
X86::CondCode X86CC = CC == ISD::SETEQ ? X86::COND_E : X86::COND_NE;		X86::CondCode X86CC = CC == ISD::SETEQ ? X86::COND_E : X86::COND_NE;
SDValue SetCC = getSETCC(X86CC, PT, DL, DAG);		SDValue SetCC = getSETCC(X86CC, PT, DL, DAG);
return DAG.getNode(ISD::TRUNCATE, DL, VT, SetCC.getValue(0));		return DAG.getNode(ISD::TRUNCATE, DL, VT, SetCC.getValue(0));
▲ Show 20 Lines • Show All 3,450 Lines • Show Last 20 Lines

lib/Target/X86/X86Subtarget.h

Show First 20 Lines • Show All 439 Lines • ▼ Show 20 Lines	protected:
unsigned MaxInlineSizeThreshold = 128;		unsigned MaxInlineSizeThreshold = 128;

/// Indicates target prefers 128 bit instructions.		/// Indicates target prefers 128 bit instructions.
bool Prefer128Bit = false;		bool Prefer128Bit = false;

/// Indicates target prefers 256 bit instructions.		/// Indicates target prefers 256 bit instructions.
bool Prefer256Bit = false;		bool Prefer256Bit = false;

		/// Indicates target prefers AVX512 mask registers.
		bool PreferMaskRegisters = false;

/// Threeway branch is profitable in this subtarget.		/// Threeway branch is profitable in this subtarget.
bool ThreewayBranchProfitable = false;		bool ThreewayBranchProfitable = false;

/// What processor and OS we're targeting.		/// What processor and OS we're targeting.
Triple TargetTriple;		Triple TargetTriple;

/// GlobalISel related APIs.		/// GlobalISel related APIs.
std::unique_ptr<CallLowering> CallLoweringInfo;		std::unique_ptr<CallLowering> CallLoweringInfo;
▲ Show 20 Lines • Show All 245 Lines • ▼ Show 20 Lines	public:
bool threewayBranchProfitable() const { return ThreewayBranchProfitable; }		bool threewayBranchProfitable() const { return ThreewayBranchProfitable; }
bool hasINVPCID() const { return HasINVPCID; }		bool hasINVPCID() const { return HasINVPCID; }
bool hasENQCMD() const { return HasENQCMD; }		bool hasENQCMD() const { return HasENQCMD; }
bool useRetpolineIndirectCalls() const { return UseRetpolineIndirectCalls; }		bool useRetpolineIndirectCalls() const { return UseRetpolineIndirectCalls; }
bool useRetpolineIndirectBranches() const {		bool useRetpolineIndirectBranches() const {
return UseRetpolineIndirectBranches;		return UseRetpolineIndirectBranches;
}		}
bool useRetpolineExternalThunk() const { return UseRetpolineExternalThunk; }		bool useRetpolineExternalThunk() const { return UseRetpolineExternalThunk; }
		bool preferMaskRegisters() const { return PreferMaskRegisters; }

unsigned getPreferVectorWidth() const { return PreferVectorWidth; }		unsigned getPreferVectorWidth() const { return PreferVectorWidth; }
unsigned getRequiredVectorWidth() const { return RequiredVectorWidth; }		unsigned getRequiredVectorWidth() const { return RequiredVectorWidth; }

// Helper functions to determine when we should allow widening to 512-bit		// Helper functions to determine when we should allow widening to 512-bit
// during codegen.		// during codegen.
// TODO: Currently we're always allowing widening on CPUs without VLX,		// TODO: Currently we're always allowing widening on CPUs without VLX,
// because for many cases we don't have a better option.		// because for many cases we don't have a better option.
▲ Show 20 Lines • Show All 165 Lines • Show Last 20 Lines

lib/Target/X86/X86TargetTransformInfo.h

Show First 20 Lines • Show All 79 Lines • ▼ Show 20 Lines	const FeatureBitset InlineFeatureIgnoreList = {

// Perf-tuning flags.		// Perf-tuning flags.
X86::FeatureHasFastGather,		X86::FeatureHasFastGather,
X86::FeatureSlowUAMem32,		X86::FeatureSlowUAMem32,

// Based on whether user set the -mprefer-vector-width command line.		// Based on whether user set the -mprefer-vector-width command line.
X86::FeaturePrefer128Bit,		X86::FeaturePrefer128Bit,
X86::FeaturePrefer256Bit,		X86::FeaturePrefer256Bit,
		X86::FeaturePreferMaskRegisters,
		craig.topperUnsubmitted Not Done Reply Inline Actions I think this should be with the CodeGen control options. The FeaturePrefer128Bit/256Bit were special because they are properties of the CPUs and they can be implied by a function attribute. I can't explain why SlowUAMem32 and SlowUAMem16 are in separate sections.... craig.topper: I think this should be with the CodeGen control options. The FeaturePrefer128Bit/256Bit were…
		davezarzyckiAuthorUnsubmitted Done Reply Inline Actions Interesting. I was seriously considering naming this "SlowPTESTAndMOVMSK" to be consistent with the "Fast" and "Slow" pattern for fast/slow instructions. I can make this a CodeGen control option if you want, but please help me understand why slow PTEST/MOVMSK instructions (a.k.a. "prefer mask registers") is different than the other slow feature flags. Thanks! davezarzycki: Interesting. I was seriously considering naming this "SlowPTESTAndMOVMSK" to be consistent with…
		craig.topperUnsubmitted Not Done Reply Inline Actions Its not different that's why I wanted it grouped with the Fast/Slow flags. craig.topper: Its not different that's why I wanted it grouped with the Fast/Slow flags.

// CPU name enums. These just follow CPU string.		// CPU name enums. These just follow CPU string.
X86::ProcIntelAtom,		X86::ProcIntelAtom,
X86::ProcIntelGLM,		X86::ProcIntelGLM,
X86::ProcIntelGLP,		X86::ProcIntelGLP,
X86::ProcIntelSLM,		X86::ProcIntelSLM,
X86::ProcIntelTRM,		X86::ProcIntelTRM,
};		};
▲ Show 20 Lines • Show All 122 Lines • Show Last 20 Lines

test/CodeGen/X86/memcmp.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=cmov \| FileCheck %s --check-prefix=X86 --check-prefix=X86-NOSSE			; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=cmov \| FileCheck %s --check-prefix=X86 --check-prefix=X86-NOSSE
	; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+sse \| FileCheck %s --check-prefix=X86 --check-prefix=SSE --check-prefix=X86-SSE1			; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+sse \| FileCheck %s --check-prefix=X86 --check-prefix=SSE --check-prefix=X86-SSE1
	; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+sse2 \| FileCheck %s --check-prefix=X86 --check-prefix=SSE --check-prefix=X86-SSE2			; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+sse2 \| FileCheck %s --check-prefix=X86 --check-prefix=SSE --check-prefix=X86-SSE2
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown \| FileCheck %s --check-prefix=X64 --check-prefix=X64-SSE2			; RUN: llc < %s -mtriple=x86_64-unknown-unknown \| FileCheck %s --check-prefix=X64 --check-prefix=X64-SSE2
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx \| FileCheck %s --check-prefix=X64 --check-prefix=X64-AVX --check-prefix=X64-AVX1			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx \| FileCheck %s --check-prefix=X64 --check-prefix=X64-AVX --check-prefix=X64-AVX1
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx2 \| FileCheck %s --check-prefix=X64 --check-prefix=X64-AVX --check-prefix=X64-AVX2			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx2 \| FileCheck %s --check-prefix=X64 --check-prefix=X64-AVX --check-prefix=X64-AVX2
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx512f \| FileCheck %s --check-prefix=X64 --check-prefix=X64-AVX512F			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx512f \| FileCheck %s --check-prefix=X64 --check-prefix=X64-AVX512F
				; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx512f,prefer-mask-registers \| FileCheck %s --check-prefix=X64 --check-prefix=X64-AVX512Fk
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx512bw \| FileCheck %s --check-prefix=X64 --check-prefix=X64-AVX512BW			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx512bw \| FileCheck %s --check-prefix=X64 --check-prefix=X64-AVX512BW

	; This tests codegen time inlining/optimization of memcmp			; This tests codegen time inlining/optimization of memcmp
	; rdar://6480398			; rdar://6480398

	@.str = private constant [65 x i8] c"0123456789012345678901234567890123456789012345678901234567890123\00", align 1			@.str = private constant [65 x i8] c"0123456789012345678901234567890123456789012345678901234567890123\00", align 1

	declare i32 @memcmp(i8, i8, i64)			declare i32 @memcmp(i8, i8, i64)
	▲ Show 20 Lines • Show All 985 Lines • ▼ Show 20 Lines
	;			;
	; X64-AVX-LABEL: length16_eq:			; X64-AVX-LABEL: length16_eq:
	; X64-AVX: # %bb.0:			; X64-AVX: # %bb.0:
	; X64-AVX-NEXT: vmovdqu (%rdi), %xmm0			; X64-AVX-NEXT: vmovdqu (%rdi), %xmm0
	; X64-AVX-NEXT: vpxor (%rsi), %xmm0, %xmm0			; X64-AVX-NEXT: vpxor (%rsi), %xmm0, %xmm0
	; X64-AVX-NEXT: vptest %xmm0, %xmm0			; X64-AVX-NEXT: vptest %xmm0, %xmm0
	; X64-AVX-NEXT: setne %al			; X64-AVX-NEXT: setne %al
	; X64-AVX-NEXT: retq			; X64-AVX-NEXT: retq
				;
				; X64-AVX512F-LABEL: length16_eq:
				; X64-AVX512F: # %bb.0:
				; X64-AVX512F-NEXT: vmovdqu (%rdi), %xmm0
				; X64-AVX512F-NEXT: vpxor (%rsi), %xmm0, %xmm0
				; X64-AVX512F-NEXT: vptest %xmm0, %xmm0
				; X64-AVX512F-NEXT: setne %al
				; X64-AVX512F-NEXT: retq
				;
				; X64-AVX512Fk-LABEL: length16_eq:
				; X64-AVX512Fk: # %bb.0:
				; X64-AVX512Fk-NEXT: vmovdqu (%rdi), %xmm0
				; X64-AVX512Fk-NEXT: vmovdqu (%rsi), %xmm1
				; X64-AVX512Fk-NEXT: vpcmpneqd %zmm1, %zmm0, %k0
				; X64-AVX512Fk-NEXT: kortestw %k0, %k0
				; X64-AVX512Fk-NEXT: setne %al
				; X64-AVX512Fk-NEXT: vzeroupper
				; X64-AVX512Fk-NEXT: retq
				;
				; X64-AVX512BW-LABEL: length16_eq:
				; X64-AVX512BW: # %bb.0:
				; X64-AVX512BW-NEXT: vmovdqu (%rdi), %xmm0
				; X64-AVX512BW-NEXT: vpxor (%rsi), %xmm0, %xmm0
				; X64-AVX512BW-NEXT: vptest %xmm0, %xmm0
				; X64-AVX512BW-NEXT: setne %al
				; X64-AVX512BW-NEXT: retq
	%call = tail call i32 @memcmp(i8* %x, i8* %y, i64 16) nounwind			%call = tail call i32 @memcmp(i8* %x, i8* %y, i64 16) nounwind
	%cmp = icmp ne i32 %call, 0			%cmp = icmp ne i32 %call, 0
	ret i1 %cmp			ret i1 %cmp
	}			}

	define i1 @length16_eq_const(i8* %X) nounwind {			define i1 @length16_eq_const(i8* %X) nounwind {
	; X86-NOSSE-LABEL: length16_eq_const:			; X86-NOSSE-LABEL: length16_eq_const:
	; X86-NOSSE: # %bb.0:			; X86-NOSSE: # %bb.0:
	Show All 40 Lines
	;			;
	; X64-AVX-LABEL: length16_eq_const:			; X64-AVX-LABEL: length16_eq_const:
	; X64-AVX: # %bb.0:			; X64-AVX: # %bb.0:
	; X64-AVX-NEXT: vmovdqu (%rdi), %xmm0			; X64-AVX-NEXT: vmovdqu (%rdi), %xmm0
	; X64-AVX-NEXT: vpxor {{.*}}(%rip), %xmm0, %xmm0			; X64-AVX-NEXT: vpxor {{.*}}(%rip), %xmm0, %xmm0
	; X64-AVX-NEXT: vptest %xmm0, %xmm0			; X64-AVX-NEXT: vptest %xmm0, %xmm0
	; X64-AVX-NEXT: sete %al			; X64-AVX-NEXT: sete %al
	; X64-AVX-NEXT: retq			; X64-AVX-NEXT: retq
				;
				; X64-AVX512F-LABEL: length16_eq_const:
				; X64-AVX512F: # %bb.0:
				; X64-AVX512F-NEXT: vmovdqu (%rdi), %xmm0
				; X64-AVX512F-NEXT: vpxor {{.*}}(%rip), %xmm0, %xmm0
				; X64-AVX512F-NEXT: vptest %xmm0, %xmm0
				; X64-AVX512F-NEXT: sete %al
				; X64-AVX512F-NEXT: retq
				;
				; X64-AVX512Fk-LABEL: length16_eq_const:
				; X64-AVX512Fk: # %bb.0:
				; X64-AVX512Fk-NEXT: vmovdqu (%rdi), %xmm0
				; X64-AVX512Fk-NEXT: vmovdqa {{.*#+}} xmm1 = [858927408,926299444,825243960,892613426]
				; X64-AVX512Fk-NEXT: vpcmpneqd %zmm1, %zmm0, %k0
				; X64-AVX512Fk-NEXT: kortestw %k0, %k0
				; X64-AVX512Fk-NEXT: sete %al
				; X64-AVX512Fk-NEXT: vzeroupper
				; X64-AVX512Fk-NEXT: retq
				;
				; X64-AVX512BW-LABEL: length16_eq_const:
				; X64-AVX512BW: # %bb.0:
				; X64-AVX512BW-NEXT: vmovdqu (%rdi), %xmm0
				; X64-AVX512BW-NEXT: vpxor {{.*}}(%rip), %xmm0, %xmm0
				; X64-AVX512BW-NEXT: vptest %xmm0, %xmm0
				; X64-AVX512BW-NEXT: sete %al
				; X64-AVX512BW-NEXT: retq
	%m = tail call i32 @memcmp(i8* %X, i8* getelementptr inbounds ([65 x i8], [65 x i8]* @.str, i32 0, i32 0), i64 16) nounwind			%m = tail call i32 @memcmp(i8* %X, i8* getelementptr inbounds ([65 x i8], [65 x i8]* @.str, i32 0, i32 0), i64 16) nounwind
	%c = icmp eq i32 %m, 0			%c = icmp eq i32 %m, 0
	ret i1 %c			ret i1 %c
	}			}

	; PR33914 - https://bugs.llvm.org/show_bug.cgi?id=33914			; PR33914 - https://bugs.llvm.org/show_bug.cgi?id=33914

	define i32 @length24(i8* %X, i8* %Y) nounwind {			define i32 @length24(i8* %X, i8* %Y) nounwind {
	▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines
	; X64-SSE2-NEXT: pmovmskb %xmm2, %eax			; X64-SSE2-NEXT: pmovmskb %xmm2, %eax
	; X64-SSE2-NEXT: cmpl $65535, %eax # imm = 0xFFFF			; X64-SSE2-NEXT: cmpl $65535, %eax # imm = 0xFFFF
	; X64-SSE2-NEXT: sete %al			; X64-SSE2-NEXT: sete %al
	; X64-SSE2-NEXT: retq			; X64-SSE2-NEXT: retq
	;			;
	; X64-AVX-LABEL: length24_eq:			; X64-AVX-LABEL: length24_eq:
	; X64-AVX: # %bb.0:			; X64-AVX: # %bb.0:
	; X64-AVX-NEXT: vmovdqu (%rdi), %xmm0			; X64-AVX-NEXT: vmovdqu (%rdi), %xmm0
	; X64-AVX-NEXT: vmovq 16(%rdi), %xmm1			; X64-AVX-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero
	; X64-AVX-NEXT: vmovq 16(%rsi), %xmm2			; X64-AVX-NEXT: vmovq {{.*#+}} xmm2 = mem[0],zero
	; X64-AVX-NEXT: vpxor %xmm2, %xmm1, %xmm1			; X64-AVX-NEXT: vpxor %xmm2, %xmm1, %xmm1
	; X64-AVX-NEXT: vpxor (%rsi), %xmm0, %xmm0			; X64-AVX-NEXT: vpxor (%rsi), %xmm0, %xmm0
	; X64-AVX-NEXT: vpor %xmm1, %xmm0, %xmm0			; X64-AVX-NEXT: vpor %xmm1, %xmm0, %xmm0
	; X64-AVX-NEXT: vptest %xmm0, %xmm0			; X64-AVX-NEXT: vptest %xmm0, %xmm0
	; X64-AVX-NEXT: sete %al			; X64-AVX-NEXT: sete %al
	; X64-AVX-NEXT: retq			; X64-AVX-NEXT: retq
				;
				; X64-AVX512F-LABEL: length24_eq:
				; X64-AVX512F: # %bb.0:
				; X64-AVX512F-NEXT: vmovdqu (%rdi), %xmm0
				; X64-AVX512F-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero
				; X64-AVX512F-NEXT: vmovq {{.*#+}} xmm2 = mem[0],zero
				; X64-AVX512F-NEXT: vpxor %xmm2, %xmm1, %xmm1
				; X64-AVX512F-NEXT: vpxor (%rsi), %xmm0, %xmm0
				; X64-AVX512F-NEXT: vpor %xmm1, %xmm0, %xmm0
				; X64-AVX512F-NEXT: vptest %xmm0, %xmm0
				; X64-AVX512F-NEXT: sete %al
				; X64-AVX512F-NEXT: retq
				;
				; X64-AVX512Fk-LABEL: length24_eq:
				; X64-AVX512Fk: # %bb.0:
				; X64-AVX512Fk-NEXT: vmovdqu (%rdi), %xmm0
				; X64-AVX512Fk-NEXT: vmovdqu (%rsi), %xmm1
				; X64-AVX512Fk-NEXT: vmovq {{.*#+}} xmm2 = mem[0],zero
				; X64-AVX512Fk-NEXT: vmovq {{.*#+}} xmm3 = mem[0],zero
				; X64-AVX512Fk-NEXT: vpcmpneqd %zmm3, %zmm2, %k0
				; X64-AVX512Fk-NEXT: vpcmpneqd %zmm1, %zmm0, %k1
				; X64-AVX512Fk-NEXT: kortestw %k0, %k1
				; X64-AVX512Fk-NEXT: sete %al
				; X64-AVX512Fk-NEXT: vzeroupper
				; X64-AVX512Fk-NEXT: retq
				;
				; X64-AVX512BW-LABEL: length24_eq:
				; X64-AVX512BW: # %bb.0:
				; X64-AVX512BW-NEXT: vmovdqu (%rdi), %xmm0
				; X64-AVX512BW-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero
				; X64-AVX512BW-NEXT: vmovq {{.*#+}} xmm2 = mem[0],zero
				; X64-AVX512BW-NEXT: vpxor %xmm2, %xmm1, %xmm1
				; X64-AVX512BW-NEXT: vpxor (%rsi), %xmm0, %xmm0
				; X64-AVX512BW-NEXT: vpor %xmm1, %xmm0, %xmm0
				; X64-AVX512BW-NEXT: vptest %xmm0, %xmm0
				; X64-AVX512BW-NEXT: sete %al
				; X64-AVX512BW-NEXT: retq
	%call = tail call i32 @memcmp(i8* %x, i8* %y, i64 24) nounwind			%call = tail call i32 @memcmp(i8* %x, i8* %y, i64 24) nounwind
	%cmp = icmp eq i32 %call, 0			%cmp = icmp eq i32 %call, 0
	ret i1 %cmp			ret i1 %cmp
	}			}

	define i1 @length24_eq_const(i8* %X) nounwind {			define i1 @length24_eq_const(i8* %X) nounwind {
	; X86-NOSSE-LABEL: length24_eq_const:			; X86-NOSSE-LABEL: length24_eq_const:
	; X86-NOSSE: # %bb.0:			; X86-NOSSE: # %bb.0:
	▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
	; X64-SSE2-NEXT: pmovmskb %xmm0, %eax			; X64-SSE2-NEXT: pmovmskb %xmm0, %eax
	; X64-SSE2-NEXT: cmpl $65535, %eax # imm = 0xFFFF			; X64-SSE2-NEXT: cmpl $65535, %eax # imm = 0xFFFF
	; X64-SSE2-NEXT: setne %al			; X64-SSE2-NEXT: setne %al
	; X64-SSE2-NEXT: retq			; X64-SSE2-NEXT: retq
	;			;
	; X64-AVX-LABEL: length24_eq_const:			; X64-AVX-LABEL: length24_eq_const:
	; X64-AVX: # %bb.0:			; X64-AVX: # %bb.0:
	; X64-AVX-NEXT: vmovdqu (%rdi), %xmm0			; X64-AVX-NEXT: vmovdqu (%rdi), %xmm0
	; X64-AVX-NEXT: vmovq 16(%rdi), %xmm1			; X64-AVX-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero
	; X64-AVX-NEXT: vpxor {{.*}}(%rip), %xmm1, %xmm1			; X64-AVX-NEXT: vpxor {{.*}}(%rip), %xmm1, %xmm1
	; X64-AVX-NEXT: vpxor {{.*}}(%rip), %xmm0, %xmm0			; X64-AVX-NEXT: vpxor {{.*}}(%rip), %xmm0, %xmm0
	; X64-AVX-NEXT: vpor %xmm1, %xmm0, %xmm0			; X64-AVX-NEXT: vpor %xmm1, %xmm0, %xmm0
	; X64-AVX-NEXT: vptest %xmm0, %xmm0			; X64-AVX-NEXT: vptest %xmm0, %xmm0
	; X64-AVX-NEXT: setne %al			; X64-AVX-NEXT: setne %al
	; X64-AVX-NEXT: retq			; X64-AVX-NEXT: retq
				;
				; X64-AVX512F-LABEL: length24_eq_const:
				; X64-AVX512F: # %bb.0:
				; X64-AVX512F-NEXT: vmovdqu (%rdi), %xmm0
				; X64-AVX512F-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero
				; X64-AVX512F-NEXT: vpxor {{.*}}(%rip), %xmm1, %xmm1
				; X64-AVX512F-NEXT: vpxor {{.*}}(%rip), %xmm0, %xmm0
				; X64-AVX512F-NEXT: vpor %xmm1, %xmm0, %xmm0
				; X64-AVX512F-NEXT: vptest %xmm0, %xmm0
				; X64-AVX512F-NEXT: setne %al
				; X64-AVX512F-NEXT: retq
				;
				; X64-AVX512Fk-LABEL: length24_eq_const:
				; X64-AVX512Fk: # %bb.0:
				; X64-AVX512Fk-NEXT: vmovdqu (%rdi), %xmm0
				; X64-AVX512Fk-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero
				; X64-AVX512Fk-NEXT: vmovdqa {{.*#+}} xmm2 = [959985462,858927408,0,0]
				; X64-AVX512Fk-NEXT: vpcmpneqd %zmm2, %zmm1, %k0
				; X64-AVX512Fk-NEXT: vmovdqa {{.*#+}} xmm1 = [858927408,926299444,825243960,892613426]
				; X64-AVX512Fk-NEXT: vpcmpneqd %zmm1, %zmm0, %k1
				; X64-AVX512Fk-NEXT: kortestw %k0, %k1
				; X64-AVX512Fk-NEXT: setne %al
				; X64-AVX512Fk-NEXT: vzeroupper
				; X64-AVX512Fk-NEXT: retq
				;
				; X64-AVX512BW-LABEL: length24_eq_const:
				; X64-AVX512BW: # %bb.0:
				; X64-AVX512BW-NEXT: vmovdqu (%rdi), %xmm0
				; X64-AVX512BW-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero
				; X64-AVX512BW-NEXT: vpxor {{.*}}(%rip), %xmm1, %xmm1
				; X64-AVX512BW-NEXT: vpxor {{.*}}(%rip), %xmm0, %xmm0
				; X64-AVX512BW-NEXT: vpor %xmm1, %xmm0, %xmm0
				; X64-AVX512BW-NEXT: vptest %xmm0, %xmm0
				; X64-AVX512BW-NEXT: setne %al
				; X64-AVX512BW-NEXT: retq
	%m = tail call i32 @memcmp(i8* %X, i8* getelementptr inbounds ([65 x i8], [65 x i8]* @.str, i32 0, i32 0), i64 24) nounwind			%m = tail call i32 @memcmp(i8* %X, i8* getelementptr inbounds ([65 x i8], [65 x i8]* @.str, i32 0, i32 0), i64 24) nounwind
	%c = icmp ne i32 %m, 0			%c = icmp ne i32 %m, 0
	ret i1 %c			ret i1 %c
	}			}

	define i32 @length32(i8* %X, i8* %Y) nounwind {			define i32 @length32(i8* %X, i8* %Y) nounwind {
	; X86-LABEL: length32:			; X86-LABEL: length32:
	; X86: # %bb.0:			; X86: # %bb.0:
	▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines
	; X64-AVX1-NEXT: vpor %xmm1, %xmm0, %xmm0			; X64-AVX1-NEXT: vpor %xmm1, %xmm0, %xmm0
	; X64-AVX1-NEXT: vptest %xmm0, %xmm0			; X64-AVX1-NEXT: vptest %xmm0, %xmm0
	; X64-AVX1-NEXT: sete %al			; X64-AVX1-NEXT: sete %al
	; X64-AVX1-NEXT: retq			; X64-AVX1-NEXT: retq
	;			;
	; X64-AVX2-LABEL: length32_eq:			; X64-AVX2-LABEL: length32_eq:
	; X64-AVX2: # %bb.0:			; X64-AVX2: # %bb.0:
	; X64-AVX2-NEXT: vmovdqu (%rdi), %ymm0			; X64-AVX2-NEXT: vmovdqu (%rdi), %ymm0
	; X64-AVX2-NEXT: vpxor (%rsi), %ymm0, %ymm0			; X64-AVX2-NEXT: vpxor (%rsi), %ymm0, %ymm0
	; X64-AVX2-NEXT: vptest %ymm0, %ymm0			; X64-AVX2-NEXT: vptest %ymm0, %ymm0
	; X64-AVX2-NEXT: sete %al			; X64-AVX2-NEXT: sete %al
	; X64-AVX2-NEXT: vzeroupper			; X64-AVX2-NEXT: vzeroupper
	; X64-AVX2-NEXT: retq			; X64-AVX2-NEXT: retq
				;
				; X64-AVX512F-LABEL: length32_eq:
				; X64-AVX512F: # %bb.0:
				; X64-AVX512F-NEXT: vmovdqu (%rdi), %ymm0
				; X64-AVX512F-NEXT: vpxor (%rsi), %ymm0, %ymm0
				; X64-AVX512F-NEXT: vptest %ymm0, %ymm0
				; X64-AVX512F-NEXT: sete %al
				; X64-AVX512F-NEXT: vzeroupper
				; X64-AVX512F-NEXT: retq
				;
				; X64-AVX512Fk-LABEL: length32_eq:
				; X64-AVX512Fk: # %bb.0:
				; X64-AVX512Fk-NEXT: vmovdqu (%rdi), %ymm0
				; X64-AVX512Fk-NEXT: vmovdqu (%rsi), %ymm1
				; X64-AVX512Fk-NEXT: vpcmpneqd %zmm1, %zmm0, %k0
				; X64-AVX512Fk-NEXT: kortestw %k0, %k0
				; X64-AVX512Fk-NEXT: sete %al
				; X64-AVX512Fk-NEXT: vzeroupper
				; X64-AVX512Fk-NEXT: retq
				;
				; X64-AVX512BW-LABEL: length32_eq:
				; X64-AVX512BW: # %bb.0:
				; X64-AVX512BW-NEXT: vmovdqu (%rdi), %ymm0
				; X64-AVX512BW-NEXT: vpxor (%rsi), %ymm0, %ymm0
				; X64-AVX512BW-NEXT: vptest %ymm0, %ymm0
				; X64-AVX512BW-NEXT: sete %al
				; X64-AVX512BW-NEXT: vzeroupper
				; X64-AVX512BW-NEXT: retq
	%call = tail call i32 @memcmp(i8* %x, i8* %y, i64 32) nounwind			%call = tail call i32 @memcmp(i8* %x, i8* %y, i64 32) nounwind
	%cmp = icmp eq i32 %call, 0			%cmp = icmp eq i32 %call, 0
	ret i1 %cmp			ret i1 %cmp
	}			}

	define i1 @length32_eq_prefer128(i8* %x, i8* %y) nounwind "prefer-vector-width"="128" {			define i1 @length32_eq_prefer128(i8* %x, i8* %y) nounwind "prefer-vector-width"="128" {
	; X86-NOSSE-LABEL: length32_eq_prefer128:			; X86-NOSSE-LABEL: length32_eq_prefer128:
	; X86-NOSSE: # %bb.0:			; X86-NOSSE: # %bb.0:
	▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
	; X64-AVX-NEXT: vmovdqu (%rdi), %xmm0			; X64-AVX-NEXT: vmovdqu (%rdi), %xmm0
	; X64-AVX-NEXT: vmovdqu 16(%rdi), %xmm1			; X64-AVX-NEXT: vmovdqu 16(%rdi), %xmm1
	; X64-AVX-NEXT: vpxor 16(%rsi), %xmm1, %xmm1			; X64-AVX-NEXT: vpxor 16(%rsi), %xmm1, %xmm1
	; X64-AVX-NEXT: vpxor (%rsi), %xmm0, %xmm0			; X64-AVX-NEXT: vpxor (%rsi), %xmm0, %xmm0
	; X64-AVX-NEXT: vpor %xmm1, %xmm0, %xmm0			; X64-AVX-NEXT: vpor %xmm1, %xmm0, %xmm0
	; X64-AVX-NEXT: vptest %xmm0, %xmm0			; X64-AVX-NEXT: vptest %xmm0, %xmm0
	; X64-AVX-NEXT: sete %al			; X64-AVX-NEXT: sete %al
	; X64-AVX-NEXT: retq			; X64-AVX-NEXT: retq
				;
				; X64-AVX512F-LABEL: length32_eq_prefer128:
				; X64-AVX512F: # %bb.0:
				; X64-AVX512F-NEXT: vmovdqu (%rdi), %xmm0
				; X64-AVX512F-NEXT: vmovdqu 16(%rdi), %xmm1
				; X64-AVX512F-NEXT: vpxor 16(%rsi), %xmm1, %xmm1
				; X64-AVX512F-NEXT: vpxor (%rsi), %xmm0, %xmm0
				; X64-AVX512F-NEXT: vpor %xmm1, %xmm0, %xmm0
				; X64-AVX512F-NEXT: vptest %xmm0, %xmm0
				; X64-AVX512F-NEXT: sete %al
				; X64-AVX512F-NEXT: retq
				;
				; X64-AVX512Fk-LABEL: length32_eq_prefer128:
				; X64-AVX512Fk: # %bb.0:
				; X64-AVX512Fk-NEXT: vmovdqu (%rdi), %xmm0
				; X64-AVX512Fk-NEXT: vmovdqu 16(%rdi), %xmm1
				; X64-AVX512Fk-NEXT: vmovdqu (%rsi), %xmm2
				; X64-AVX512Fk-NEXT: vmovdqu 16(%rsi), %xmm3
				; X64-AVX512Fk-NEXT: vpcmpneqd %zmm3, %zmm1, %k0
				; X64-AVX512Fk-NEXT: vpcmpneqd %zmm2, %zmm0, %k1
				; X64-AVX512Fk-NEXT: kortestw %k0, %k1
				; X64-AVX512Fk-NEXT: sete %al
				; X64-AVX512Fk-NEXT: vzeroupper
				; X64-AVX512Fk-NEXT: retq
				;
				; X64-AVX512BW-LABEL: length32_eq_prefer128:
				; X64-AVX512BW: # %bb.0:
				; X64-AVX512BW-NEXT: vmovdqu (%rdi), %xmm0
				; X64-AVX512BW-NEXT: vmovdqu 16(%rdi), %xmm1
				; X64-AVX512BW-NEXT: vpxor 16(%rsi), %xmm1, %xmm1
				; X64-AVX512BW-NEXT: vpxor (%rsi), %xmm0, %xmm0
				; X64-AVX512BW-NEXT: vpor %xmm1, %xmm0, %xmm0
				; X64-AVX512BW-NEXT: vptest %xmm0, %xmm0
				; X64-AVX512BW-NEXT: sete %al
				; X64-AVX512BW-NEXT: retq
	%call = tail call i32 @memcmp(i8* %x, i8* %y, i64 32) nounwind			%call = tail call i32 @memcmp(i8* %x, i8* %y, i64 32) nounwind
	%cmp = icmp eq i32 %call, 0			%cmp = icmp eq i32 %call, 0
	ret i1 %cmp			ret i1 %cmp
	}			}

	define i1 @length32_eq_const(i8* %X) nounwind {			define i1 @length32_eq_const(i8* %X) nounwind {
	; X86-NOSSE-LABEL: length32_eq_const:			; X86-NOSSE-LABEL: length32_eq_const:
	; X86-NOSSE: # %bb.0:			; X86-NOSSE: # %bb.0:
	▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
	; X64-AVX2-LABEL: length32_eq_const:			; X64-AVX2-LABEL: length32_eq_const:
	; X64-AVX2: # %bb.0:			; X64-AVX2: # %bb.0:
	; X64-AVX2-NEXT: vmovdqu (%rdi), %ymm0			; X64-AVX2-NEXT: vmovdqu (%rdi), %ymm0
	; X64-AVX2-NEXT: vpxor {{.*}}(%rip), %ymm0, %ymm0			; X64-AVX2-NEXT: vpxor {{.*}}(%rip), %ymm0, %ymm0
	; X64-AVX2-NEXT: vptest %ymm0, %ymm0			; X64-AVX2-NEXT: vptest %ymm0, %ymm0
	; X64-AVX2-NEXT: setne %al			; X64-AVX2-NEXT: setne %al
	; X64-AVX2-NEXT: vzeroupper			; X64-AVX2-NEXT: vzeroupper
	; X64-AVX2-NEXT: retq			; X64-AVX2-NEXT: retq
				;
				; X64-AVX512F-LABEL: length32_eq_const:
				; X64-AVX512F: # %bb.0:
				; X64-AVX512F-NEXT: vmovdqu (%rdi), %ymm0
				; X64-AVX512F-NEXT: vpxor {{.*}}(%rip), %ymm0, %ymm0
				; X64-AVX512F-NEXT: vptest %ymm0, %ymm0
				; X64-AVX512F-NEXT: setne %al
				; X64-AVX512F-NEXT: vzeroupper
				; X64-AVX512F-NEXT: retq
				;
				; X64-AVX512Fk-LABEL: length32_eq_const:
				; X64-AVX512Fk: # %bb.0:
				; X64-AVX512Fk-NEXT: vmovdqu (%rdi), %ymm0
				; X64-AVX512Fk-NEXT: vmovdqa {{.*#+}} ymm1 = [858927408,926299444,825243960,892613426,959985462,858927408,926299444,825243960]
				; X64-AVX512Fk-NEXT: vpcmpneqd %zmm1, %zmm0, %k0
				; X64-AVX512Fk-NEXT: kortestw %k0, %k0
				; X64-AVX512Fk-NEXT: setne %al
				; X64-AVX512Fk-NEXT: vzeroupper
				; X64-AVX512Fk-NEXT: retq
				;
				; X64-AVX512BW-LABEL: length32_eq_const:
				; X64-AVX512BW: # %bb.0:
				; X64-AVX512BW-NEXT: vmovdqu (%rdi), %ymm0
				; X64-AVX512BW-NEXT: vpxor {{.*}}(%rip), %ymm0, %ymm0
				; X64-AVX512BW-NEXT: vptest %ymm0, %ymm0
				; X64-AVX512BW-NEXT: setne %al
				; X64-AVX512BW-NEXT: vzeroupper
				; X64-AVX512BW-NEXT: retq
	%m = tail call i32 @memcmp(i8* %X, i8* getelementptr inbounds ([65 x i8], [65 x i8]* @.str, i32 0, i32 0), i64 32) nounwind			%m = tail call i32 @memcmp(i8* %X, i8* getelementptr inbounds ([65 x i8], [65 x i8]* @.str, i32 0, i32 0), i64 32) nounwind
	%c = icmp ne i32 %m, 0			%c = icmp ne i32 %m, 0
	ret i1 %c			ret i1 %c
	}			}

	define i32 @length64(i8* %X, i8* %Y) nounwind {			define i32 @length64(i8* %X, i8* %Y) nounwind {
	; X86-LABEL: length64:			; X86-LABEL: length64:
	; X86: # %bb.0:			; X86: # %bb.0:
	▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
	; X64-AVX2-NEXT: vptest %ymm0, %ymm0			; X64-AVX2-NEXT: vptest %ymm0, %ymm0
	; X64-AVX2-NEXT: setne %al			; X64-AVX2-NEXT: setne %al
	; X64-AVX2-NEXT: vzeroupper			; X64-AVX2-NEXT: vzeroupper
	; X64-AVX2-NEXT: retq			; X64-AVX2-NEXT: retq
	;			;
	; X64-AVX512F-LABEL: length64_eq:			; X64-AVX512F-LABEL: length64_eq:
	; X64-AVX512F: # %bb.0:			; X64-AVX512F: # %bb.0:
	; X64-AVX512F-NEXT: vmovdqu64 (%rdi), %zmm0			; X64-AVX512F-NEXT: vmovdqu64 (%rdi), %zmm0
	; X64-AVX512F-NEXT: vpcmpeqd (%rsi), %zmm0, %k0			; X64-AVX512F-NEXT: vpcmpneqd (%rsi), %zmm0, %k0
	; X64-AVX512F-NEXT: kortestw %k0, %k0			; X64-AVX512F-NEXT: kortestw %k0, %k0
	; X64-AVX512F-NEXT: setae %al			; X64-AVX512F-NEXT: setne %al
	; X64-AVX512F-NEXT: vzeroupper			; X64-AVX512F-NEXT: vzeroupper
	; X64-AVX512F-NEXT: retq			; X64-AVX512F-NEXT: retq
	;			;
				; X64-AVX512Fk-LABEL: length64_eq:
				; X64-AVX512Fk: # %bb.0:
				; X64-AVX512Fk-NEXT: vmovdqu64 (%rdi), %zmm0
				; X64-AVX512Fk-NEXT: vpcmpneqd (%rsi), %zmm0, %k0
				; X64-AVX512Fk-NEXT: kortestw %k0, %k0
				; X64-AVX512Fk-NEXT: setne %al
				; X64-AVX512Fk-NEXT: vzeroupper
				; X64-AVX512Fk-NEXT: retq
				;
	; X64-AVX512BW-LABEL: length64_eq:			; X64-AVX512BW-LABEL: length64_eq:
	; X64-AVX512BW: # %bb.0:			; X64-AVX512BW: # %bb.0:
	; X64-AVX512BW-NEXT: vmovdqu64 (%rdi), %zmm0			; X64-AVX512BW-NEXT: vmovdqu64 (%rdi), %zmm0
	; X64-AVX512BW-NEXT: vpcmpeqb (%rsi), %zmm0, %k0			; X64-AVX512BW-NEXT: vpcmpneqb (%rsi), %zmm0, %k0
	; X64-AVX512BW-NEXT: kortestq %k0, %k0			; X64-AVX512BW-NEXT: kortestq %k0, %k0
	; X64-AVX512BW-NEXT: setae %al			; X64-AVX512BW-NEXT: setne %al
	; X64-AVX512BW-NEXT: vzeroupper			; X64-AVX512BW-NEXT: vzeroupper
	; X64-AVX512BW-NEXT: retq			; X64-AVX512BW-NEXT: retq
	%call = tail call i32 @memcmp(i8* %x, i8* %y, i64 64) nounwind			%call = tail call i32 @memcmp(i8* %x, i8* %y, i64 64) nounwind
	%cmp = icmp ne i32 %call, 0			%cmp = icmp ne i32 %call, 0
	ret i1 %cmp			ret i1 %cmp
	}			}

	define i1 @length64_eq_const(i8* %X) nounwind {			define i1 @length64_eq_const(i8* %X) nounwind {
	▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
	; X64-AVX2-NEXT: vptest %ymm0, %ymm0			; X64-AVX2-NEXT: vptest %ymm0, %ymm0
	; X64-AVX2-NEXT: sete %al			; X64-AVX2-NEXT: sete %al
	; X64-AVX2-NEXT: vzeroupper			; X64-AVX2-NEXT: vzeroupper
	; X64-AVX2-NEXT: retq			; X64-AVX2-NEXT: retq
	;			;
	; X64-AVX512F-LABEL: length64_eq_const:			; X64-AVX512F-LABEL: length64_eq_const:
	; X64-AVX512F: # %bb.0:			; X64-AVX512F: # %bb.0:
	; X64-AVX512F-NEXT: vmovdqu64 (%rdi), %zmm0			; X64-AVX512F-NEXT: vmovdqu64 (%rdi), %zmm0
	; X64-AVX512F-NEXT: vpcmpeqd {{.*}}(%rip), %zmm0, %k0			; X64-AVX512F-NEXT: vpcmpneqd {{.*}}(%rip), %zmm0, %k0
	; X64-AVX512F-NEXT: kortestw %k0, %k0			; X64-AVX512F-NEXT: kortestw %k0, %k0
	; X64-AVX512F-NEXT: setb %al			; X64-AVX512F-NEXT: sete %al
	; X64-AVX512F-NEXT: vzeroupper			; X64-AVX512F-NEXT: vzeroupper
	; X64-AVX512F-NEXT: retq			; X64-AVX512F-NEXT: retq
	;			;
				; X64-AVX512Fk-LABEL: length64_eq_const:
				; X64-AVX512Fk: # %bb.0:
				; X64-AVX512Fk-NEXT: vmovdqu64 (%rdi), %zmm0
				; X64-AVX512Fk-NEXT: vpcmpneqd {{.*}}(%rip), %zmm0, %k0
				; X64-AVX512Fk-NEXT: kortestw %k0, %k0
				; X64-AVX512Fk-NEXT: sete %al
				; X64-AVX512Fk-NEXT: vzeroupper
				; X64-AVX512Fk-NEXT: retq
				;
	; X64-AVX512BW-LABEL: length64_eq_const:			; X64-AVX512BW-LABEL: length64_eq_const:
	; X64-AVX512BW: # %bb.0:			; X64-AVX512BW: # %bb.0:
	; X64-AVX512BW-NEXT: vmovdqu64 (%rdi), %zmm0			; X64-AVX512BW-NEXT: vmovdqu64 (%rdi), %zmm0
	; X64-AVX512BW-NEXT: vpcmpeqb {{.*}}(%rip), %zmm0, %k0			; X64-AVX512BW-NEXT: vpcmpneqb {{.*}}(%rip), %zmm0, %k0
	; X64-AVX512BW-NEXT: kortestq %k0, %k0			; X64-AVX512BW-NEXT: kortestq %k0, %k0
	; X64-AVX512BW-NEXT: setb %al			; X64-AVX512BW-NEXT: sete %al
	; X64-AVX512BW-NEXT: vzeroupper			; X64-AVX512BW-NEXT: vzeroupper
	; X64-AVX512BW-NEXT: retq			; X64-AVX512BW-NEXT: retq
	%m = tail call i32 @memcmp(i8* %X, i8* getelementptr inbounds ([65 x i8], [65 x i8]* @.str, i32 0, i32 0), i64 64) nounwind			%m = tail call i32 @memcmp(i8* %X, i8* getelementptr inbounds ([65 x i8], [65 x i8]* @.str, i32 0, i32 0), i64 64) nounwind
	%c = icmp eq i32 %m, 0			%c = icmp eq i32 %m, 0
	ret i1 %c			ret i1 %c
	}			}

	; This checks that we do not do stupid things with huge sizes.			; This checks that we do not do stupid things with huge sizes.
	▲ Show 20 Lines • Show All 85 Lines • Show Last 20 Lines

test/CodeGen/X86/setcc-wide-types.ll

	Show First 20 Lines • Show All 395 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: xorl %eax, %eax			; AVX2-NEXT: xorl %eax, %eax
	; AVX2-NEXT: orq %rdx, %rdi			; AVX2-NEXT: orq %rdx, %rdi
	; AVX2-NEXT: setne %al			; AVX2-NEXT: setne %al
	; AVX2-NEXT: vzeroupper			; AVX2-NEXT: vzeroupper
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512F-LABEL: ne_i512:			; AVX512F-LABEL: ne_i512:
	; AVX512F: # %bb.0:			; AVX512F: # %bb.0:
	; AVX512F-NEXT: vpcmpeqd %zmm1, %zmm0, %k0			; AVX512F-NEXT: vpcmpneqd %zmm1, %zmm0, %k0
	; AVX512F-NEXT: xorl %eax, %eax			; AVX512F-NEXT: xorl %eax, %eax
	; AVX512F-NEXT: kortestw %k0, %k0			; AVX512F-NEXT: kortestw %k0, %k0
	; AVX512F-NEXT: setae %al			; AVX512F-NEXT: setne %al
	; AVX512F-NEXT: vzeroupper			; AVX512F-NEXT: vzeroupper
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512BW-LABEL: ne_i512:			; AVX512BW-LABEL: ne_i512:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: vpcmpeqb %zmm1, %zmm0, %k0			; AVX512BW-NEXT: vpcmpneqb %zmm1, %zmm0, %k0
	; AVX512BW-NEXT: xorl %eax, %eax			; AVX512BW-NEXT: xorl %eax, %eax
	; AVX512BW-NEXT: kortestq %k0, %k0			; AVX512BW-NEXT: kortestq %k0, %k0
	; AVX512BW-NEXT: setae %al			; AVX512BW-NEXT: setne %al
	; AVX512BW-NEXT: vzeroupper			; AVX512BW-NEXT: vzeroupper
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	%bcx = bitcast <8 x i64> %x to i512			%bcx = bitcast <8 x i64> %x to i512
	%bcy = bitcast <8 x i64> %y to i512			%bcy = bitcast <8 x i64> %y to i512
	%cmp = icmp ne i512 %bcx, %bcy			%cmp = icmp ne i512 %bcx, %bcy
	%zext = zext i1 %cmp to i32			%zext = zext i1 %cmp to i32
	ret i32 %zext			ret i32 %zext
	}			}
	▲ Show 20 Lines • Show All 162 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: xorl %eax, %eax			; AVX2-NEXT: xorl %eax, %eax
	; AVX2-NEXT: orq %rdx, %rdi			; AVX2-NEXT: orq %rdx, %rdi
	; AVX2-NEXT: sete %al			; AVX2-NEXT: sete %al
	; AVX2-NEXT: vzeroupper			; AVX2-NEXT: vzeroupper
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512F-LABEL: eq_i512:			; AVX512F-LABEL: eq_i512:
	; AVX512F: # %bb.0:			; AVX512F: # %bb.0:
	; AVX512F-NEXT: vpcmpeqd %zmm1, %zmm0, %k0			; AVX512F-NEXT: vpcmpneqd %zmm1, %zmm0, %k0
	; AVX512F-NEXT: xorl %eax, %eax			; AVX512F-NEXT: xorl %eax, %eax
	; AVX512F-NEXT: kortestw %k0, %k0			; AVX512F-NEXT: kortestw %k0, %k0
	; AVX512F-NEXT: setb %al			; AVX512F-NEXT: sete %al
	; AVX512F-NEXT: vzeroupper			; AVX512F-NEXT: vzeroupper
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512BW-LABEL: eq_i512:			; AVX512BW-LABEL: eq_i512:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: vpcmpeqb %zmm1, %zmm0, %k0			; AVX512BW-NEXT: vpcmpneqb %zmm1, %zmm0, %k0
	; AVX512BW-NEXT: xorl %eax, %eax			; AVX512BW-NEXT: xorl %eax, %eax
	; AVX512BW-NEXT: kortestq %k0, %k0			; AVX512BW-NEXT: kortestq %k0, %k0
	; AVX512BW-NEXT: setb %al			; AVX512BW-NEXT: sete %al
	; AVX512BW-NEXT: vzeroupper			; AVX512BW-NEXT: vzeroupper
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	%bcx = bitcast <8 x i64> %x to i512			%bcx = bitcast <8 x i64> %x to i512
	%bcy = bitcast <8 x i64> %y to i512			%bcy = bitcast <8 x i64> %y to i512
	%cmp = icmp eq i512 %bcx, %bcy			%cmp = icmp eq i512 %bcx, %bcy
	%zext = zext i1 %cmp to i32			%zext = zext i1 %cmp to i32
	ret i32 %zext			ret i32 %zext
	}			}
	▲ Show 20 Lines • Show All 398 Lines • ▼ Show 20 Lines
	; NO512-NEXT: orq %rcx, %rdx			; NO512-NEXT: orq %rcx, %rdx
	; NO512-NEXT: setne %al			; NO512-NEXT: setne %al
	; NO512-NEXT: retq			; NO512-NEXT: retq
	;			;
	; AVX512F-LABEL: ne_i512_pair:			; AVX512F-LABEL: ne_i512_pair:
	; AVX512F: # %bb.0:			; AVX512F: # %bb.0:
	; AVX512F-NEXT: vmovdqu64 (%rdi), %zmm0			; AVX512F-NEXT: vmovdqu64 (%rdi), %zmm0
	; AVX512F-NEXT: vmovdqu64 64(%rdi), %zmm1			; AVX512F-NEXT: vmovdqu64 64(%rdi), %zmm1
	; AVX512F-NEXT: vpcmpeqd (%rsi), %zmm0, %k1			; AVX512F-NEXT: vpcmpneqd 64(%rsi), %zmm1, %k0
	; AVX512F-NEXT: vpcmpeqd 64(%rsi), %zmm1, %k0 {%k1}			; AVX512F-NEXT: vpcmpneqd (%rsi), %zmm0, %k1
	; AVX512F-NEXT: xorl %eax, %eax			; AVX512F-NEXT: xorl %eax, %eax
	; AVX512F-NEXT: kortestw %k0, %k0			; AVX512F-NEXT: kortestw %k0, %k1
	; AVX512F-NEXT: setae %al			; AVX512F-NEXT: setne %al
	; AVX512F-NEXT: vzeroupper			; AVX512F-NEXT: vzeroupper
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512BW-LABEL: ne_i512_pair:			; AVX512BW-LABEL: ne_i512_pair:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: vmovdqu64 (%rdi), %zmm0			; AVX512BW-NEXT: vmovdqu64 (%rdi), %zmm0
	; AVX512BW-NEXT: vmovdqu64 64(%rdi), %zmm1			; AVX512BW-NEXT: vmovdqu64 64(%rdi), %zmm1
	; AVX512BW-NEXT: vpcmpeqb (%rsi), %zmm0, %k1			; AVX512BW-NEXT: vpcmpneqb 64(%rsi), %zmm1, %k0
	; AVX512BW-NEXT: vpcmpeqb 64(%rsi), %zmm1, %k0 {%k1}			; AVX512BW-NEXT: vpcmpneqb (%rsi), %zmm0, %k1
	; AVX512BW-NEXT: xorl %eax, %eax			; AVX512BW-NEXT: xorl %eax, %eax
	; AVX512BW-NEXT: kortestq %k0, %k0			; AVX512BW-NEXT: kortestq %k0, %k1
	; AVX512BW-NEXT: setae %al			; AVX512BW-NEXT: setne %al
	; AVX512BW-NEXT: vzeroupper			; AVX512BW-NEXT: vzeroupper
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	%a0 = load i512, i512* %a			%a0 = load i512, i512* %a
	%b0 = load i512, i512* %b			%b0 = load i512, i512* %b
	%xor1 = xor i512 %a0, %b0			%xor1 = xor i512 %a0, %b0
	%ap1 = getelementptr i512, i512* %a, i512 1			%ap1 = getelementptr i512, i512* %a, i512 1
	%bp1 = getelementptr i512, i512* %b, i512 1			%bp1 = getelementptr i512, i512* %b, i512 1
	%a1 = load i512, i512* %ap1			%a1 = load i512, i512* %ap1
	▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
	; NO512-NEXT: orq %rcx, %rdx			; NO512-NEXT: orq %rcx, %rdx
	; NO512-NEXT: sete %al			; NO512-NEXT: sete %al
	; NO512-NEXT: retq			; NO512-NEXT: retq
	;			;
	; AVX512F-LABEL: eq_i512_pair:			; AVX512F-LABEL: eq_i512_pair:
	; AVX512F: # %bb.0:			; AVX512F: # %bb.0:
	; AVX512F-NEXT: vmovdqu64 (%rdi), %zmm0			; AVX512F-NEXT: vmovdqu64 (%rdi), %zmm0
	; AVX512F-NEXT: vmovdqu64 64(%rdi), %zmm1			; AVX512F-NEXT: vmovdqu64 64(%rdi), %zmm1
	; AVX512F-NEXT: vpcmpeqd (%rsi), %zmm0, %k1			; AVX512F-NEXT: vpcmpneqd 64(%rsi), %zmm1, %k0
	; AVX512F-NEXT: vpcmpeqd 64(%rsi), %zmm1, %k0 {%k1}			; AVX512F-NEXT: vpcmpneqd (%rsi), %zmm0, %k1
	; AVX512F-NEXT: xorl %eax, %eax			; AVX512F-NEXT: xorl %eax, %eax
	; AVX512F-NEXT: kortestw %k0, %k0			; AVX512F-NEXT: kortestw %k0, %k1
	; AVX512F-NEXT: setb %al			; AVX512F-NEXT: sete %al
	; AVX512F-NEXT: vzeroupper			; AVX512F-NEXT: vzeroupper
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512BW-LABEL: eq_i512_pair:			; AVX512BW-LABEL: eq_i512_pair:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: vmovdqu64 (%rdi), %zmm0			; AVX512BW-NEXT: vmovdqu64 (%rdi), %zmm0
	; AVX512BW-NEXT: vmovdqu64 64(%rdi), %zmm1			; AVX512BW-NEXT: vmovdqu64 64(%rdi), %zmm1
	; AVX512BW-NEXT: vpcmpeqb (%rsi), %zmm0, %k1			; AVX512BW-NEXT: vpcmpneqb 64(%rsi), %zmm1, %k0
	; AVX512BW-NEXT: vpcmpeqb 64(%rsi), %zmm1, %k0 {%k1}			; AVX512BW-NEXT: vpcmpneqb (%rsi), %zmm0, %k1
	; AVX512BW-NEXT: xorl %eax, %eax			; AVX512BW-NEXT: xorl %eax, %eax
	; AVX512BW-NEXT: kortestq %k0, %k0			; AVX512BW-NEXT: kortestq %k0, %k1
	; AVX512BW-NEXT: setb %al			; AVX512BW-NEXT: sete %al
	; AVX512BW-NEXT: vzeroupper			; AVX512BW-NEXT: vzeroupper
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	%a0 = load i512, i512* %a			%a0 = load i512, i512* %a
	%b0 = load i512, i512* %b			%b0 = load i512, i512* %b
	%xor1 = xor i512 %a0, %b0			%xor1 = xor i512 %a0, %b0
	%ap1 = getelementptr i512, i512* %a, i512 1			%ap1 = getelementptr i512, i512* %a, i512 1
	%bp1 = getelementptr i512, i512* %b, i512 1			%bp1 = getelementptr i512, i512* %b, i512 1
	%a1 = load i512, i512* %ap1			%a1 = load i512, i512* %ap1
	▲ Show 20 Lines • Show All 191 Lines • Show Last 20 Lines