Download Raw Diff

Details

Reviewers

RKSimon
zvi
craig.topper
efriedma

Commits

rG9ebb68843e57: [x86] use PMOVMSK to replace memcmp libcalls for 16-byte equality
rL298775: [x86] use PMOVMSK to replace memcmp libcalls for 16-byte equality

Summary

This is the payoff for D31156 - if a target has efficient comparison instructions for vector-sized equality, we can replace memcmp calls with inline code that is both smaller and faster.

Seems like we're missing a load folding opportunity on the first test, but that's a separate problem.

I can enable the 32-byte case for AVX2 as an immediate follow-up, but I want to make sure this part looks ok before adding that.

Diff Detail

Repository: rL LLVM

Event Timeline

spatel created this revision.Mar 23 2017, 9:17 AM

Herald added a subscriber: mcrosier. · View Herald TranscriptMar 23 2017, 9:17 AM

efriedma added a subscriber: efriedma.Mar 23 2017, 12:40 PM

efriedma added inline comments.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6098 ↗	(On Diff #92816)	What's the point of performing the load in a vector type if you're going to immediately bitcast the result to an integer type? IIRC DAGCombine will fold this away.
test/CodeGen/X86/memcmp.ll
104 ↗	(On Diff #92816)	What's the performance of this compared to using integer registers? (movq+xorq+movq+xorq+orq).

spatel added inline comments.Mar 23 2017, 1:40 PM

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6098 ↗	(On Diff #92816)	I actually had it loading i128 to start, but I saw 2 problems: The i128 loads+bitcasts weren't converted to vector loads directly. Legalization for x86-64 split this into i64 loads, and then we had to rely on the combiner to merge the loads. At the least, I think this would be slower to compile since it caused more nodes to be created and folded. At worst, we might not put the loads back together properly and that would lead to poor code. It wasn't clear to me that I could add a generic combine to do that either since some targets might not want that. It wasn't honest to use i128 loads and bypass the isTypeLegal() check. We could make the TLI hook more specialized to account for that - have it confirm that loads of a given type/size are fast, so it's truly just a memcmp hook. But given the first problem, I got scared away.
test/CodeGen/X86/memcmp.ll
104 ↗	(On Diff #92816)	Hmm...didn't consider that option since movmsk has been fast for a long time and scalar always needs more ops. We'd need to separate x86-32 from x86-64 too. I'll try to get some real numbers.

spatel added inline comments.Mar 23 2017, 2:32 PM

test/CodeGen/X86/memcmp.ll
104 ↗	(On Diff #92816)	I benchmarked the 2 sequences shown below and the libcall. On Haswell with macOS, I'm seeing more wobble in these numbers than I can explain, but: memcmp : 34485936 cycles for 1048576 iterations (32.89 cycles/iter). vec cmp : 5245888 cycles for 1048576 iterations (5.00 cycles/iter). xor cmp : 5247940 cycles for 1048576 iterations (5.00 cycles/iter). On Ubuntu with AMD Jaguar: memcmp : 21150343 cycles for 1048576 iterations (20.17 cycles/iter). vec cmp : 9988395 cycles for 1048576 iterations (9.53 cycles/iter). xor cmp : 9471849 cycles for 1048576 iterations (9.03 cycles/iter). .align 6, 0x90 .global _cmp16vec _cmp16vec: movdqu (%rsi), %xmm0 movdqu (%rdi), %xmm1 pcmpeqb %xmm0, %xmm1 pmovmskb %xmm1, %eax cmpl $65535, %eax setne %al movzbl %al, %eax retq .align 6, 0x90 .global _cmp16scalar _cmp16scalar: movq (%rsi), %rax movq 8(%rsi), %rcx xorq (%rdi), %rax xorq 8(%rdi), %rcx orq %rax, %rcx setne %al movzbl %al, %eax retq

spatel added inline comments.Mar 23 2017, 3:36 PM

test/CodeGen/X86/memcmp.ll
104 ↗	(On Diff #92816)	There will be bugs: https://bugs.llvm.org/show_bug.cgi?id=32401

(clang produces the xor sequence if you just write int x(__int128_t*x, __int128_t*y) { return *x == *y; }.)

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6098 ↗	(On Diff #92816)	If DAGCombine doesn't fold it away, this is fine, I guess. Maybe let the target specify the type to use, in case some target wants to use a type that isn't `<4 x i32>`?

spatel added inline comments.Mar 24 2017, 9:56 AM

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6098 ↗	(On Diff #92816)	Yes - that would be better. We can cycle through the possible simple types (including i128), and the target can let us know what works. Also, your example using "__int128_t" probably explains why we saw/expected different things after this step in the DAG. If the loads are aligned, then we will legalize these to v16i8 loads for an SSE2 target, but not if they are unaligned as I was seeing in my experiments.

Patch updated:
Check all of the 16-byte simple value types before giving up.

Eli pointed me to D28637 (which I hadn't seen of course!) - a general solution for memcmp transformation. Not sure if this specialization still makes sense given that patch, but since I already made the edits, I'll post it.

Not sure if this specialization still makes sense given that patch, but since I already made the edits, I'll post it.

Even with that patch, we probably still want a similar target hook. Might as well finish/merge this now, then make sure we continue to generate the same efficient code when x86 transitions to the new memcmp lowering.

lib/Target/X86/X86ISelLowering.h
819 ↗	(On Diff #92974)	It probably makes sense to make this take a size in bytes, and return a VT, rather than calling this with every possible VT.

spatel marked an inline comment as done.Mar 24 2017, 1:42 PM

spatel added inline comments.

lib/Target/X86/X86ISelLowering.h
819 ↗	(On Diff #92974)	Yep - that makes the patch simpler.

Patch updated:
Have the TLI hook return the preferred operand (load) type for a given bitwidth, so we don't have to cycle through all of those when transforming the memcmp().

I'm using EVT instead of MVT in the hook anticipating that we extend this to 256-bit types for AVX2. In that case, we'd use i256 which isn't an MVT / simple type, so we'd have to switch it at that point unless I'm misunderstanding how these things work.

Patch updated:
On 2nd thought, that EVT/MVT argument makes no sense. The returned type from the hook is always going to be an MVT because it will be a supported type in order to be fast. Using MVT makes the code a bit cleaner since we don't have to pass a context around for those.

efriedma added inline comments.Mar 24 2017, 2:44 PM

lib/Target/X86/X86ISelLowering.cpp
4646 ↗	(On Diff #93010)	Maybe check isTypeLegal(MVT::v16i8) instead? hasSSE2() doesn't mean what you want it to.
4650 ↗	(On Diff #93010)	Maybe also 64-bit types (on a 32-bit target).
test/CodeGen/X86/memcmp.ll
2 ↗	(On Diff #93010)	Could you regenerate this test so it also compiles for a 32-bit target?

Patch updated:

Added 32-bit target testing in rL298744
Don't use hasSSE2() in the x86 override - that won't work if we're in soft-float mode (nice catch!).
Add TODO comment to handle 64-bit type on x86 32-bit target.

LGTM.

This revision is now accepted and ready to land.Mar 24 2017, 4:33 PM

Closed by commit rL298775: [x86] use PMOVMSK to replace memcmp libcalls for 16-byte equality (authored by spatel). · Explain WhyMar 25 2017, 9:17 AM

This revision was automatically updated to reflect the committed changes.

Diff 93046

llvm/trunk/include/llvm/Target/TargetLowering.h

Show First 20 Lines • Show All 431 Lines • ▼ Show 20 Lines	public:
/// into a single machine instruction of a form like:		/// into a single machine instruction of a form like:
/// \code		/// \code
/// cc = test %register, #mask		/// cc = test %register, #mask
/// \endcode		/// \endcode
virtual bool isMaskAndCmp0FoldingBeneficial(const Instruction &AndI) const {		virtual bool isMaskAndCmp0FoldingBeneficial(const Instruction &AndI) const {
return false;		return false;
}		}

		/// Return the preferred operand type if the target has a quick way to compare
		/// integer values of the given size. Assume that any legal integer type can
		/// be compared efficiently. Targets may override this to allow illegal wide
		/// types to return a vector type if there is support to compare that type.
		virtual MVT hasFastEqualityCompare(unsigned NumBits) const {
		MVT VT = MVT::getIntegerVT(NumBits);
		return isTypeLegal(VT) ? VT : MVT::INVALID_SIMPLE_VALUE_TYPE;
		}

/// Return true if the target should transform:		/// Return true if the target should transform:
/// (X & Y) == Y ---> (~X & Y) == 0		/// (X & Y) == Y ---> (~X & Y) == 0
/// (X & Y) != Y ---> (~X & Y) != 0		/// (X & Y) != Y ---> (~X & Y) != 0
///		///
/// This may be profitable if the target has a bitwise and-not operation that		/// This may be profitable if the target has a bitwise and-not operation that
/// sets comparison flags. A target may want to limit the transformation based		/// sets comparison flags. A target may want to limit the transformation based
/// on the type of Y or if Y is a constant.		/// on the type of Y or if Y is a constant.
///		///
▲ Show 20 Lines • Show All 2,777 Lines • Show Last 20 Lines

llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,949 Lines • ▼ Show 20 Lines	if (const ICmpInst *IC = dyn_cast<ICmpInst>(U))
continue;		continue;
// Unknown instruction.		// Unknown instruction.
return false;		return false;
}		}
return true;		return true;
}		}

static SDValue getMemCmpLoad(const Value *PtrVal, MVT LoadVT,		static SDValue getMemCmpLoad(const Value *PtrVal, MVT LoadVT,
Type *LoadTy,
SelectionDAGBuilder &Builder) {		SelectionDAGBuilder &Builder) {

// Check to see if this load can be trivially constant folded, e.g. if the		// Check to see if this load can be trivially constant folded, e.g. if the
// input is from a string literal.		// input is from a string literal.
if (const Constant *LoadInput = dyn_cast<Constant>(PtrVal)) {		if (const Constant *LoadInput = dyn_cast<Constant>(PtrVal)) {
// Cast pointer to the type we really want to load.		// Cast pointer to the type we really want to load.
		Type *LoadTy =
		Type::getIntNTy(PtrVal->getContext(), LoadVT.getScalarSizeInBits());
		if (LoadVT.isVector())
		LoadTy = VectorType::get(LoadTy, LoadVT.getVectorNumElements());

LoadInput = ConstantExpr::getBitCast(const_cast<Constant *>(LoadInput),		LoadInput = ConstantExpr::getBitCast(const_cast<Constant *>(LoadInput),
PointerType::getUnqual(LoadTy));		PointerType::getUnqual(LoadTy));

if (const Constant *LoadCst = ConstantFoldLoadFromConstPtr(		if (const Constant *LoadCst = ConstantFoldLoadFromConstPtr(
const_cast<Constant >(LoadInput), LoadTy, Builder.DL))		const_cast<Constant >(LoadInput), LoadTy, Builder.DL))
return Builder.getValue(LoadCst);		return Builder.getValue(LoadCst);
}		}

▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines	if (Res.first.getNode()) {
return true;		return true;
}		}

// memcmp(S1,S2,2) != 0 -> ((short)LHS != (short)RHS) != 0		// memcmp(S1,S2,2) != 0 -> ((short)LHS != (short)RHS) != 0
// memcmp(S1,S2,4) != 0 -> ((int)LHS != (int)RHS) != 0		// memcmp(S1,S2,4) != 0 -> ((int)LHS != (int)RHS) != 0
if (!CSize \|\| !IsOnlyUsedInZeroEqualityComparison(&I))		if (!CSize \|\| !IsOnlyUsedInZeroEqualityComparison(&I))
return false;		return false;

		// If the target has a fast compare for the given size, it will return a
		// preferred load type for that size. Require that the load VT is legal and
		// that the target supports unaligned loads of that type. Otherwise, return
		// INVALID.
		auto hasFastLoadsAndCompare = [&](unsigned NumBits) {
		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
		MVT LVT = TLI.hasFastEqualityCompare(NumBits);
		if (LVT != MVT::INVALID_SIMPLE_VALUE_TYPE) {
		// TODO: Handle 5 byte compare as 4-byte + 1 byte.
		// TODO: Handle 8 byte compare on x86-32 as two 32-bit loads.
		// TODO: Check alignment of src and dest ptrs.
		unsigned DstAS = LHS->getType()->getPointerAddressSpace();
		unsigned SrcAS = RHS->getType()->getPointerAddressSpace();
		if (!TLI.isTypeLegal(LVT) \|\|
		!TLI.allowsMisalignedMemoryAccesses(LVT, SrcAS) \|\|
		!TLI.allowsMisalignedMemoryAccesses(LVT, DstAS))
		LVT = MVT::INVALID_SIMPLE_VALUE_TYPE;
		}

		return LVT;
		};

		// This turns into unaligned loads. We only do this if the target natively
		// supports the MVT we'll be loading or if it is small enough (<= 4) that
		// we'll only produce a small number of byte loads.
MVT LoadVT;		MVT LoadVT;
Type *LoadTy;
switch (CSize->getZExtValue()) {		switch (CSize->getZExtValue()) {
default:		default:
return false;		return false;
case 2:		case 2:
LoadVT = MVT::i16;		LoadVT = MVT::i16;
LoadTy = Type::getInt16Ty(CSize->getContext());
break;		break;
case 4:		case 4:
LoadVT = MVT::i32;		LoadVT = MVT::i32;
LoadTy = Type::getInt32Ty(CSize->getContext());
break;		break;
case 8:		case 8:
LoadVT = MVT::i64;		LoadVT = hasFastLoadsAndCompare(64);
LoadTy = Type::getInt64Ty(CSize->getContext());
break;		break;
/*
case 16:		case 16:
LoadVT = MVT::v4i32;		LoadVT = hasFastLoadsAndCompare(128);
LoadTy = Type::getInt32Ty(CSize->getContext());
LoadTy = VectorType::get(LoadTy, 4);
break;		break;
*/
}		}

// This turns into unaligned loads. We only do this if the target natively		if (LoadVT == MVT::INVALID_SIMPLE_VALUE_TYPE)
// supports the MVT we'll be loading or if it is small enough (<= 4) that
// we'll only produce a small number of byte loads.

// Require that we can find a legal MVT, and only do this if the target
// supports unaligned loads of that type. Expanding into byte loads would
// bloat the code.
const TargetLowering &TLI = DAG.getTargetLoweringInfo();
if (CSize->getZExtValue() > 4) {
unsigned DstAS = LHS->getType()->getPointerAddressSpace();
unsigned SrcAS = RHS->getType()->getPointerAddressSpace();
// TODO: Handle 5 byte compare as 4-byte + 1 byte.
// TODO: Handle 8 byte compare on x86-32 as two 32-bit loads.
// TODO: Check alignment of src and dest ptrs.
if (!TLI.isTypeLegal(LoadVT) \|\|
!TLI.allowsMisalignedMemoryAccesses(LoadVT, SrcAS) \|\|
!TLI.allowsMisalignedMemoryAccesses(LoadVT, DstAS))
return false;		return false;

		SDValue LoadL = getMemCmpLoad(LHS, LoadVT, *this);
		SDValue LoadR = getMemCmpLoad(RHS, LoadVT, *this);

		// Bitcast to a wide integer type if the loads are vectors.
		if (LoadVT.isVector()) {
		EVT CmpVT = EVT::getIntegerVT(LHS->getContext(), LoadVT.getSizeInBits());
		LoadL = DAG.getBitcast(CmpVT, LoadL);
		LoadR = DAG.getBitcast(CmpVT, LoadR);
}		}

SDValue LHSVal = getMemCmpLoad(LHS, LoadVT, LoadTy, *this);		SDValue Cmp = DAG.getSetCC(getCurSDLoc(), MVT::i1, LoadL, LoadR, ISD::SETNE);
SDValue RHSVal = getMemCmpLoad(RHS, LoadVT, LoadTy, *this);		processIntegerCallValue(I, Cmp, false);
SDValue SetCC =
DAG.getSetCC(getCurSDLoc(), MVT::i1, LHSVal, RHSVal, ISD::SETNE);
processIntegerCallValue(I, SetCC, false);
return true;		return true;
}		}

/// See if we can lower a memchr call into an optimized form. If so, return		/// See if we can lower a memchr call into an optimized form. If so, return
/// true and lower it. Otherwise return false, and it will be lowered like a		/// true and lower it. Otherwise return false, and it will be lowered like a
/// normal call.		/// normal call.
/// The caller already checked that \p I calls the appropriate LibFunc with a		/// The caller already checked that \p I calls the appropriate LibFunc with a
/// correct prototype.		/// correct prototype.
▲ Show 20 Lines • Show All 3,434 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86ISelLowering.h

Show First 20 Lines • Show All 809 Lines • ▼ Show 20 Lines	bool isMultiStoresCheaperThanBitsMerge(EVT LTy, EVT HTy) const override {
// such pair out until we get testcase to prove it is a win.		// such pair out until we get testcase to prove it is a win.
return false;		return false;
}		}

bool isMaskAndCmp0FoldingBeneficial(const Instruction &AndI) const override;		bool isMaskAndCmp0FoldingBeneficial(const Instruction &AndI) const override;

bool hasAndNotCompare(SDValue Y) const override;		bool hasAndNotCompare(SDValue Y) const override;

		/// Vector-sized comparisons are fast using PCMPEQ + PMOVMSK or PTEST.
		MVT hasFastEqualityCompare(unsigned NumBits) const override;

/// Return the value type to use for ISD::SETCC.		/// Return the value type to use for ISD::SETCC.
EVT getSetCCResultType(const DataLayout &DL, LLVMContext &Context,		EVT getSetCCResultType(const DataLayout &DL, LLVMContext &Context,
EVT VT) const override;		EVT VT) const override;

/// Determine which of the bits specified in Mask are known to be either		/// Determine which of the bits specified in Mask are known to be either
/// zero or one and return them in the KnownZero/KnownOne bitsets.		/// zero or one and return them in the KnownZero/KnownOne bitsets.
void computeKnownBitsForTargetNode(const SDValue Op,		void computeKnownBitsForTargetNode(const SDValue Op,
APInt &KnownZero,		APInt &KnownZero,
▲ Show 20 Lines • Show All 568 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,631 Lines • ▼ Show 20 Lines	bool X86TargetLowering::hasAndNotCompare(SDValue Y) const {
// There are only 32-bit and 64-bit forms for 'andn'.		// There are only 32-bit and 64-bit forms for 'andn'.
EVT VT = Y.getValueType();		EVT VT = Y.getValueType();
if (VT != MVT::i32 && VT != MVT::i64)		if (VT != MVT::i32 && VT != MVT::i64)
return false;		return false;

return true;		return true;
}		}

		MVT X86TargetLowering::hasFastEqualityCompare(unsigned NumBits) const {
		MVT VT = MVT::getIntegerVT(NumBits);
		if (isTypeLegal(VT))
		return VT;

		// PMOVMSKB can handle this.
		if (NumBits == 128 && isTypeLegal(MVT::v16i8))
		return MVT::v16i8;

		// TODO: Allow 64-bit type for 32-bit target.
		// TODO: 256- and 512-bit types should be allowed, but make sure that those
		// cases are handled in combineVectorSizedSetCCEquality().

		return MVT::INVALID_SIMPLE_VALUE_TYPE;
		}

/// Val is the undef sentinel value or equal to the specified value.		/// Val is the undef sentinel value or equal to the specified value.
static bool isUndefOrEqual(int Val, int CmpVal) {		static bool isUndefOrEqual(int Val, int CmpVal) {
return ((Val == SM_SentinelUndef) \|\| (Val == CmpVal));		return ((Val == SM_SentinelUndef) \|\| (Val == CmpVal));
}		}

/// Val is either the undef or zero sentinel value.		/// Val is either the undef or zero sentinel value.
static bool isUndefOrZero(int Val) {		static bool isUndefOrZero(int Val) {
return ((Val == SM_SentinelUndef) \|\| (Val == SM_SentinelZero));		return ((Val == SM_SentinelUndef) \|\| (Val == SM_SentinelZero));
▲ Show 20 Lines • Show All 31,214 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/memcmp.ll

	Show First 20 Lines • Show All 173 Lines • ▼ Show 20 Lines
	; X32-NEXT: calll memcmp			; X32-NEXT: calll memcmp
	; X32-NEXT: addl $16, %esp			; X32-NEXT: addl $16, %esp
	; X32-NEXT: testl %eax, %eax			; X32-NEXT: testl %eax, %eax
	; X32-NEXT: setne %al			; X32-NEXT: setne %al
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: length16:			; X64-LABEL: length16:
	; X64: # BB#0:			; X64: # BB#0:
	; X64-NEXT: pushq %rax			; X64-NEXT: movdqu (%rsi), %xmm0
	; X64-NEXT: movl $16, %edx			; X64-NEXT: movdqu (%rdi), %xmm1
	; X64-NEXT: callq memcmp			; X64-NEXT: pcmpeqb %xmm0, %xmm1
	; X64-NEXT: testl %eax, %eax			; X64-NEXT: pmovmskb %xmm1, %eax
				; X64-NEXT: cmpl $65535, %eax # imm = 0xFFFF
	; X64-NEXT: setne %al			; X64-NEXT: setne %al
	; X64-NEXT: popq %rcx
	; X64-NEXT: retq			; X64-NEXT: retq
	%call = tail call i32 @memcmp(i8* %x, i8* %y, i64 16) nounwind			%call = tail call i32 @memcmp(i8* %x, i8* %y, i64 16) nounwind
	%cmp = icmp ne i32 %call, 0			%cmp = icmp ne i32 %call, 0
	ret i1 %cmp			ret i1 %cmp
	}			}

	define i1 @length16_const(i8* %X, i32* nocapture %P) nounwind {			define i1 @length16_const(i8* %X, i32* nocapture %P) nounwind {
	; X32-LABEL: length16_const:			; X32-LABEL: length16_const:
	; X32: # BB#0:			; X32: # BB#0:
	; X32-NEXT: pushl $0			; X32-NEXT: pushl $0
	; X32-NEXT: pushl $16			; X32-NEXT: pushl $16
	; X32-NEXT: pushl $.L.str			; X32-NEXT: pushl $.L.str
	; X32-NEXT: pushl {{[0-9]+}}(%esp)			; X32-NEXT: pushl {{[0-9]+}}(%esp)
	; X32-NEXT: calll memcmp			; X32-NEXT: calll memcmp
	; X32-NEXT: addl $16, %esp			; X32-NEXT: addl $16, %esp
	; X32-NEXT: testl %eax, %eax			; X32-NEXT: testl %eax, %eax
	; X32-NEXT: sete %al			; X32-NEXT: sete %al
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: length16_const:			; X64-LABEL: length16_const:
	; X64: # BB#0:			; X64: # BB#0:
	; X64-NEXT: pushq %rax			; X64-NEXT: movdqu (%rdi), %xmm0
	; X64-NEXT: movl $.L.str, %esi			; X64-NEXT: pcmpeqb {{.*}}(%rip), %xmm0
	; X64-NEXT: movl $16, %edx			; X64-NEXT: pmovmskb %xmm0, %eax
	; X64-NEXT: callq memcmp			; X64-NEXT: cmpl $65535, %eax # imm = 0xFFFF
	; X64-NEXT: testl %eax, %eax
	; X64-NEXT: sete %al			; X64-NEXT: sete %al
	; X64-NEXT: popq %rcx
	; X64-NEXT: retq			; X64-NEXT: retq
	%m = tail call i32 @memcmp(i8* %X, i8* getelementptr inbounds ([65 x i8], [65 x i8]* @.str, i32 0, i32 0), i64 16) nounwind			%m = tail call i32 @memcmp(i8* %X, i8* getelementptr inbounds ([65 x i8], [65 x i8]* @.str, i32 0, i32 0), i64 16) nounwind
	%c = icmp eq i32 %m, 0			%c = icmp eq i32 %m, 0
	ret i1 %c			ret i1 %c
	}			}

	define i1 @length32(i8* %x, i8* %y) nounwind {			define i1 @length32(i8* %x, i8* %y) nounwind {
	; X32-LABEL: length32:			; X32-LABEL: length32:
	▲ Show 20 Lines • Show All 108 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[x86] use PMOVMSK to replace memcmp libcalls for 16-byte equality
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 93046

llvm/trunk/include/llvm/Target/TargetLowering.h

llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

llvm/trunk/lib/Target/X86/X86ISelLowering.h

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

llvm/trunk/test/CodeGen/X86/memcmp.ll

This is an archive of the discontinued LLVM Phabricator instance.

[x86] use PMOVMSK to replace memcmp libcalls for 16-byte equalityClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 93046

llvm/trunk/include/llvm/Target/TargetLowering.h

llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

llvm/trunk/lib/Target/X86/X86ISelLowering.h

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

llvm/trunk/test/CodeGen/X86/memcmp.ll

[x86] use PMOVMSK to replace memcmp libcalls for 16-byte equality
ClosedPublic