This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
-
TargetLowering.h
-
lib/
-
CodeGen/
-
TargetLoweringBase.cpp
-
Target/
-
AArch64/
-
AArch64ISelLowering.cpp
-
AArch64TargetTransformInfo.cpp
-
PowerPC/
-
PPCISelLowering.cpp
-
PPCTargetTransformInfo.cpp
-
X86/
-
X86ISelLowering.cpp
-
X86TargetTransformInfo.cpp
-
test/
-
CodeGen/X86/
-
X86/
-
memcmp.ll
-
Transforms/ExpandMemCmp/X86/
-
ExpandMemCmp/
-
X86/
-
memcmp.ll

Differential D69044

[X86] Allow up to 4 loads per inline memcmp()
AbandonedPublic

Authored by davezarzycki on Oct 16 2019, 8:28 AM.

Download Raw Diff

Details

Reviewers

craig.topper
spatel
RKSimon
courbet
gchatelet

Summary

This effectively relands r308322 / D35067, but sidesteps the PR33914 regression by only increasing the load count for memcmp() if the user only cares about equality (not which operand is greater or lesser).

This patch also generalizes combineVectorSizedSetCCEquality() to handle nontrivial memcmp() expansion pass results.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

davezarzycki created this revision.Oct 16 2019, 8:28 AM

Herald added subscribers: jsji, MaskRay, kbarton, nemanjai. · View Herald TranscriptOct 16 2019, 8:28 AM

Can we make the x86 change to combineVectorSizedSetCCEquality() independently and before the change to TargetLowering?

Given that the previous attempt was reverted because of perf only it would be good to show some perf data here in the proposal. Micro-benchmark or more substantial. cc'ing @courbet in case there's already a test harness in place for that.

I haven't looked at this in a while, so I wonder if we now have the infrastructure within memcmp expansion to create the partial vector code with 'ptest' shown here:
https://bugs.llvm.org/show_bug.cgi?id=33914

Herald added a subscriber: • wuzish. · View Herald TranscriptOct 18 2019, 5:17 AM

Also worth mentioning: one of the suspects for the regression in PR33914 was a trailing cmov. That's gone now*, so we might want to implement the simpler fix (expand everything to 4) and re-check perf.

*Replaced with x86 hackery:
setae %al
addl %eax, %eax
decl %eax

In D69044#1714314, @spatel wrote:

Can we make the x86 change to combineVectorSizedSetCCEquality() independently and before the change to TargetLowering?

Given that the previous attempt was reverted because of perf only it would be good to show some perf data here in the proposal. Micro-benchmark or more substantial. cc'ing @courbet in case there's already a test harness in place for that.

I haven't looked at this in a while, so I wonder if we now have the infrastructure within memcmp expansion to create the partial vector code with 'ptest' shown here:
https://bugs.llvm.org/show_bug.cgi?id=33914

Hi @spatel,

Thanks for the feedback and yes we can separate the changes. A few thoughts:

The inlined memcmp is much smarter than the Glibc memcmp code these days, at least for pure equality comparisons. In particular, the compiler's overlapping load optimization is really nice (see D55263).
This change proposal intentionally sidesteps more complex memcmps where the return result is tristate (greater, lesser, or equal), not binary (equal versus not). The tristate memcmp is what regressed in PR33914.
It would be easy to contrive a microbenchmark that makes any libc memcmp look very bad and the inlined memcmp look very good. This would be fun, but not informative or actionable.
If I were to design a somewhat interesting and quasi-realistic micro-benchmark, I might create a carefully crafted test that hammers on a llvm::StringSwitch where the cases need more than two load pairs to be inlined.

This all being said, and if I might be totally honest, I'd like to observe two things:

The size of inlined memcmps tend to have a log-normal distribution with a small mean/median/variance. In other words, the vast majority of inlined memcmps (especially on AVX or AVX512 CPUs) don't need more than two load pairs.
That being said, the specialization that a higher max load pair count allows matters more on simpler CPUs with smaller vectors (if at all) and fewer micro-architectural tricks to mask the cost of libc's dynamic dispatch.

Therefore, I would argue that the max load pair count should be derived, not fixed. For example, I think the following psuedo-code would yield reasonable results across the semi-recent history of Intel's product line: 2 * CACHELINE_SIZE / PREFERRED_VECTOR_SIZE

I'd further argue that the compiler shouldn't assume that "max load pairs per block" being less than "max load pairs" is predictable by the branch predictor, but that's a separate discussion.

Your thoughts would be appreciated. Thanks!

In D69044#1714609, @davezarzycki wrote:

In D69044#1714314, @spatel wrote:

Can we make the x86 change to combineVectorSizedSetCCEquality() independently and before the change to TargetLowering?

Given that the previous attempt was reverted because of perf only it would be good to show some perf data here in the proposal. Micro-benchmark or more substantial. cc'ing @courbet in case there's already a test harness in place for that.

I haven't looked at this in a while, so I wonder if we now have the infrastructure within memcmp expansion to create the partial vector code with 'ptest' shown here:
https://bugs.llvm.org/show_bug.cgi?id=33914

Hi @spatel,

Thanks for the feedback and yes we can separate the changes. A few thoughts:

The inlined memcmp is much smarter than the Glibc memcmp code these days, at least for pure equality comparisons. In particular, the compiler's overlapping load optimization is really nice (see D55263).

This change proposal intentionally sidesteps more complex memcmps where the return result is tristate (greater, lesser, or equal), not binary (equal versus not). The tristate memcmp is what regressed in PR33914.

It would be easy to contrive a microbenchmark that makes any libc memcmp look very bad and the inlined memcmp look very good. This would be fun, but not informative or actionable.

If I were to design a somewhat interesting and quasi-realistic micro-benchmark, I might create a carefully crafted test that hammers on a llvm::StringSwitch where the cases need more than two load pairs to be inlined.

This all being said, and if I might be totally honest, I'd like to observe two things:

The size of inlined memcmps tend to have a log-normal distribution with a small mean/median/variance. In other words, the vast majority of inlined memcmps (especially on AVX or AVX512 CPUs) don't need more than two load pairs.

That being said, the specialization that a higher max load pair count allows matters more on simpler CPUs with smaller vectors (if at all) and fewer micro-architectural tricks to mask the cost of libc's dynamic dispatch.

Therefore, I would argue that the max load pair count should be derived, not fixed. For example, I think the following psuedo-code would yield reasonable results across the semi-recent history of Intel's product line: 2 * CACHELINE_SIZE / PREFERRED_VECTOR_SIZE

I'd further argue that the compiler shouldn't assume that "max load pairs per block" being less than "max load pairs" is predictable by the branch predictor, but that's a separate discussion.

Your thoughts would be appreciated. Thanks!

I agree that a derived setting would be better than hard-coding. Exactly what that formula should be, I don't know...
Lazy question (can't tell from the test diffs): are we ignoring -mprefer-vector-width in these expansions? If so, we're almost certainly going to create fallout.
There's a lot to unravel for memcmp, so let's break this down into possible patches/steps:

Add a bunch of x86 tests (all of the new tests here can be committed with baseline codegen, so we'll just see diffs as we enable more expansions). Include some -prefer-vector-width override RUNs.
Enhance the x86 lowering.
Change/adjust the MaxLoads settings (probably fine to start as shown here currently with a simple tweak of the hardcoded value).

The staging sounds fine. As an aside, this patch is exposing a bug in EVEX address/displacement generation:

length256_eq:
    vmovdqu64 -128(%rdi), %zmm0
    vmovdqu64 -64(%rdi), %zmm1
    vmovdqu64 (%rdi), %zmm2
    vmovdqu64 64(%rdi), %zmm3

The above code should be:

length256_eq:
    vmovdqu64 (%rdi), %zmm0
    vmovdqu64 64(%rdi), %zmm1
    vmovdqu64 128(%rdi), %zmm2
    vmovdqu64 192(%rdi), %zmm3

Any tips on how to debug this?

Oh, and the memcmp() expansion only honors -mprefer-vector-width if the width is 256 or 512. I don't think any part of the X86 code gen actually honors 128 when 256 or 512 is possible.

In D69044#1715641, @davezarzycki wrote:

Oh, and the memcmp() expansion only honors -mprefer-vector-width if the width is 256 or 512. I don't think any part of the X86 code gen actually honors 128 when 256 or 512 is possible.

Nothing disables 256-bit types the way we do for 512. But X86TargetLowering::getOptimalMemOpType() honors -mprefer-vector-width=128

The negative offsets on length256_eq are coming from the memcmp expansion IR.

*** IR Dump After Expand memcmp() to load/stores ***

define i1 @length256_eq(i8* %x, i8* %y) #0 {
  %1 = bitcast i8* %x to i512*
  %2 = bitcast i8* %y to i512*
  %3 = load i512, i512* %1
  %4 = load i512, i512* %2
  %5 = xor i512 %3, %4
  %6 = getelementptr i8, i8* %x, i8 64
  %7 = bitcast i8* %6 to i512*
  %8 = getelementptr i8, i8* %y, i8 64
  %9 = bitcast i8* %8 to i512*
  %10 = load i512, i512* %7
  %11 = load i512, i512* %9
  %12 = xor i512 %10, %11
  %13 = getelementptr i8, i8* %x, i8 -128
  %14 = bitcast i8* %13 to i512*
  %15 = getelementptr i8, i8* %y, i8 -128
  %16 = bitcast i8* %15 to i512*
  %17 = load i512, i512* %14
  %18 = load i512, i512* %16
  %19 = xor i512 %17, %18
  %20 = getelementptr i8, i8* %x, i8 -64
  %21 = bitcast i8* %20 to i512*
  %22 = getelementptr i8, i8* %y, i8 -64
  %23 = bitcast i8* %22 to i512*
  %24 = load i512, i512* %21
  %25 = load i512, i512* %23
  %26 = xor i512 %24, %25
  %27 = or i512 %5, %12
  %28 = or i512 %19, %26
  %29 = or i512 %27, %28
  %30 = icmp ne i512 %29, 0
  %31 = zext i1 %30 to i32
  %cmp = icmp ne i32 %31, 0
  ret i1 %cmp
}

Seems to be because the geps are only using i8 as their index type. So they aliased 128 to -128. And 192 to -64.

This seems to fix it. I haven't tested to see if it effects any existing tests

diff --git a/llvm/lib/CodeGen/ExpandMemCmp.cpp b/llvm/lib/CodeGen/ExpandMemCmp.cpp
index 9916f2de041..0539db58a61 100644
--- a/llvm/lib/CodeGen/ExpandMemCmp.cpp
+++ b/llvm/lib/CodeGen/ExpandMemCmp.cpp
@@ -264,9 +264,9 @@ Value *MemCmpExpansion::getPtrToElementAtOffset(Value *Source,
                                                 uint64_t OffsetBytes) {
   if (OffsetBytes > 0) {
     auto *ByteType = Type::getInt8Ty(CI->getContext());
-    Source = Builder.CreateGEP(
+    Source = Builder.CreateConstGEP1_64(
         ByteType, Builder.CreateBitCast(Source, ByteType->getPointerTo()),
-        ConstantInt::get(ByteType, OffsetBytes));
+        OffsetBytes);
   }
   return Builder.CreateBitCast(Source, LoadSizeType->getPointerTo());
 }

@craig.topper – That does seem to fix the bug. Thanks!

Just FYI everybody, I built LLVM+clang+lld with this change, and I don't see any evidence of more than two AVX512 load pairs being generated outside of one LLVM unit test (which failed until Craig's patch). And yes, one could argue that LLVM/clang/lld aren't representative of normal code, but let's pause on that. I just wanted to test a large source base.

More so, the ratios are consistent with my assertion that inline memcmps have log-normal distribution (with small mean/median/variance). For example, clang has ~3300 XMM load pairs, ~230 YMM load pairs, and ~40 ZMM load pairs. These values are approximate because the script I wrote might have missed something.

davezarzycki mentioned this in D69222: [X86] NFC: expand inline memcmp test coverage.Oct 21 2019, 2:45 AM

davezarzycki mentioned this in D69507: [X86] Make memcmp vector lowering handle arbitrary expansions.Oct 29 2019, 10:54 PM

Some testing results.

I built llvm+clang twice, both with core2 as the target CPU. Once without this change and once with this change. I verified that the 4-load-pair clang assembly to see that at least some memcmps generated three or more XMM load-pairs. That being said, more than two load XMM pairs was uncommon. I then ran perf stat against clang while it compiled X86ISelLowering.cpp (which takes about 37 seconds on my Xeon 8168 with turbo disabled).

In terms of "wall clock" performance, allowing up to four load pairs is lost in the noise. (At best, there might be a 0.082% difference.) The 2-load-pair clang required 0.027% more instructions to execute versus the 4-load-pair clang, and almost 0.03% more branches. Both of these seem given the dynamic overhead of Libc's memcmp().

Separably, I started writing a microbenchmark that used llvm::StringSwitch but it didn't feel right. Two (potentially overlapping) XMM registers can cover all values up to 32 bytes. That's big enough for the majority of real world scenarios.

Overall, I've changed my mind about this proposal. I think the time and place for 4 (or more) load pairs was in the pre-vector (and therefore pre-64-bit) era, where going from 2 scalar load pairs to 4 scalar load pairs was a bigger win because the load sizes were so tiny.

I suppose we could enable four load pairs on pre-SSE machines if people care. Otherwise and unless there objections, I'll close this proposal in a few days.

In D69044#1726941, @davezarzycki wrote:

Some testing results.

I built llvm+clang twice, both with core2 as the target CPU. Once without this change and once with this change. I verified that the 4-load-pair clang assembly to see that at least some memcmps generated three or more XMM load-pairs. That being said, more than two load XMM pairs was uncommon. I then ran perf stat against clang while it compiled X86ISelLowering.cpp (which takes about 37 seconds on my Xeon 8168 with turbo disabled).

In terms of "wall clock" performance, allowing up to four load pairs is lost in the noise. (At best, there might be a 0.082% difference.) The 2-load-pair clang required 0.027% more instructions to execute versus the 4-load-pair clang, and almost 0.03% more branches. Both of these seem given the dynamic overhead of Libc's memcmp().

Separably, I started writing a microbenchmark that used llvm::StringSwitch but it didn't feel right. Two (potentially overlapping) XMM registers can cover all values up to 32 bytes. That's big enough for the majority of real world scenarios.

Overall, I've changed my mind about this proposal. I think the time and place for 4 (or more) load pairs was in the pre-vector (and therefore pre-64-bit) era, where going from 2 scalar load pairs to 4 scalar load pairs was a bigger win because the load sizes were so tiny.

I suppose we could enable four load pairs on pre-SSE machines if people care. Otherwise and unless there objections, I'll close this proposal in a few days.

Thanks for running the experiments. I don't have a motivating case to change the current setting, so no objection from me. Adding Clement and Guillaume as reviewers in case they have data/thoughts.

I don't remember cases where we had very large constant compares (though we do have quite a lot of small ones). I'll run our internal benchmarks with this change.

In D69044#1728571, @courbet wrote:

I don't remember cases where we had very large constant compares (though we do have quite a lot of small ones). I'll run our internal benchmarks with this change.

I've ran our benchmarks, I see no improvement from the change.

In D69044#1733886, @courbet wrote:

In D69044#1728571, @courbet wrote:

I don't remember cases where we had very large constant compares (though we do have quite a lot of small ones). I'll run our internal benchmarks with this change.

I've ran our benchmarks, I see no improvement from the change.

Thanks. I think we're almost ready to close this. Do your benchmarks test pre-SSE2 CPUs? In particular 32-bit CPUs? Otherwise, as long as SSE vector registers are available, two load pairs covers the majority inline memcmp scenarios.

No, we only have SSE2 and above.

I think we reasoned our way out of this.

Revision Contents

Path

Size


	llvm/

include/

llvm/

CodeGen/

TargetLowering.h

8 lines

lib/

CodeGen/

TargetLoweringBase.cpp

5 lines

Target/

AArch64/

AArch64ISelLowering.cpp

2 lines

AArch64TargetTransformInfo.cpp

2 lines

PowerPC/

PPCISelLowering.cpp

3 lines

PPCTargetTransformInfo.cpp

2 lines

X86/

X86ISelLowering.cpp

58 lines

X86TargetTransformInfo.cpp

4 lines

test/

CodeGen/

X86/

memcmp.ll

1448 lines

Transforms/

ExpandMemCmp/

X86/

memcmp.ll

244 lines

Diff 225232

include/llvm/CodeGen/TargetLowering.h

Show First 20 Lines • Show All 1,431 Lines • ▼ Show 20 Lines
}		}

/// Get maximum # of load operations permitted for memcmp		/// Get maximum # of load operations permitted for memcmp
///		///
/// This function returns the maximum number of load operations permitted		/// This function returns the maximum number of load operations permitted
/// to replace a call to memcmp. The value is set by the target at the		/// to replace a call to memcmp. The value is set by the target at the
/// performance threshold for such a replacement. If OptSize is true,		/// performance threshold for such a replacement. If OptSize is true,
/// return the limit for functions that have OptSize attribute.		/// return the limit for functions that have OptSize attribute.
unsigned getMaxExpandSizeMemcmp(bool OptSize) const {		unsigned getMaxExpandSizeMemcmp(bool OptSize, bool Equality) const {
		if (Equality)
		return OptSize ? MaxLoadsPerMemcmpEqOptSize : MaxLoadsPerMemcmpEq;
return OptSize ? MaxLoadsPerMemcmpOptSize : MaxLoadsPerMemcmp;		return OptSize ? MaxLoadsPerMemcmpOptSize : MaxLoadsPerMemcmp;
}		}

/// Get maximum # of store operations permitted for llvm.memmove		/// Get maximum # of store operations permitted for llvm.memmove
///		///
/// This function returns the maximum number of store operations permitted		/// This function returns the maximum number of store operations permitted
/// to replace a call to llvm.memmove. The value is set by the target at the		/// to replace a call to llvm.memmove. The value is set by the target at the
/// performance threshold for such a replacement. If OptSize is true,		/// performance threshold for such a replacement. If OptSize is true,
▲ Show 20 Lines • Show All 1,431 Lines • ▼ Show 20 Lines	protected:
/// largest load operations first, followed by smaller ones, if necessary, per		/// largest load operations first, followed by smaller ones, if necessary, per
/// alignment restrictions. For example, loading 7 bytes on a 32-bit machine		/// alignment restrictions. For example, loading 7 bytes on a 32-bit machine
/// with 32-bit alignment would result in one 4-byte load, a one 2-byte load		/// with 32-bit alignment would result in one 4-byte load, a one 2-byte load
/// and one 1-byte load. This only applies to copying a constant array of		/// and one 1-byte load. This only applies to copying a constant array of
/// constant size.		/// constant size.
unsigned MaxLoadsPerMemcmp;		unsigned MaxLoadsPerMemcmp;
/// Likewise for functions with the OptSize attribute.		/// Likewise for functions with the OptSize attribute.
unsigned MaxLoadsPerMemcmpOptSize;		unsigned MaxLoadsPerMemcmpOptSize;
		/// The maximum when only equality matters (memcmp() == or != 0).
		unsigned MaxLoadsPerMemcmpEq;
		/// Likewise for functions with the OptSize attribute.
		unsigned MaxLoadsPerMemcmpEqOptSize;

/// \brief Specify maximum number of store instructions per memmove call.		/// \brief Specify maximum number of store instructions per memmove call.
///		///
/// When lowering \@llvm.memmove this field specifies the maximum number of		/// When lowering \@llvm.memmove this field specifies the maximum number of
/// store instructions that may be substituted for a call to memmove. Targets		/// store instructions that may be substituted for a call to memmove. Targets
/// must set this value based on the cost threshold for that target. Targets		/// must set this value based on the cost threshold for that target. Targets
/// should assume that the memmove will be done using as many of the largest		/// should assume that the memmove will be done using as many of the largest
/// store operations first, followed by smaller ones, if necessary, per		/// store operations first, followed by smaller ones, if necessary, per
▲ Show 20 Lines • Show All 1,381 Lines • Show Last 20 Lines

lib/CodeGen/TargetLoweringBase.cpp

	Show First 20 Lines • Show All 562 Lines • ▼ Show 20 Lines
	}			}

	/// NOTE: The TargetMachine owns TLOF.			/// NOTE: The TargetMachine owns TLOF.
	TargetLoweringBase::TargetLoweringBase(const TargetMachine &tm) : TM(tm) {			TargetLoweringBase::TargetLoweringBase(const TargetMachine &tm) : TM(tm) {
	initActions();			initActions();

	// Perform these initializations only once.			// Perform these initializations only once.
	MaxStoresPerMemset = MaxStoresPerMemcpy = MaxStoresPerMemmove =			MaxStoresPerMemset = MaxStoresPerMemcpy = MaxStoresPerMemmove =
	MaxLoadsPerMemcmp = 8;			MaxLoadsPerMemcmp = MaxLoadsPerMemcmpEq = 8;
	MaxGluedStoresPerMemcpy = 0;			MaxGluedStoresPerMemcpy = 0;
	MaxStoresPerMemsetOptSize = MaxStoresPerMemcpyOptSize =			MaxStoresPerMemsetOptSize = MaxStoresPerMemcpyOptSize =
	MaxStoresPerMemmoveOptSize = MaxLoadsPerMemcmpOptSize = 4;			MaxStoresPerMemmoveOptSize = MaxLoadsPerMemcmpEqOptSize =
				MaxLoadsPerMemcmpOptSize = 4;
	UseUnderscoreSetJmp = false;			UseUnderscoreSetJmp = false;
	UseUnderscoreLongJmp = false;			UseUnderscoreLongJmp = false;
	HasMultipleConditionRegisters = false;			HasMultipleConditionRegisters = false;
	HasExtractBitsInsn = false;			HasExtractBitsInsn = false;
	JumpIsExpensive = JumpIsExpensiveOverride;			JumpIsExpensive = JumpIsExpensiveOverride;
	PredictableSelectIsExpensive = false;			PredictableSelectIsExpensive = false;
	EnableExtLdPromotion = false;			EnableExtLdPromotion = false;
	StackPointerRegisterToSaveRestore = 0;			StackPointerRegisterToSaveRestore = 0;
	▲ Show 20 Lines • Show All 1,413 Lines • Show Last 20 Lines

lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 627 Lines • ▼ Show 20 Lines	AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
MaxStoresPerMemcpy = Subtarget->requiresStrictAlign()		MaxStoresPerMemcpy = Subtarget->requiresStrictAlign()
? MaxStoresPerMemcpyOptSize : 16;		? MaxStoresPerMemcpyOptSize : 16;

MaxStoresPerMemmoveOptSize = MaxStoresPerMemmove = 4;		MaxStoresPerMemmoveOptSize = MaxStoresPerMemmove = 4;

MaxLoadsPerMemcmpOptSize = 4;		MaxLoadsPerMemcmpOptSize = 4;
MaxLoadsPerMemcmp = Subtarget->requiresStrictAlign()		MaxLoadsPerMemcmp = Subtarget->requiresStrictAlign()
? MaxLoadsPerMemcmpOptSize : 8;		? MaxLoadsPerMemcmpOptSize : 8;
		MaxLoadsPerMemcmpEqOptSize = MaxLoadsPerMemcmpOptSize;
		MaxLoadsPerMemcmpEq = MaxLoadsPerMemcmp;

setStackPointerRegisterToSaveRestore(AArch64::SP);		setStackPointerRegisterToSaveRestore(AArch64::SP);

setSchedulingPreference(Sched::Hybrid);		setSchedulingPreference(Sched::Hybrid);

EnableExtLdPromotion = true;		EnableExtLdPromotion = true;

// Set required alignment.		// Set required alignment.
▲ Show 20 Lines • Show All 11,740 Lines • Show Last 20 Lines

lib/Target/AArch64/AArch64TargetTransformInfo.cpp

Show First 20 Lines • Show All 616 Lines • ▼ Show 20 Lines	int AArch64TTIImpl::getCmpSelInstrCost(unsigned Opcode, Type *ValTy,
}		}
return BaseT::getCmpSelInstrCost(Opcode, ValTy, CondTy, I);		return BaseT::getCmpSelInstrCost(Opcode, ValTy, CondTy, I);
}		}

AArch64TTIImpl::TTI::MemCmpExpansionOptions		AArch64TTIImpl::TTI::MemCmpExpansionOptions
AArch64TTIImpl::enableMemCmpExpansion(bool OptSize, bool IsZeroCmp) const {		AArch64TTIImpl::enableMemCmpExpansion(bool OptSize, bool IsZeroCmp) const {
TTI::MemCmpExpansionOptions Options;		TTI::MemCmpExpansionOptions Options;
Options.AllowOverlappingLoads = !ST->requiresStrictAlign();		Options.AllowOverlappingLoads = !ST->requiresStrictAlign();
Options.MaxNumLoads = TLI->getMaxExpandSizeMemcmp(OptSize);		Options.MaxNumLoads = TLI->getMaxExpandSizeMemcmp(OptSize, IsZeroCmp);
Options.NumLoadsPerBlock = Options.MaxNumLoads;		Options.NumLoadsPerBlock = Options.MaxNumLoads;
// TODO: Though vector loads usually perform well on AArch64, in some targets		// TODO: Though vector loads usually perform well on AArch64, in some targets
// they may wake up the FP unit, which raises the power consumption. Perhaps		// they may wake up the FP unit, which raises the power consumption. Perhaps
// they could be used with no holds barred (-O3).		// they could be used with no holds barred (-O3).
Options.LoadSizes = {8, 4, 2, 1};		Options.LoadSizes = {8, 4, 2, 1};
return Options;		return Options;
}		}

▲ Show 20 Lines • Show All 366 Lines • Show Last 20 Lines

lib/Target/PowerPC/PPCISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 1,222 Lines • ▼ Show 20 Lines
	} else if (Subtarget.getDarwinDirective() == PPC::DIR_A2) {			} else if (Subtarget.getDarwinDirective() == PPC::DIR_A2) {
	// The A2 also benefits from (very) aggressive inlining of memcpy and			// The A2 also benefits from (very) aggressive inlining of memcpy and
	// friends. The overhead of a the function call, even when warm, can be			// friends. The overhead of a the function call, even when warm, can be
	// over one hundred cycles.			// over one hundred cycles.
	MaxStoresPerMemset = 128;			MaxStoresPerMemset = 128;
	MaxStoresPerMemcpy = 128;			MaxStoresPerMemcpy = 128;
	MaxStoresPerMemmove = 128;			MaxStoresPerMemmove = 128;
	MaxLoadsPerMemcmp = 128;			MaxLoadsPerMemcmp = 128;
				MaxLoadsPerMemcmpEq = 128;
	} else {			} else {
	MaxLoadsPerMemcmp = 8;			MaxLoadsPerMemcmp = 8;
				MaxLoadsPerMemcmpEq = 8;
	MaxLoadsPerMemcmpOptSize = 4;			MaxLoadsPerMemcmpOptSize = 4;
				MaxLoadsPerMemcmpEqOptSize = 4;
	}			}
	}			}

	/// getMaxByValAlign - Helper for getByValTypeAlignment to determine			/// getMaxByValAlign - Helper for getByValTypeAlignment to determine
	/// the desired ByVal argument alignment.			/// the desired ByVal argument alignment.
	static void getMaxByValAlign(Type *Ty, unsigned &MaxAlign,			static void getMaxByValAlign(Type *Ty, unsigned &MaxAlign,
	unsigned MaxMaxAlign) {			unsigned MaxMaxAlign) {
	if (MaxAlign == MaxMaxAlign)			if (MaxAlign == MaxMaxAlign)
	▲ Show 20 Lines • Show All 14,275 Lines • Show Last 20 Lines

lib/Target/PowerPC/PPCTargetTransformInfo.cpp

Show First 20 Lines • Show All 580 Lines • ▼ Show 20 Lines	bool PPCTTIImpl::enableAggressiveInterleaving(bool LoopHasReductions) {

return LoopHasReductions;		return LoopHasReductions;
}		}

PPCTTIImpl::TTI::MemCmpExpansionOptions		PPCTTIImpl::TTI::MemCmpExpansionOptions
PPCTTIImpl::enableMemCmpExpansion(bool OptSize, bool IsZeroCmp) const {		PPCTTIImpl::enableMemCmpExpansion(bool OptSize, bool IsZeroCmp) const {
TTI::MemCmpExpansionOptions Options;		TTI::MemCmpExpansionOptions Options;
Options.LoadSizes = {8, 4, 2, 1};		Options.LoadSizes = {8, 4, 2, 1};
Options.MaxNumLoads = TLI->getMaxExpandSizeMemcmp(OptSize);		Options.MaxNumLoads = TLI->getMaxExpandSizeMemcmp(OptSize, IsZeroCmp);
return Options;		return Options;
}		}

bool PPCTTIImpl::enableInterleavedAccessVectorization() {		bool PPCTTIImpl::enableInterleavedAccessVectorization() {
return true;		return true;
}		}

unsigned PPCTTIImpl::getNumberOfRegisters(unsigned ClassID) const {		unsigned PPCTTIImpl::getNumberOfRegisters(unsigned ClassID) const {
▲ Show 20 Lines • Show All 358 Lines • Show Last 20 Lines

lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,884 Lines • ▼ Show 20 Lines
MaxStoresPerMemmove = 8; // For @llvm.memmove -> sequence of stores		MaxStoresPerMemmove = 8; // For @llvm.memmove -> sequence of stores
MaxStoresPerMemmoveOptSize = 4;		MaxStoresPerMemmoveOptSize = 4;

// TODO: These control memcmp expansion in CGP and could be raised higher, but		// TODO: These control memcmp expansion in CGP and could be raised higher, but
// that needs to benchmarked and balanced with the potential use of vector		// that needs to benchmarked and balanced with the potential use of vector
// load/store types (PR33329, PR33914).		// load/store types (PR33329, PR33914).
MaxLoadsPerMemcmp = 2;		MaxLoadsPerMemcmp = 2;
MaxLoadsPerMemcmpOptSize = 2;		MaxLoadsPerMemcmpOptSize = 2;
		MaxLoadsPerMemcmpEq = 4;
		MaxLoadsPerMemcmpEqOptSize = 2;

// Set loop alignment to 2^ExperimentalPrefLoopAlignment bytes (default: 2^4).		// Set loop alignment to 2^ExperimentalPrefLoopAlignment bytes (default: 2^4).
setPrefLoopAlignment(Align(1ULL << ExperimentalPrefLoopAlignment));		setPrefLoopAlignment(Align(1ULL << ExperimentalPrefLoopAlignment));

// An out-of-order CPU can speculatively execute past a predictable branch,		// An out-of-order CPU can speculatively execute past a predictable branch,
// but a conditional move could be stalled by an expensive earlier operation.		// but a conditional move could be stalled by an expensive earlier operation.
PredictableSelectIsExpensive = Subtarget.getSchedModel().isOutOfOrder();		PredictableSelectIsExpensive = Subtarget.getSchedModel().isOutOfOrder();
EnableExtLdPromotion = true;		EnableExtLdPromotion = true;
▲ Show 20 Lines • Show All 40,604 Lines • ▼ Show 20 Lines	if ((N00.isUndef() \|\| DAG.MaskedValueIsZero(N00, ZeroMask)) &&
(N01.isUndef() \|\| DAG.MaskedValueIsZero(N01, ZeroMask))) {		(N01.isUndef() \|\| DAG.MaskedValueIsZero(N01, ZeroMask))) {
return concatSubVectors(N00, N01, DAG, dl);		return concatSubVectors(N00, N01, DAG, dl);
}		}
}		}

return SDValue();		return SDValue();
}		}

		/// Recursive helper for combineVectorSizedSetCCEquality() to see if we have a
		/// recognizable memcmp expansion.
		static bool isOrXorXorTree(SDValue X, bool Root = true) {
		if (X.getOpcode() == ISD::OR)
		return isOrXorXorTree(X.getOperand(0), false) &&
		isOrXorXorTree(X.getOperand(1), false);
		if (Root)
		return false;
		return X.getOpcode() == ISD::XOR;
		}

		/// Recursive helper for combineVectorSizedSetCCEquality() to emit the memcmp
		/// expansion.
		static SDValue emitOrXorXorTree(SDValue X, SDLoc &DL, SelectionDAG &DAG,
		EVT VecVT, EVT CmpVT, bool HasPT) {
		if (X.getOpcode() == ISD::OR) {
		SDValue A = emitOrXorXorTree(X.getOperand(0), DL, DAG, VecVT, CmpVT, HasPT);
		SDValue B = emitOrXorXorTree(X.getOperand(1), DL, DAG, VecVT, CmpVT, HasPT);
		if (VecVT == CmpVT && HasPT)
		return DAG.getNode(ISD::OR, DL, VecVT, A, B);
		return DAG.getNode(ISD::AND, DL, CmpVT, A, B);
		} else if (X.getOpcode() == ISD::XOR) {
		SDValue A = DAG.getBitcast(VecVT, X.getOperand(0));
		SDValue B = DAG.getBitcast(VecVT, X.getOperand(1));
		if (VecVT == CmpVT && HasPT)
		return DAG.getNode(ISD::XOR, DL, VecVT, A, B);
		return DAG.getSetCC(DL, CmpVT, A, B, ISD::SETEQ);
		}
		llvm_unreachable("Impossible");
		}

/// Try to map a 128-bit or larger integer comparison to vector instructions		/// Try to map a 128-bit or larger integer comparison to vector instructions
/// before type legalization splits it up into chunks.		/// before type legalization splits it up into chunks.
static SDValue combineVectorSizedSetCCEquality(SDNode *SetCC, SelectionDAG &DAG,		static SDValue combineVectorSizedSetCCEquality(SDNode *SetCC, SelectionDAG &DAG,
const X86Subtarget &Subtarget) {		const X86Subtarget &Subtarget) {
ISD::CondCode CC = cast<CondCodeSDNode>(SetCC->getOperand(2))->get();		ISD::CondCode CC = cast<CondCodeSDNode>(SetCC->getOperand(2))->get();
assert((CC == ISD::SETNE \|\| CC == ISD::SETEQ) && "Bad comparison predicate");		assert((CC == ISD::SETNE \|\| CC == ISD::SETEQ) && "Bad comparison predicate");

// We're looking for an oversized integer equality comparison.		// We're looking for an oversized integer equality comparison.
SDValue X = SetCC->getOperand(0);		SDValue X = SetCC->getOperand(0);
SDValue Y = SetCC->getOperand(1);		SDValue Y = SetCC->getOperand(1);
EVT OpVT = X.getValueType();		EVT OpVT = X.getValueType();
unsigned OpSize = OpVT.getSizeInBits();		unsigned OpSize = OpVT.getSizeInBits();
if (!OpVT.isScalarInteger() \|\| OpSize < 128)		if (!OpVT.isScalarInteger() \|\| OpSize < 128)
return SDValue();		return SDValue();

// Ignore a comparison with zero because that gets special treatment in		// Ignore a comparison with zero because that gets special treatment in
// EmitTest(). But make an exception for the special case of a pair of		// EmitTest(). But make an exception for the special case of a pair of
// logically-combined vector-sized operands compared to zero. This pattern may		// logically-combined vector-sized operands compared to zero. This pattern may
// be generated by the memcmp expansion pass with oversized integer compares		// be generated by the memcmp expansion pass with oversized integer compares
// (see PR33325).		// (see PR33325).
bool IsOrXorXorCCZero = isNullConstant(Y) && X.getOpcode() == ISD::OR &&		bool IsOrXorXorTreeCCZero = isNullConstant(Y) && isOrXorXorTree(X);
X.getOperand(0).getOpcode() == ISD::XOR &&		if (isNullConstant(Y) && !IsOrXorXorTreeCCZero)
X.getOperand(1).getOpcode() == ISD::XOR;
if (isNullConstant(Y) && !IsOrXorXorCCZero)
return SDValue();		return SDValue();

// Don't perform this combine if constructing the vector will be expensive.		// Don't perform this combine if constructing the vector will be expensive.
auto IsVectorBitCastCheap = [](SDValue X) {		auto IsVectorBitCastCheap = [](SDValue X) {
X = peekThroughBitcasts(X);		X = peekThroughBitcasts(X);
return isa<ConstantSDNode>(X) \|\| X.getValueType().isVector() \|\|		return isa<ConstantSDNode>(X) \|\| X.getValueType().isVector() \|\|
X.getOpcode() == ISD::LOAD;		X.getOpcode() == ISD::LOAD;
};		};
if ((!IsVectorBitCastCheap(X) \|\| !IsVectorBitCastCheap(Y)) &&		if ((!IsVectorBitCastCheap(X) \|\| !IsVectorBitCastCheap(Y)) &&
!IsOrXorXorCCZero)		!IsOrXorXorTreeCCZero)
return SDValue();		return SDValue();

EVT VT = SetCC->getValueType(0);		EVT VT = SetCC->getValueType(0);
SDLoc DL(SetCC);		SDLoc DL(SetCC);
bool HasAVX = Subtarget.hasAVX();		bool HasAVX = Subtarget.hasAVX();

// Use XOR (plus OR) and PTEST after SSE4.1 and before AVX512.		// Use XOR (plus OR) and PTEST after SSE4.1 and before AVX512.
// Otherwise use PCMPEQ (plus AND) and mask testing.		// Otherwise use PCMPEQ (plus AND) and mask testing.
		bool DoZext = false;
if ((OpSize == 128 && Subtarget.hasSSE2()) \|\|		if ((OpSize == 128 && Subtarget.hasSSE2()) \|\|
(OpSize == 256 && HasAVX) \|\|		(OpSize == 256 && HasAVX) \|\|
(OpSize == 512 && Subtarget.useAVX512Regs())) {		(OpSize == 512 && Subtarget.useAVX512Regs())) {
bool HasPT = Subtarget.hasSSE41();		bool HasPT = Subtarget.hasSSE41();
EVT VecVT = MVT::v16i8;		EVT VecVT = MVT::v16i8;
EVT CmpVT = MVT::v16i8;		EVT CmpVT = MVT::v16i8;
if (OpSize == 256)		if (OpSize == 256)
VecVT = CmpVT = MVT::v32i8;		VecVT = CmpVT = MVT::v32i8;
if (OpSize == 512) {		if (OpSize == 512) {
if (Subtarget.hasBWI()) {		if (Subtarget.hasBWI()) {
VecVT = MVT::v64i8;		VecVT = MVT::v64i8;
CmpVT = MVT::v64i1;		CmpVT = MVT::v64i1;
} else {		} else {
VecVT = MVT::v16i32;		VecVT = MVT::v16i32;
CmpVT = MVT::v16i1;		CmpVT = MVT::v16i1;
}		}
}		}

SDValue Cmp;		SDValue Cmp;
if (IsOrXorXorCCZero) {		if (IsOrXorXorTreeCCZero) {
// This is a bitwise-combined equality comparison of 2 pairs of vectors:		// This is a bitwise-combined equality comparison of 2 pairs of vectors:
// setcc i128 (or (xor A, B), (xor C, D)), 0, eq\|ne		// setcc i128 (or (xor A, B), (xor C, D)), 0, eq\|ne
// Use 2 vector equality compares and 'and' the results before doing a		// Use 2 vector equality compares and 'and' the results before doing a
// MOVMSK.		// MOVMSK.
SDValue A = DAG.getBitcast(VecVT, X.getOperand(0).getOperand(0));		Cmp = emitOrXorXorTree(X, DL, DAG, VecVT, CmpVT, HasPT);
SDValue B = DAG.getBitcast(VecVT, X.getOperand(0).getOperand(1));
SDValue C = DAG.getBitcast(VecVT, X.getOperand(1).getOperand(0));
SDValue D = DAG.getBitcast(VecVT, X.getOperand(1).getOperand(1));
if (VecVT == CmpVT && HasPT) {
SDValue Cmp1 = DAG.getNode(ISD::XOR, DL, VecVT, A, B);
SDValue Cmp2 = DAG.getNode(ISD::XOR, DL, VecVT, C, D);
Cmp = DAG.getNode(ISD::OR, DL, VecVT, Cmp1, Cmp2);
} else {
SDValue Cmp1 = DAG.getSetCC(DL, CmpVT, A, B, ISD::SETEQ);
SDValue Cmp2 = DAG.getSetCC(DL, CmpVT, C, D, ISD::SETEQ);
Cmp = DAG.getNode(ISD::AND, DL, CmpVT, Cmp1, Cmp2);
}
} else {		} else {
SDValue VecX = DAG.getBitcast(VecVT, X);		SDValue VecX = DAG.getBitcast(VecVT, X);
SDValue VecY = DAG.getBitcast(VecVT, Y);		SDValue VecY = DAG.getBitcast(VecVT, Y);
if (VecVT == CmpVT && HasPT) {		if (VecVT == CmpVT && HasPT) {
Cmp = DAG.getNode(ISD::XOR, DL, VecVT, VecX, VecY);		Cmp = DAG.getNode(ISD::XOR, DL, VecVT, VecX, VecY);
} else {		} else {
Cmp = DAG.getSetCC(DL, CmpVT, VecX, VecY, ISD::SETEQ);		Cmp = DAG.getSetCC(DL, CmpVT, VecX, VecY, ISD::SETEQ);
}		}
▲ Show 20 Lines • Show All 3,464 Lines • Show Last 20 Lines

lib/Target/X86/X86TargetTransformInfo.cpp

Show First 20 Lines • Show All 3,430 Lines • ▼ Show 20 Lines	bool X86TTIImpl::areFunctionArgsABICompatible(

return TM.getSubtarget<X86Subtarget>(*Caller).useAVX512Regs() ==		return TM.getSubtarget<X86Subtarget>(*Caller).useAVX512Regs() ==
TM.getSubtarget<X86Subtarget>(*Callee).useAVX512Regs();		TM.getSubtarget<X86Subtarget>(*Callee).useAVX512Regs();
}		}

X86TTIImpl::TTI::MemCmpExpansionOptions		X86TTIImpl::TTI::MemCmpExpansionOptions
X86TTIImpl::enableMemCmpExpansion(bool OptSize, bool IsZeroCmp) const {		X86TTIImpl::enableMemCmpExpansion(bool OptSize, bool IsZeroCmp) const {
TTI::MemCmpExpansionOptions Options;		TTI::MemCmpExpansionOptions Options;
Options.MaxNumLoads = TLI->getMaxExpandSizeMemcmp(OptSize);		Options.MaxNumLoads = TLI->getMaxExpandSizeMemcmp(OptSize, IsZeroCmp);
Options.NumLoadsPerBlock = 2;		Options.NumLoadsPerBlock = 4;
if (IsZeroCmp) {		if (IsZeroCmp) {
// Only enable vector loads for equality comparison. Right now the vector		// Only enable vector loads for equality comparison. Right now the vector
// version is not as fast for three way compare (see #33329).		// version is not as fast for three way compare (see #33329).
const unsigned PreferredWidth = ST->getPreferVectorWidth();		const unsigned PreferredWidth = ST->getPreferVectorWidth();
if (PreferredWidth >= 512 && ST->hasAVX512()) Options.LoadSizes.push_back(64);		if (PreferredWidth >= 512 && ST->hasAVX512()) Options.LoadSizes.push_back(64);
if (PreferredWidth >= 256 && ST->hasAVX2()) Options.LoadSizes.push_back(32);		if (PreferredWidth >= 256 && ST->hasAVX2()) Options.LoadSizes.push_back(32);
if (PreferredWidth >= 128 && ST->hasSSE2()) Options.LoadSizes.push_back(16);		if (PreferredWidth >= 128 && ST->hasSSE2()) Options.LoadSizes.push_back(16);
// All GPR and vector loads can be unaligned. SIMD compare requires integer		// All GPR and vector loads can be unaligned. SIMD compare requires integer
▲ Show 20 Lines • Show All 301 Lines • Show Last 20 Lines

test/CodeGen/X86/memcmp.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=cmov \| FileCheck %s --check-prefix=X86 --check-prefix=X86-NOSSE			; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=cmov \| FileCheck %s --check-prefix=X86 --check-prefix=X86-NOSSE
	; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+sse \| FileCheck %s --check-prefix=X86 --check-prefix=SSE --check-prefix=X86-SSE1			; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+sse \| FileCheck %s --check-prefix=X86 --check-prefix=SSE --check-prefix=X86-SSE1
	; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+sse2 \| FileCheck %s --check-prefix=X86 --check-prefix=SSE --check-prefix=X86-SSE2			; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+sse2 \| FileCheck %s --check-prefix=X86 --check-prefix=SSE --check-prefix=X86-SSE2
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown \| FileCheck %s --check-prefix=X64 --check-prefix=X64-SSE2			; RUN: llc < %s -mtriple=x86_64-unknown-unknown \| FileCheck %s --check-prefix=X64 --check-prefix=X64-SSE2
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx \| FileCheck %s --check-prefix=X64 --check-prefix=X64-AVX --check-prefix=X64-AVX1			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx \| FileCheck %s --check-prefix=X64 --check-prefix=X64-AVX --check-prefix=X64-AVX1
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx2 \| FileCheck %s --check-prefix=X64 --check-prefix=X64-AVX --check-prefix=X64-AVX2			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx2 \| FileCheck %s --check-prefix=X64 --check-prefix=X64-AVX --check-prefix=X64-AVX2
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx512f \| FileCheck %s --check-prefix=X64 --check-prefix=X64-AVX512F			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx512f \| FileCheck %s --check-prefix=X64 --check-prefix=X64-AVX512F
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx512bw \| FileCheck %s --check-prefix=X64 --check-prefix=X64-AVX512BW			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx512bw \| FileCheck %s --check-prefix=X64 --check-prefix=X64-AVX512BW

	; This tests codegen time inlining/optimization of memcmp			; This tests codegen time inlining/optimization of memcmp
	; rdar://6480398			; rdar://6480398

	@.str = private constant [65 x i8] c"0123456789012345678901234567890123456789012345678901234567890123\00", align 1			@.str = private constant [513 x i8] c"01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901\00", align 1

	declare i32 @memcmp(i8, i8, i64)			declare i32 @memcmp(i8, i8, i64)

	define i32 @length0(i8* %X, i8* %Y) nounwind {			define i32 @length0(i8* %X, i8* %Y) nounwind {
	; X86-LABEL: length0:			; X86-LABEL: length0:
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: xorl %eax, %eax			; X86-NEXT: xorl %eax, %eax
	; X86-NEXT: retl			; X86-NEXT: retl
	▲ Show 20 Lines • Show All 161 Lines • ▼ Show 20 Lines
	; X86-NEXT: retl			; X86-NEXT: retl
	;			;
	; X64-LABEL: length2_eq_const:			; X64-LABEL: length2_eq_const:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: movzwl (%rdi), %eax			; X64-NEXT: movzwl (%rdi), %eax
	; X64-NEXT: cmpl $12849, %eax # imm = 0x3231			; X64-NEXT: cmpl $12849, %eax # imm = 0x3231
	; X64-NEXT: setne %al			; X64-NEXT: setne %al
	; X64-NEXT: retq			; X64-NEXT: retq
	%m = tail call i32 @memcmp(i8* %X, i8* getelementptr inbounds ([65 x i8], [65 x i8]* @.str, i32 0, i32 1), i64 2) nounwind			%m = tail call i32 @memcmp(i8* %X, i8* getelementptr inbounds ([513 x i8], [513 x i8]* @.str, i32 0, i32 1), i64 2) nounwind
	%c = icmp ne i32 %m, 0			%c = icmp ne i32 %m, 0
	ret i1 %c			ret i1 %c
	}			}

	define i1 @length2_eq_nobuiltin_attr(i8* %X, i8* %Y) nounwind {			define i1 @length2_eq_nobuiltin_attr(i8* %X, i8* %Y) nounwind {
	; X86-LABEL: length2_eq_nobuiltin_attr:			; X86-LABEL: length2_eq_nobuiltin_attr:
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: pushl $0			; X86-NEXT: pushl $0
	▲ Show 20 Lines • Show All 225 Lines • ▼ Show 20 Lines
	; X86-NEXT: sete %al			; X86-NEXT: sete %al
	; X86-NEXT: retl			; X86-NEXT: retl
	;			;
	; X64-LABEL: length4_eq_const:			; X64-LABEL: length4_eq_const:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: cmpl $875770417, (%rdi) # imm = 0x34333231			; X64-NEXT: cmpl $875770417, (%rdi) # imm = 0x34333231
	; X64-NEXT: sete %al			; X64-NEXT: sete %al
	; X64-NEXT: retq			; X64-NEXT: retq
	%m = tail call i32 @memcmp(i8* %X, i8* getelementptr inbounds ([65 x i8], [65 x i8]* @.str, i32 0, i32 1), i64 4) nounwind			%m = tail call i32 @memcmp(i8* %X, i8* getelementptr inbounds ([513 x i8], [513 x i8]* @.str, i32 0, i32 1), i64 4) nounwind
	%c = icmp eq i32 %m, 0			%c = icmp eq i32 %m, 0
	ret i1 %c			ret i1 %c
	}			}

	define i32 @length5(i8* %X, i8* %Y) nounwind {			define i32 @length5(i8* %X, i8* %Y) nounwind {
	; X86-LABEL: length5:			; X86-LABEL: length5:
	; X86: # %bb.0: # %loadbb			; X86: # %bb.0: # %loadbb
	; X86-NEXT: pushl %esi			; X86-NEXT: pushl %esi
	▲ Show 20 Lines • Show All 231 Lines • ▼ Show 20 Lines
	; X86-NEXT: retl			; X86-NEXT: retl
	;			;
	; X64-LABEL: length8_eq_const:			; X64-LABEL: length8_eq_const:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: movabsq $3978425819141910832, %rax # imm = 0x3736353433323130			; X64-NEXT: movabsq $3978425819141910832, %rax # imm = 0x3736353433323130
	; X64-NEXT: cmpq %rax, (%rdi)			; X64-NEXT: cmpq %rax, (%rdi)
	; X64-NEXT: setne %al			; X64-NEXT: setne %al
	; X64-NEXT: retq			; X64-NEXT: retq
	%m = tail call i32 @memcmp(i8* %X, i8* getelementptr inbounds ([65 x i8], [65 x i8]* @.str, i32 0, i32 0), i64 8) nounwind			%m = tail call i32 @memcmp(i8* %X, i8* getelementptr inbounds ([513 x i8], [513 x i8]* @.str, i32 0, i32 0), i64 8) nounwind
	%c = icmp ne i32 %m, 0			%c = icmp ne i32 %m, 0
	ret i1 %c			ret i1 %c
	}			}

	define i1 @length9_eq(i8* %X, i8* %Y) nounwind {			define i1 @length9_eq(i8* %X, i8* %Y) nounwind {
	; X86-LABEL: length9_eq:			; X86-LABEL: length9_eq:
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: pushl $0			; X86-NEXT: pushl %esi
	; X86-NEXT: pushl $9			; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-NEXT: pushl {{[0-9]+}}(%esp)			; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; X86-NEXT: pushl {{[0-9]+}}(%esp)			; X86-NEXT: movl (%ecx), %edx
	; X86-NEXT: calll memcmp			; X86-NEXT: movl 4(%ecx), %esi
	; X86-NEXT: addl $16, %esp			; X86-NEXT: xorl (%eax), %edx
	; X86-NEXT: testl %eax, %eax			; X86-NEXT: xorl 4(%eax), %esi
				; X86-NEXT: orl %edx, %esi
				; X86-NEXT: movb 8(%ecx), %cl
				; X86-NEXT: xorb 8(%eax), %cl
				; X86-NEXT: movzbl %cl, %eax
				; X86-NEXT: orl %esi, %eax
	; X86-NEXT: sete %al			; X86-NEXT: sete %al
				; X86-NEXT: popl %esi
	; X86-NEXT: retl			; X86-NEXT: retl
	;			;
	; X64-LABEL: length9_eq:			; X64-LABEL: length9_eq:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: movq (%rdi), %rax			; X64-NEXT: movq (%rdi), %rax
	; X64-NEXT: xorq (%rsi), %rax			; X64-NEXT: xorq (%rsi), %rax
	; X64-NEXT: movb 8(%rdi), %cl			; X64-NEXT: movb 8(%rdi), %cl
	; X64-NEXT: xorb 8(%rsi), %cl			; X64-NEXT: xorb 8(%rsi), %cl
	; X64-NEXT: movzbl %cl, %ecx			; X64-NEXT: movzbl %cl, %ecx
	; X64-NEXT: orq %rax, %rcx			; X64-NEXT: orq %rax, %rcx
	; X64-NEXT: sete %al			; X64-NEXT: sete %al
	; X64-NEXT: retq			; X64-NEXT: retq
	%m = tail call i32 @memcmp(i8* %X, i8* %Y, i64 9) nounwind			%m = tail call i32 @memcmp(i8* %X, i8* %Y, i64 9) nounwind
	%c = icmp eq i32 %m, 0			%c = icmp eq i32 %m, 0
	ret i1 %c			ret i1 %c
	}			}

	define i1 @length10_eq(i8* %X, i8* %Y) nounwind {			define i1 @length10_eq(i8* %X, i8* %Y) nounwind {
	; X86-LABEL: length10_eq:			; X86-LABEL: length10_eq:
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: pushl $0			; X86-NEXT: pushl %esi
	; X86-NEXT: pushl $10			; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-NEXT: pushl {{[0-9]+}}(%esp)			; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; X86-NEXT: pushl {{[0-9]+}}(%esp)			; X86-NEXT: movl (%ecx), %edx
	; X86-NEXT: calll memcmp			; X86-NEXT: movl 4(%ecx), %esi
	; X86-NEXT: addl $16, %esp			; X86-NEXT: xorl (%eax), %edx
	; X86-NEXT: testl %eax, %eax			; X86-NEXT: xorl 4(%eax), %esi
				; X86-NEXT: orl %edx, %esi
				; X86-NEXT: movzwl 8(%ecx), %ecx
				; X86-NEXT: xorw 8(%eax), %cx
				; X86-NEXT: movzwl %cx, %eax
				; X86-NEXT: orl %esi, %eax
	; X86-NEXT: sete %al			; X86-NEXT: sete %al
				; X86-NEXT: popl %esi
	; X86-NEXT: retl			; X86-NEXT: retl
	;			;
	; X64-LABEL: length10_eq:			; X64-LABEL: length10_eq:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: movq (%rdi), %rax			; X64-NEXT: movq (%rdi), %rax
	; X64-NEXT: xorq (%rsi), %rax			; X64-NEXT: xorq (%rsi), %rax
	; X64-NEXT: movzwl 8(%rdi), %ecx			; X64-NEXT: movzwl 8(%rdi), %ecx
	; X64-NEXT: xorw 8(%rsi), %cx			; X64-NEXT: xorw 8(%rsi), %cx
	; X64-NEXT: movzwl %cx, %ecx			; X64-NEXT: movzwl %cx, %ecx
	; X64-NEXT: orq %rax, %rcx			; X64-NEXT: orq %rax, %rcx
	; X64-NEXT: sete %al			; X64-NEXT: sete %al
	; X64-NEXT: retq			; X64-NEXT: retq
	%m = tail call i32 @memcmp(i8* %X, i8* %Y, i64 10) nounwind			%m = tail call i32 @memcmp(i8* %X, i8* %Y, i64 10) nounwind
	%c = icmp eq i32 %m, 0			%c = icmp eq i32 %m, 0
	ret i1 %c			ret i1 %c
	}			}

	define i1 @length11_eq(i8* %X, i8* %Y) nounwind {			define i1 @length11_eq(i8* %X, i8* %Y) nounwind {
	; X86-LABEL: length11_eq:			; X86-LABEL: length11_eq:
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: pushl $0			; X86-NEXT: pushl %esi
	; X86-NEXT: pushl $11			; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-NEXT: pushl {{[0-9]+}}(%esp)			; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; X86-NEXT: pushl {{[0-9]+}}(%esp)			; X86-NEXT: movl (%ecx), %edx
	; X86-NEXT: calll memcmp			; X86-NEXT: movl 4(%ecx), %esi
	; X86-NEXT: addl $16, %esp			; X86-NEXT: xorl (%eax), %edx
	; X86-NEXT: testl %eax, %eax			; X86-NEXT: xorl 4(%eax), %esi
				; X86-NEXT: orl %edx, %esi
				; X86-NEXT: movl 7(%ecx), %ecx
				; X86-NEXT: xorl 7(%eax), %ecx
				; X86-NEXT: orl %esi, %ecx
	; X86-NEXT: sete %al			; X86-NEXT: sete %al
				; X86-NEXT: popl %esi
	; X86-NEXT: retl			; X86-NEXT: retl
	;			;
	; X64-LABEL: length11_eq:			; X64-LABEL: length11_eq:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: movq (%rdi), %rax			; X64-NEXT: movq (%rdi), %rax
	; X64-NEXT: movq 3(%rdi), %rcx			; X64-NEXT: movq 3(%rdi), %rcx
	; X64-NEXT: xorq (%rsi), %rax			; X64-NEXT: xorq (%rsi), %rax
	; X64-NEXT: xorq 3(%rsi), %rcx			; X64-NEXT: xorq 3(%rsi), %rcx
	; X64-NEXT: orq %rax, %rcx			; X64-NEXT: orq %rax, %rcx
	; X64-NEXT: sete %al			; X64-NEXT: sete %al
	; X64-NEXT: retq			; X64-NEXT: retq
	%m = tail call i32 @memcmp(i8* %X, i8* %Y, i64 11) nounwind			%m = tail call i32 @memcmp(i8* %X, i8* %Y, i64 11) nounwind
	%c = icmp eq i32 %m, 0			%c = icmp eq i32 %m, 0
	ret i1 %c			ret i1 %c
	}			}

	define i1 @length12_eq(i8* %X, i8* %Y) nounwind {			define i1 @length12_eq(i8* %X, i8* %Y) nounwind {
	; X86-LABEL: length12_eq:			; X86-LABEL: length12_eq:
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: pushl $0			; X86-NEXT: pushl %esi
	; X86-NEXT: pushl $12			; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-NEXT: pushl {{[0-9]+}}(%esp)			; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; X86-NEXT: pushl {{[0-9]+}}(%esp)			; X86-NEXT: movl (%ecx), %edx
	; X86-NEXT: calll memcmp			; X86-NEXT: movl 4(%ecx), %esi
	; X86-NEXT: addl $16, %esp			; X86-NEXT: xorl (%eax), %edx
	; X86-NEXT: testl %eax, %eax			; X86-NEXT: xorl 4(%eax), %esi
				; X86-NEXT: orl %edx, %esi
				; X86-NEXT: movl 8(%ecx), %ecx
				; X86-NEXT: xorl 8(%eax), %ecx
				; X86-NEXT: orl %esi, %ecx
	; X86-NEXT: setne %al			; X86-NEXT: setne %al
				; X86-NEXT: popl %esi
	; X86-NEXT: retl			; X86-NEXT: retl
	;			;
	; X64-LABEL: length12_eq:			; X64-LABEL: length12_eq:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: movq (%rdi), %rax			; X64-NEXT: movq (%rdi), %rax
	; X64-NEXT: xorq (%rsi), %rax			; X64-NEXT: xorq (%rsi), %rax
	; X64-NEXT: movl 8(%rdi), %ecx			; X64-NEXT: movl 8(%rdi), %ecx
	; X64-NEXT: xorl 8(%rsi), %ecx			; X64-NEXT: xorl 8(%rsi), %ecx
	▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
	; X64-NEXT: retq			; X64-NEXT: retq
	%m = tail call i32 @memcmp(i8* %X, i8* %Y, i64 12) nounwind			%m = tail call i32 @memcmp(i8* %X, i8* %Y, i64 12) nounwind
	ret i32 %m			ret i32 %m
	}			}

	define i1 @length13_eq(i8* %X, i8* %Y) nounwind {			define i1 @length13_eq(i8* %X, i8* %Y) nounwind {
	; X86-LABEL: length13_eq:			; X86-LABEL: length13_eq:
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: pushl $0			; X86-NEXT: pushl %esi
	; X86-NEXT: pushl $13			; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-NEXT: pushl {{[0-9]+}}(%esp)			; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; X86-NEXT: pushl {{[0-9]+}}(%esp)			; X86-NEXT: movl (%ecx), %edx
	; X86-NEXT: calll memcmp			; X86-NEXT: movl 4(%ecx), %esi
	; X86-NEXT: addl $16, %esp			; X86-NEXT: xorl (%eax), %edx
	; X86-NEXT: testl %eax, %eax			; X86-NEXT: xorl 4(%eax), %esi
				; X86-NEXT: orl %edx, %esi
				; X86-NEXT: movl 8(%ecx), %edx
				; X86-NEXT: xorl 8(%eax), %edx
				; X86-NEXT: movb 12(%ecx), %cl
				; X86-NEXT: xorb 12(%eax), %cl
				; X86-NEXT: movzbl %cl, %eax
				; X86-NEXT: orl %edx, %eax
				; X86-NEXT: orl %esi, %eax
	; X86-NEXT: sete %al			; X86-NEXT: sete %al
				; X86-NEXT: popl %esi
	; X86-NEXT: retl			; X86-NEXT: retl
	;			;
	; X64-LABEL: length13_eq:			; X64-LABEL: length13_eq:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: movq (%rdi), %rax			; X64-NEXT: movq (%rdi), %rax
	; X64-NEXT: movq 5(%rdi), %rcx			; X64-NEXT: movq 5(%rdi), %rcx
	; X64-NEXT: xorq (%rsi), %rax			; X64-NEXT: xorq (%rsi), %rax
	; X64-NEXT: xorq 5(%rsi), %rcx			; X64-NEXT: xorq 5(%rsi), %rcx
	; X64-NEXT: orq %rax, %rcx			; X64-NEXT: orq %rax, %rcx
	; X64-NEXT: sete %al			; X64-NEXT: sete %al
	; X64-NEXT: retq			; X64-NEXT: retq
	%m = tail call i32 @memcmp(i8* %X, i8* %Y, i64 13) nounwind			%m = tail call i32 @memcmp(i8* %X, i8* %Y, i64 13) nounwind
	%c = icmp eq i32 %m, 0			%c = icmp eq i32 %m, 0
	ret i1 %c			ret i1 %c
	}			}

	define i1 @length14_eq(i8* %X, i8* %Y) nounwind {			define i1 @length14_eq(i8* %X, i8* %Y) nounwind {
	; X86-LABEL: length14_eq:			; X86-LABEL: length14_eq:
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: pushl $0			; X86-NEXT: pushl %esi
	; X86-NEXT: pushl $14			; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-NEXT: pushl {{[0-9]+}}(%esp)			; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; X86-NEXT: pushl {{[0-9]+}}(%esp)			; X86-NEXT: movl (%ecx), %edx
	; X86-NEXT: calll memcmp			; X86-NEXT: movl 4(%ecx), %esi
	; X86-NEXT: addl $16, %esp			; X86-NEXT: xorl (%eax), %edx
	; X86-NEXT: testl %eax, %eax			; X86-NEXT: xorl 4(%eax), %esi
				; X86-NEXT: orl %edx, %esi
				; X86-NEXT: movl 8(%ecx), %edx
				; X86-NEXT: xorl 8(%eax), %edx
				; X86-NEXT: movzwl 12(%ecx), %ecx
				; X86-NEXT: xorw 12(%eax), %cx
				; X86-NEXT: movzwl %cx, %eax
				; X86-NEXT: orl %edx, %eax
				; X86-NEXT: orl %esi, %eax
	; X86-NEXT: sete %al			; X86-NEXT: sete %al
				; X86-NEXT: popl %esi
	; X86-NEXT: retl			; X86-NEXT: retl
	;			;
	; X64-LABEL: length14_eq:			; X64-LABEL: length14_eq:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: movq (%rdi), %rax			; X64-NEXT: movq (%rdi), %rax
	; X64-NEXT: movq 6(%rdi), %rcx			; X64-NEXT: movq 6(%rdi), %rcx
	; X64-NEXT: xorq (%rsi), %rax			; X64-NEXT: xorq (%rsi), %rax
	; X64-NEXT: xorq 6(%rsi), %rcx			; X64-NEXT: xorq 6(%rsi), %rcx
	; X64-NEXT: orq %rax, %rcx			; X64-NEXT: orq %rax, %rcx
	; X64-NEXT: sete %al			; X64-NEXT: sete %al
	; X64-NEXT: retq			; X64-NEXT: retq
	%m = tail call i32 @memcmp(i8* %X, i8* %Y, i64 14) nounwind			%m = tail call i32 @memcmp(i8* %X, i8* %Y, i64 14) nounwind
	%c = icmp eq i32 %m, 0			%c = icmp eq i32 %m, 0
	ret i1 %c			ret i1 %c
	}			}

	define i1 @length15_eq(i8* %X, i8* %Y) nounwind {			define i1 @length15_eq(i8* %X, i8* %Y) nounwind {
	; X86-LABEL: length15_eq:			; X86-LABEL: length15_eq:
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: pushl $0			; X86-NEXT: pushl %esi
	; X86-NEXT: pushl $15			; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-NEXT: pushl {{[0-9]+}}(%esp)			; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; X86-NEXT: pushl {{[0-9]+}}(%esp)			; X86-NEXT: movl (%ecx), %edx
	; X86-NEXT: calll memcmp			; X86-NEXT: movl 4(%ecx), %esi
	; X86-NEXT: addl $16, %esp			; X86-NEXT: xorl (%eax), %edx
	; X86-NEXT: testl %eax, %eax			; X86-NEXT: xorl 4(%eax), %esi
				; X86-NEXT: orl %edx, %esi
				; X86-NEXT: movl 8(%ecx), %edx
				; X86-NEXT: xorl 8(%eax), %edx
				; X86-NEXT: movl 11(%ecx), %ecx
				; X86-NEXT: xorl 11(%eax), %ecx
				; X86-NEXT: orl %edx, %ecx
				; X86-NEXT: orl %esi, %ecx
	; X86-NEXT: sete %al			; X86-NEXT: sete %al
				; X86-NEXT: popl %esi
	; X86-NEXT: retl			; X86-NEXT: retl
	;			;
	; X64-LABEL: length15_eq:			; X64-LABEL: length15_eq:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: movq (%rdi), %rax			; X64-NEXT: movq (%rdi), %rax
	; X64-NEXT: movq 7(%rdi), %rcx			; X64-NEXT: movq 7(%rdi), %rcx
	; X64-NEXT: xorq (%rsi), %rax			; X64-NEXT: xorq (%rsi), %rax
	; X64-NEXT: xorq 7(%rsi), %rcx			; X64-NEXT: xorq 7(%rsi), %rcx
	▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
	; X64-NEXT: retq			; X64-NEXT: retq
	%m = tail call i32 @memcmp(i8* %X, i8* %Y, i64 16) nounwind			%m = tail call i32 @memcmp(i8* %X, i8* %Y, i64 16) nounwind
	ret i32 %m			ret i32 %m
	}			}

	define i1 @length16_eq(i8* %x, i8* %y) nounwind {			define i1 @length16_eq(i8* %x, i8* %y) nounwind {
	; X86-NOSSE-LABEL: length16_eq:			; X86-NOSSE-LABEL: length16_eq:
	; X86-NOSSE: # %bb.0:			; X86-NOSSE: # %bb.0:
	; X86-NOSSE-NEXT: pushl $0			; X86-NOSSE-NEXT: pushl %esi
	; X86-NOSSE-NEXT: pushl $16			; X86-NOSSE-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-NOSSE-NEXT: pushl {{[0-9]+}}(%esp)			; X86-NOSSE-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; X86-NOSSE-NEXT: pushl {{[0-9]+}}(%esp)			; X86-NOSSE-NEXT: movl (%ecx), %edx
	; X86-NOSSE-NEXT: calll memcmp			; X86-NOSSE-NEXT: movl 4(%ecx), %esi
	; X86-NOSSE-NEXT: addl $16, %esp			; X86-NOSSE-NEXT: xorl (%eax), %edx
	; X86-NOSSE-NEXT: testl %eax, %eax			; X86-NOSSE-NEXT: xorl 4(%eax), %esi
				; X86-NOSSE-NEXT: orl %edx, %esi
				; X86-NOSSE-NEXT: movl 8(%ecx), %edx
				; X86-NOSSE-NEXT: xorl 8(%eax), %edx
				; X86-NOSSE-NEXT: movl 12(%ecx), %ecx
				; X86-NOSSE-NEXT: xorl 12(%eax), %ecx
				; X86-NOSSE-NEXT: orl %edx, %ecx
				; X86-NOSSE-NEXT: orl %esi, %ecx
	; X86-NOSSE-NEXT: setne %al			; X86-NOSSE-NEXT: setne %al
				; X86-NOSSE-NEXT: popl %esi
	; X86-NOSSE-NEXT: retl			; X86-NOSSE-NEXT: retl
	;			;
	; X86-SSE1-LABEL: length16_eq:			; X86-SSE1-LABEL: length16_eq:
	; X86-SSE1: # %bb.0:			; X86-SSE1: # %bb.0:
	; X86-SSE1-NEXT: pushl $0			; X86-SSE1-NEXT: pushl %esi
	; X86-SSE1-NEXT: pushl $16			; X86-SSE1-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-SSE1-NEXT: pushl {{[0-9]+}}(%esp)			; X86-SSE1-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; X86-SSE1-NEXT: pushl {{[0-9]+}}(%esp)			; X86-SSE1-NEXT: movl (%ecx), %edx
	; X86-SSE1-NEXT: calll memcmp			; X86-SSE1-NEXT: movl 4(%ecx), %esi
	; X86-SSE1-NEXT: addl $16, %esp			; X86-SSE1-NEXT: xorl (%eax), %edx
	; X86-SSE1-NEXT: testl %eax, %eax			; X86-SSE1-NEXT: xorl 4(%eax), %esi
				; X86-SSE1-NEXT: orl %edx, %esi
				; X86-SSE1-NEXT: movl 8(%ecx), %edx
				; X86-SSE1-NEXT: xorl 8(%eax), %edx
				; X86-SSE1-NEXT: movl 12(%ecx), %ecx
				; X86-SSE1-NEXT: xorl 12(%eax), %ecx
				; X86-SSE1-NEXT: orl %edx, %ecx
				; X86-SSE1-NEXT: orl %esi, %ecx
	; X86-SSE1-NEXT: setne %al			; X86-SSE1-NEXT: setne %al
				; X86-SSE1-NEXT: popl %esi
	; X86-SSE1-NEXT: retl			; X86-SSE1-NEXT: retl
	;			;
	; X86-SSE2-LABEL: length16_eq:			; X86-SSE2-LABEL: length16_eq:
	; X86-SSE2: # %bb.0:			; X86-SSE2: # %bb.0:
	; X86-SSE2-NEXT: movl {{[0-9]+}}(%esp), %eax			; X86-SSE2-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-SSE2-NEXT: movl {{[0-9]+}}(%esp), %ecx			; X86-SSE2-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; X86-SSE2-NEXT: movdqu (%ecx), %xmm0			; X86-SSE2-NEXT: movdqu (%ecx), %xmm0
	; X86-SSE2-NEXT: movdqu (%eax), %xmm1			; X86-SSE2-NEXT: movdqu (%eax), %xmm1
	Show All 15 Lines
	;			;
	; X64-AVX-LABEL: length16_eq:			; X64-AVX-LABEL: length16_eq:
	; X64-AVX: # %bb.0:			; X64-AVX: # %bb.0:
	; X64-AVX-NEXT: vmovdqu (%rdi), %xmm0			; X64-AVX-NEXT: vmovdqu (%rdi), %xmm0
	; X64-AVX-NEXT: vpxor (%rsi), %xmm0, %xmm0			; X64-AVX-NEXT: vpxor (%rsi), %xmm0, %xmm0
	; X64-AVX-NEXT: vptest %xmm0, %xmm0			; X64-AVX-NEXT: vptest %xmm0, %xmm0
	; X64-AVX-NEXT: setne %al			; X64-AVX-NEXT: setne %al
	; X64-AVX-NEXT: retq			; X64-AVX-NEXT: retq
				;
				; X64-AVX512F-LABEL: length16_eq:
				; X64-AVX512F: # %bb.0:
				; X64-AVX512F-NEXT: vmovdqu (%rdi), %xmm0
				; X64-AVX512F-NEXT: vpxor (%rsi), %xmm0, %xmm0
				; X64-AVX512F-NEXT: vptest %xmm0, %xmm0
				; X64-AVX512F-NEXT: setne %al
				; X64-AVX512F-NEXT: retq
				;
				; X64-AVX512BW-LABEL: length16_eq:
				; X64-AVX512BW: # %bb.0:
				; X64-AVX512BW-NEXT: vmovdqu (%rdi), %xmm0
				; X64-AVX512BW-NEXT: vpxor (%rsi), %xmm0, %xmm0
				; X64-AVX512BW-NEXT: vptest %xmm0, %xmm0
				; X64-AVX512BW-NEXT: setne %al
				; X64-AVX512BW-NEXT: retq
	%call = tail call i32 @memcmp(i8* %x, i8* %y, i64 16) nounwind			%call = tail call i32 @memcmp(i8* %x, i8* %y, i64 16) nounwind
	%cmp = icmp ne i32 %call, 0			%cmp = icmp ne i32 %call, 0
	ret i1 %cmp			ret i1 %cmp
	}			}

	define i1 @length16_eq_const(i8* %X) nounwind {			define i1 @length16_eq_const(i8* %X) nounwind {
	; X86-NOSSE-LABEL: length16_eq_const:			; X86-NOSSE-LABEL: length16_eq_const:
	; X86-NOSSE: # %bb.0:			; X86-NOSSE: # %bb.0:
	; X86-NOSSE-NEXT: pushl $0			; X86-NOSSE-NEXT: pushl %esi
	; X86-NOSSE-NEXT: pushl $16			; X86-NOSSE-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-NOSSE-NEXT: pushl $.L.str			; X86-NOSSE-NEXT: movl $858927408, %ecx # imm = 0x33323130
	; X86-NOSSE-NEXT: pushl {{[0-9]+}}(%esp)			; X86-NOSSE-NEXT: xorl (%eax), %ecx
	; X86-NOSSE-NEXT: calll memcmp			; X86-NOSSE-NEXT: movl $926299444, %edx # imm = 0x37363534
	; X86-NOSSE-NEXT: addl $16, %esp			; X86-NOSSE-NEXT: xorl 4(%eax), %edx
	; X86-NOSSE-NEXT: testl %eax, %eax			; X86-NOSSE-NEXT: orl %ecx, %edx
				; X86-NOSSE-NEXT: movl $825243960, %ecx # imm = 0x31303938
				; X86-NOSSE-NEXT: xorl 8(%eax), %ecx
				; X86-NOSSE-NEXT: movl $892613426, %esi # imm = 0x35343332
				; X86-NOSSE-NEXT: xorl 12(%eax), %esi
				; X86-NOSSE-NEXT: orl %ecx, %esi
				; X86-NOSSE-NEXT: orl %edx, %esi
	; X86-NOSSE-NEXT: sete %al			; X86-NOSSE-NEXT: sete %al
				; X86-NOSSE-NEXT: popl %esi
	; X86-NOSSE-NEXT: retl			; X86-NOSSE-NEXT: retl
	;			;
	; X86-SSE1-LABEL: length16_eq_const:			; X86-SSE1-LABEL: length16_eq_const:
	; X86-SSE1: # %bb.0:			; X86-SSE1: # %bb.0:
	; X86-SSE1-NEXT: pushl $0			; X86-SSE1-NEXT: pushl %esi
	; X86-SSE1-NEXT: pushl $16			; X86-SSE1-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-SSE1-NEXT: pushl $.L.str			; X86-SSE1-NEXT: movl $858927408, %ecx # imm = 0x33323130
	; X86-SSE1-NEXT: pushl {{[0-9]+}}(%esp)			; X86-SSE1-NEXT: xorl (%eax), %ecx
	; X86-SSE1-NEXT: calll memcmp			; X86-SSE1-NEXT: movl $926299444, %edx # imm = 0x37363534
	; X86-SSE1-NEXT: addl $16, %esp			; X86-SSE1-NEXT: xorl 4(%eax), %edx
	; X86-SSE1-NEXT: testl %eax, %eax			; X86-SSE1-NEXT: orl %ecx, %edx
				; X86-SSE1-NEXT: movl $825243960, %ecx # imm = 0x31303938
				; X86-SSE1-NEXT: xorl 8(%eax), %ecx
				; X86-SSE1-NEXT: movl $892613426, %esi # imm = 0x35343332
				; X86-SSE1-NEXT: xorl 12(%eax), %esi
				; X86-SSE1-NEXT: orl %ecx, %esi
				; X86-SSE1-NEXT: orl %edx, %esi
	; X86-SSE1-NEXT: sete %al			; X86-SSE1-NEXT: sete %al
				; X86-SSE1-NEXT: popl %esi
	; X86-SSE1-NEXT: retl			; X86-SSE1-NEXT: retl
	;			;
	; X86-SSE2-LABEL: length16_eq_const:			; X86-SSE2-LABEL: length16_eq_const:
	; X86-SSE2: # %bb.0:			; X86-SSE2: # %bb.0:
	; X86-SSE2-NEXT: movl {{[0-9]+}}(%esp), %eax			; X86-SSE2-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-SSE2-NEXT: movdqu (%eax), %xmm0			; X86-SSE2-NEXT: movdqu (%eax), %xmm0
	; X86-SSE2-NEXT: pcmpeqb {{\.LCPI.*}}, %xmm0			; X86-SSE2-NEXT: pcmpeqb {{\.LCPI.*}}, %xmm0
	; X86-SSE2-NEXT: pmovmskb %xmm0, %eax			; X86-SSE2-NEXT: pmovmskb %xmm0, %eax
	Show All 12 Lines
	;			;
	; X64-AVX-LABEL: length16_eq_const:			; X64-AVX-LABEL: length16_eq_const:
	; X64-AVX: # %bb.0:			; X64-AVX: # %bb.0:
	; X64-AVX-NEXT: vmovdqu (%rdi), %xmm0			; X64-AVX-NEXT: vmovdqu (%rdi), %xmm0
	; X64-AVX-NEXT: vpxor {{.*}}(%rip), %xmm0, %xmm0			; X64-AVX-NEXT: vpxor {{.*}}(%rip), %xmm0, %xmm0
	; X64-AVX-NEXT: vptest %xmm0, %xmm0			; X64-AVX-NEXT: vptest %xmm0, %xmm0
	; X64-AVX-NEXT: sete %al			; X64-AVX-NEXT: sete %al
	; X64-AVX-NEXT: retq			; X64-AVX-NEXT: retq
	%m = tail call i32 @memcmp(i8* %X, i8* getelementptr inbounds ([65 x i8], [65 x i8]* @.str, i32 0, i32 0), i64 16) nounwind			;
				; X64-AVX512F-LABEL: length16_eq_const:
				; X64-AVX512F: # %bb.0:
				; X64-AVX512F-NEXT: vmovdqu (%rdi), %xmm0
				; X64-AVX512F-NEXT: vpxor {{.*}}(%rip), %xmm0, %xmm0
				; X64-AVX512F-NEXT: vptest %xmm0, %xmm0
				; X64-AVX512F-NEXT: sete %al
				; X64-AVX512F-NEXT: retq
				;
				; X64-AVX512BW-LABEL: length16_eq_const:
				; X64-AVX512BW: # %bb.0:
				; X64-AVX512BW-NEXT: vmovdqu (%rdi), %xmm0
				; X64-AVX512BW-NEXT: vpxor {{.*}}(%rip), %xmm0, %xmm0
				; X64-AVX512BW-NEXT: vptest %xmm0, %xmm0
				; X64-AVX512BW-NEXT: sete %al
				; X64-AVX512BW-NEXT: retq
				%m = tail call i32 @memcmp(i8* %X, i8* getelementptr inbounds ([513 x i8], [513 x i8]* @.str, i32 0, i32 0), i64 16) nounwind
	%c = icmp eq i32 %m, 0			%c = icmp eq i32 %m, 0
	ret i1 %c			ret i1 %c
	}			}

	; PR33914 - https://bugs.llvm.org/show_bug.cgi?id=33914			; PR33914 - https://bugs.llvm.org/show_bug.cgi?id=33914

	define i32 @length24(i8* %X, i8* %Y) nounwind {			define i32 @length24(i8* %X, i8* %Y) nounwind {
	; X86-LABEL: length24:			; X86-LABEL: length24:
	▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines
	; X64-SSE2-NEXT: pmovmskb %xmm2, %eax			; X64-SSE2-NEXT: pmovmskb %xmm2, %eax
	; X64-SSE2-NEXT: cmpl $65535, %eax # imm = 0xFFFF			; X64-SSE2-NEXT: cmpl $65535, %eax # imm = 0xFFFF
	; X64-SSE2-NEXT: sete %al			; X64-SSE2-NEXT: sete %al
	; X64-SSE2-NEXT: retq			; X64-SSE2-NEXT: retq
	;			;
	; X64-AVX-LABEL: length24_eq:			; X64-AVX-LABEL: length24_eq:
	; X64-AVX: # %bb.0:			; X64-AVX: # %bb.0:
	; X64-AVX-NEXT: vmovdqu (%rdi), %xmm0			; X64-AVX-NEXT: vmovdqu (%rdi), %xmm0
	; X64-AVX-NEXT: vmovq 16(%rdi), %xmm1			; X64-AVX-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero
	; X64-AVX-NEXT: vmovq 16(%rsi), %xmm2			; X64-AVX-NEXT: vmovq {{.*#+}} xmm2 = mem[0],zero
	; X64-AVX-NEXT: vpxor %xmm2, %xmm1, %xmm1			; X64-AVX-NEXT: vpxor %xmm2, %xmm1, %xmm1
	; X64-AVX-NEXT: vpxor (%rsi), %xmm0, %xmm0			; X64-AVX-NEXT: vpxor (%rsi), %xmm0, %xmm0
	; X64-AVX-NEXT: vpor %xmm1, %xmm0, %xmm0			; X64-AVX-NEXT: vpor %xmm1, %xmm0, %xmm0
	; X64-AVX-NEXT: vptest %xmm0, %xmm0			; X64-AVX-NEXT: vptest %xmm0, %xmm0
	; X64-AVX-NEXT: sete %al			; X64-AVX-NEXT: sete %al
	; X64-AVX-NEXT: retq			; X64-AVX-NEXT: retq
				;
				; X64-AVX512F-LABEL: length24_eq:
				; X64-AVX512F: # %bb.0:
				; X64-AVX512F-NEXT: vmovdqu (%rdi), %xmm0
				; X64-AVX512F-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero
				; X64-AVX512F-NEXT: vmovq {{.*#+}} xmm2 = mem[0],zero
				; X64-AVX512F-NEXT: vpxor %xmm2, %xmm1, %xmm1
				; X64-AVX512F-NEXT: vpxor (%rsi), %xmm0, %xmm0
				; X64-AVX512F-NEXT: vpor %xmm1, %xmm0, %xmm0
				; X64-AVX512F-NEXT: vptest %xmm0, %xmm0
				; X64-AVX512F-NEXT: sete %al
				; X64-AVX512F-NEXT: retq
				;
				; X64-AVX512BW-LABEL: length24_eq:
				; X64-AVX512BW: # %bb.0:
				; X64-AVX512BW-NEXT: vmovdqu (%rdi), %xmm0
				; X64-AVX512BW-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero
				; X64-AVX512BW-NEXT: vmovq {{.*#+}} xmm2 = mem[0],zero
				; X64-AVX512BW-NEXT: vpxor %xmm2, %xmm1, %xmm1
				; X64-AVX512BW-NEXT: vpxor (%rsi), %xmm0, %xmm0
				; X64-AVX512BW-NEXT: vpor %xmm1, %xmm0, %xmm0
				; X64-AVX512BW-NEXT: vptest %xmm0, %xmm0
				; X64-AVX512BW-NEXT: sete %al
				; X64-AVX512BW-NEXT: retq
	%call = tail call i32 @memcmp(i8* %x, i8* %y, i64 24) nounwind			%call = tail call i32 @memcmp(i8* %x, i8* %y, i64 24) nounwind
	%cmp = icmp eq i32 %call, 0			%cmp = icmp eq i32 %call, 0
	ret i1 %cmp			ret i1 %cmp
	}			}

	define i1 @length24_eq_const(i8* %X) nounwind {			define i1 @length24_eq_const(i8* %X) nounwind {
	; X86-NOSSE-LABEL: length24_eq_const:			; X86-NOSSE-LABEL: length24_eq_const:
	; X86-NOSSE: # %bb.0:			; X86-NOSSE: # %bb.0:
	▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
	; X64-SSE2-NEXT: pmovmskb %xmm0, %eax			; X64-SSE2-NEXT: pmovmskb %xmm0, %eax
	; X64-SSE2-NEXT: cmpl $65535, %eax # imm = 0xFFFF			; X64-SSE2-NEXT: cmpl $65535, %eax # imm = 0xFFFF
	; X64-SSE2-NEXT: setne %al			; X64-SSE2-NEXT: setne %al
	; X64-SSE2-NEXT: retq			; X64-SSE2-NEXT: retq
	;			;
	; X64-AVX-LABEL: length24_eq_const:			; X64-AVX-LABEL: length24_eq_const:
	; X64-AVX: # %bb.0:			; X64-AVX: # %bb.0:
	; X64-AVX-NEXT: vmovdqu (%rdi), %xmm0			; X64-AVX-NEXT: vmovdqu (%rdi), %xmm0
	; X64-AVX-NEXT: vmovq 16(%rdi), %xmm1			; X64-AVX-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero
	; X64-AVX-NEXT: vpxor {{.*}}(%rip), %xmm1, %xmm1			; X64-AVX-NEXT: vpxor {{.*}}(%rip), %xmm1, %xmm1
	; X64-AVX-NEXT: vpxor {{.*}}(%rip), %xmm0, %xmm0			; X64-AVX-NEXT: vpxor {{.*}}(%rip), %xmm0, %xmm0
	; X64-AVX-NEXT: vpor %xmm1, %xmm0, %xmm0			; X64-AVX-NEXT: vpor %xmm1, %xmm0, %xmm0
	; X64-AVX-NEXT: vptest %xmm0, %xmm0			; X64-AVX-NEXT: vptest %xmm0, %xmm0
	; X64-AVX-NEXT: setne %al			; X64-AVX-NEXT: setne %al
	; X64-AVX-NEXT: retq			; X64-AVX-NEXT: retq
	%m = tail call i32 @memcmp(i8* %X, i8* getelementptr inbounds ([65 x i8], [65 x i8]* @.str, i32 0, i32 0), i64 24) nounwind			;
				; X64-AVX512F-LABEL: length24_eq_const:
				; X64-AVX512F: # %bb.0:
				; X64-AVX512F-NEXT: vmovdqu (%rdi), %xmm0
				; X64-AVX512F-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero
				; X64-AVX512F-NEXT: vpxor {{.*}}(%rip), %xmm1, %xmm1
				; X64-AVX512F-NEXT: vpxor {{.*}}(%rip), %xmm0, %xmm0
				; X64-AVX512F-NEXT: vpor %xmm1, %xmm0, %xmm0
				; X64-AVX512F-NEXT: vptest %xmm0, %xmm0
				; X64-AVX512F-NEXT: setne %al
				; X64-AVX512F-NEXT: retq
				;
				; X64-AVX512BW-LABEL: length24_eq_const:
				; X64-AVX512BW: # %bb.0:
				; X64-AVX512BW-NEXT: vmovdqu (%rdi), %xmm0
				; X64-AVX512BW-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero
				; X64-AVX512BW-NEXT: vpxor {{.*}}(%rip), %xmm1, %xmm1
				; X64-AVX512BW-NEXT: vpxor {{.*}}(%rip), %xmm0, %xmm0
				; X64-AVX512BW-NEXT: vpor %xmm1, %xmm0, %xmm0
				; X64-AVX512BW-NEXT: vptest %xmm0, %xmm0
				; X64-AVX512BW-NEXT: setne %al
				; X64-AVX512BW-NEXT: retq
				%m = tail call i32 @memcmp(i8* %X, i8* getelementptr inbounds ([513 x i8], [513 x i8]* @.str, i32 0, i32 0), i64 24) nounwind
	%c = icmp ne i32 %m, 0			%c = icmp ne i32 %m, 0
	ret i1 %c			ret i1 %c
	}			}

	define i32 @length32(i8* %X, i8* %Y) nounwind {			define i32 @length32(i8* %X, i8* %Y) nounwind {
	; X86-LABEL: length32:			; X86-LABEL: length32:
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: pushl $0			; X86-NEXT: pushl $0
	▲ Show 20 Lines • Show All 78 Lines • ▼ Show 20 Lines
	; X64-AVX1-NEXT: vpor %xmm1, %xmm0, %xmm0			; X64-AVX1-NEXT: vpor %xmm1, %xmm0, %xmm0
	; X64-AVX1-NEXT: vptest %xmm0, %xmm0			; X64-AVX1-NEXT: vptest %xmm0, %xmm0
	; X64-AVX1-NEXT: sete %al			; X64-AVX1-NEXT: sete %al
	; X64-AVX1-NEXT: retq			; X64-AVX1-NEXT: retq
	;			;
	; X64-AVX2-LABEL: length32_eq:			; X64-AVX2-LABEL: length32_eq:
	; X64-AVX2: # %bb.0:			; X64-AVX2: # %bb.0:
	; X64-AVX2-NEXT: vmovdqu (%rdi), %ymm0			; X64-AVX2-NEXT: vmovdqu (%rdi), %ymm0
	; X64-AVX2-NEXT: vpxor (%rsi), %ymm0, %ymm0			; X64-AVX2-NEXT: vpxor (%rsi), %ymm0, %ymm0
	; X64-AVX2-NEXT: vptest %ymm0, %ymm0			; X64-AVX2-NEXT: vptest %ymm0, %ymm0
	; X64-AVX2-NEXT: sete %al			; X64-AVX2-NEXT: sete %al
	; X64-AVX2-NEXT: vzeroupper			; X64-AVX2-NEXT: vzeroupper
	; X64-AVX2-NEXT: retq			; X64-AVX2-NEXT: retq
				;
				; X64-AVX512F-LABEL: length32_eq:
				; X64-AVX512F: # %bb.0:
				; X64-AVX512F-NEXT: vmovdqu (%rdi), %ymm0
				; X64-AVX512F-NEXT: vpxor (%rsi), %ymm0, %ymm0
				; X64-AVX512F-NEXT: vptest %ymm0, %ymm0
				; X64-AVX512F-NEXT: sete %al
				; X64-AVX512F-NEXT: vzeroupper
				; X64-AVX512F-NEXT: retq
				;
				; X64-AVX512BW-LABEL: length32_eq:
				; X64-AVX512BW: # %bb.0:
				; X64-AVX512BW-NEXT: vmovdqu (%rdi), %ymm0
				; X64-AVX512BW-NEXT: vpxor (%rsi), %ymm0, %ymm0
				; X64-AVX512BW-NEXT: vptest %ymm0, %ymm0
				; X64-AVX512BW-NEXT: sete %al
				; X64-AVX512BW-NEXT: vzeroupper
				; X64-AVX512BW-NEXT: retq
	%call = tail call i32 @memcmp(i8* %x, i8* %y, i64 32) nounwind			%call = tail call i32 @memcmp(i8* %x, i8* %y, i64 32) nounwind
	%cmp = icmp eq i32 %call, 0			%cmp = icmp eq i32 %call, 0
	ret i1 %cmp			ret i1 %cmp
	}			}

	define i1 @length32_eq_prefer128(i8* %x, i8* %y) nounwind "prefer-vector-width"="128" {			define i1 @length32_eq_prefer128(i8* %x, i8* %y) nounwind "prefer-vector-width"="128" {
	; X86-NOSSE-LABEL: length32_eq_prefer128:			; X86-NOSSE-LABEL: length32_eq_prefer128:
	; X86-NOSSE: # %bb.0:			; X86-NOSSE: # %bb.0:
	▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
	; X64-AVX-NEXT: vmovdqu (%rdi), %xmm0			; X64-AVX-NEXT: vmovdqu (%rdi), %xmm0
	; X64-AVX-NEXT: vmovdqu 16(%rdi), %xmm1			; X64-AVX-NEXT: vmovdqu 16(%rdi), %xmm1
	; X64-AVX-NEXT: vpxor 16(%rsi), %xmm1, %xmm1			; X64-AVX-NEXT: vpxor 16(%rsi), %xmm1, %xmm1
	; X64-AVX-NEXT: vpxor (%rsi), %xmm0, %xmm0			; X64-AVX-NEXT: vpxor (%rsi), %xmm0, %xmm0
	; X64-AVX-NEXT: vpor %xmm1, %xmm0, %xmm0			; X64-AVX-NEXT: vpor %xmm1, %xmm0, %xmm0
	; X64-AVX-NEXT: vptest %xmm0, %xmm0			; X64-AVX-NEXT: vptest %xmm0, %xmm0
	; X64-AVX-NEXT: sete %al			; X64-AVX-NEXT: sete %al
	; X64-AVX-NEXT: retq			; X64-AVX-NEXT: retq
				;
				; X64-AVX512F-LABEL: length32_eq_prefer128:
				; X64-AVX512F: # %bb.0:
				; X64-AVX512F-NEXT: vmovdqu (%rdi), %xmm0
				; X64-AVX512F-NEXT: vmovdqu 16(%rdi), %xmm1
				; X64-AVX512F-NEXT: vpxor 16(%rsi), %xmm1, %xmm1
				; X64-AVX512F-NEXT: vpxor (%rsi), %xmm0, %xmm0
				; X64-AVX512F-NEXT: vpor %xmm1, %xmm0, %xmm0
				; X64-AVX512F-NEXT: vptest %xmm0, %xmm0
				; X64-AVX512F-NEXT: sete %al
				; X64-AVX512F-NEXT: retq
				;
				; X64-AVX512BW-LABEL: length32_eq_prefer128:
				; X64-AVX512BW: # %bb.0:
				; X64-AVX512BW-NEXT: vmovdqu (%rdi), %xmm0
				; X64-AVX512BW-NEXT: vmovdqu 16(%rdi), %xmm1
				; X64-AVX512BW-NEXT: vpxor 16(%rsi), %xmm1, %xmm1
				; X64-AVX512BW-NEXT: vpxor (%rsi), %xmm0, %xmm0
				; X64-AVX512BW-NEXT: vpor %xmm1, %xmm0, %xmm0
				; X64-AVX512BW-NEXT: vptest %xmm0, %xmm0
				; X64-AVX512BW-NEXT: sete %al
				; X64-AVX512BW-NEXT: retq
	%call = tail call i32 @memcmp(i8* %x, i8* %y, i64 32) nounwind			%call = tail call i32 @memcmp(i8* %x, i8* %y, i64 32) nounwind
	%cmp = icmp eq i32 %call, 0			%cmp = icmp eq i32 %call, 0
	ret i1 %cmp			ret i1 %cmp
	}			}

	define i1 @length32_eq_const(i8* %X) nounwind {			define i1 @length32_eq_const(i8* %X) nounwind {
	; X86-NOSSE-LABEL: length32_eq_const:			; X86-NOSSE-LABEL: length32_eq_const:
	; X86-NOSSE: # %bb.0:			; X86-NOSSE: # %bb.0:
	▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
	; X64-AVX2-LABEL: length32_eq_const:			; X64-AVX2-LABEL: length32_eq_const:
	; X64-AVX2: # %bb.0:			; X64-AVX2: # %bb.0:
	; X64-AVX2-NEXT: vmovdqu (%rdi), %ymm0			; X64-AVX2-NEXT: vmovdqu (%rdi), %ymm0
	; X64-AVX2-NEXT: vpxor {{.*}}(%rip), %ymm0, %ymm0			; X64-AVX2-NEXT: vpxor {{.*}}(%rip), %ymm0, %ymm0
	; X64-AVX2-NEXT: vptest %ymm0, %ymm0			; X64-AVX2-NEXT: vptest %ymm0, %ymm0
	; X64-AVX2-NEXT: setne %al			; X64-AVX2-NEXT: setne %al
	; X64-AVX2-NEXT: vzeroupper			; X64-AVX2-NEXT: vzeroupper
	; X64-AVX2-NEXT: retq			; X64-AVX2-NEXT: retq
	%m = tail call i32 @memcmp(i8* %X, i8* getelementptr inbounds ([65 x i8], [65 x i8]* @.str, i32 0, i32 0), i64 32) nounwind			;
				; X64-AVX512F-LABEL: length32_eq_const:
				; X64-AVX512F: # %bb.0:
				; X64-AVX512F-NEXT: vmovdqu (%rdi), %ymm0
				; X64-AVX512F-NEXT: vpxor {{.*}}(%rip), %ymm0, %ymm0
				; X64-AVX512F-NEXT: vptest %ymm0, %ymm0
				; X64-AVX512F-NEXT: setne %al
				; X64-AVX512F-NEXT: vzeroupper
				; X64-AVX512F-NEXT: retq
				;
				; X64-AVX512BW-LABEL: length32_eq_const:
				; X64-AVX512BW: # %bb.0:
				; X64-AVX512BW-NEXT: vmovdqu (%rdi), %ymm0
				; X64-AVX512BW-NEXT: vpxor {{.*}}(%rip), %ymm0, %ymm0
				; X64-AVX512BW-NEXT: vptest %ymm0, %ymm0
				; X64-AVX512BW-NEXT: setne %al
				; X64-AVX512BW-NEXT: vzeroupper
				; X64-AVX512BW-NEXT: retq
				%m = tail call i32 @memcmp(i8* %X, i8* getelementptr inbounds ([513 x i8], [513 x i8]* @.str, i32 0, i32 0), i64 32) nounwind
	%c = icmp ne i32 %m, 0			%c = icmp ne i32 %m, 0
	ret i1 %c			ret i1 %c
	}			}

				define i32 @length63(i8* %X, i8* %Y) nounwind {
				; X86-LABEL: length63:
				; X86: # %bb.0:
				; X86-NEXT: pushl $0
				; X86-NEXT: pushl $63
				; X86-NEXT: pushl {{[0-9]+}}(%esp)
				; X86-NEXT: pushl {{[0-9]+}}(%esp)
				; X86-NEXT: calll memcmp
				; X86-NEXT: addl $16, %esp
				; X86-NEXT: retl
				;
				; X64-LABEL: length63:
				; X64: # %bb.0:
				; X64-NEXT: movl $63, %edx
				; X64-NEXT: jmp memcmp # TAILCALL
				%m = tail call i32 @memcmp(i8* %X, i8* %Y, i64 63) nounwind
				ret i32 %m
				}

				define i1 @length63_eq(i8* %x, i8* %y) nounwind {
				; X86-NOSSE-LABEL: length63_eq:
				; X86-NOSSE: # %bb.0:
				; X86-NOSSE-NEXT: pushl $0
				; X86-NOSSE-NEXT: pushl $63
				; X86-NOSSE-NEXT: pushl {{[0-9]+}}(%esp)
				; X86-NOSSE-NEXT: pushl {{[0-9]+}}(%esp)
				; X86-NOSSE-NEXT: calll memcmp
				; X86-NOSSE-NEXT: addl $16, %esp
				; X86-NOSSE-NEXT: testl %eax, %eax
				; X86-NOSSE-NEXT: setne %al
				; X86-NOSSE-NEXT: retl
				;
				; X86-SSE1-LABEL: length63_eq:
				; X86-SSE1: # %bb.0:
				; X86-SSE1-NEXT: pushl $0
				; X86-SSE1-NEXT: pushl $63
				; X86-SSE1-NEXT: pushl {{[0-9]+}}(%esp)
				; X86-SSE1-NEXT: pushl {{[0-9]+}}(%esp)
				; X86-SSE1-NEXT: calll memcmp
				; X86-SSE1-NEXT: addl $16, %esp
				; X86-SSE1-NEXT: testl %eax, %eax
				; X86-SSE1-NEXT: setne %al
				; X86-SSE1-NEXT: retl
				;
				; X86-SSE2-LABEL: length63_eq:
				; X86-SSE2: # %bb.0:
				; X86-SSE2-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-SSE2-NEXT: movl {{[0-9]+}}(%esp), %ecx
				; X86-SSE2-NEXT: movdqu (%ecx), %xmm0
				; X86-SSE2-NEXT: movdqu 16(%ecx), %xmm1
				; X86-SSE2-NEXT: movdqu (%eax), %xmm2
				; X86-SSE2-NEXT: pcmpeqb %xmm0, %xmm2
				; X86-SSE2-NEXT: movdqu 16(%eax), %xmm0
				; X86-SSE2-NEXT: pcmpeqb %xmm1, %xmm0
				; X86-SSE2-NEXT: movdqu 32(%ecx), %xmm1
				; X86-SSE2-NEXT: movdqu 32(%eax), %xmm3
				; X86-SSE2-NEXT: pcmpeqb %xmm1, %xmm3
				; X86-SSE2-NEXT: movdqu 47(%ecx), %xmm1
				; X86-SSE2-NEXT: movdqu 47(%eax), %xmm4
				; X86-SSE2-NEXT: pcmpeqb %xmm1, %xmm4
				; X86-SSE2-NEXT: pand %xmm3, %xmm4
				; X86-SSE2-NEXT: pand %xmm0, %xmm4
				; X86-SSE2-NEXT: pand %xmm2, %xmm4
				; X86-SSE2-NEXT: pmovmskb %xmm4, %eax
				; X86-SSE2-NEXT: cmpl $65535, %eax # imm = 0xFFFF
				; X86-SSE2-NEXT: setne %al
				; X86-SSE2-NEXT: retl
				;
				; X64-SSE2-LABEL: length63_eq:
				; X64-SSE2: # %bb.0:
				; X64-SSE2-NEXT: movdqu (%rdi), %xmm0
				; X64-SSE2-NEXT: movdqu 16(%rdi), %xmm1
				; X64-SSE2-NEXT: movdqu 32(%rdi), %xmm2
				; X64-SSE2-NEXT: movdqu 47(%rdi), %xmm3
				; X64-SSE2-NEXT: movdqu (%rsi), %xmm4
				; X64-SSE2-NEXT: pcmpeqb %xmm0, %xmm4
				; X64-SSE2-NEXT: movdqu 16(%rsi), %xmm0
				; X64-SSE2-NEXT: pcmpeqb %xmm1, %xmm0
				; X64-SSE2-NEXT: movdqu 32(%rsi), %xmm1
				; X64-SSE2-NEXT: pcmpeqb %xmm2, %xmm1
				; X64-SSE2-NEXT: movdqu 47(%rsi), %xmm2
				; X64-SSE2-NEXT: pcmpeqb %xmm3, %xmm2
				; X64-SSE2-NEXT: pand %xmm1, %xmm2
				; X64-SSE2-NEXT: pand %xmm0, %xmm2
				; X64-SSE2-NEXT: pand %xmm4, %xmm2
				; X64-SSE2-NEXT: pmovmskb %xmm2, %eax
				; X64-SSE2-NEXT: cmpl $65535, %eax # imm = 0xFFFF
				; X64-SSE2-NEXT: setne %al
				; X64-SSE2-NEXT: retq
				;
				; X64-AVX1-LABEL: length63_eq:
				; X64-AVX1: # %bb.0:
				; X64-AVX1-NEXT: vmovdqu (%rdi), %xmm0
				; X64-AVX1-NEXT: vmovdqu 16(%rdi), %xmm1
				; X64-AVX1-NEXT: vmovdqu 32(%rdi), %xmm2
				; X64-AVX1-NEXT: vmovdqu 47(%rdi), %xmm3
				; X64-AVX1-NEXT: vpxor 47(%rsi), %xmm3, %xmm3
				; X64-AVX1-NEXT: vpxor 32(%rsi), %xmm2, %xmm2
				; X64-AVX1-NEXT: vpor %xmm3, %xmm2, %xmm2
				; X64-AVX1-NEXT: vpxor 16(%rsi), %xmm1, %xmm1
				; X64-AVX1-NEXT: vpor %xmm2, %xmm1, %xmm1
				; X64-AVX1-NEXT: vpxor (%rsi), %xmm0, %xmm0
				; X64-AVX1-NEXT: vpor %xmm1, %xmm0, %xmm0
				; X64-AVX1-NEXT: vptest %xmm0, %xmm0
				; X64-AVX1-NEXT: setne %al
				; X64-AVX1-NEXT: retq
				;
				; X64-AVX2-LABEL: length63_eq:
				; X64-AVX2: # %bb.0:
				; X64-AVX2-NEXT: vmovdqu (%rdi), %ymm0
				; X64-AVX2-NEXT: vmovdqu 31(%rdi), %ymm1
				; X64-AVX2-NEXT: vpxor 31(%rsi), %ymm1, %ymm1
				; X64-AVX2-NEXT: vpxor (%rsi), %ymm0, %ymm0
				; X64-AVX2-NEXT: vpor %ymm1, %ymm0, %ymm0
				; X64-AVX2-NEXT: vptest %ymm0, %ymm0
				; X64-AVX2-NEXT: setne %al
				; X64-AVX2-NEXT: vzeroupper
				; X64-AVX2-NEXT: retq
				;
				; X64-AVX512F-LABEL: length63_eq:
				; X64-AVX512F: # %bb.0:
				; X64-AVX512F-NEXT: vmovdqu (%rdi), %ymm0
				; X64-AVX512F-NEXT: vmovdqu 31(%rdi), %ymm1
				; X64-AVX512F-NEXT: vpxor 31(%rsi), %ymm1, %ymm1
				; X64-AVX512F-NEXT: vpxor (%rsi), %ymm0, %ymm0
				; X64-AVX512F-NEXT: vpor %ymm1, %ymm0, %ymm0
				; X64-AVX512F-NEXT: vptest %ymm0, %ymm0
				; X64-AVX512F-NEXT: setne %al
				; X64-AVX512F-NEXT: vzeroupper
				; X64-AVX512F-NEXT: retq
				;
				; X64-AVX512BW-LABEL: length63_eq:
				; X64-AVX512BW: # %bb.0:
				; X64-AVX512BW-NEXT: vmovdqu (%rdi), %ymm0
				; X64-AVX512BW-NEXT: vmovdqu 31(%rdi), %ymm1
				; X64-AVX512BW-NEXT: vpxor 31(%rsi), %ymm1, %ymm1
				; X64-AVX512BW-NEXT: vpxor (%rsi), %ymm0, %ymm0
				; X64-AVX512BW-NEXT: vpor %ymm1, %ymm0, %ymm0
				; X64-AVX512BW-NEXT: vptest %ymm0, %ymm0
				; X64-AVX512BW-NEXT: setne %al
				; X64-AVX512BW-NEXT: vzeroupper
				; X64-AVX512BW-NEXT: retq
				%call = tail call i32 @memcmp(i8* %x, i8* %y, i64 63) nounwind
				%cmp = icmp ne i32 %call, 0
				ret i1 %cmp
				}

				define i1 @length63_eq_const(i8* %X) nounwind {
				; X86-NOSSE-LABEL: length63_eq_const:
				; X86-NOSSE: # %bb.0:
				; X86-NOSSE-NEXT: pushl $0
				; X86-NOSSE-NEXT: pushl $63
				; X86-NOSSE-NEXT: pushl $.L.str
				; X86-NOSSE-NEXT: pushl {{[0-9]+}}(%esp)
				; X86-NOSSE-NEXT: calll memcmp
				; X86-NOSSE-NEXT: addl $16, %esp
				; X86-NOSSE-NEXT: testl %eax, %eax
				; X86-NOSSE-NEXT: sete %al
				; X86-NOSSE-NEXT: retl
				;
				; X86-SSE1-LABEL: length63_eq_const:
				; X86-SSE1: # %bb.0:
				; X86-SSE1-NEXT: pushl $0
				; X86-SSE1-NEXT: pushl $63
				; X86-SSE1-NEXT: pushl $.L.str
				; X86-SSE1-NEXT: pushl {{[0-9]+}}(%esp)
				; X86-SSE1-NEXT: calll memcmp
				; X86-SSE1-NEXT: addl $16, %esp
				; X86-SSE1-NEXT: testl %eax, %eax
				; X86-SSE1-NEXT: sete %al
				; X86-SSE1-NEXT: retl
				;
				; X86-SSE2-LABEL: length63_eq_const:
				; X86-SSE2: # %bb.0:
				; X86-SSE2-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-SSE2-NEXT: movdqu (%eax), %xmm0
				; X86-SSE2-NEXT: movdqu 16(%eax), %xmm1
				; X86-SSE2-NEXT: movdqu 32(%eax), %xmm2
				; X86-SSE2-NEXT: movdqu 47(%eax), %xmm3
				; X86-SSE2-NEXT: pcmpeqb {{\.LCPI.*}}, %xmm3
				; X86-SSE2-NEXT: pcmpeqb {{\.LCPI.*}}, %xmm2
				; X86-SSE2-NEXT: pand %xmm3, %xmm2
				; X86-SSE2-NEXT: pcmpeqb {{\.LCPI.*}}, %xmm1
				; X86-SSE2-NEXT: pand %xmm2, %xmm1
				; X86-SSE2-NEXT: pcmpeqb {{\.LCPI.*}}, %xmm0
				; X86-SSE2-NEXT: pand %xmm1, %xmm0
				; X86-SSE2-NEXT: pmovmskb %xmm0, %eax
				; X86-SSE2-NEXT: cmpl $65535, %eax # imm = 0xFFFF
				; X86-SSE2-NEXT: sete %al
				; X86-SSE2-NEXT: retl
				;
				; X64-SSE2-LABEL: length63_eq_const:
				; X64-SSE2: # %bb.0:
				; X64-SSE2-NEXT: movdqu (%rdi), %xmm0
				; X64-SSE2-NEXT: movdqu 16(%rdi), %xmm1
				; X64-SSE2-NEXT: movdqu 32(%rdi), %xmm2
				; X64-SSE2-NEXT: movdqu 47(%rdi), %xmm3
				; X64-SSE2-NEXT: pcmpeqb {{.*}}(%rip), %xmm3
				; X64-SSE2-NEXT: pcmpeqb {{.*}}(%rip), %xmm2
				; X64-SSE2-NEXT: pand %xmm3, %xmm2
				; X64-SSE2-NEXT: pcmpeqb {{.*}}(%rip), %xmm1
				; X64-SSE2-NEXT: pand %xmm2, %xmm1
				; X64-SSE2-NEXT: pcmpeqb {{.*}}(%rip), %xmm0
				; X64-SSE2-NEXT: pand %xmm1, %xmm0
				; X64-SSE2-NEXT: pmovmskb %xmm0, %eax
				; X64-SSE2-NEXT: cmpl $65535, %eax # imm = 0xFFFF
				; X64-SSE2-NEXT: sete %al
				; X64-SSE2-NEXT: retq
				;
				; X64-AVX1-LABEL: length63_eq_const:
				; X64-AVX1: # %bb.0:
				; X64-AVX1-NEXT: vmovdqu (%rdi), %xmm0
				; X64-AVX1-NEXT: vmovdqu 16(%rdi), %xmm1
				; X64-AVX1-NEXT: vmovdqu 32(%rdi), %xmm2
				; X64-AVX1-NEXT: vmovdqu 47(%rdi), %xmm3
				; X64-AVX1-NEXT: vpxor {{.*}}(%rip), %xmm3, %xmm3
				; X64-AVX1-NEXT: vpxor {{.*}}(%rip), %xmm2, %xmm2
				; X64-AVX1-NEXT: vpor %xmm3, %xmm2, %xmm2
				; X64-AVX1-NEXT: vpxor {{.*}}(%rip), %xmm1, %xmm1
				; X64-AVX1-NEXT: vpor %xmm2, %xmm1, %xmm1
				; X64-AVX1-NEXT: vpxor {{.*}}(%rip), %xmm0, %xmm0
				; X64-AVX1-NEXT: vpor %xmm1, %xmm0, %xmm0
				; X64-AVX1-NEXT: vptest %xmm0, %xmm0
				; X64-AVX1-NEXT: sete %al
				; X64-AVX1-NEXT: retq
				;
				; X64-AVX2-LABEL: length63_eq_const:
				; X64-AVX2: # %bb.0:
				; X64-AVX2-NEXT: vmovdqu (%rdi), %ymm0
				; X64-AVX2-NEXT: vmovdqu 31(%rdi), %ymm1
				; X64-AVX2-NEXT: vpxor {{.*}}(%rip), %ymm1, %ymm1
				; X64-AVX2-NEXT: vpxor {{.*}}(%rip), %ymm0, %ymm0
				; X64-AVX2-NEXT: vpor %ymm1, %ymm0, %ymm0
				; X64-AVX2-NEXT: vptest %ymm0, %ymm0
				; X64-AVX2-NEXT: sete %al
				; X64-AVX2-NEXT: vzeroupper
				; X64-AVX2-NEXT: retq
				;
				; X64-AVX512F-LABEL: length63_eq_const:
				; X64-AVX512F: # %bb.0:
				; X64-AVX512F-NEXT: vmovdqu (%rdi), %ymm0
				; X64-AVX512F-NEXT: vmovdqu 31(%rdi), %ymm1
				; X64-AVX512F-NEXT: vpxor {{.*}}(%rip), %ymm1, %ymm1
				; X64-AVX512F-NEXT: vpxor {{.*}}(%rip), %ymm0, %ymm0
				; X64-AVX512F-NEXT: vpor %ymm1, %ymm0, %ymm0
				; X64-AVX512F-NEXT: vptest %ymm0, %ymm0
				; X64-AVX512F-NEXT: sete %al
				; X64-AVX512F-NEXT: vzeroupper
				; X64-AVX512F-NEXT: retq
				;
				; X64-AVX512BW-LABEL: length63_eq_const:
				; X64-AVX512BW: # %bb.0:
				; X64-AVX512BW-NEXT: vmovdqu (%rdi), %ymm0
				; X64-AVX512BW-NEXT: vmovdqu 31(%rdi), %ymm1
				; X64-AVX512BW-NEXT: vpxor {{.*}}(%rip), %ymm1, %ymm1
				; X64-AVX512BW-NEXT: vpxor {{.*}}(%rip), %ymm0, %ymm0
				; X64-AVX512BW-NEXT: vpor %ymm1, %ymm0, %ymm0
				; X64-AVX512BW-NEXT: vptest %ymm0, %ymm0
				; X64-AVX512BW-NEXT: sete %al
				; X64-AVX512BW-NEXT: vzeroupper
				; X64-AVX512BW-NEXT: retq
				%m = tail call i32 @memcmp(i8* %X, i8* getelementptr inbounds ([513 x i8], [513 x i8]* @.str, i32 0, i32 0), i64 63) nounwind
				%c = icmp eq i32 %m, 0
				ret i1 %c
				}

	define i32 @length64(i8* %X, i8* %Y) nounwind {			define i32 @length64(i8* %X, i8* %Y) nounwind {
	; X86-LABEL: length64:			; X86-LABEL: length64:
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: pushl $0			; X86-NEXT: pushl $0
	; X86-NEXT: pushl $64			; X86-NEXT: pushl $64
	; X86-NEXT: pushl {{[0-9]+}}(%esp)			; X86-NEXT: pushl {{[0-9]+}}(%esp)
	; X86-NEXT: pushl {{[0-9]+}}(%esp)			; X86-NEXT: pushl {{[0-9]+}}(%esp)
	; X86-NEXT: calll memcmp			; X86-NEXT: calll memcmp
	; X86-NEXT: addl $16, %esp			; X86-NEXT: addl $16, %esp
	; X86-NEXT: retl			; X86-NEXT: retl
	;			;
	; X64-LABEL: length64:			; X64-LABEL: length64:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: movl $64, %edx			; X64-NEXT: movl $64, %edx
	; X64-NEXT: jmp memcmp # TAILCALL			; X64-NEXT: jmp memcmp # TAILCALL
	%m = tail call i32 @memcmp(i8* %X, i8* %Y, i64 64) nounwind			%m = tail call i32 @memcmp(i8* %X, i8* %Y, i64 64) nounwind
	ret i32 %m			ret i32 %m
	}			}

	define i1 @length64_eq(i8* %x, i8* %y) nounwind {			define i1 @length64_eq(i8* %x, i8* %y) nounwind {
	; X86-LABEL: length64_eq:			; X86-NOSSE-LABEL: length64_eq:
				; X86-NOSSE: # %bb.0:
				; X86-NOSSE-NEXT: pushl $0
				; X86-NOSSE-NEXT: pushl $64
				; X86-NOSSE-NEXT: pushl {{[0-9]+}}(%esp)
				; X86-NOSSE-NEXT: pushl {{[0-9]+}}(%esp)
				; X86-NOSSE-NEXT: calll memcmp
				; X86-NOSSE-NEXT: addl $16, %esp
				; X86-NOSSE-NEXT: testl %eax, %eax
				; X86-NOSSE-NEXT: setne %al
				; X86-NOSSE-NEXT: retl
				;
				; X86-SSE1-LABEL: length64_eq:
				; X86-SSE1: # %bb.0:
				; X86-SSE1-NEXT: pushl $0
				; X86-SSE1-NEXT: pushl $64
				; X86-SSE1-NEXT: pushl {{[0-9]+}}(%esp)
				; X86-SSE1-NEXT: pushl {{[0-9]+}}(%esp)
				; X86-SSE1-NEXT: calll memcmp
				; X86-SSE1-NEXT: addl $16, %esp
				; X86-SSE1-NEXT: testl %eax, %eax
				; X86-SSE1-NEXT: setne %al
				; X86-SSE1-NEXT: retl
				;
				; X86-SSE2-LABEL: length64_eq:
				; X86-SSE2: # %bb.0:
				; X86-SSE2-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-SSE2-NEXT: movl {{[0-9]+}}(%esp), %ecx
				; X86-SSE2-NEXT: movdqu (%ecx), %xmm0
				; X86-SSE2-NEXT: movdqu 16(%ecx), %xmm1
				; X86-SSE2-NEXT: movdqu (%eax), %xmm2
				; X86-SSE2-NEXT: pcmpeqb %xmm0, %xmm2
				; X86-SSE2-NEXT: movdqu 16(%eax), %xmm0
				; X86-SSE2-NEXT: pcmpeqb %xmm1, %xmm0
				; X86-SSE2-NEXT: movdqu 32(%ecx), %xmm1
				; X86-SSE2-NEXT: movdqu 32(%eax), %xmm3
				; X86-SSE2-NEXT: pcmpeqb %xmm1, %xmm3
				; X86-SSE2-NEXT: movdqu 48(%ecx), %xmm1
				; X86-SSE2-NEXT: movdqu 48(%eax), %xmm4
				; X86-SSE2-NEXT: pcmpeqb %xmm1, %xmm4
				; X86-SSE2-NEXT: pand %xmm3, %xmm4
				; X86-SSE2-NEXT: pand %xmm0, %xmm4
				; X86-SSE2-NEXT: pand %xmm2, %xmm4
				; X86-SSE2-NEXT: pmovmskb %xmm4, %eax
				; X86-SSE2-NEXT: cmpl $65535, %eax # imm = 0xFFFF
				; X86-SSE2-NEXT: setne %al
				; X86-SSE2-NEXT: retl
				;
				; X64-SSE2-LABEL: length64_eq:
				; X64-SSE2: # %bb.0:
				; X64-SSE2-NEXT: movdqu (%rdi), %xmm0
				; X64-SSE2-NEXT: movdqu 16(%rdi), %xmm1
				; X64-SSE2-NEXT: movdqu 32(%rdi), %xmm2
				; X64-SSE2-NEXT: movdqu 48(%rdi), %xmm3
				; X64-SSE2-NEXT: movdqu (%rsi), %xmm4
				; X64-SSE2-NEXT: pcmpeqb %xmm0, %xmm4
				; X64-SSE2-NEXT: movdqu 16(%rsi), %xmm0
				; X64-SSE2-NEXT: pcmpeqb %xmm1, %xmm0
				; X64-SSE2-NEXT: movdqu 32(%rsi), %xmm1
				; X64-SSE2-NEXT: pcmpeqb %xmm2, %xmm1
				; X64-SSE2-NEXT: movdqu 48(%rsi), %xmm2
				; X64-SSE2-NEXT: pcmpeqb %xmm3, %xmm2
				; X64-SSE2-NEXT: pand %xmm1, %xmm2
				; X64-SSE2-NEXT: pand %xmm0, %xmm2
				; X64-SSE2-NEXT: pand %xmm4, %xmm2
				; X64-SSE2-NEXT: pmovmskb %xmm2, %eax
				; X64-SSE2-NEXT: cmpl $65535, %eax # imm = 0xFFFF
				; X64-SSE2-NEXT: setne %al
				; X64-SSE2-NEXT: retq
				;
				; X64-AVX1-LABEL: length64_eq:
				; X64-AVX1: # %bb.0:
				; X64-AVX1-NEXT: vmovdqu (%rdi), %xmm0
				; X64-AVX1-NEXT: vmovdqu 16(%rdi), %xmm1
				; X64-AVX1-NEXT: vmovdqu 32(%rdi), %xmm2
				; X64-AVX1-NEXT: vmovdqu 48(%rdi), %xmm3
				; X64-AVX1-NEXT: vpxor 48(%rsi), %xmm3, %xmm3
				; X64-AVX1-NEXT: vpxor 32(%rsi), %xmm2, %xmm2
				; X64-AVX1-NEXT: vpor %xmm3, %xmm2, %xmm2
				; X64-AVX1-NEXT: vpxor 16(%rsi), %xmm1, %xmm1
				; X64-AVX1-NEXT: vpor %xmm2, %xmm1, %xmm1
				; X64-AVX1-NEXT: vpxor (%rsi), %xmm0, %xmm0
				; X64-AVX1-NEXT: vpor %xmm1, %xmm0, %xmm0
				; X64-AVX1-NEXT: vptest %xmm0, %xmm0
				; X64-AVX1-NEXT: setne %al
				; X64-AVX1-NEXT: retq
				;
				; X64-AVX2-LABEL: length64_eq:
				; X64-AVX2: # %bb.0:
				; X64-AVX2-NEXT: vmovdqu (%rdi), %ymm0
				; X64-AVX2-NEXT: vmovdqu 32(%rdi), %ymm1
				; X64-AVX2-NEXT: vpxor 32(%rsi), %ymm1, %ymm1
				; X64-AVX2-NEXT: vpxor (%rsi), %ymm0, %ymm0
				; X64-AVX2-NEXT: vpor %ymm1, %ymm0, %ymm0
				; X64-AVX2-NEXT: vptest %ymm0, %ymm0
				; X64-AVX2-NEXT: setne %al
				; X64-AVX2-NEXT: vzeroupper
				; X64-AVX2-NEXT: retq
				;
				; X64-AVX512F-LABEL: length64_eq:
				; X64-AVX512F: # %bb.0:
				; X64-AVX512F-NEXT: vmovdqu64 (%rdi), %zmm0
				; X64-AVX512F-NEXT: vpcmpeqd (%rsi), %zmm0, %k0
				; X64-AVX512F-NEXT: kortestw %k0, %k0
				; X64-AVX512F-NEXT: setae %al
				; X64-AVX512F-NEXT: vzeroupper
				; X64-AVX512F-NEXT: retq
				;
				; X64-AVX512BW-LABEL: length64_eq:
				; X64-AVX512BW: # %bb.0:
				; X64-AVX512BW-NEXT: vmovdqu64 (%rdi), %zmm0
				; X64-AVX512BW-NEXT: vpcmpeqb (%rsi), %zmm0, %k0
				; X64-AVX512BW-NEXT: kortestq %k0, %k0
				; X64-AVX512BW-NEXT: setae %al
				; X64-AVX512BW-NEXT: vzeroupper
				; X64-AVX512BW-NEXT: retq
				%call = tail call i32 @memcmp(i8* %x, i8* %y, i64 64) nounwind
				%cmp = icmp ne i32 %call, 0
				ret i1 %cmp
				}

				define i1 @length64_eq_const(i8* %X) nounwind {
				; X86-NOSSE-LABEL: length64_eq_const:
				; X86-NOSSE: # %bb.0:
				; X86-NOSSE-NEXT: pushl $0
				; X86-NOSSE-NEXT: pushl $64
				; X86-NOSSE-NEXT: pushl $.L.str
				; X86-NOSSE-NEXT: pushl {{[0-9]+}}(%esp)
				; X86-NOSSE-NEXT: calll memcmp
				; X86-NOSSE-NEXT: addl $16, %esp
				; X86-NOSSE-NEXT: testl %eax, %eax
				; X86-NOSSE-NEXT: sete %al
				; X86-NOSSE-NEXT: retl
				;
				; X86-SSE1-LABEL: length64_eq_const:
				; X86-SSE1: # %bb.0:
				; X86-SSE1-NEXT: pushl $0
				; X86-SSE1-NEXT: pushl $64
				; X86-SSE1-NEXT: pushl $.L.str
				; X86-SSE1-NEXT: pushl {{[0-9]+}}(%esp)
				; X86-SSE1-NEXT: calll memcmp
				; X86-SSE1-NEXT: addl $16, %esp
				; X86-SSE1-NEXT: testl %eax, %eax
				; X86-SSE1-NEXT: sete %al
				; X86-SSE1-NEXT: retl
				;
				; X86-SSE2-LABEL: length64_eq_const:
				; X86-SSE2: # %bb.0:
				; X86-SSE2-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-SSE2-NEXT: movdqu (%eax), %xmm0
				; X86-SSE2-NEXT: movdqu 16(%eax), %xmm1
				; X86-SSE2-NEXT: movdqu 32(%eax), %xmm2
				; X86-SSE2-NEXT: movdqu 48(%eax), %xmm3
				; X86-SSE2-NEXT: pcmpeqb {{\.LCPI.*}}, %xmm3
				; X86-SSE2-NEXT: pcmpeqb {{\.LCPI.*}}, %xmm2
				; X86-SSE2-NEXT: pand %xmm3, %xmm2
				; X86-SSE2-NEXT: pcmpeqb {{\.LCPI.*}}, %xmm1
				; X86-SSE2-NEXT: pand %xmm2, %xmm1
				; X86-SSE2-NEXT: pcmpeqb {{\.LCPI.*}}, %xmm0
				; X86-SSE2-NEXT: pand %xmm1, %xmm0
				; X86-SSE2-NEXT: pmovmskb %xmm0, %eax
				; X86-SSE2-NEXT: cmpl $65535, %eax # imm = 0xFFFF
				; X86-SSE2-NEXT: sete %al
				; X86-SSE2-NEXT: retl
				;
				; X64-SSE2-LABEL: length64_eq_const:
				; X64-SSE2: # %bb.0:
				; X64-SSE2-NEXT: movdqu (%rdi), %xmm0
				; X64-SSE2-NEXT: movdqu 16(%rdi), %xmm1
				; X64-SSE2-NEXT: movdqu 32(%rdi), %xmm2
				; X64-SSE2-NEXT: movdqu 48(%rdi), %xmm3
				; X64-SSE2-NEXT: pcmpeqb {{.*}}(%rip), %xmm3
				; X64-SSE2-NEXT: pcmpeqb {{.*}}(%rip), %xmm2
				; X64-SSE2-NEXT: pand %xmm3, %xmm2
				; X64-SSE2-NEXT: pcmpeqb {{.*}}(%rip), %xmm1
				; X64-SSE2-NEXT: pand %xmm2, %xmm1
				; X64-SSE2-NEXT: pcmpeqb {{.*}}(%rip), %xmm0
				; X64-SSE2-NEXT: pand %xmm1, %xmm0
				; X64-SSE2-NEXT: pmovmskb %xmm0, %eax
				; X64-SSE2-NEXT: cmpl $65535, %eax # imm = 0xFFFF
				; X64-SSE2-NEXT: sete %al
				; X64-SSE2-NEXT: retq
				;
				; X64-AVX1-LABEL: length64_eq_const:
				; X64-AVX1: # %bb.0:
				; X64-AVX1-NEXT: vmovdqu (%rdi), %xmm0
				; X64-AVX1-NEXT: vmovdqu 16(%rdi), %xmm1
				; X64-AVX1-NEXT: vmovdqu 32(%rdi), %xmm2
				; X64-AVX1-NEXT: vmovdqu 48(%rdi), %xmm3
				; X64-AVX1-NEXT: vpxor {{.*}}(%rip), %xmm3, %xmm3
				; X64-AVX1-NEXT: vpxor {{.*}}(%rip), %xmm2, %xmm2
				; X64-AVX1-NEXT: vpor %xmm3, %xmm2, %xmm2
				; X64-AVX1-NEXT: vpxor {{.*}}(%rip), %xmm1, %xmm1
				; X64-AVX1-NEXT: vpor %xmm2, %xmm1, %xmm1
				; X64-AVX1-NEXT: vpxor {{.*}}(%rip), %xmm0, %xmm0
				; X64-AVX1-NEXT: vpor %xmm1, %xmm0, %xmm0
				; X64-AVX1-NEXT: vptest %xmm0, %xmm0
				; X64-AVX1-NEXT: sete %al
				; X64-AVX1-NEXT: retq
				;
				; X64-AVX2-LABEL: length64_eq_const:
				; X64-AVX2: # %bb.0:
				; X64-AVX2-NEXT: vmovdqu (%rdi), %ymm0
				; X64-AVX2-NEXT: vmovdqu 32(%rdi), %ymm1
				; X64-AVX2-NEXT: vpxor {{.*}}(%rip), %ymm1, %ymm1
				; X64-AVX2-NEXT: vpxor {{.*}}(%rip), %ymm0, %ymm0
				; X64-AVX2-NEXT: vpor %ymm1, %ymm0, %ymm0
				; X64-AVX2-NEXT: vptest %ymm0, %ymm0
				; X64-AVX2-NEXT: sete %al
				; X64-AVX2-NEXT: vzeroupper
				; X64-AVX2-NEXT: retq
				;
				; X64-AVX512F-LABEL: length64_eq_const:
				; X64-AVX512F: # %bb.0:
				; X64-AVX512F-NEXT: vmovdqu64 (%rdi), %zmm0
				; X64-AVX512F-NEXT: vpcmpeqd {{.*}}(%rip), %zmm0, %k0
				; X64-AVX512F-NEXT: kortestw %k0, %k0
				; X64-AVX512F-NEXT: setb %al
				; X64-AVX512F-NEXT: vzeroupper
				; X64-AVX512F-NEXT: retq
				;
				; X64-AVX512BW-LABEL: length64_eq_const:
				; X64-AVX512BW: # %bb.0:
				; X64-AVX512BW-NEXT: vmovdqu64 (%rdi), %zmm0
				; X64-AVX512BW-NEXT: vpcmpeqb {{.*}}(%rip), %zmm0, %k0
				; X64-AVX512BW-NEXT: kortestq %k0, %k0
				; X64-AVX512BW-NEXT: setb %al
				; X64-AVX512BW-NEXT: vzeroupper
				; X64-AVX512BW-NEXT: retq
				%m = tail call i32 @memcmp(i8* %X, i8* getelementptr inbounds ([513 x i8], [513 x i8]* @.str, i32 0, i32 0), i64 64) nounwind
				%c = icmp eq i32 %m, 0
				ret i1 %c
				}

				define i32 @length128(i8* %X, i8* %Y) nounwind {
				; X86-LABEL: length128:
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: pushl $0			; X86-NEXT: pushl $0
	; X86-NEXT: pushl $64			; X86-NEXT: pushl $128
				; X86-NEXT: pushl {{[0-9]+}}(%esp)
				; X86-NEXT: pushl {{[0-9]+}}(%esp)
				; X86-NEXT: calll memcmp
				; X86-NEXT: addl $16, %esp
				; X86-NEXT: retl
				;
				; X64-LABEL: length128:
				; X64: # %bb.0:
				; X64-NEXT: movl $128, %edx
				; X64-NEXT: jmp memcmp # TAILCALL
				%m = tail call i32 @memcmp(i8* %X, i8* %Y, i64 128) nounwind
				ret i32 %m
				}

				define i1 @length128_eq(i8* %x, i8* %y) nounwind {
				; X86-LABEL: length128_eq:
				; X86: # %bb.0:
				; X86-NEXT: pushl $0
				; X86-NEXT: pushl $128
	; X86-NEXT: pushl {{[0-9]+}}(%esp)			; X86-NEXT: pushl {{[0-9]+}}(%esp)
	; X86-NEXT: pushl {{[0-9]+}}(%esp)			; X86-NEXT: pushl {{[0-9]+}}(%esp)
	; X86-NEXT: calll memcmp			; X86-NEXT: calll memcmp
	; X86-NEXT: addl $16, %esp			; X86-NEXT: addl $16, %esp
	; X86-NEXT: testl %eax, %eax			; X86-NEXT: testl %eax, %eax
	; X86-NEXT: setne %al			; X86-NEXT: setne %al
	; X86-NEXT: retl			; X86-NEXT: retl
	;			;
	; X64-SSE2-LABEL: length64_eq:			; X64-SSE2-LABEL: length128_eq:
	; X64-SSE2: # %bb.0:			; X64-SSE2: # %bb.0:
	; X64-SSE2-NEXT: pushq %rax			; X64-SSE2-NEXT: pushq %rax
	; X64-SSE2-NEXT: movl $64, %edx			; X64-SSE2-NEXT: movl $128, %edx
	; X64-SSE2-NEXT: callq memcmp			; X64-SSE2-NEXT: callq memcmp
	; X64-SSE2-NEXT: testl %eax, %eax			; X64-SSE2-NEXT: testl %eax, %eax
	; X64-SSE2-NEXT: setne %al			; X64-SSE2-NEXT: setne %al
	; X64-SSE2-NEXT: popq %rcx			; X64-SSE2-NEXT: popq %rcx
	; X64-SSE2-NEXT: retq			; X64-SSE2-NEXT: retq
	;			;
	; X64-AVX1-LABEL: length64_eq:			; X64-AVX1-LABEL: length128_eq:
	; X64-AVX1: # %bb.0:			; X64-AVX1: # %bb.0:
	; X64-AVX1-NEXT: pushq %rax			; X64-AVX1-NEXT: pushq %rax
	; X64-AVX1-NEXT: movl $64, %edx			; X64-AVX1-NEXT: movl $128, %edx
	; X64-AVX1-NEXT: callq memcmp			; X64-AVX1-NEXT: callq memcmp
	; X64-AVX1-NEXT: testl %eax, %eax			; X64-AVX1-NEXT: testl %eax, %eax
	; X64-AVX1-NEXT: setne %al			; X64-AVX1-NEXT: setne %al
	; X64-AVX1-NEXT: popq %rcx			; X64-AVX1-NEXT: popq %rcx
	; X64-AVX1-NEXT: retq			; X64-AVX1-NEXT: retq
	;			;
	; X64-AVX2-LABEL: length64_eq:			; X64-AVX2-LABEL: length128_eq:
	; X64-AVX2: # %bb.0:			; X64-AVX2: # %bb.0:
	; X64-AVX2-NEXT: vmovdqu (%rdi), %ymm0			; X64-AVX2-NEXT: vmovdqu (%rdi), %ymm0
	; X64-AVX2-NEXT: vmovdqu 32(%rdi), %ymm1			; X64-AVX2-NEXT: vmovdqu 32(%rdi), %ymm1
				; X64-AVX2-NEXT: vmovdqu 64(%rdi), %ymm2
				; X64-AVX2-NEXT: vmovdqu 96(%rdi), %ymm3
				; X64-AVX2-NEXT: vpxor 96(%rsi), %ymm3, %ymm3
				; X64-AVX2-NEXT: vpxor 64(%rsi), %ymm2, %ymm2
				; X64-AVX2-NEXT: vpor %ymm3, %ymm2, %ymm2
	; X64-AVX2-NEXT: vpxor 32(%rsi), %ymm1, %ymm1			; X64-AVX2-NEXT: vpxor 32(%rsi), %ymm1, %ymm1
				; X64-AVX2-NEXT: vpor %ymm2, %ymm1, %ymm1
	; X64-AVX2-NEXT: vpxor (%rsi), %ymm0, %ymm0			; X64-AVX2-NEXT: vpxor (%rsi), %ymm0, %ymm0
	; X64-AVX2-NEXT: vpor %ymm1, %ymm0, %ymm0			; X64-AVX2-NEXT: vpor %ymm1, %ymm0, %ymm0
	; X64-AVX2-NEXT: vptest %ymm0, %ymm0			; X64-AVX2-NEXT: vptest %ymm0, %ymm0
	; X64-AVX2-NEXT: setne %al			; X64-AVX2-NEXT: setne %al
	; X64-AVX2-NEXT: vzeroupper			; X64-AVX2-NEXT: vzeroupper
	; X64-AVX2-NEXT: retq			; X64-AVX2-NEXT: retq
	;			;
	; X64-AVX512F-LABEL: length64_eq:			; X64-AVX512F-LABEL: length128_eq:
	; X64-AVX512F: # %bb.0:			; X64-AVX512F: # %bb.0:
	; X64-AVX512F-NEXT: vmovdqu64 (%rdi), %zmm0			; X64-AVX512F-NEXT: vmovdqu64 (%rdi), %zmm0
	; X64-AVX512F-NEXT: vpcmpeqd (%rsi), %zmm0, %k0			; X64-AVX512F-NEXT: vmovdqu64 64(%rdi), %zmm1
				; X64-AVX512F-NEXT: vpcmpeqd (%rsi), %zmm0, %k1
				; X64-AVX512F-NEXT: vpcmpeqd 64(%rsi), %zmm1, %k0 {%k1}
	; X64-AVX512F-NEXT: kortestw %k0, %k0			; X64-AVX512F-NEXT: kortestw %k0, %k0
	; X64-AVX512F-NEXT: setae %al			; X64-AVX512F-NEXT: setae %al
	; X64-AVX512F-NEXT: vzeroupper			; X64-AVX512F-NEXT: vzeroupper
	; X64-AVX512F-NEXT: retq			; X64-AVX512F-NEXT: retq
	;			;
	; X64-AVX512BW-LABEL: length64_eq:			; X64-AVX512BW-LABEL: length128_eq:
	; X64-AVX512BW: # %bb.0:			; X64-AVX512BW: # %bb.0:
	; X64-AVX512BW-NEXT: vmovdqu64 (%rdi), %zmm0			; X64-AVX512BW-NEXT: vmovdqu64 (%rdi), %zmm0
	; X64-AVX512BW-NEXT: vpcmpeqb (%rsi), %zmm0, %k0			; X64-AVX512BW-NEXT: vmovdqu64 64(%rdi), %zmm1
				; X64-AVX512BW-NEXT: vpcmpeqb (%rsi), %zmm0, %k1
				; X64-AVX512BW-NEXT: vpcmpeqb 64(%rsi), %zmm1, %k0 {%k1}
	; X64-AVX512BW-NEXT: kortestq %k0, %k0			; X64-AVX512BW-NEXT: kortestq %k0, %k0
	; X64-AVX512BW-NEXT: setae %al			; X64-AVX512BW-NEXT: setae %al
	; X64-AVX512BW-NEXT: vzeroupper			; X64-AVX512BW-NEXT: vzeroupper
	; X64-AVX512BW-NEXT: retq			; X64-AVX512BW-NEXT: retq
	%call = tail call i32 @memcmp(i8* %x, i8* %y, i64 64) nounwind			%call = tail call i32 @memcmp(i8* %x, i8* %y, i64 128) nounwind
	%cmp = icmp ne i32 %call, 0			%cmp = icmp ne i32 %call, 0
	ret i1 %cmp			ret i1 %cmp
	}			}

	define i1 @length64_eq_const(i8* %X) nounwind {			define i1 @length128_eq_const(i8* %X) nounwind {
	; X86-LABEL: length64_eq_const:			; X86-LABEL: length128_eq_const:
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: pushl $0			; X86-NEXT: pushl $0
	; X86-NEXT: pushl $64			; X86-NEXT: pushl $128
	; X86-NEXT: pushl $.L.str			; X86-NEXT: pushl $.L.str
	; X86-NEXT: pushl {{[0-9]+}}(%esp)			; X86-NEXT: pushl {{[0-9]+}}(%esp)
	; X86-NEXT: calll memcmp			; X86-NEXT: calll memcmp
	; X86-NEXT: addl $16, %esp			; X86-NEXT: addl $16, %esp
	; X86-NEXT: testl %eax, %eax			; X86-NEXT: testl %eax, %eax
	; X86-NEXT: sete %al			; X86-NEXT: sete %al
	; X86-NEXT: retl			; X86-NEXT: retl
	;			;
	; X64-SSE2-LABEL: length64_eq_const:			; X64-SSE2-LABEL: length128_eq_const:
	; X64-SSE2: # %bb.0:			; X64-SSE2: # %bb.0:
	; X64-SSE2-NEXT: pushq %rax			; X64-SSE2-NEXT: pushq %rax
	; X64-SSE2-NEXT: movl $.L.str, %esi			; X64-SSE2-NEXT: movl $.L.str, %esi
	; X64-SSE2-NEXT: movl $64, %edx			; X64-SSE2-NEXT: movl $128, %edx
	; X64-SSE2-NEXT: callq memcmp			; X64-SSE2-NEXT: callq memcmp
	; X64-SSE2-NEXT: testl %eax, %eax			; X64-SSE2-NEXT: testl %eax, %eax
	; X64-SSE2-NEXT: sete %al			; X64-SSE2-NEXT: sete %al
	; X64-SSE2-NEXT: popq %rcx			; X64-SSE2-NEXT: popq %rcx
	; X64-SSE2-NEXT: retq			; X64-SSE2-NEXT: retq
	;			;
	; X64-AVX1-LABEL: length64_eq_const:			; X64-AVX1-LABEL: length128_eq_const:
	; X64-AVX1: # %bb.0:			; X64-AVX1: # %bb.0:
	; X64-AVX1-NEXT: pushq %rax			; X64-AVX1-NEXT: pushq %rax
	; X64-AVX1-NEXT: movl $.L.str, %esi			; X64-AVX1-NEXT: movl $.L.str, %esi
	; X64-AVX1-NEXT: movl $64, %edx			; X64-AVX1-NEXT: movl $128, %edx
	; X64-AVX1-NEXT: callq memcmp			; X64-AVX1-NEXT: callq memcmp
	; X64-AVX1-NEXT: testl %eax, %eax			; X64-AVX1-NEXT: testl %eax, %eax
	; X64-AVX1-NEXT: sete %al			; X64-AVX1-NEXT: sete %al
	; X64-AVX1-NEXT: popq %rcx			; X64-AVX1-NEXT: popq %rcx
	; X64-AVX1-NEXT: retq			; X64-AVX1-NEXT: retq
	;			;
	; X64-AVX2-LABEL: length64_eq_const:			; X64-AVX2-LABEL: length128_eq_const:
	; X64-AVX2: # %bb.0:			; X64-AVX2: # %bb.0:
	; X64-AVX2-NEXT: vmovdqu (%rdi), %ymm0			; X64-AVX2-NEXT: vmovdqu (%rdi), %ymm0
	; X64-AVX2-NEXT: vmovdqu 32(%rdi), %ymm1			; X64-AVX2-NEXT: vmovdqu 32(%rdi), %ymm1
				; X64-AVX2-NEXT: vmovdqu 64(%rdi), %ymm2
				; X64-AVX2-NEXT: vmovdqu 96(%rdi), %ymm3
				; X64-AVX2-NEXT: vpxor {{.*}}(%rip), %ymm3, %ymm3
				; X64-AVX2-NEXT: vpxor {{.*}}(%rip), %ymm2, %ymm2
				; X64-AVX2-NEXT: vpor %ymm3, %ymm2, %ymm2
	; X64-AVX2-NEXT: vpxor {{.*}}(%rip), %ymm1, %ymm1			; X64-AVX2-NEXT: vpxor {{.*}}(%rip), %ymm1, %ymm1
				; X64-AVX2-NEXT: vpor %ymm2, %ymm1, %ymm1
	; X64-AVX2-NEXT: vpxor {{.*}}(%rip), %ymm0, %ymm0			; X64-AVX2-NEXT: vpxor {{.*}}(%rip), %ymm0, %ymm0
	; X64-AVX2-NEXT: vpor %ymm1, %ymm0, %ymm0			; X64-AVX2-NEXT: vpor %ymm1, %ymm0, %ymm0
	; X64-AVX2-NEXT: vptest %ymm0, %ymm0			; X64-AVX2-NEXT: vptest %ymm0, %ymm0
	; X64-AVX2-NEXT: sete %al			; X64-AVX2-NEXT: sete %al
	; X64-AVX2-NEXT: vzeroupper			; X64-AVX2-NEXT: vzeroupper
	; X64-AVX2-NEXT: retq			; X64-AVX2-NEXT: retq
	;			;
	; X64-AVX512F-LABEL: length64_eq_const:			; X64-AVX512F-LABEL: length128_eq_const:
	; X64-AVX512F: # %bb.0:			; X64-AVX512F: # %bb.0:
	; X64-AVX512F-NEXT: vmovdqu64 (%rdi), %zmm0			; X64-AVX512F-NEXT: vmovdqu64 (%rdi), %zmm0
	; X64-AVX512F-NEXT: vpcmpeqd {{.*}}(%rip), %zmm0, %k0			; X64-AVX512F-NEXT: vmovdqu64 64(%rdi), %zmm1
				; X64-AVX512F-NEXT: vpcmpeqd {{.*}}(%rip), %zmm0, %k1
				; X64-AVX512F-NEXT: vpcmpeqd .L.str+{{.*}}(%rip), %zmm1, %k0 {%k1}
	; X64-AVX512F-NEXT: kortestw %k0, %k0			; X64-AVX512F-NEXT: kortestw %k0, %k0
	; X64-AVX512F-NEXT: setb %al			; X64-AVX512F-NEXT: setb %al
	; X64-AVX512F-NEXT: vzeroupper			; X64-AVX512F-NEXT: vzeroupper
	; X64-AVX512F-NEXT: retq			; X64-AVX512F-NEXT: retq
	;			;
	; X64-AVX512BW-LABEL: length64_eq_const:			; X64-AVX512BW-LABEL: length128_eq_const:
	; X64-AVX512BW: # %bb.0:			; X64-AVX512BW: # %bb.0:
	; X64-AVX512BW-NEXT: vmovdqu64 (%rdi), %zmm0			; X64-AVX512BW-NEXT: vmovdqu64 (%rdi), %zmm0
	; X64-AVX512BW-NEXT: vpcmpeqb {{.*}}(%rip), %zmm0, %k0			; X64-AVX512BW-NEXT: vmovdqu64 64(%rdi), %zmm1
				; X64-AVX512BW-NEXT: vpcmpeqb {{.*}}(%rip), %zmm0, %k1
				; X64-AVX512BW-NEXT: vpcmpeqb .L.str+{{.*}}(%rip), %zmm1, %k0 {%k1}
				; X64-AVX512BW-NEXT: kortestq %k0, %k0
				; X64-AVX512BW-NEXT: setb %al
				; X64-AVX512BW-NEXT: vzeroupper
				; X64-AVX512BW-NEXT: retq
				%m = tail call i32 @memcmp(i8* %X, i8* getelementptr inbounds ([513 x i8], [513 x i8]* @.str, i32 0, i32 0), i64 128) nounwind
				%c = icmp eq i32 %m, 0
				ret i1 %c
				}

				define i32 @length255(i8* %X, i8* %Y) nounwind {
				; X86-LABEL: length255:
				; X86: # %bb.0:
				; X86-NEXT: pushl $0
				; X86-NEXT: pushl $255
				; X86-NEXT: pushl {{[0-9]+}}(%esp)
				; X86-NEXT: pushl {{[0-9]+}}(%esp)
				; X86-NEXT: calll memcmp
				; X86-NEXT: addl $16, %esp
				; X86-NEXT: retl
				;
				; X64-LABEL: length255:
				; X64: # %bb.0:
				; X64-NEXT: movl $255, %edx
				; X64-NEXT: jmp memcmp # TAILCALL
				%m = tail call i32 @memcmp(i8* %X, i8* %Y, i64 255) nounwind
				ret i32 %m
				}

				define i1 @length255_eq(i8* %x, i8* %y) nounwind {
				; X86-LABEL: length255_eq:
				; X86: # %bb.0:
				; X86-NEXT: pushl $0
				; X86-NEXT: pushl $255
				; X86-NEXT: pushl {{[0-9]+}}(%esp)
				; X86-NEXT: pushl {{[0-9]+}}(%esp)
				; X86-NEXT: calll memcmp
				; X86-NEXT: addl $16, %esp
				; X86-NEXT: testl %eax, %eax
				; X86-NEXT: setne %al
				; X86-NEXT: retl
				;
				; X64-SSE2-LABEL: length255_eq:
				; X64-SSE2: # %bb.0:
				; X64-SSE2-NEXT: pushq %rax
				; X64-SSE2-NEXT: movl $255, %edx
				; X64-SSE2-NEXT: callq memcmp
				; X64-SSE2-NEXT: testl %eax, %eax
				; X64-SSE2-NEXT: setne %al
				; X64-SSE2-NEXT: popq %rcx
				; X64-SSE2-NEXT: retq
				;
				; X64-AVX-LABEL: length255_eq:
				; X64-AVX: # %bb.0:
				; X64-AVX-NEXT: pushq %rax
				; X64-AVX-NEXT: movl $255, %edx
				; X64-AVX-NEXT: callq memcmp
				; X64-AVX-NEXT: testl %eax, %eax
				; X64-AVX-NEXT: setne %al
				; X64-AVX-NEXT: popq %rcx
				; X64-AVX-NEXT: retq
				;
				; X64-AVX512F-LABEL: length255_eq:
				; X64-AVX512F: # %bb.0:
				; X64-AVX512F-NEXT: vmovdqu64 -128(%rdi), %zmm0
				; X64-AVX512F-NEXT: vmovdqu64 -65(%rdi), %zmm1
				; X64-AVX512F-NEXT: vmovdqu64 (%rdi), %zmm2
				; X64-AVX512F-NEXT: vmovdqu64 64(%rdi), %zmm3
				; X64-AVX512F-NEXT: vpcmpeqd -128(%rsi), %zmm0, %k1
				; X64-AVX512F-NEXT: vpcmpeqd -65(%rsi), %zmm1, %k0 {%k1}
				; X64-AVX512F-NEXT: vpcmpeqd (%rsi), %zmm2, %k1
				; X64-AVX512F-NEXT: vpcmpeqd 64(%rsi), %zmm3, %k1 {%k1}
				; X64-AVX512F-NEXT: kandw %k0, %k1, %k0
				; X64-AVX512F-NEXT: kortestw %k0, %k0
				; X64-AVX512F-NEXT: setae %al
				; X64-AVX512F-NEXT: vzeroupper
				; X64-AVX512F-NEXT: retq
				;
				; X64-AVX512BW-LABEL: length255_eq:
				; X64-AVX512BW: # %bb.0:
				; X64-AVX512BW-NEXT: vmovdqu64 -128(%rdi), %zmm0
				; X64-AVX512BW-NEXT: vmovdqu64 -65(%rdi), %zmm1
				; X64-AVX512BW-NEXT: vmovdqu64 (%rdi), %zmm2
				; X64-AVX512BW-NEXT: vmovdqu64 64(%rdi), %zmm3
				; X64-AVX512BW-NEXT: vpcmpeqb -128(%rsi), %zmm0, %k1
				; X64-AVX512BW-NEXT: vpcmpeqb -65(%rsi), %zmm1, %k0 {%k1}
				; X64-AVX512BW-NEXT: vpcmpeqb (%rsi), %zmm2, %k1
				; X64-AVX512BW-NEXT: vpcmpeqb 64(%rsi), %zmm3, %k1 {%k1}
				; X64-AVX512BW-NEXT: kandq %k0, %k1, %k0
				; X64-AVX512BW-NEXT: kortestq %k0, %k0
				; X64-AVX512BW-NEXT: setae %al
				; X64-AVX512BW-NEXT: vzeroupper
				; X64-AVX512BW-NEXT: retq
				%call = tail call i32 @memcmp(i8* %x, i8* %y, i64 255) nounwind
				%cmp = icmp ne i32 %call, 0
				ret i1 %cmp
				}

				define i1 @length255_eq_const(i8* %X) nounwind {
				; X86-LABEL: length255_eq_const:
				; X86: # %bb.0:
				; X86-NEXT: pushl $0
				; X86-NEXT: pushl $255
				; X86-NEXT: pushl $.L.str
				; X86-NEXT: pushl {{[0-9]+}}(%esp)
				; X86-NEXT: calll memcmp
				; X86-NEXT: addl $16, %esp
				; X86-NEXT: testl %eax, %eax
				; X86-NEXT: sete %al
				; X86-NEXT: retl
				;
				; X64-SSE2-LABEL: length255_eq_const:
				; X64-SSE2: # %bb.0:
				; X64-SSE2-NEXT: pushq %rax
				; X64-SSE2-NEXT: movl $.L.str, %esi
				; X64-SSE2-NEXT: movl $255, %edx
				; X64-SSE2-NEXT: callq memcmp
				; X64-SSE2-NEXT: testl %eax, %eax
				; X64-SSE2-NEXT: sete %al
				; X64-SSE2-NEXT: popq %rcx
				; X64-SSE2-NEXT: retq
				;
				; X64-AVX-LABEL: length255_eq_const:
				; X64-AVX: # %bb.0:
				; X64-AVX-NEXT: pushq %rax
				; X64-AVX-NEXT: movl $.L.str, %esi
				; X64-AVX-NEXT: movl $255, %edx
				; X64-AVX-NEXT: callq memcmp
				; X64-AVX-NEXT: testl %eax, %eax
				; X64-AVX-NEXT: sete %al
				; X64-AVX-NEXT: popq %rcx
				; X64-AVX-NEXT: retq
				;
				; X64-AVX512F-LABEL: length255_eq_const:
				; X64-AVX512F: # %bb.0:
				; X64-AVX512F-NEXT: vmovdqu64 -128(%rdi), %zmm0
				; X64-AVX512F-NEXT: vmovdqu64 -65(%rdi), %zmm1
				; X64-AVX512F-NEXT: vmovdqu64 (%rdi), %zmm2
				; X64-AVX512F-NEXT: vmovdqu64 64(%rdi), %zmm3
				; X64-AVX512F-NEXT: vpcmpeqd .L.str-{{.*}}(%rip), %zmm0, %k1
				; X64-AVX512F-NEXT: vpcmpeqd .L.str-{{.*}}(%rip), %zmm1, %k0 {%k1}
				; X64-AVX512F-NEXT: vpcmpeqd {{.*}}(%rip), %zmm2, %k1
				; X64-AVX512F-NEXT: vpcmpeqd .L.str+{{.*}}(%rip), %zmm3, %k1 {%k1}
				; X64-AVX512F-NEXT: kandw %k0, %k1, %k0
				; X64-AVX512F-NEXT: kortestw %k0, %k0
				; X64-AVX512F-NEXT: setb %al
				; X64-AVX512F-NEXT: vzeroupper
				; X64-AVX512F-NEXT: retq
				;
				; X64-AVX512BW-LABEL: length255_eq_const:
				; X64-AVX512BW: # %bb.0:
				; X64-AVX512BW-NEXT: vmovdqu64 -128(%rdi), %zmm0
				; X64-AVX512BW-NEXT: vmovdqu64 -65(%rdi), %zmm1
				; X64-AVX512BW-NEXT: vmovdqu64 (%rdi), %zmm2
				; X64-AVX512BW-NEXT: vmovdqu64 64(%rdi), %zmm3
				; X64-AVX512BW-NEXT: vpcmpeqb .L.str-{{.*}}(%rip), %zmm0, %k1
				; X64-AVX512BW-NEXT: vpcmpeqb .L.str-{{.*}}(%rip), %zmm1, %k0 {%k1}
				; X64-AVX512BW-NEXT: vpcmpeqb {{.*}}(%rip), %zmm2, %k1
				; X64-AVX512BW-NEXT: vpcmpeqb .L.str+{{.*}}(%rip), %zmm3, %k1 {%k1}
				; X64-AVX512BW-NEXT: kandq %k0, %k1, %k0
	; X64-AVX512BW-NEXT: kortestq %k0, %k0			; X64-AVX512BW-NEXT: kortestq %k0, %k0
	; X64-AVX512BW-NEXT: setb %al			; X64-AVX512BW-NEXT: setb %al
	; X64-AVX512BW-NEXT: vzeroupper			; X64-AVX512BW-NEXT: vzeroupper
	; X64-AVX512BW-NEXT: retq			; X64-AVX512BW-NEXT: retq
	%m = tail call i32 @memcmp(i8* %X, i8* getelementptr inbounds ([65 x i8], [65 x i8]* @.str, i32 0, i32 0), i64 64) nounwind			%m = tail call i32 @memcmp(i8* %X, i8* getelementptr inbounds ([513 x i8], [513 x i8]* @.str, i32 0, i32 0), i64 255) nounwind
				%c = icmp eq i32 %m, 0
				ret i1 %c
				}

				define i32 @length256(i8* %X, i8* %Y) nounwind {
				; X86-LABEL: length256:
				; X86: # %bb.0:
				; X86-NEXT: pushl $0
				; X86-NEXT: pushl $256 # imm = 0x100
				; X86-NEXT: pushl {{[0-9]+}}(%esp)
				; X86-NEXT: pushl {{[0-9]+}}(%esp)
				; X86-NEXT: calll memcmp
				; X86-NEXT: addl $16, %esp
				; X86-NEXT: retl
				;
				; X64-LABEL: length256:
				; X64: # %bb.0:
				; X64-NEXT: movl $256, %edx # imm = 0x100
				; X64-NEXT: jmp memcmp # TAILCALL
				%m = tail call i32 @memcmp(i8* %X, i8* %Y, i64 256) nounwind
				ret i32 %m
				}

				define i1 @length256_eq(i8* %x, i8* %y) nounwind {
				; X86-LABEL: length256_eq:
				; X86: # %bb.0:
				; X86-NEXT: pushl $0
				; X86-NEXT: pushl $256 # imm = 0x100
				; X86-NEXT: pushl {{[0-9]+}}(%esp)
				; X86-NEXT: pushl {{[0-9]+}}(%esp)
				; X86-NEXT: calll memcmp
				; X86-NEXT: addl $16, %esp
				; X86-NEXT: testl %eax, %eax
				; X86-NEXT: setne %al
				; X86-NEXT: retl
				;
				; X64-SSE2-LABEL: length256_eq:
				; X64-SSE2: # %bb.0:
				; X64-SSE2-NEXT: pushq %rax
				; X64-SSE2-NEXT: movl $256, %edx # imm = 0x100
				; X64-SSE2-NEXT: callq memcmp
				; X64-SSE2-NEXT: testl %eax, %eax
				; X64-SSE2-NEXT: setne %al
				; X64-SSE2-NEXT: popq %rcx
				; X64-SSE2-NEXT: retq
				;
				; X64-AVX-LABEL: length256_eq:
				; X64-AVX: # %bb.0:
				; X64-AVX-NEXT: pushq %rax
				; X64-AVX-NEXT: movl $256, %edx # imm = 0x100
				; X64-AVX-NEXT: callq memcmp
				; X64-AVX-NEXT: testl %eax, %eax
				; X64-AVX-NEXT: setne %al
				; X64-AVX-NEXT: popq %rcx
				; X64-AVX-NEXT: retq
				;
				; X64-AVX512F-LABEL: length256_eq:
				; X64-AVX512F: # %bb.0:
				; X64-AVX512F-NEXT: vmovdqu64 -128(%rdi), %zmm0
				; X64-AVX512F-NEXT: vmovdqu64 -64(%rdi), %zmm1
				; X64-AVX512F-NEXT: vmovdqu64 (%rdi), %zmm2
				; X64-AVX512F-NEXT: vmovdqu64 64(%rdi), %zmm3
				; X64-AVX512F-NEXT: vpcmpeqd -128(%rsi), %zmm0, %k1
				; X64-AVX512F-NEXT: vpcmpeqd -64(%rsi), %zmm1, %k0 {%k1}
				; X64-AVX512F-NEXT: vpcmpeqd (%rsi), %zmm2, %k1
				; X64-AVX512F-NEXT: vpcmpeqd 64(%rsi), %zmm3, %k1 {%k1}
				; X64-AVX512F-NEXT: kandw %k0, %k1, %k0
				; X64-AVX512F-NEXT: kortestw %k0, %k0
				; X64-AVX512F-NEXT: setae %al
				; X64-AVX512F-NEXT: vzeroupper
				; X64-AVX512F-NEXT: retq
				;
				; X64-AVX512BW-LABEL: length256_eq:
				; X64-AVX512BW: # %bb.0:
				; X64-AVX512BW-NEXT: vmovdqu64 -128(%rdi), %zmm0
				; X64-AVX512BW-NEXT: vmovdqu64 -64(%rdi), %zmm1
				; X64-AVX512BW-NEXT: vmovdqu64 (%rdi), %zmm2
				; X64-AVX512BW-NEXT: vmovdqu64 64(%rdi), %zmm3
				; X64-AVX512BW-NEXT: vpcmpeqb -128(%rsi), %zmm0, %k1
				; X64-AVX512BW-NEXT: vpcmpeqb -64(%rsi), %zmm1, %k0 {%k1}
				; X64-AVX512BW-NEXT: vpcmpeqb (%rsi), %zmm2, %k1
				; X64-AVX512BW-NEXT: vpcmpeqb 64(%rsi), %zmm3, %k1 {%k1}
				; X64-AVX512BW-NEXT: kandq %k0, %k1, %k0
				; X64-AVX512BW-NEXT: kortestq %k0, %k0
				; X64-AVX512BW-NEXT: setae %al
				; X64-AVX512BW-NEXT: vzeroupper
				; X64-AVX512BW-NEXT: retq
				%call = tail call i32 @memcmp(i8* %x, i8* %y, i64 256) nounwind
				%cmp = icmp ne i32 %call, 0
				ret i1 %cmp
				}

				define i1 @length256_eq_const(i8* %X) nounwind {
				; X86-LABEL: length256_eq_const:
				; X86: # %bb.0:
				; X86-NEXT: pushl $0
				; X86-NEXT: pushl $256 # imm = 0x100
				; X86-NEXT: pushl $.L.str
				; X86-NEXT: pushl {{[0-9]+}}(%esp)
				; X86-NEXT: calll memcmp
				; X86-NEXT: addl $16, %esp
				; X86-NEXT: testl %eax, %eax
				; X86-NEXT: sete %al
				; X86-NEXT: retl
				;
				; X64-SSE2-LABEL: length256_eq_const:
				; X64-SSE2: # %bb.0:
				; X64-SSE2-NEXT: pushq %rax
				; X64-SSE2-NEXT: movl $.L.str, %esi
				; X64-SSE2-NEXT: movl $256, %edx # imm = 0x100
				; X64-SSE2-NEXT: callq memcmp
				; X64-SSE2-NEXT: testl %eax, %eax
				; X64-SSE2-NEXT: sete %al
				; X64-SSE2-NEXT: popq %rcx
				; X64-SSE2-NEXT: retq
				;
				; X64-AVX-LABEL: length256_eq_const:
				; X64-AVX: # %bb.0:
				; X64-AVX-NEXT: pushq %rax
				; X64-AVX-NEXT: movl $.L.str, %esi
				; X64-AVX-NEXT: movl $256, %edx # imm = 0x100
				; X64-AVX-NEXT: callq memcmp
				; X64-AVX-NEXT: testl %eax, %eax
				; X64-AVX-NEXT: sete %al
				; X64-AVX-NEXT: popq %rcx
				; X64-AVX-NEXT: retq
				;
				; X64-AVX512F-LABEL: length256_eq_const:
				; X64-AVX512F: # %bb.0:
				; X64-AVX512F-NEXT: vmovdqu64 -128(%rdi), %zmm0
				; X64-AVX512F-NEXT: vmovdqu64 -64(%rdi), %zmm1
				; X64-AVX512F-NEXT: vmovdqu64 (%rdi), %zmm2
				; X64-AVX512F-NEXT: vmovdqu64 64(%rdi), %zmm3
				; X64-AVX512F-NEXT: vpcmpeqd .L.str-{{.*}}(%rip), %zmm0, %k1
				; X64-AVX512F-NEXT: vpcmpeqd .L.str-{{.*}}(%rip), %zmm1, %k0 {%k1}
				; X64-AVX512F-NEXT: vpcmpeqd {{.*}}(%rip), %zmm2, %k1
				; X64-AVX512F-NEXT: vpcmpeqd .L.str+{{.*}}(%rip), %zmm3, %k1 {%k1}
				; X64-AVX512F-NEXT: kandw %k0, %k1, %k0
				; X64-AVX512F-NEXT: kortestw %k0, %k0
				; X64-AVX512F-NEXT: setb %al
				; X64-AVX512F-NEXT: vzeroupper
				; X64-AVX512F-NEXT: retq
				;
				; X64-AVX512BW-LABEL: length256_eq_const:
				; X64-AVX512BW: # %bb.0:
				; X64-AVX512BW-NEXT: vmovdqu64 -128(%rdi), %zmm0
				; X64-AVX512BW-NEXT: vmovdqu64 -64(%rdi), %zmm1
				; X64-AVX512BW-NEXT: vmovdqu64 (%rdi), %zmm2
				; X64-AVX512BW-NEXT: vmovdqu64 64(%rdi), %zmm3
				; X64-AVX512BW-NEXT: vpcmpeqb .L.str-{{.*}}(%rip), %zmm0, %k1
				; X64-AVX512BW-NEXT: vpcmpeqb .L.str-{{.*}}(%rip), %zmm1, %k0 {%k1}
				; X64-AVX512BW-NEXT: vpcmpeqb {{.*}}(%rip), %zmm2, %k1
				; X64-AVX512BW-NEXT: vpcmpeqb .L.str+{{.*}}(%rip), %zmm3, %k1 {%k1}
				; X64-AVX512BW-NEXT: kandq %k0, %k1, %k0
				; X64-AVX512BW-NEXT: kortestq %k0, %k0
				; X64-AVX512BW-NEXT: setb %al
				; X64-AVX512BW-NEXT: vzeroupper
				; X64-AVX512BW-NEXT: retq
				%m = tail call i32 @memcmp(i8* %X, i8* getelementptr inbounds ([513 x i8], [513 x i8]* @.str, i32 0, i32 0), i64 256) nounwind
				%c = icmp eq i32 %m, 0
				ret i1 %c
				}

				define i32 @length512(i8* %X, i8* %Y) nounwind {
				; X86-LABEL: length512:
				; X86: # %bb.0:
				; X86-NEXT: pushl $0
				; X86-NEXT: pushl $512 # imm = 0x200
				; X86-NEXT: pushl {{[0-9]+}}(%esp)
				; X86-NEXT: pushl {{[0-9]+}}(%esp)
				; X86-NEXT: calll memcmp
				; X86-NEXT: addl $16, %esp
				; X86-NEXT: retl
				;
				; X64-LABEL: length512:
				; X64: # %bb.0:
				; X64-NEXT: movl $512, %edx # imm = 0x200
				; X64-NEXT: jmp memcmp # TAILCALL
				%m = tail call i32 @memcmp(i8* %X, i8* %Y, i64 512) nounwind
				ret i32 %m
				}

				define i1 @length512_eq(i8* %x, i8* %y) nounwind {
				; X86-LABEL: length512_eq:
				; X86: # %bb.0:
				; X86-NEXT: pushl $0
				; X86-NEXT: pushl $512 # imm = 0x200
				; X86-NEXT: pushl {{[0-9]+}}(%esp)
				; X86-NEXT: pushl {{[0-9]+}}(%esp)
				; X86-NEXT: calll memcmp
				; X86-NEXT: addl $16, %esp
				; X86-NEXT: testl %eax, %eax
				; X86-NEXT: setne %al
				; X86-NEXT: retl
				;
				; X64-LABEL: length512_eq:
				; X64: # %bb.0:
				; X64-NEXT: pushq %rax
				; X64-NEXT: movl $512, %edx # imm = 0x200
				; X64-NEXT: callq memcmp
				; X64-NEXT: testl %eax, %eax
				; X64-NEXT: setne %al
				; X64-NEXT: popq %rcx
				; X64-NEXT: retq
				%call = tail call i32 @memcmp(i8* %x, i8* %y, i64 512) nounwind
				%cmp = icmp ne i32 %call, 0
				ret i1 %cmp
				}

				define i1 @length512_eq_const(i8* %X) nounwind {
				; X86-LABEL: length512_eq_const:
				; X86: # %bb.0:
				; X86-NEXT: pushl $0
				; X86-NEXT: pushl $512 # imm = 0x200
				; X86-NEXT: pushl $.L.str
				; X86-NEXT: pushl {{[0-9]+}}(%esp)
				; X86-NEXT: calll memcmp
				; X86-NEXT: addl $16, %esp
				; X86-NEXT: testl %eax, %eax
				; X86-NEXT: sete %al
				; X86-NEXT: retl
				;
				; X64-LABEL: length512_eq_const:
				; X64: # %bb.0:
				; X64-NEXT: pushq %rax
				; X64-NEXT: movl $.L.str, %esi
				; X64-NEXT: movl $512, %edx # imm = 0x200
				; X64-NEXT: callq memcmp
				; X64-NEXT: testl %eax, %eax
				; X64-NEXT: sete %al
				; X64-NEXT: popq %rcx
				; X64-NEXT: retq
				%m = tail call i32 @memcmp(i8* %X, i8* getelementptr inbounds ([513 x i8], [513 x i8]* @.str, i32 0, i32 0), i64 512) nounwind
	%c = icmp eq i32 %m, 0			%c = icmp eq i32 %m, 0
	ret i1 %c			ret i1 %c
	}			}

	; This checks that we do not do stupid things with huge sizes.			; This checks that we do not do stupid things with huge sizes.
	define i32 @huge_length(i8* %X, i8* %Y) nounwind {			define i32 @huge_length(i8* %X, i8* %Y) nounwind {
	; X86-LABEL: huge_length:			; X86-LABEL: huge_length:
	; X86: # %bb.0:			; X86: # %bb.0:
	▲ Show 20 Lines • Show All 82 Lines • Show Last 20 Lines

test/Transforms/ExpandMemCmp/X86/memcmp.ll

Show First 20 Lines • Show All 774 Lines • ▼ Show 20 Lines	;
%call = tail call i32 @memcmp(i8* %x, i8* %y, i64 8)		%call = tail call i32 @memcmp(i8* %x, i8* %y, i64 8)
%cmp = icmp eq i32 %call, 0		%cmp = icmp eq i32 %call, 0
%conv = zext i1 %cmp to i32		%conv = zext i1 %cmp to i32
ret i32 %conv		ret i32 %conv
}		}

define i32 @cmp_eq9(i8* nocapture readonly %x, i8* nocapture readonly %y) {		define i32 @cmp_eq9(i8* nocapture readonly %x, i8* nocapture readonly %y) {
; X32-LABEL: @cmp_eq9(		; X32-LABEL: @cmp_eq9(
; X32-NEXT: [[CALL:%.]] = tail call i32 @memcmp(i8 [[X:%.]], i8 [[Y:%.*]], i64 9)		; X32-NEXT: [[TMP1:%.]] = bitcast i8 [[X:%.]] to i32
; X32-NEXT: [[CMP:%.*]] = icmp eq i32 [[CALL]], 0		; X32-NEXT: [[TMP2:%.]] = bitcast i8 [[Y:%.]] to i32
		; X32-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1]]
		; X32-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP2]]
		; X32-NEXT: [[TMP5:%.*]] = xor i32 [[TMP3]], [[TMP4]]
		; X32-NEXT: [[TMP6:%.]] = getelementptr i8, i8 [[X]], i8 4
		; X32-NEXT: [[TMP7:%.]] = bitcast i8 [[TMP6]] to i32*
		; X32-NEXT: [[TMP8:%.]] = getelementptr i8, i8 [[Y]], i8 4
		; X32-NEXT: [[TMP9:%.]] = bitcast i8 [[TMP8]] to i32*
		; X32-NEXT: [[TMP10:%.]] = load i32, i32 [[TMP7]]
		; X32-NEXT: [[TMP11:%.]] = load i32, i32 [[TMP9]]
		; X32-NEXT: [[TMP12:%.*]] = xor i32 [[TMP10]], [[TMP11]]
		; X32-NEXT: [[TMP13:%.]] = getelementptr i8, i8 [[X]], i8 8
		; X32-NEXT: [[TMP14:%.]] = getelementptr i8, i8 [[Y]], i8 8
		; X32-NEXT: [[TMP15:%.]] = load i8, i8 [[TMP13]]
		; X32-NEXT: [[TMP16:%.]] = load i8, i8 [[TMP14]]
		; X32-NEXT: [[TMP17:%.*]] = zext i8 [[TMP15]] to i32
		; X32-NEXT: [[TMP18:%.*]] = zext i8 [[TMP16]] to i32
		; X32-NEXT: [[TMP19:%.*]] = xor i32 [[TMP17]], [[TMP18]]
		; X32-NEXT: [[TMP20:%.*]] = or i32 [[TMP5]], [[TMP12]]
		; X32-NEXT: [[TMP21:%.*]] = or i32 [[TMP20]], [[TMP19]]
		; X32-NEXT: [[TMP22:%.*]] = icmp ne i32 [[TMP21]], 0
		; X32-NEXT: [[TMP23:%.*]] = zext i1 [[TMP22]] to i32
		; X32-NEXT: [[CMP:%.*]] = icmp eq i32 [[TMP23]], 0
; X32-NEXT: [[CONV:%.*]] = zext i1 [[CMP]] to i32		; X32-NEXT: [[CONV:%.*]] = zext i1 [[CMP]] to i32
; X32-NEXT: ret i32 [[CONV]]		; X32-NEXT: ret i32 [[CONV]]
;		;
; X64_1LD-LABEL: @cmp_eq9(		; X64_1LD-LABEL: @cmp_eq9(
; X64_1LD-NEXT: br label [[LOADBB:%.*]]		; X64_1LD-NEXT: br label [[LOADBB:%.*]]
; X64_1LD: res_block:		; X64_1LD: res_block:
; X64_1LD-NEXT: br label [[ENDBLOCK:%.*]]		; X64_1LD-NEXT: br label [[ENDBLOCK:%.*]]
; X64_1LD: loadbb:		; X64_1LD: loadbb:
Show All 39 Lines	;
%call = tail call i32 @memcmp(i8* %x, i8* %y, i64 9)		%call = tail call i32 @memcmp(i8* %x, i8* %y, i64 9)
%cmp = icmp eq i32 %call, 0		%cmp = icmp eq i32 %call, 0
%conv = zext i1 %cmp to i32		%conv = zext i1 %cmp to i32
ret i32 %conv		ret i32 %conv
}		}

define i32 @cmp_eq10(i8* nocapture readonly %x, i8* nocapture readonly %y) {		define i32 @cmp_eq10(i8* nocapture readonly %x, i8* nocapture readonly %y) {
; X32-LABEL: @cmp_eq10(		; X32-LABEL: @cmp_eq10(
; X32-NEXT: [[CALL:%.]] = tail call i32 @memcmp(i8 [[X:%.]], i8 [[Y:%.*]], i64 10)		; X32-NEXT: [[TMP1:%.]] = bitcast i8 [[X:%.]] to i32
; X32-NEXT: [[CMP:%.*]] = icmp eq i32 [[CALL]], 0		; X32-NEXT: [[TMP2:%.]] = bitcast i8 [[Y:%.]] to i32
		; X32-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1]]
		; X32-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP2]]
		; X32-NEXT: [[TMP5:%.*]] = xor i32 [[TMP3]], [[TMP4]]
		; X32-NEXT: [[TMP6:%.]] = getelementptr i8, i8 [[X]], i8 4
		; X32-NEXT: [[TMP7:%.]] = bitcast i8 [[TMP6]] to i32*
		; X32-NEXT: [[TMP8:%.]] = getelementptr i8, i8 [[Y]], i8 4
		; X32-NEXT: [[TMP9:%.]] = bitcast i8 [[TMP8]] to i32*
		; X32-NEXT: [[TMP10:%.]] = load i32, i32 [[TMP7]]
		; X32-NEXT: [[TMP11:%.]] = load i32, i32 [[TMP9]]
		; X32-NEXT: [[TMP12:%.*]] = xor i32 [[TMP10]], [[TMP11]]
		; X32-NEXT: [[TMP13:%.]] = getelementptr i8, i8 [[X]], i8 8
		; X32-NEXT: [[TMP14:%.]] = bitcast i8 [[TMP13]] to i16*
		; X32-NEXT: [[TMP15:%.]] = getelementptr i8, i8 [[Y]], i8 8
		; X32-NEXT: [[TMP16:%.]] = bitcast i8 [[TMP15]] to i16*
		; X32-NEXT: [[TMP17:%.]] = load i16, i16 [[TMP14]]
		; X32-NEXT: [[TMP18:%.]] = load i16, i16 [[TMP16]]
		; X32-NEXT: [[TMP19:%.*]] = zext i16 [[TMP17]] to i32
		; X32-NEXT: [[TMP20:%.*]] = zext i16 [[TMP18]] to i32
		; X32-NEXT: [[TMP21:%.*]] = xor i32 [[TMP19]], [[TMP20]]
		; X32-NEXT: [[TMP22:%.*]] = or i32 [[TMP5]], [[TMP12]]
		; X32-NEXT: [[TMP23:%.*]] = or i32 [[TMP22]], [[TMP21]]
		; X32-NEXT: [[TMP24:%.*]] = icmp ne i32 [[TMP23]], 0
		; X32-NEXT: [[TMP25:%.*]] = zext i1 [[TMP24]] to i32
		; X32-NEXT: [[CMP:%.*]] = icmp eq i32 [[TMP25]], 0
; X32-NEXT: [[CONV:%.*]] = zext i1 [[CMP]] to i32		; X32-NEXT: [[CONV:%.*]] = zext i1 [[CMP]] to i32
; X32-NEXT: ret i32 [[CONV]]		; X32-NEXT: ret i32 [[CONV]]
;		;
; X64_1LD-LABEL: @cmp_eq10(		; X64_1LD-LABEL: @cmp_eq10(
; X64_1LD-NEXT: br label [[LOADBB:%.*]]		; X64_1LD-NEXT: br label [[LOADBB:%.*]]
; X64_1LD: res_block:		; X64_1LD: res_block:
; X64_1LD-NEXT: br label [[ENDBLOCK:%.*]]		; X64_1LD-NEXT: br label [[ENDBLOCK:%.*]]
; X64_1LD: loadbb:		; X64_1LD: loadbb:
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	;
%call = tail call i32 @memcmp(i8* %x, i8* %y, i64 10)		%call = tail call i32 @memcmp(i8* %x, i8* %y, i64 10)
%cmp = icmp eq i32 %call, 0		%cmp = icmp eq i32 %call, 0
%conv = zext i1 %cmp to i32		%conv = zext i1 %cmp to i32
ret i32 %conv		ret i32 %conv
}		}

define i32 @cmp_eq11(i8* nocapture readonly %x, i8* nocapture readonly %y) {		define i32 @cmp_eq11(i8* nocapture readonly %x, i8* nocapture readonly %y) {
; X32-LABEL: @cmp_eq11(		; X32-LABEL: @cmp_eq11(
; X32-NEXT: [[CALL:%.]] = tail call i32 @memcmp(i8 [[X:%.]], i8 [[Y:%.*]], i64 11)		; X32-NEXT: [[TMP1:%.]] = bitcast i8 [[X:%.]] to i32
; X32-NEXT: [[CMP:%.*]] = icmp eq i32 [[CALL]], 0		; X32-NEXT: [[TMP2:%.]] = bitcast i8 [[Y:%.]] to i32
		; X32-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1]]
		; X32-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP2]]
		; X32-NEXT: [[TMP5:%.*]] = xor i32 [[TMP3]], [[TMP4]]
		; X32-NEXT: [[TMP6:%.]] = getelementptr i8, i8 [[X]], i8 4
		; X32-NEXT: [[TMP7:%.]] = bitcast i8 [[TMP6]] to i32*
		; X32-NEXT: [[TMP8:%.]] = getelementptr i8, i8 [[Y]], i8 4
		; X32-NEXT: [[TMP9:%.]] = bitcast i8 [[TMP8]] to i32*
		; X32-NEXT: [[TMP10:%.]] = load i32, i32 [[TMP7]]
		; X32-NEXT: [[TMP11:%.]] = load i32, i32 [[TMP9]]
		; X32-NEXT: [[TMP12:%.*]] = xor i32 [[TMP10]], [[TMP11]]
		; X32-NEXT: [[TMP13:%.]] = getelementptr i8, i8 [[X]], i8 7
		; X32-NEXT: [[TMP14:%.]] = bitcast i8 [[TMP13]] to i32*
		; X32-NEXT: [[TMP15:%.]] = getelementptr i8, i8 [[Y]], i8 7
		; X32-NEXT: [[TMP16:%.]] = bitcast i8 [[TMP15]] to i32*
		; X32-NEXT: [[TMP17:%.]] = load i32, i32 [[TMP14]]
		; X32-NEXT: [[TMP18:%.]] = load i32, i32 [[TMP16]]
		; X32-NEXT: [[TMP19:%.*]] = xor i32 [[TMP17]], [[TMP18]]
		; X32-NEXT: [[TMP20:%.*]] = or i32 [[TMP5]], [[TMP12]]
		; X32-NEXT: [[TMP21:%.*]] = or i32 [[TMP20]], [[TMP19]]
		; X32-NEXT: [[TMP22:%.*]] = icmp ne i32 [[TMP21]], 0
		; X32-NEXT: [[TMP23:%.*]] = zext i1 [[TMP22]] to i32
		; X32-NEXT: [[CMP:%.*]] = icmp eq i32 [[TMP23]], 0
; X32-NEXT: [[CONV:%.*]] = zext i1 [[CMP]] to i32		; X32-NEXT: [[CONV:%.*]] = zext i1 [[CMP]] to i32
; X32-NEXT: ret i32 [[CONV]]		; X32-NEXT: ret i32 [[CONV]]
;		;
; X64_1LD-LABEL: @cmp_eq11(		; X64_1LD-LABEL: @cmp_eq11(
; X64_1LD-NEXT: br label [[LOADBB:%.*]]		; X64_1LD-NEXT: br label [[LOADBB:%.*]]
; X64_1LD: res_block:		; X64_1LD: res_block:
; X64_1LD-NEXT: br label [[ENDBLOCK:%.*]]		; X64_1LD-NEXT: br label [[ENDBLOCK:%.*]]
; X64_1LD: loadbb:		; X64_1LD: loadbb:
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	;
%call = tail call i32 @memcmp(i8* %x, i8* %y, i64 11)		%call = tail call i32 @memcmp(i8* %x, i8* %y, i64 11)
%cmp = icmp eq i32 %call, 0		%cmp = icmp eq i32 %call, 0
%conv = zext i1 %cmp to i32		%conv = zext i1 %cmp to i32
ret i32 %conv		ret i32 %conv
}		}

define i32 @cmp_eq12(i8* nocapture readonly %x, i8* nocapture readonly %y) {		define i32 @cmp_eq12(i8* nocapture readonly %x, i8* nocapture readonly %y) {
; X32-LABEL: @cmp_eq12(		; X32-LABEL: @cmp_eq12(
; X32-NEXT: [[CALL:%.]] = tail call i32 @memcmp(i8 [[X:%.]], i8 [[Y:%.*]], i64 12)		; X32-NEXT: [[TMP1:%.]] = bitcast i8 [[X:%.]] to i32
; X32-NEXT: [[CMP:%.*]] = icmp eq i32 [[CALL]], 0		; X32-NEXT: [[TMP2:%.]] = bitcast i8 [[Y:%.]] to i32
		; X32-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1]]
		; X32-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP2]]
		; X32-NEXT: [[TMP5:%.*]] = xor i32 [[TMP3]], [[TMP4]]
		; X32-NEXT: [[TMP6:%.]] = getelementptr i8, i8 [[X]], i8 4
		; X32-NEXT: [[TMP7:%.]] = bitcast i8 [[TMP6]] to i32*
		; X32-NEXT: [[TMP8:%.]] = getelementptr i8, i8 [[Y]], i8 4
		; X32-NEXT: [[TMP9:%.]] = bitcast i8 [[TMP8]] to i32*
		; X32-NEXT: [[TMP10:%.]] = load i32, i32 [[TMP7]]
		; X32-NEXT: [[TMP11:%.]] = load i32, i32 [[TMP9]]
		; X32-NEXT: [[TMP12:%.*]] = xor i32 [[TMP10]], [[TMP11]]
		; X32-NEXT: [[TMP13:%.]] = getelementptr i8, i8 [[X]], i8 8
		; X32-NEXT: [[TMP14:%.]] = bitcast i8 [[TMP13]] to i32*
		; X32-NEXT: [[TMP15:%.]] = getelementptr i8, i8 [[Y]], i8 8
		; X32-NEXT: [[TMP16:%.]] = bitcast i8 [[TMP15]] to i32*
		; X32-NEXT: [[TMP17:%.]] = load i32, i32 [[TMP14]]
		; X32-NEXT: [[TMP18:%.]] = load i32, i32 [[TMP16]]
		; X32-NEXT: [[TMP19:%.*]] = xor i32 [[TMP17]], [[TMP18]]
		; X32-NEXT: [[TMP20:%.*]] = or i32 [[TMP5]], [[TMP12]]
		; X32-NEXT: [[TMP21:%.*]] = or i32 [[TMP20]], [[TMP19]]
		; X32-NEXT: [[TMP22:%.*]] = icmp ne i32 [[TMP21]], 0
		; X32-NEXT: [[TMP23:%.*]] = zext i1 [[TMP22]] to i32
		; X32-NEXT: [[CMP:%.*]] = icmp eq i32 [[TMP23]], 0
; X32-NEXT: [[CONV:%.*]] = zext i1 [[CMP]] to i32		; X32-NEXT: [[CONV:%.*]] = zext i1 [[CMP]] to i32
; X32-NEXT: ret i32 [[CONV]]		; X32-NEXT: ret i32 [[CONV]]
;		;
; X64_1LD-LABEL: @cmp_eq12(		; X64_1LD-LABEL: @cmp_eq12(
; X64_1LD-NEXT: br label [[LOADBB:%.*]]		; X64_1LD-NEXT: br label [[LOADBB:%.*]]
; X64_1LD: res_block:		; X64_1LD: res_block:
; X64_1LD-NEXT: br label [[ENDBLOCK:%.*]]		; X64_1LD-NEXT: br label [[ENDBLOCK:%.*]]
; X64_1LD: loadbb:		; X64_1LD: loadbb:
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	;
%call = tail call i32 @memcmp(i8* %x, i8* %y, i64 12)		%call = tail call i32 @memcmp(i8* %x, i8* %y, i64 12)
%cmp = icmp eq i32 %call, 0		%cmp = icmp eq i32 %call, 0
%conv = zext i1 %cmp to i32		%conv = zext i1 %cmp to i32
ret i32 %conv		ret i32 %conv
}		}

define i32 @cmp_eq13(i8* nocapture readonly %x, i8* nocapture readonly %y) {		define i32 @cmp_eq13(i8* nocapture readonly %x, i8* nocapture readonly %y) {
; X32-LABEL: @cmp_eq13(		; X32-LABEL: @cmp_eq13(
; X32-NEXT: [[CALL:%.]] = tail call i32 @memcmp(i8 [[X:%.]], i8 [[Y:%.*]], i64 13)		; X32-NEXT: [[TMP1:%.]] = bitcast i8 [[X:%.]] to i32
; X32-NEXT: [[CMP:%.*]] = icmp eq i32 [[CALL]], 0		; X32-NEXT: [[TMP2:%.]] = bitcast i8 [[Y:%.]] to i32
		; X32-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1]]
		; X32-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP2]]
		; X32-NEXT: [[TMP5:%.*]] = xor i32 [[TMP3]], [[TMP4]]
		; X32-NEXT: [[TMP6:%.]] = getelementptr i8, i8 [[X]], i8 4
		; X32-NEXT: [[TMP7:%.]] = bitcast i8 [[TMP6]] to i32*
		; X32-NEXT: [[TMP8:%.]] = getelementptr i8, i8 [[Y]], i8 4
		; X32-NEXT: [[TMP9:%.]] = bitcast i8 [[TMP8]] to i32*
		; X32-NEXT: [[TMP10:%.]] = load i32, i32 [[TMP7]]
		; X32-NEXT: [[TMP11:%.]] = load i32, i32 [[TMP9]]
		; X32-NEXT: [[TMP12:%.*]] = xor i32 [[TMP10]], [[TMP11]]
		; X32-NEXT: [[TMP13:%.]] = getelementptr i8, i8 [[X]], i8 8
		; X32-NEXT: [[TMP14:%.]] = bitcast i8 [[TMP13]] to i32*
		; X32-NEXT: [[TMP15:%.]] = getelementptr i8, i8 [[Y]], i8 8
		; X32-NEXT: [[TMP16:%.]] = bitcast i8 [[TMP15]] to i32*
		; X32-NEXT: [[TMP17:%.]] = load i32, i32 [[TMP14]]
		; X32-NEXT: [[TMP18:%.]] = load i32, i32 [[TMP16]]
		; X32-NEXT: [[TMP19:%.*]] = xor i32 [[TMP17]], [[TMP18]]
		; X32-NEXT: [[TMP20:%.]] = getelementptr i8, i8 [[X]], i8 12
		; X32-NEXT: [[TMP21:%.]] = getelementptr i8, i8 [[Y]], i8 12
		; X32-NEXT: [[TMP22:%.]] = load i8, i8 [[TMP20]]
		; X32-NEXT: [[TMP23:%.]] = load i8, i8 [[TMP21]]
		; X32-NEXT: [[TMP24:%.*]] = zext i8 [[TMP22]] to i32
		; X32-NEXT: [[TMP25:%.*]] = zext i8 [[TMP23]] to i32
		; X32-NEXT: [[TMP26:%.*]] = xor i32 [[TMP24]], [[TMP25]]
		; X32-NEXT: [[TMP27:%.*]] = or i32 [[TMP5]], [[TMP12]]
		; X32-NEXT: [[TMP28:%.*]] = or i32 [[TMP19]], [[TMP26]]
		; X32-NEXT: [[TMP29:%.*]] = or i32 [[TMP27]], [[TMP28]]
		; X32-NEXT: [[TMP30:%.*]] = icmp ne i32 [[TMP29]], 0
		; X32-NEXT: [[TMP31:%.*]] = zext i1 [[TMP30]] to i32
		; X32-NEXT: [[CMP:%.*]] = icmp eq i32 [[TMP31]], 0
; X32-NEXT: [[CONV:%.*]] = zext i1 [[CMP]] to i32		; X32-NEXT: [[CONV:%.*]] = zext i1 [[CMP]] to i32
; X32-NEXT: ret i32 [[CONV]]		; X32-NEXT: ret i32 [[CONV]]
;		;
; X64_1LD-LABEL: @cmp_eq13(		; X64_1LD-LABEL: @cmp_eq13(
; X64_1LD-NEXT: br label [[LOADBB:%.*]]		; X64_1LD-NEXT: br label [[LOADBB:%.*]]
; X64_1LD: res_block:		; X64_1LD: res_block:
; X64_1LD-NEXT: br label [[ENDBLOCK:%.*]]		; X64_1LD-NEXT: br label [[ENDBLOCK:%.*]]
; X64_1LD: loadbb:		; X64_1LD: loadbb:
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	;
%call = tail call i32 @memcmp(i8* %x, i8* %y, i64 13)		%call = tail call i32 @memcmp(i8* %x, i8* %y, i64 13)
%cmp = icmp eq i32 %call, 0		%cmp = icmp eq i32 %call, 0
%conv = zext i1 %cmp to i32		%conv = zext i1 %cmp to i32
ret i32 %conv		ret i32 %conv
}		}

define i32 @cmp_eq14(i8* nocapture readonly %x, i8* nocapture readonly %y) {		define i32 @cmp_eq14(i8* nocapture readonly %x, i8* nocapture readonly %y) {
; X32-LABEL: @cmp_eq14(		; X32-LABEL: @cmp_eq14(
; X32-NEXT: [[CALL:%.]] = tail call i32 @memcmp(i8 [[X:%.]], i8 [[Y:%.*]], i64 14)		; X32-NEXT: [[TMP1:%.]] = bitcast i8 [[X:%.]] to i32
; X32-NEXT: [[CMP:%.*]] = icmp eq i32 [[CALL]], 0		; X32-NEXT: [[TMP2:%.]] = bitcast i8 [[Y:%.]] to i32
		; X32-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1]]
		; X32-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP2]]
		; X32-NEXT: [[TMP5:%.*]] = xor i32 [[TMP3]], [[TMP4]]
		; X32-NEXT: [[TMP6:%.]] = getelementptr i8, i8 [[X]], i8 4
		; X32-NEXT: [[TMP7:%.]] = bitcast i8 [[TMP6]] to i32*
		; X32-NEXT: [[TMP8:%.]] = getelementptr i8, i8 [[Y]], i8 4
		; X32-NEXT: [[TMP9:%.]] = bitcast i8 [[TMP8]] to i32*
		; X32-NEXT: [[TMP10:%.]] = load i32, i32 [[TMP7]]
		; X32-NEXT: [[TMP11:%.]] = load i32, i32 [[TMP9]]
		; X32-NEXT: [[TMP12:%.*]] = xor i32 [[TMP10]], [[TMP11]]
		; X32-NEXT: [[TMP13:%.]] = getelementptr i8, i8 [[X]], i8 8
		; X32-NEXT: [[TMP14:%.]] = bitcast i8 [[TMP13]] to i32*
		; X32-NEXT: [[TMP15:%.]] = getelementptr i8, i8 [[Y]], i8 8
		; X32-NEXT: [[TMP16:%.]] = bitcast i8 [[TMP15]] to i32*
		; X32-NEXT: [[TMP17:%.]] = load i32, i32 [[TMP14]]
		; X32-NEXT: [[TMP18:%.]] = load i32, i32 [[TMP16]]
		; X32-NEXT: [[TMP19:%.*]] = xor i32 [[TMP17]], [[TMP18]]
		; X32-NEXT: [[TMP20:%.]] = getelementptr i8, i8 [[X]], i8 12
		; X32-NEXT: [[TMP21:%.]] = bitcast i8 [[TMP20]] to i16*
		; X32-NEXT: [[TMP22:%.]] = getelementptr i8, i8 [[Y]], i8 12
		; X32-NEXT: [[TMP23:%.]] = bitcast i8 [[TMP22]] to i16*
		; X32-NEXT: [[TMP24:%.]] = load i16, i16 [[TMP21]]
		; X32-NEXT: [[TMP25:%.]] = load i16, i16 [[TMP23]]
		; X32-NEXT: [[TMP26:%.*]] = zext i16 [[TMP24]] to i32
		; X32-NEXT: [[TMP27:%.*]] = zext i16 [[TMP25]] to i32
		; X32-NEXT: [[TMP28:%.*]] = xor i32 [[TMP26]], [[TMP27]]
		; X32-NEXT: [[TMP29:%.*]] = or i32 [[TMP5]], [[TMP12]]
		; X32-NEXT: [[TMP30:%.*]] = or i32 [[TMP19]], [[TMP28]]
		; X32-NEXT: [[TMP31:%.*]] = or i32 [[TMP29]], [[TMP30]]
		; X32-NEXT: [[TMP32:%.*]] = icmp ne i32 [[TMP31]], 0
		; X32-NEXT: [[TMP33:%.*]] = zext i1 [[TMP32]] to i32
		; X32-NEXT: [[CMP:%.*]] = icmp eq i32 [[TMP33]], 0
; X32-NEXT: [[CONV:%.*]] = zext i1 [[CMP]] to i32		; X32-NEXT: [[CONV:%.*]] = zext i1 [[CMP]] to i32
; X32-NEXT: ret i32 [[CONV]]		; X32-NEXT: ret i32 [[CONV]]
;		;
; X64_1LD-LABEL: @cmp_eq14(		; X64_1LD-LABEL: @cmp_eq14(
; X64_1LD-NEXT: br label [[LOADBB:%.*]]		; X64_1LD-NEXT: br label [[LOADBB:%.*]]
; X64_1LD: res_block:		; X64_1LD: res_block:
; X64_1LD-NEXT: br label [[ENDBLOCK:%.*]]		; X64_1LD-NEXT: br label [[ENDBLOCK:%.*]]
; X64_1LD: loadbb:		; X64_1LD: loadbb:
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	;
%call = tail call i32 @memcmp(i8* %x, i8* %y, i64 14)		%call = tail call i32 @memcmp(i8* %x, i8* %y, i64 14)
%cmp = icmp eq i32 %call, 0		%cmp = icmp eq i32 %call, 0
%conv = zext i1 %cmp to i32		%conv = zext i1 %cmp to i32
ret i32 %conv		ret i32 %conv
}		}

define i32 @cmp_eq15(i8* nocapture readonly %x, i8* nocapture readonly %y) {		define i32 @cmp_eq15(i8* nocapture readonly %x, i8* nocapture readonly %y) {
; X32-LABEL: @cmp_eq15(		; X32-LABEL: @cmp_eq15(
; X32-NEXT: [[CALL:%.]] = tail call i32 @memcmp(i8 [[X:%.]], i8 [[Y:%.*]], i64 15)		; X32-NEXT: [[TMP1:%.]] = bitcast i8 [[X:%.]] to i32
; X32-NEXT: [[CMP:%.*]] = icmp eq i32 [[CALL]], 0		; X32-NEXT: [[TMP2:%.]] = bitcast i8 [[Y:%.]] to i32
		; X32-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1]]
		; X32-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP2]]
		; X32-NEXT: [[TMP5:%.*]] = xor i32 [[TMP3]], [[TMP4]]
		; X32-NEXT: [[TMP6:%.]] = getelementptr i8, i8 [[X]], i8 4
		; X32-NEXT: [[TMP7:%.]] = bitcast i8 [[TMP6]] to i32*
		; X32-NEXT: [[TMP8:%.]] = getelementptr i8, i8 [[Y]], i8 4
		; X32-NEXT: [[TMP9:%.]] = bitcast i8 [[TMP8]] to i32*
		; X32-NEXT: [[TMP10:%.]] = load i32, i32 [[TMP7]]
		; X32-NEXT: [[TMP11:%.]] = load i32, i32 [[TMP9]]
		; X32-NEXT: [[TMP12:%.*]] = xor i32 [[TMP10]], [[TMP11]]
		; X32-NEXT: [[TMP13:%.]] = getelementptr i8, i8 [[X]], i8 8
		; X32-NEXT: [[TMP14:%.]] = bitcast i8 [[TMP13]] to i32*
		; X32-NEXT: [[TMP15:%.]] = getelementptr i8, i8 [[Y]], i8 8
		; X32-NEXT: [[TMP16:%.]] = bitcast i8 [[TMP15]] to i32*
		; X32-NEXT: [[TMP17:%.]] = load i32, i32 [[TMP14]]
		; X32-NEXT: [[TMP18:%.]] = load i32, i32 [[TMP16]]
		; X32-NEXT: [[TMP19:%.*]] = xor i32 [[TMP17]], [[TMP18]]
		; X32-NEXT: [[TMP20:%.]] = getelementptr i8, i8 [[X]], i8 11
		; X32-NEXT: [[TMP21:%.]] = bitcast i8 [[TMP20]] to i32*
		; X32-NEXT: [[TMP22:%.]] = getelementptr i8, i8 [[Y]], i8 11
		; X32-NEXT: [[TMP23:%.]] = bitcast i8 [[TMP22]] to i32*
		; X32-NEXT: [[TMP24:%.]] = load i32, i32 [[TMP21]]
		; X32-NEXT: [[TMP25:%.]] = load i32, i32 [[TMP23]]
		; X32-NEXT: [[TMP26:%.*]] = xor i32 [[TMP24]], [[TMP25]]
		; X32-NEXT: [[TMP27:%.*]] = or i32 [[TMP5]], [[TMP12]]
		; X32-NEXT: [[TMP28:%.*]] = or i32 [[TMP19]], [[TMP26]]
		; X32-NEXT: [[TMP29:%.*]] = or i32 [[TMP27]], [[TMP28]]
		; X32-NEXT: [[TMP30:%.*]] = icmp ne i32 [[TMP29]], 0
		; X32-NEXT: [[TMP31:%.*]] = zext i1 [[TMP30]] to i32
		; X32-NEXT: [[CMP:%.*]] = icmp eq i32 [[TMP31]], 0
; X32-NEXT: [[CONV:%.*]] = zext i1 [[CMP]] to i32		; X32-NEXT: [[CONV:%.*]] = zext i1 [[CMP]] to i32
; X32-NEXT: ret i32 [[CONV]]		; X32-NEXT: ret i32 [[CONV]]
;		;
; X64_1LD-LABEL: @cmp_eq15(		; X64_1LD-LABEL: @cmp_eq15(
; X64_1LD-NEXT: br label [[LOADBB:%.*]]		; X64_1LD-NEXT: br label [[LOADBB:%.*]]
; X64_1LD: res_block:		; X64_1LD: res_block:
; X64_1LD-NEXT: br label [[ENDBLOCK:%.*]]		; X64_1LD-NEXT: br label [[ENDBLOCK:%.*]]
; X64_1LD: loadbb:		; X64_1LD: loadbb:
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	;
%call = tail call i32 @memcmp(i8* %x, i8* %y, i64 15)		%call = tail call i32 @memcmp(i8* %x, i8* %y, i64 15)
%cmp = icmp eq i32 %call, 0		%cmp = icmp eq i32 %call, 0
%conv = zext i1 %cmp to i32		%conv = zext i1 %cmp to i32
ret i32 %conv		ret i32 %conv
}		}

define i32 @cmp_eq16(i8* nocapture readonly %x, i8* nocapture readonly %y) {		define i32 @cmp_eq16(i8* nocapture readonly %x, i8* nocapture readonly %y) {
; X32-LABEL: @cmp_eq16(		; X32-LABEL: @cmp_eq16(
; X32-NEXT: [[CALL:%.]] = tail call i32 @memcmp(i8 [[X:%.]], i8 [[Y:%.*]], i64 16)		; X32-NEXT: [[TMP1:%.]] = bitcast i8 [[X:%.]] to i32
; X32-NEXT: [[CMP:%.*]] = icmp eq i32 [[CALL]], 0		; X32-NEXT: [[TMP2:%.]] = bitcast i8 [[Y:%.]] to i32
		; X32-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1]]
		; X32-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP2]]
		; X32-NEXT: [[TMP5:%.*]] = xor i32 [[TMP3]], [[TMP4]]
		; X32-NEXT: [[TMP6:%.]] = getelementptr i8, i8 [[X]], i8 4
		; X32-NEXT: [[TMP7:%.]] = bitcast i8 [[TMP6]] to i32*
		; X32-NEXT: [[TMP8:%.]] = getelementptr i8, i8 [[Y]], i8 4
		; X32-NEXT: [[TMP9:%.]] = bitcast i8 [[TMP8]] to i32*
		; X32-NEXT: [[TMP10:%.]] = load i32, i32 [[TMP7]]
		; X32-NEXT: [[TMP11:%.]] = load i32, i32 [[TMP9]]
		; X32-NEXT: [[TMP12:%.*]] = xor i32 [[TMP10]], [[TMP11]]
		; X32-NEXT: [[TMP13:%.]] = getelementptr i8, i8 [[X]], i8 8
		; X32-NEXT: [[TMP14:%.]] = bitcast i8 [[TMP13]] to i32*
		; X32-NEXT: [[TMP15:%.]] = getelementptr i8, i8 [[Y]], i8 8
		; X32-NEXT: [[TMP16:%.]] = bitcast i8 [[TMP15]] to i32*
		; X32-NEXT: [[TMP17:%.]] = load i32, i32 [[TMP14]]
		; X32-NEXT: [[TMP18:%.]] = load i32, i32 [[TMP16]]
		; X32-NEXT: [[TMP19:%.*]] = xor i32 [[TMP17]], [[TMP18]]
		; X32-NEXT: [[TMP20:%.]] = getelementptr i8, i8 [[X]], i8 12
		; X32-NEXT: [[TMP21:%.]] = bitcast i8 [[TMP20]] to i32*
		; X32-NEXT: [[TMP22:%.]] = getelementptr i8, i8 [[Y]], i8 12
		; X32-NEXT: [[TMP23:%.]] = bitcast i8 [[TMP22]] to i32*
		; X32-NEXT: [[TMP24:%.]] = load i32, i32 [[TMP21]]
		; X32-NEXT: [[TMP25:%.]] = load i32, i32 [[TMP23]]
		; X32-NEXT: [[TMP26:%.*]] = xor i32 [[TMP24]], [[TMP25]]
		; X32-NEXT: [[TMP27:%.*]] = or i32 [[TMP5]], [[TMP12]]
		; X32-NEXT: [[TMP28:%.*]] = or i32 [[TMP19]], [[TMP26]]
		; X32-NEXT: [[TMP29:%.*]] = or i32 [[TMP27]], [[TMP28]]
		; X32-NEXT: [[TMP30:%.*]] = icmp ne i32 [[TMP29]], 0
		; X32-NEXT: [[TMP31:%.*]] = zext i1 [[TMP30]] to i32
		; X32-NEXT: [[CMP:%.*]] = icmp eq i32 [[TMP31]], 0
; X32-NEXT: [[CONV:%.*]] = zext i1 [[CMP]] to i32		; X32-NEXT: [[CONV:%.*]] = zext i1 [[CMP]] to i32
; X32-NEXT: ret i32 [[CONV]]		; X32-NEXT: ret i32 [[CONV]]
;		;
; X64-LABEL: @cmp_eq16(		; X64-LABEL: @cmp_eq16(
; X64-NEXT: [[TMP1:%.]] = bitcast i8 [[X:%.]] to i128		; X64-NEXT: [[TMP1:%.]] = bitcast i8 [[X:%.]] to i128
; X64-NEXT: [[TMP2:%.]] = bitcast i8 [[Y:%.]] to i128		; X64-NEXT: [[TMP2:%.]] = bitcast i8 [[Y:%.]] to i128
; X64-NEXT: [[TMP3:%.]] = load i128, i128 [[TMP1]]		; X64-NEXT: [[TMP3:%.]] = load i128, i128 [[TMP1]]
; X64-NEXT: [[TMP4:%.]] = load i128, i128 [[TMP2]]		; X64-NEXT: [[TMP4:%.]] = load i128, i128 [[TMP2]]
Show All 12 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Allow up to 4 loads per inline memcmp()AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 225232

include/llvm/CodeGen/TargetLowering.h

lib/CodeGen/TargetLoweringBase.cpp

lib/Target/AArch64/AArch64ISelLowering.cpp

lib/Target/AArch64/AArch64TargetTransformInfo.cpp

lib/Target/PowerPC/PPCISelLowering.cpp

lib/Target/PowerPC/PPCTargetTransformInfo.cpp

lib/Target/X86/X86ISelLowering.cpp

lib/Target/X86/X86TargetTransformInfo.cpp

test/CodeGen/X86/memcmp.ll

test/Transforms/ExpandMemCmp/X86/memcmp.ll

[X86] Allow up to 4 loads per inline memcmp()
AbandonedPublic