This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Analysis/
-
llvm/
-
Analysis/
-
TargetTransformInfo.h
-
lib/
-
CodeGen/
-
ExpandMemCmp.cpp
-
Target/AArch64/
-
AArch64/
-
AArch64TargetTransformInfo.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
bcmp-inline-small.ll
-
bcmp.ll

Differential D136672

[ExpandMemCmp][AArch64] Add a new option PreferCmpToExpand in inMemCmpExpansionOptions and enable on AArch64
AbandonedPublic

Authored by bcl5980 on Oct 25 2022, 2:53 AM.

Download Raw Diff

Details

Reviewers

nikic
RKSimon
efriedma
dmgreen
spatel
Allen

Summary

Current code use xor+or pattern to expand memcmp, but it is not efficient on AArch64.
This patch adds a new option PreferCmpToExpand, use cmp+or pattern to expand.

Fix: #56543

Diff Detail

Event Timeline

bcl5980 created this revision.Oct 25 2022, 2:53 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 25 2022, 2:53 AM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald Transcript

bcl5980 requested review of this revision.Oct 25 2022, 2:53 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 25 2022, 2:53 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

bcl5980 retitled this revision from [ExpandMemCmp][AArch64] Add a new option PreferCmpToExpand in inMemCmpExpansionOptions and enable or AArch64 to [ExpandMemCmp][AArch64] Add a new option PreferCmpToExpand in inMemCmpExpansionOptions and enable on AArch64.Oct 25 2022, 2:53 AM

Is this the same as https://reviews.llvm.org/D136244? Looks like there are two tickets for the same issue.

They both solve it differently. D136244 looks like it is more general - handling more cases then just bcmp. But this might optimize larger bcmp's? I think I would expect them to become a series of cmp;ccmp;ccmp;ccmp when optimized properly though.

Harbormaster completed remote builds in B194133: Diff 470425.Oct 25 2022, 5:12 AM

rebase code

In D136672#3882041, @dmgreen wrote:

Is this the same as https://reviews.llvm.org/D136244? Looks like there are two tickets for the same issue.

They both solve it differently. D136244 looks like it is more general - handling more cases then just bcmp. But this might optimize larger bcmp's? I think I would expect them to become a series of cmp;ccmp;ccmp;ccmp when optimized properly though.

I have rebase to the code already include D136244. These two tickets looks the same but it looks D136244 can't fix the original case bcmp 3bytes.
And not only AArch64, maybe this change also help some potential platform.

I have rebase to the code already include D136244. These two tickets looks the same but it looks D136244 can't fix the original case bcmp 3bytes.
And not only AArch64, maybe this change also help some potential platform.

For the case bcmp3, the combine pattern is mismatch as we restrict to ISD::XOR, so the ISD::AND may be enhanced too ?

(gdb) p LHS.getOperand(0)->getOpcode() == ISD::XOR && LHS.getOperand(1)->getOpcode() == ISD::XOR
$16 = false
(gdb) p LHS.dump()
t20: i32 = or t46, t50
$17 = void
(gdb) p LHS.getOperand(0).dump()
t46: i32 = and t44, Constant:i32<65535>
$18 = void
(gdb) p LHS.getOperand(1).dump()
t50: i32 = and t48, Constant:i32<255>

In D136672#3890483, @Allen wrote:
I have rebase to the code already include D136244. These two tickets looks the same but it looks D136244 can't fix the original case bcmp 3bytes.
And not only AArch64, maybe this change also help some potential platform.

For the case bcmp3, the combine pattern is mismatch as we restrict to ISD::XOR, so the ISD::AND may be enhanced too ?
(gdb) p LHS.getOperand(0)->getOpcode() == ISD::XOR && LHS.getOperand(1)->getOpcode() == ISD::XOR
$16 = false
(gdb) p LHS.dump()
t20: i32 = or t46, t50
$17 = void
(gdb) p LHS.getOperand(0).dump()
t46: i32 = and t44, Constant:i32<65535>
$18 = void
(gdb) p LHS.getOperand(1).dump()
t50: i32 = and t48, Constant:i32<255>

Of course, we can do it on AArch64. And I guess the code pattern match will be more complicated and long, you need to detect more 3 patterns:

or (and (xor a, b), C1), (xor c, d)
or (xor a, b), (and (xor c, d), C2)
or (and (xor a, b), C1), (and (xor c, d), C2)

And you should only consider eq 0 when and invovle. And C1, C2 value also involve some different result.
I guess there will be some tasks if you want to start.

For now, what I want to discuss is: Do we need this patch?

Harbormaster completed remote builds in B194796: Diff 471345.Oct 27 2022, 7:40 PM

From the looks of the tests the AND should be optimized out. My guess would be that we don't revisit the setcc after the and have been simplified though. It might work better if it was run during lowering.

For now, what I want to discuss is: Do we need this patch?

I think, at the moment, we are heading towards ISel folds to turn the the eor patterns into ccmp. If we don't think that method will work (or is too complicated), then this patch would be useful. But it should be more general to fix it in ISel, as it should capture more cases than just bcmp.

bcl5980 abandoned this revision.Oct 31 2022, 2:39 AM

Allen mentioned this in D137721: [AArch64] Optimize more memcmp when the result is tested for [in]equality with 0.Nov 9 2022, 8:31 AM

I happen to have been looking into https://github.com/llvm/llvm-project/issues/56543 (which I believe is what this is trying to solve), and came across a possible fix from a slightly different angle.

It seems LLVM already knows how to optimize to the desired (handwritten code) for that issue, but the patterns (some zero extend folds) needed to do so are being prevented by this XOR simplification (in DAGCombiner.cpp visitXOR) :

// Simplify: xor (op x...), (op y...)  -> (op (xor x, y))
if (N0Opcode == N1.getOpcode())
  if (SDValue V = hoistLogicOpWithSameOpcodeHands(N))
    return V;

If you comment that fold LLVM will optimize the memcmp as desired.

So I think it's just a matter of adding/or adjusting some of these zero extend optimizations to handle xor being simplified like this.

Allen mentioned this in rG894f9e54f595: [AArch64] Optimize more memcmp when the result is tested for [in]equality with 0.Nov 12 2022, 7:06 PM

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

TargetTransformInfo.h

3 lines

lib/

CodeGen/

ExpandMemCmp.cpp

28 lines

Target/

AArch64/

AArch64TargetTransformInfo.cpp

1 line

test/

CodeGen/

AArch64/

bcmp-inline-small.ll

35 lines

bcmp.ll

284 lines

Diff 470425

llvm/include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 795 Lines • ▼ Show 20 Lines	struct MemCmpExpansionOptions {
// b2 = load1byte &b[2]		// b2 = load1byte &b[2]
// r = cmp eq (a0 ^ b0 \| a2 ^ b2), 0		// r = cmp eq (a0 ^ b0 \| a2 ^ b2), 0
unsigned NumLoadsPerBlock = 1;		unsigned NumLoadsPerBlock = 1;

// Set to true to allow overlapping loads. For example, 7-byte compares can		// Set to true to allow overlapping loads. For example, 7-byte compares can
// be done with two 4-byte compares instead of 4+2+1-byte compares. This		// be done with two 4-byte compares instead of 4+2+1-byte compares. This
// requires all loads in LoadSizes to be doable in an unaligned way.		// requires all loads in LoadSizes to be doable in an unaligned way.
bool AllowOverlappingLoads = false;		bool AllowOverlappingLoads = false;

		// Set to true will expand by cmp+or instead of xor+or.
		bool PreferCmpToExpand = false;
};		};
MemCmpExpansionOptions enableMemCmpExpansion(bool OptSize,		MemCmpExpansionOptions enableMemCmpExpansion(bool OptSize,
bool IsZeroCmp) const;		bool IsZeroCmp) const;

/// Enable matching of interleaved access groups.		/// Enable matching of interleaved access groups.
bool enableInterleavedAccessVectorization() const;		bool enableInterleavedAccessVectorization() const;

/// Enable matching of interleaved access groups that contain predicated		/// Enable matching of interleaved access groups that contain predicated
▲ Show 20 Lines • Show All 1,848 Lines • Show Last 20 Lines

llvm/lib/CodeGen/ExpandMemCmp.cpp

Show First 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	class MemCmpExpansion {
const uint64_t Size;		const uint64_t Size;
unsigned MaxLoadSize = 0;		unsigned MaxLoadSize = 0;
uint64_t NumLoadsNonOneByte = 0;		uint64_t NumLoadsNonOneByte = 0;
const uint64_t NumLoadsPerBlockForZeroCmp;		const uint64_t NumLoadsPerBlockForZeroCmp;
std::vector<BasicBlock *> LoadCmpBlocks;		std::vector<BasicBlock *> LoadCmpBlocks;
BasicBlock *EndBlock;		BasicBlock *EndBlock;
PHINode *PhiRes;		PHINode *PhiRes;
const bool IsUsedForZeroCmp;		const bool IsUsedForZeroCmp;
		const bool IsPreferCmpToExpand;
const DataLayout &DL;		const DataLayout &DL;
DomTreeUpdater *DTU;		DomTreeUpdater *DTU;
IRBuilder<> Builder;		IRBuilder<> Builder;
// Represents the decomposition in blocks of the expansion. For example,		// Represents the decomposition in blocks of the expansion. For example,
// comparing 33 bytes on X86+sse can be done with 2x16-byte loads and		// comparing 33 bytes on X86+sse can be done with 2x16-byte loads and
// 1x1-byte load, which would be represented as [{16, 0}, {16, 16}, {1, 32}.		// 1x1-byte load, which would be represented as [{16, 0}, {16, 16}, {1, 32}.
struct LoadEntry {		struct LoadEntry {
LoadEntry(unsigned LoadSize, uint64_t Offset)		LoadEntry(unsigned LoadSize, uint64_t Offset)
▲ Show 20 Lines • Show All 127 Lines • ▼ Show 20 Lines
// 3. ResultBlock, block to branch to for early exit when a		// 3. ResultBlock, block to branch to for early exit when a
// LoadCmpBlock finds a difference.		// LoadCmpBlock finds a difference.
MemCmpExpansion::MemCmpExpansion(		MemCmpExpansion::MemCmpExpansion(
CallInst *const CI, uint64_t Size,		CallInst *const CI, uint64_t Size,
const TargetTransformInfo::MemCmpExpansionOptions &Options,		const TargetTransformInfo::MemCmpExpansionOptions &Options,
const bool IsUsedForZeroCmp, const DataLayout &TheDataLayout,		const bool IsUsedForZeroCmp, const DataLayout &TheDataLayout,
DomTreeUpdater *DTU)		DomTreeUpdater *DTU)
: CI(CI), Size(Size), NumLoadsPerBlockForZeroCmp(Options.NumLoadsPerBlock),		: CI(CI), Size(Size), NumLoadsPerBlockForZeroCmp(Options.NumLoadsPerBlock),
IsUsedForZeroCmp(IsUsedForZeroCmp), DL(TheDataLayout), DTU(DTU),		IsUsedForZeroCmp(IsUsedForZeroCmp),
Builder(CI) {		IsPreferCmpToExpand(Options.PreferCmpToExpand), DL(TheDataLayout),
		DTU(DTU), Builder(CI) {
assert(Size > 0 && "zero blocks");		assert(Size > 0 && "zero blocks");
// Scale the max size down if the target can load more bytes than we need.		// Scale the max size down if the target can load more bytes than we need.
llvm::ArrayRef<unsigned> LoadSizes(Options.LoadSizes);		llvm::ArrayRef<unsigned> LoadSizes(Options.LoadSizes);
while (!LoadSizes.empty() && LoadSizes.front() > Size) {		while (!LoadSizes.empty() && LoadSizes.front() > Size) {
LoadSizes = LoadSizes.drop_front();		LoadSizes = LoadSizes.drop_front();
}		}
assert(!LoadSizes.empty() && "cannot load Size bytes");		assert(!LoadSizes.empty() && "cannot load Size bytes");
MaxLoadSize = LoadSizes.front();		MaxLoadSize = LoadSizes.front();
▲ Show 20 Lines • Show All 130 Lines • ▼ Show 20 Lines

/// Generate an equality comparison for one or more pairs of loaded values.		/// Generate an equality comparison for one or more pairs of loaded values.
/// This is used in the case where the memcmp() call is compared equal or not		/// This is used in the case where the memcmp() call is compared equal or not
/// equal to zero.		/// equal to zero.
Value *MemCmpExpansion::getCompareLoadPairs(unsigned BlockIndex,		Value *MemCmpExpansion::getCompareLoadPairs(unsigned BlockIndex,
unsigned &LoadIndex) {		unsigned &LoadIndex) {
assert(LoadIndex < getNumLoads() &&		assert(LoadIndex < getNumLoads() &&
"getCompareLoadPairs() called with no remaining loads");		"getCompareLoadPairs() called with no remaining loads");
std::vector<Value *> XorList, OrList;		std::vector<Value *> CompareList, OrList;
Value *Diff = nullptr;		Value *Diff = nullptr;

const unsigned NumLoads =		const unsigned NumLoads =
std::min(getNumLoads() - LoadIndex, NumLoadsPerBlockForZeroCmp);		std::min(getNumLoads() - LoadIndex, NumLoadsPerBlockForZeroCmp);

// For a single-block expansion, start inserting before the memcmp call.		// For a single-block expansion, start inserting before the memcmp call.
if (LoadCmpBlocks.empty())		if (LoadCmpBlocks.empty())
Builder.SetInsertPoint(CI);		Builder.SetInsertPoint(CI);
Show All 10 Lines	Value *MemCmpExpansion::getCompareLoadPairs(unsigned BlockIndex,
for (unsigned i = 0; i < NumLoads; ++i, ++LoadIndex) {		for (unsigned i = 0; i < NumLoads; ++i, ++LoadIndex) {
const LoadEntry &CurLoadEntry = LoadSequence[LoadIndex];		const LoadEntry &CurLoadEntry = LoadSequence[LoadIndex];
const LoadPair Loads = getLoadPair(		const LoadPair Loads = getLoadPair(
IntegerType::get(CI->getContext(), CurLoadEntry.LoadSize * 8),		IntegerType::get(CI->getContext(), CurLoadEntry.LoadSize * 8),
/NeedsBSwap=/false, MaxLoadType, CurLoadEntry.Offset);		/NeedsBSwap=/false, MaxLoadType, CurLoadEntry.Offset);

if (NumLoads != 1) {		if (NumLoads != 1) {
// If we have multiple loads per block, we need to generate a composite		// If we have multiple loads per block, we need to generate a composite
// comparison using xor+or.		// comparison.
		if (IsPreferCmpToExpand) {
		Diff = Builder.CreateICmpNE(Loads.Lhs, Loads.Rhs);
		} else {
Diff = Builder.CreateXor(Loads.Lhs, Loads.Rhs);		Diff = Builder.CreateXor(Loads.Lhs, Loads.Rhs);
Diff = Builder.CreateZExt(Diff, MaxLoadType);		Diff = Builder.CreateZExt(Diff, MaxLoadType);
XorList.push_back(Diff);		}
		CompareList.push_back(Diff);
} else {		} else {
// If there's only one load per block, we just compare the loaded values.		// If there's only one load per block, we just compare the loaded values.
Cmp = Builder.CreateICmpNE(Loads.Lhs, Loads.Rhs);		Cmp = Builder.CreateICmpNE(Loads.Lhs, Loads.Rhs);
}		}
}		}

auto pairWiseOr = [&](std::vector<Value > &InList) -> std::vector<Value > {		auto pairWiseOr = [&](std::vector<Value > &InList) -> std::vector<Value > {
std::vector<Value *> OutList;		std::vector<Value *> OutList;
for (unsigned i = 0; i < InList.size() - 1; i = i + 2) {		for (unsigned i = 0; i < InList.size() - 1; i = i + 2) {
Value *Or = Builder.CreateOr(InList[i], InList[i + 1]);		Value *Or = Builder.CreateOr(InList[i], InList[i + 1]);
OutList.push_back(Or);		OutList.push_back(Or);
}		}
if (InList.size() % 2 != 0)		if (InList.size() % 2 != 0)
OutList.push_back(InList.back());		OutList.push_back(InList.back());
return OutList;		return OutList;
};		};

if (!Cmp) {		if (!Cmp) {
// Pairwise OR the XOR results.		// Pairwise OR the XOR results.
OrList = pairWiseOr(XorList);		OrList = pairWiseOr(CompareList);

// Pairwise OR the OR results until one result left.		// Pairwise OR the OR results until one result left.
while (OrList.size() != 1) {		while (OrList.size() != 1) {
OrList = pairWiseOr(OrList);		OrList = pairWiseOr(OrList);
}		}

assert(Diff && "Failed to find comparison diff");		assert(Diff && "Failed to find comparison diff");
Cmp = Builder.CreateICmpNE(OrList[0], ConstantInt::get(Diff->getType(), 0));		if (IsPreferCmpToExpand)
		Cmp = OrList[0];
		else
		Cmp =
		Builder.CreateICmpNE(OrList[0], ConstantInt::get(Diff->getType(), 0));
}		}

return Cmp;		return Cmp;
}		}

void MemCmpExpansion::emitLoadCompareBlockMultipleLoads(unsigned BlockIndex,		void MemCmpExpansion::emitLoadCompareBlockMultipleLoads(unsigned BlockIndex,
unsigned &LoadIndex) {		unsigned &LoadIndex) {
Value *Cmp = getCompareLoadPairs(BlockIndex, LoadIndex);		Value *Cmp = getCompareLoadPairs(BlockIndex, LoadIndex);
▲ Show 20 Lines • Show All 482 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

	Show First 20 Lines • Show All 2,292 Lines • ▼ Show 20 Lines
	AArch64TTIImpl::enableMemCmpExpansion(bool OptSize, bool IsZeroCmp) const {			AArch64TTIImpl::enableMemCmpExpansion(bool OptSize, bool IsZeroCmp) const {
	TTI::MemCmpExpansionOptions Options;			TTI::MemCmpExpansionOptions Options;
	if (ST->requiresStrictAlign()) {			if (ST->requiresStrictAlign()) {
	// TODO: Add cost modeling for strict align. Misaligned loads expand to			// TODO: Add cost modeling for strict align. Misaligned loads expand to
	// a bunch of instructions when strict align is enabled.			// a bunch of instructions when strict align is enabled.
	return Options;			return Options;
	}			}
	Options.AllowOverlappingLoads = true;			Options.AllowOverlappingLoads = true;
				Options.PreferCmpToExpand = true;
	Options.MaxNumLoads = TLI->getMaxExpandSizeMemcmp(OptSize);			Options.MaxNumLoads = TLI->getMaxExpandSizeMemcmp(OptSize);
	Options.NumLoadsPerBlock = Options.MaxNumLoads;			Options.NumLoadsPerBlock = Options.MaxNumLoads;
	// TODO: Though vector loads usually perform well on AArch64, in some targets			// TODO: Though vector loads usually perform well on AArch64, in some targets
	// they may wake up the FP unit, which raises the power consumption. Perhaps			// they may wake up the FP unit, which raises the power consumption. Perhaps
	// they could be used with no holds barred (-O3).			// they could be used with no holds barred (-O3).
	Options.LoadSizes = {8, 4, 2, 1};			Options.LoadSizes = {8, 4, 2, 1};
	return Options;			return Options;
	}			}
	▲ Show 20 Lines • Show All 900 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/bcmp-inline-small.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -O2 < %s -mtriple=aarch64-linux-gnu \| FileCheck %s --check-prefix=CHECKN			; RUN: llc -O2 < %s -mtriple=aarch64-linux-gnu \| FileCheck %s --check-prefix=CHECKN
	; RUN: llc -O2 < %s -mtriple=aarch64-linux-gnu -mattr=strict-align \| FileCheck %s --check-prefix=CHECKS			; RUN: llc -O2 < %s -mtriple=aarch64-linux-gnu -mattr=strict-align \| FileCheck %s --check-prefix=CHECKS

	declare i32 @bcmp(i8, i8, i64) nounwind readonly			declare i32 @bcmp(i8, i8, i64) nounwind readonly
	declare i32 @memcmp(i8, i8, i64) nounwind readonly			declare i32 @memcmp(i8, i8, i64) nounwind readonly

	define i1 @test_b2(i8* %s1, i8* %s2) {			define i1 @test_b2(i8* %s1, i8* %s2) {
	; CHECKN-LABEL: test_b2:			; CHECKN-LABEL: test_b2:
	; CHECKN: // %bb.0: // %entry			; CHECKN: // %bb.0: // %entry
	; CHECKN-NEXT: ldr x8, [x0]			; CHECKN-NEXT: ldr x8, [x0]
	; CHECKN-NEXT: ldr x9, [x1]			; CHECKN-NEXT: ldr x9, [x1]
	; CHECKN-NEXT: ldur x10, [x0, #7]			; CHECKN-NEXT: ldur x10, [x0, #7]
	; CHECKN-NEXT: ldur x11, [x1, #7]			; CHECKN-NEXT: ldur x11, [x1, #7]
	; CHECKN-NEXT: eor x8, x8, x9			; CHECKN-NEXT: cmp x8, x9
	; CHECKN-NEXT: eor x9, x10, x11			; CHECKN-NEXT: ccmp x10, x11, #0, eq
	; CHECKN-NEXT: orr x8, x8, x9
	; CHECKN-NEXT: cmp x8, #0
	; CHECKN-NEXT: cset w0, eq			; CHECKN-NEXT: cset w0, eq
	; CHECKN-NEXT: ret			; CHECKN-NEXT: ret
	;			;
	; CHECKS-LABEL: test_b2:			; CHECKS-LABEL: test_b2:
	; CHECKS: // %bb.0: // %entry			; CHECKS: // %bb.0: // %entry
	; CHECKS-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill			; CHECKS-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill
	; CHECKS-NEXT: .cfi_def_cfa_offset 16			; CHECKS-NEXT: .cfi_def_cfa_offset 16
	; CHECKS-NEXT: .cfi_offset w30, -16			; CHECKS-NEXT: .cfi_offset w30, -16
	Show All 12 Lines
	; TODO: Four loads should be within the limit, but the heuristic isn't implemented.			; TODO: Four loads should be within the limit, but the heuristic isn't implemented.
	define i1 @test_b2_align8(i8* align 8 %s1, i8* align 8 %s2) {			define i1 @test_b2_align8(i8* align 8 %s1, i8* align 8 %s2) {
	; CHECKN-LABEL: test_b2_align8:			; CHECKN-LABEL: test_b2_align8:
	; CHECKN: // %bb.0: // %entry			; CHECKN: // %bb.0: // %entry
	; CHECKN-NEXT: ldr x8, [x0]			; CHECKN-NEXT: ldr x8, [x0]
	; CHECKN-NEXT: ldr x9, [x1]			; CHECKN-NEXT: ldr x9, [x1]
	; CHECKN-NEXT: ldur x10, [x0, #7]			; CHECKN-NEXT: ldur x10, [x0, #7]
	; CHECKN-NEXT: ldur x11, [x1, #7]			; CHECKN-NEXT: ldur x11, [x1, #7]
	; CHECKN-NEXT: eor x8, x8, x9			; CHECKN-NEXT: cmp x8, x9
	; CHECKN-NEXT: eor x9, x10, x11			; CHECKN-NEXT: ccmp x10, x11, #0, eq
	; CHECKN-NEXT: orr x8, x8, x9
	; CHECKN-NEXT: cmp x8, #0
	; CHECKN-NEXT: cset w0, eq			; CHECKN-NEXT: cset w0, eq
	; CHECKN-NEXT: ret			; CHECKN-NEXT: ret
	;			;
	; CHECKS-LABEL: test_b2_align8:			; CHECKS-LABEL: test_b2_align8:
	; CHECKS: // %bb.0: // %entry			; CHECKS: // %bb.0: // %entry
	; CHECKS-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill			; CHECKS-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill
	; CHECKS-NEXT: .cfi_def_cfa_offset 16			; CHECKS-NEXT: .cfi_def_cfa_offset 16
	; CHECKS-NEXT: .cfi_offset w30, -16			; CHECKS-NEXT: .cfi_offset w30, -16
	Show All 10 Lines
	}			}

	define i1 @test_bs(i8* %s1, i8* %s2) optsize {			define i1 @test_bs(i8* %s1, i8* %s2) optsize {
	; CHECKN-LABEL: test_bs:			; CHECKN-LABEL: test_bs:
	; CHECKN: // %bb.0: // %entry			; CHECKN: // %bb.0: // %entry
	; CHECKN-NEXT: ldp x8, x9, [x0]			; CHECKN-NEXT: ldp x8, x9, [x0]
	; CHECKN-NEXT: ldp x10, x11, [x1]			; CHECKN-NEXT: ldp x10, x11, [x1]
	; CHECKN-NEXT: ldr x12, [x0, #16]			; CHECKN-NEXT: ldr x12, [x0, #16]
	; CHECKN-NEXT: ldr x13, [x1, #16]			; CHECKN-NEXT: cmp x8, x10
	; CHECKN-NEXT: ldur x14, [x0, #23]			; CHECKN-NEXT: ldr x8, [x1, #16]
	; CHECKN-NEXT: eor x8, x8, x10			; CHECKN-NEXT: ccmp x9, x11, #0, eq
	; CHECKN-NEXT: ldur x15, [x1, #23]			; CHECKN-NEXT: ldur x9, [x0, #23]
	; CHECKN-NEXT: eor x9, x9, x11			; CHECKN-NEXT: ldur x10, [x1, #23]
	; CHECKN-NEXT: eor x10, x12, x13			; CHECKN-NEXT: cset w11, ne
	; CHECKN-NEXT: orr x8, x8, x9			; CHECKN-NEXT: cmp x12, x8
	; CHECKN-NEXT: eor x11, x14, x15			; CHECKN-NEXT: ccmp x9, x10, #0, eq
	; CHECKN-NEXT: orr x9, x10, x11			; CHECKN-NEXT: cset w8, ne
	; CHECKN-NEXT: orr x8, x8, x9			; CHECKN-NEXT: orr w8, w11, w8
	; CHECKN-NEXT: cmp x8, #0			; CHECKN-NEXT: eor w0, w8, #0x1
	; CHECKN-NEXT: cset w0, eq
	; CHECKN-NEXT: ret			; CHECKN-NEXT: ret
	;			;
	; CHECKS-LABEL: test_bs:			; CHECKS-LABEL: test_bs:
	; CHECKS: // %bb.0: // %entry			; CHECKS: // %bb.0: // %entry
	; CHECKS-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill			; CHECKS-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill
	; CHECKS-NEXT: .cfi_def_cfa_offset 16			; CHECKS-NEXT: .cfi_def_cfa_offset 16
	; CHECKS-NEXT: .cfi_offset w30, -16			; CHECKS-NEXT: .cfi_offset w30, -16
	; CHECKS-NEXT: mov w2, #31			; CHECKS-NEXT: mov w2, #31
	Show All 10 Lines

llvm/test/CodeGen/AArch64/bcmp.ll

	Show All 40 Lines

	define i1 @bcmp3(ptr %a, ptr %b) {			define i1 @bcmp3(ptr %a, ptr %b) {
	; CHECK-LABEL: bcmp3:			; CHECK-LABEL: bcmp3:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldrh w8, [x0]			; CHECK-NEXT: ldrh w8, [x0]
	; CHECK-NEXT: ldrh w9, [x1]			; CHECK-NEXT: ldrh w9, [x1]
	; CHECK-NEXT: ldrb w10, [x0, #2]			; CHECK-NEXT: ldrb w10, [x0, #2]
	; CHECK-NEXT: ldrb w11, [x1, #2]			; CHECK-NEXT: ldrb w11, [x1, #2]
	; CHECK-NEXT: eor w8, w8, w9			; CHECK-NEXT: cmp w8, w9
	; CHECK-NEXT: eor w9, w10, w11			; CHECK-NEXT: ccmp w10, w11, #0, eq
	; CHECK-NEXT: orr w8, w8, w9
	; CHECK-NEXT: cmp w8, #0
	; CHECK-NEXT: cset w0, eq			; CHECK-NEXT: cset w0, eq
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%cr = call i32 @bcmp(ptr %a, ptr %b, i64 3)			%cr = call i32 @bcmp(ptr %a, ptr %b, i64 3)
	%r = icmp eq i32 %cr, 0			%r = icmp eq i32 %cr, 0
	ret i1 %r			ret i1 %r
	}			}

	define i1 @bcmp4(ptr %a, ptr %b) {			define i1 @bcmp4(ptr %a, ptr %b) {
	Show All 11 Lines

	define i1 @bcmp5(ptr %a, ptr %b) {			define i1 @bcmp5(ptr %a, ptr %b) {
	; CHECK-LABEL: bcmp5:			; CHECK-LABEL: bcmp5:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldr w8, [x0]			; CHECK-NEXT: ldr w8, [x0]
	; CHECK-NEXT: ldr w9, [x1]			; CHECK-NEXT: ldr w9, [x1]
	; CHECK-NEXT: ldrb w10, [x0, #4]			; CHECK-NEXT: ldrb w10, [x0, #4]
	; CHECK-NEXT: ldrb w11, [x1, #4]			; CHECK-NEXT: ldrb w11, [x1, #4]
	; CHECK-NEXT: eor w8, w8, w9			; CHECK-NEXT: cmp w8, w9
	; CHECK-NEXT: eor w9, w10, w11			; CHECK-NEXT: ccmp w10, w11, #0, eq
	; CHECK-NEXT: orr w8, w8, w9
	; CHECK-NEXT: cmp w8, #0
	; CHECK-NEXT: cset w0, eq			; CHECK-NEXT: cset w0, eq
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%cr = call i32 @bcmp(ptr %a, ptr %b, i64 5)			%cr = call i32 @bcmp(ptr %a, ptr %b, i64 5)
	%r = icmp eq i32 %cr, 0			%r = icmp eq i32 %cr, 0
	ret i1 %r			ret i1 %r
	}			}

	define i1 @bcmp6(ptr %a, ptr %b) {			define i1 @bcmp6(ptr %a, ptr %b) {
	; CHECK-LABEL: bcmp6:			; CHECK-LABEL: bcmp6:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldr w8, [x0]			; CHECK-NEXT: ldr w8, [x0]
	; CHECK-NEXT: ldr w9, [x1]			; CHECK-NEXT: ldr w9, [x1]
	; CHECK-NEXT: ldrh w10, [x0, #4]			; CHECK-NEXT: ldrh w10, [x0, #4]
	; CHECK-NEXT: ldrh w11, [x1, #4]			; CHECK-NEXT: ldrh w11, [x1, #4]
	; CHECK-NEXT: eor w8, w8, w9			; CHECK-NEXT: cmp w8, w9
	; CHECK-NEXT: eor w9, w10, w11			; CHECK-NEXT: ccmp w10, w11, #0, eq
	; CHECK-NEXT: orr w8, w8, w9
	; CHECK-NEXT: cmp w8, #0
	; CHECK-NEXT: cset w0, eq			; CHECK-NEXT: cset w0, eq
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%cr = call i32 @bcmp(ptr %a, ptr %b, i64 6)			%cr = call i32 @bcmp(ptr %a, ptr %b, i64 6)
	%r = icmp eq i32 %cr, 0			%r = icmp eq i32 %cr, 0
	ret i1 %r			ret i1 %r
	}			}

	define i1 @bcmp7(ptr %a, ptr %b) {			define i1 @bcmp7(ptr %a, ptr %b) {
	; CHECK-LABEL: bcmp7:			; CHECK-LABEL: bcmp7:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldr w8, [x0]			; CHECK-NEXT: ldr w8, [x0]
	; CHECK-NEXT: ldr w9, [x1]			; CHECK-NEXT: ldr w9, [x1]
	; CHECK-NEXT: ldur w10, [x0, #3]			; CHECK-NEXT: ldur w10, [x0, #3]
	; CHECK-NEXT: ldur w11, [x1, #3]			; CHECK-NEXT: ldur w11, [x1, #3]
	; CHECK-NEXT: eor w8, w8, w9			; CHECK-NEXT: cmp w8, w9
	; CHECK-NEXT: eor w9, w10, w11			; CHECK-NEXT: ccmp w10, w11, #0, eq
	; CHECK-NEXT: orr w8, w8, w9
	; CHECK-NEXT: cmp w8, #0
	; CHECK-NEXT: cset w0, eq			; CHECK-NEXT: cset w0, eq
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%cr = call i32 @bcmp(ptr %a, ptr %b, i64 7)			%cr = call i32 @bcmp(ptr %a, ptr %b, i64 7)
	%r = icmp eq i32 %cr, 0			%r = icmp eq i32 %cr, 0
	ret i1 %r			ret i1 %r
	}			}

	define i1 @bcmp8(ptr %a, ptr %b) {			define i1 @bcmp8(ptr %a, ptr %b) {
	; CHECK-LABEL: bcmp8:			; CHECK-LABEL: bcmp8:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldr x8, [x0]			; CHECK-NEXT: ldr x8, [x0]
	; CHECK-NEXT: ldr x9, [x1]			; CHECK-NEXT: ldr x9, [x1]
	; CHECK-NEXT: cmp x8, x9			; CHECK-NEXT: cmp x8, x9
	; CHECK-NEXT: cset w0, eq			; CHECK-NEXT: cset w0, eq
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%cr = call i32 @bcmp(ptr %a, ptr %b, i64 8)			%cr = call i32 @bcmp(ptr %a, ptr %b, i64 8)
	%r = icmp eq i32 %cr, 0			%r = icmp eq i32 %cr, 0
	ret i1 %r			ret i1 %r
	}			}

	define i1 @bcmp9(ptr %a, ptr %b) {			define i1 @bcmp9(ptr %a, ptr %b) {
	; CHECK-LABEL: bcmp9:			; CHECK-LABEL: bcmp9:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldrb w9, [x0, #8]
	; CHECK-NEXT: ldrb w10, [x1, #8]
	; CHECK-NEXT: ldr x8, [x0]			; CHECK-NEXT: ldr x8, [x0]
	; CHECK-NEXT: ldr x11, [x1]			; CHECK-NEXT: ldr x9, [x1]
	; CHECK-NEXT: eor w9, w9, w10			; CHECK-NEXT: ldrb w10, [x0, #8]
	; CHECK-NEXT: and x9, x9, #0xff			; CHECK-NEXT: ldrb w11, [x1, #8]
	; CHECK-NEXT: eor x8, x8, x11			; CHECK-NEXT: cmp x8, x9
	; CHECK-NEXT: orr x8, x8, x9			; CHECK-NEXT: ccmp x10, x11, #0, eq
	; CHECK-NEXT: cmp x8, #0
	; CHECK-NEXT: cset w0, eq			; CHECK-NEXT: cset w0, eq
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%cr = call i32 @bcmp(ptr %a, ptr %b, i64 9)			%cr = call i32 @bcmp(ptr %a, ptr %b, i64 9)
	%r = icmp eq i32 %cr, 0			%r = icmp eq i32 %cr, 0
	ret i1 %r			ret i1 %r
	}			}

	define i1 @bcmp10(ptr %a, ptr %b) {			define i1 @bcmp10(ptr %a, ptr %b) {
	; CHECK-LABEL: bcmp10:			; CHECK-LABEL: bcmp10:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldrh w9, [x0, #8]
	; CHECK-NEXT: ldrh w10, [x1, #8]
	; CHECK-NEXT: ldr x8, [x0]			; CHECK-NEXT: ldr x8, [x0]
	; CHECK-NEXT: ldr x11, [x1]			; CHECK-NEXT: ldr x9, [x1]
	; CHECK-NEXT: eor w9, w9, w10			; CHECK-NEXT: ldrh w10, [x0, #8]
	; CHECK-NEXT: and x9, x9, #0xffff			; CHECK-NEXT: ldrh w11, [x1, #8]
	; CHECK-NEXT: eor x8, x8, x11			; CHECK-NEXT: cmp x8, x9
	; CHECK-NEXT: orr x8, x8, x9			; CHECK-NEXT: ccmp x10, x11, #0, eq
	; CHECK-NEXT: cmp x8, #0
	; CHECK-NEXT: cset w0, eq			; CHECK-NEXT: cset w0, eq
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%cr = call i32 @bcmp(ptr %a, ptr %b, i64 10)			%cr = call i32 @bcmp(ptr %a, ptr %b, i64 10)
	%r = icmp eq i32 %cr, 0			%r = icmp eq i32 %cr, 0
	ret i1 %r			ret i1 %r
	}			}

	define i1 @bcmp11(ptr %a, ptr %b) {			define i1 @bcmp11(ptr %a, ptr %b) {
	; CHECK-LABEL: bcmp11:			; CHECK-LABEL: bcmp11:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldr x8, [x0]			; CHECK-NEXT: ldr x8, [x0]
	; CHECK-NEXT: ldr x9, [x1]			; CHECK-NEXT: ldr x9, [x1]
	; CHECK-NEXT: ldur x10, [x0, #3]			; CHECK-NEXT: ldur x10, [x0, #3]
	; CHECK-NEXT: ldur x11, [x1, #3]			; CHECK-NEXT: ldur x11, [x1, #3]
	; CHECK-NEXT: eor x8, x8, x9			; CHECK-NEXT: cmp x8, x9
	; CHECK-NEXT: eor x9, x10, x11			; CHECK-NEXT: ccmp x10, x11, #0, eq
	; CHECK-NEXT: orr x8, x8, x9
	; CHECK-NEXT: cmp x8, #0
	; CHECK-NEXT: cset w0, eq			; CHECK-NEXT: cset w0, eq
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%cr = call i32 @bcmp(ptr %a, ptr %b, i64 11)			%cr = call i32 @bcmp(ptr %a, ptr %b, i64 11)
	%r = icmp eq i32 %cr, 0			%r = icmp eq i32 %cr, 0
	ret i1 %r			ret i1 %r
	}			}

	define i1 @bcmp12(ptr %a, ptr %b) {			define i1 @bcmp12(ptr %a, ptr %b) {
	; CHECK-LABEL: bcmp12:			; CHECK-LABEL: bcmp12:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldr x8, [x0]			; CHECK-NEXT: ldr x8, [x0]
	; CHECK-NEXT: ldr x9, [x1]			; CHECK-NEXT: ldr x9, [x1]
	; CHECK-NEXT: ldr w10, [x0, #8]			; CHECK-NEXT: ldr w10, [x0, #8]
	; CHECK-NEXT: ldr w11, [x1, #8]			; CHECK-NEXT: ldr w11, [x1, #8]
	; CHECK-NEXT: eor x8, x8, x9			; CHECK-NEXT: cmp x8, x9
	; CHECK-NEXT: eor w9, w10, w11			; CHECK-NEXT: ccmp x10, x11, #0, eq
	; CHECK-NEXT: orr x8, x8, x9
	; CHECK-NEXT: cmp x8, #0
	; CHECK-NEXT: cset w0, eq			; CHECK-NEXT: cset w0, eq
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%cr = call i32 @bcmp(ptr %a, ptr %b, i64 12)			%cr = call i32 @bcmp(ptr %a, ptr %b, i64 12)
	%r = icmp eq i32 %cr, 0			%r = icmp eq i32 %cr, 0
	ret i1 %r			ret i1 %r
	}			}

	define i1 @bcmp13(ptr %a, ptr %b) {			define i1 @bcmp13(ptr %a, ptr %b) {
	; CHECK-LABEL: bcmp13:			; CHECK-LABEL: bcmp13:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldr x8, [x0]			; CHECK-NEXT: ldr x8, [x0]
	; CHECK-NEXT: ldr x9, [x1]			; CHECK-NEXT: ldr x9, [x1]
	; CHECK-NEXT: ldur x10, [x0, #5]			; CHECK-NEXT: ldur x10, [x0, #5]
	; CHECK-NEXT: ldur x11, [x1, #5]			; CHECK-NEXT: ldur x11, [x1, #5]
	; CHECK-NEXT: eor x8, x8, x9			; CHECK-NEXT: cmp x8, x9
	; CHECK-NEXT: eor x9, x10, x11			; CHECK-NEXT: ccmp x10, x11, #0, eq
	; CHECK-NEXT: orr x8, x8, x9
	; CHECK-NEXT: cmp x8, #0
	; CHECK-NEXT: cset w0, eq			; CHECK-NEXT: cset w0, eq
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%cr = call i32 @bcmp(ptr %a, ptr %b, i64 13)			%cr = call i32 @bcmp(ptr %a, ptr %b, i64 13)
	%r = icmp eq i32 %cr, 0			%r = icmp eq i32 %cr, 0
	ret i1 %r			ret i1 %r
	}			}

	define i1 @bcmp14(ptr %a, ptr %b) {			define i1 @bcmp14(ptr %a, ptr %b) {
	; CHECK-LABEL: bcmp14:			; CHECK-LABEL: bcmp14:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldr x8, [x0]			; CHECK-NEXT: ldr x8, [x0]
	; CHECK-NEXT: ldr x9, [x1]			; CHECK-NEXT: ldr x9, [x1]
	; CHECK-NEXT: ldur x10, [x0, #6]			; CHECK-NEXT: ldur x10, [x0, #6]
	; CHECK-NEXT: ldur x11, [x1, #6]			; CHECK-NEXT: ldur x11, [x1, #6]
	; CHECK-NEXT: eor x8, x8, x9			; CHECK-NEXT: cmp x8, x9
	; CHECK-NEXT: eor x9, x10, x11			; CHECK-NEXT: ccmp x10, x11, #0, eq
	; CHECK-NEXT: orr x8, x8, x9
	; CHECK-NEXT: cmp x8, #0
	; CHECK-NEXT: cset w0, eq			; CHECK-NEXT: cset w0, eq
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%cr = call i32 @bcmp(ptr %a, ptr %b, i64 14)			%cr = call i32 @bcmp(ptr %a, ptr %b, i64 14)
	%r = icmp eq i32 %cr, 0			%r = icmp eq i32 %cr, 0
	ret i1 %r			ret i1 %r
	}			}

	define i1 @bcmp15(ptr %a, ptr %b) {			define i1 @bcmp15(ptr %a, ptr %b) {
	; CHECK-LABEL: bcmp15:			; CHECK-LABEL: bcmp15:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldr x8, [x0]			; CHECK-NEXT: ldr x8, [x0]
	; CHECK-NEXT: ldr x9, [x1]			; CHECK-NEXT: ldr x9, [x1]
	; CHECK-NEXT: ldur x10, [x0, #7]			; CHECK-NEXT: ldur x10, [x0, #7]
	; CHECK-NEXT: ldur x11, [x1, #7]			; CHECK-NEXT: ldur x11, [x1, #7]
	; CHECK-NEXT: eor x8, x8, x9			; CHECK-NEXT: cmp x8, x9
	; CHECK-NEXT: eor x9, x10, x11			; CHECK-NEXT: ccmp x10, x11, #0, eq
	; CHECK-NEXT: orr x8, x8, x9
	; CHECK-NEXT: cmp x8, #0
	; CHECK-NEXT: cset w0, eq			; CHECK-NEXT: cset w0, eq
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%cr = call i32 @bcmp(ptr %a, ptr %b, i64 15)			%cr = call i32 @bcmp(ptr %a, ptr %b, i64 15)
	%r = icmp eq i32 %cr, 0			%r = icmp eq i32 %cr, 0
	ret i1 %r			ret i1 %r
	}			}

	define i1 @bcmp16(ptr %a, ptr %b) {			define i1 @bcmp16(ptr %a, ptr %b) {
	; CHECK-LABEL: bcmp16:			; CHECK-LABEL: bcmp16:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldp x8, x9, [x0]			; CHECK-NEXT: ldp x8, x9, [x0]
	; CHECK-NEXT: ldp x10, x11, [x1]			; CHECK-NEXT: ldp x10, x11, [x1]
	; CHECK-NEXT: eor x8, x8, x10			; CHECK-NEXT: cmp x8, x10
	; CHECK-NEXT: eor x9, x9, x11			; CHECK-NEXT: ccmp x9, x11, #0, eq
	; CHECK-NEXT: orr x8, x8, x9
	; CHECK-NEXT: cmp x8, #0
	; CHECK-NEXT: cset w0, eq			; CHECK-NEXT: cset w0, eq
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%cr = call i32 @bcmp(ptr %a, ptr %b, i64 16)			%cr = call i32 @bcmp(ptr %a, ptr %b, i64 16)
	%r = icmp eq i32 %cr, 0			%r = icmp eq i32 %cr, 0
	ret i1 %r			ret i1 %r
	}			}

	define i1 @bcmp20(ptr %a, ptr %b) {			define i1 @bcmp20(ptr %a, ptr %b) {
	; CHECK-LABEL: bcmp20:			; CHECK-LABEL: bcmp20:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldp x8, x9, [x0]			; CHECK-NEXT: ldp x8, x9, [x0]
	; CHECK-NEXT: ldp x10, x11, [x1]			; CHECK-NEXT: ldp x10, x11, [x1]
	; CHECK-NEXT: ldr w12, [x0, #16]			; CHECK-NEXT: cmp x8, x10
	; CHECK-NEXT: ldr w13, [x1, #16]			; CHECK-NEXT: ldr w8, [x0, #16]
	; CHECK-NEXT: eor x8, x8, x10			; CHECK-NEXT: ldr w10, [x1, #16]
	; CHECK-NEXT: eor x9, x9, x11			; CHECK-NEXT: ccmp x9, x11, #0, eq
	; CHECK-NEXT: eor w10, w12, w13			; CHECK-NEXT: ccmp x8, x10, #0, eq
	; CHECK-NEXT: orr x8, x8, x9
	; CHECK-NEXT: orr x8, x8, x10
	; CHECK-NEXT: cmp x8, #0
	; CHECK-NEXT: cset w0, eq			; CHECK-NEXT: cset w0, eq
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%cr = call i32 @bcmp(ptr %a, ptr %b, i64 20)			%cr = call i32 @bcmp(ptr %a, ptr %b, i64 20)
	%r = icmp eq i32 %cr, 0			%r = icmp eq i32 %cr, 0
	ret i1 %r			ret i1 %r
	}			}

	define i1 @bcmp24(ptr %a, ptr %b) {			define i1 @bcmp24(ptr %a, ptr %b) {
	; CHECK-LABEL: bcmp24:			; CHECK-LABEL: bcmp24:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldp x8, x9, [x0]			; CHECK-NEXT: ldp x8, x9, [x0]
	; CHECK-NEXT: ldp x10, x11, [x1]			; CHECK-NEXT: ldp x10, x11, [x1]
	; CHECK-NEXT: ldr x12, [x0, #16]			; CHECK-NEXT: cmp x8, x10
	; CHECK-NEXT: ldr x13, [x1, #16]			; CHECK-NEXT: ldr x8, [x0, #16]
	; CHECK-NEXT: eor x8, x8, x10			; CHECK-NEXT: ldr x10, [x1, #16]
	; CHECK-NEXT: eor x9, x9, x11			; CHECK-NEXT: ccmp x9, x11, #0, eq
	; CHECK-NEXT: eor x10, x12, x13			; CHECK-NEXT: ccmp x8, x10, #0, eq
	; CHECK-NEXT: orr x8, x8, x9
	; CHECK-NEXT: orr x8, x8, x10
	; CHECK-NEXT: cmp x8, #0
	; CHECK-NEXT: cset w0, eq			; CHECK-NEXT: cset w0, eq
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%cr = call i32 @bcmp(ptr %a, ptr %b, i64 24)			%cr = call i32 @bcmp(ptr %a, ptr %b, i64 24)
	%r = icmp eq i32 %cr, 0			%r = icmp eq i32 %cr, 0
	ret i1 %r			ret i1 %r
	}			}

	define i1 @bcmp28(ptr %a, ptr %b) {			define i1 @bcmp28(ptr %a, ptr %b) {
	; CHECK-LABEL: bcmp28:			; CHECK-LABEL: bcmp28:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldp x8, x9, [x0]			; CHECK-NEXT: ldp x8, x9, [x0]
	; CHECK-NEXT: ldp x10, x11, [x1]			; CHECK-NEXT: ldp x10, x11, [x1]
	; CHECK-NEXT: ldr x12, [x0, #16]			; CHECK-NEXT: ldr x12, [x0, #16]
	; CHECK-NEXT: ldr x13, [x1, #16]			; CHECK-NEXT: cmp x8, x10
	; CHECK-NEXT: ldr w14, [x0, #24]			; CHECK-NEXT: ldr x8, [x1, #16]
	; CHECK-NEXT: eor x8, x8, x10			; CHECK-NEXT: ccmp x9, x11, #0, eq
	; CHECK-NEXT: ldr w15, [x1, #24]			; CHECK-NEXT: ldr w9, [x0, #24]
	; CHECK-NEXT: eor x9, x9, x11			; CHECK-NEXT: ldr w10, [x1, #24]
	; CHECK-NEXT: eor x10, x12, x13			; CHECK-NEXT: cset w11, ne
	; CHECK-NEXT: orr x8, x8, x9			; CHECK-NEXT: cmp x12, x8
	; CHECK-NEXT: eor w11, w14, w15			; CHECK-NEXT: ccmp x9, x10, #0, eq
	; CHECK-NEXT: orr x9, x10, x11			; CHECK-NEXT: cset w8, ne
	; CHECK-NEXT: orr x8, x8, x9			; CHECK-NEXT: orr w8, w11, w8
	; CHECK-NEXT: cmp x8, #0			; CHECK-NEXT: eor w0, w8, #0x1
	; CHECK-NEXT: cset w0, eq
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%cr = call i32 @bcmp(ptr %a, ptr %b, i64 28)			%cr = call i32 @bcmp(ptr %a, ptr %b, i64 28)
	%r = icmp eq i32 %cr, 0			%r = icmp eq i32 %cr, 0
	ret i1 %r			ret i1 %r
	}			}

	define i1 @bcmp33(ptr %a, ptr %b) {			define i1 @bcmp33(ptr %a, ptr %b) {
	; CHECK-LABEL: bcmp33:			; CHECK-LABEL: bcmp33:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldp x8, x9, [x0]			; CHECK-NEXT: ldrb w8, [x0, #32]
	; CHECK-NEXT: ldp x10, x11, [x1]			; CHECK-NEXT: ldrb w9, [x1, #32]
	; CHECK-NEXT: ldp x12, x13, [x0, #16]			; CHECK-NEXT: ldp x10, x11, [x0]
	; CHECK-NEXT: ldp x14, x15, [x1, #16]			; CHECK-NEXT: ldp x12, x13, [x1]
	; CHECK-NEXT: eor x8, x8, x10			; CHECK-NEXT: cmp x8, x9
	; CHECK-NEXT: eor x9, x9, x11			; CHECK-NEXT: cset w8, eq
	; CHECK-NEXT: ldrb w16, [x0, #32]			; CHECK-NEXT: cmp x10, x12
	; CHECK-NEXT: orr x8, x8, x9			; CHECK-NEXT: ldp x9, x10, [x0, #16]
	; CHECK-NEXT: ldrb w17, [x1, #32]			; CHECK-NEXT: ccmp x11, x13, #0, eq
	; CHECK-NEXT: eor x10, x12, x14			; CHECK-NEXT: ldp x11, x12, [x1, #16]
	; CHECK-NEXT: eor x11, x13, x15			; CHECK-NEXT: cset w13, ne
	; CHECK-NEXT: eor w12, w16, w17			; CHECK-NEXT: cmp x9, x11
	; CHECK-NEXT: orr x9, x10, x11			; CHECK-NEXT: ccmp x10, x12, #0, eq
	; CHECK-NEXT: and x10, x12, #0xff			; CHECK-NEXT: cset w9, ne
	; CHECK-NEXT: orr x8, x8, x9			; CHECK-NEXT: orr w9, w13, w9
	; CHECK-NEXT: orr x8, x8, x10			; CHECK-NEXT: bic w0, w8, w9
	; CHECK-NEXT: cmp x8, #0
	; CHECK-NEXT: cset w0, eq
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%cr = call i32 @bcmp(ptr %a, ptr %b, i64 33)			%cr = call i32 @bcmp(ptr %a, ptr %b, i64 33)
	%r = icmp eq i32 %cr, 0			%r = icmp eq i32 %cr, 0
	ret i1 %r			ret i1 %r
	}			}

	define i1 @bcmp38(ptr %a, ptr %b) {			define i1 @bcmp38(ptr %a, ptr %b) {
	; CHECK-LABEL: bcmp38:			; CHECK-LABEL: bcmp38:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldp x8, x9, [x0]			; CHECK-NEXT: ldur x8, [x0, #30]
	; CHECK-NEXT: ldp x10, x11, [x1]			; CHECK-NEXT: ldur x9, [x1, #30]
	; CHECK-NEXT: ldp x12, x13, [x0, #16]			; CHECK-NEXT: ldp x10, x11, [x0]
	; CHECK-NEXT: ldp x14, x15, [x1, #16]			; CHECK-NEXT: ldp x12, x13, [x1]
	; CHECK-NEXT: eor x8, x8, x10			; CHECK-NEXT: cmp x8, x9
	; CHECK-NEXT: eor x9, x9, x11			; CHECK-NEXT: cset w8, eq
	; CHECK-NEXT: ldur x10, [x0, #30]			; CHECK-NEXT: cmp x10, x12
	; CHECK-NEXT: orr x8, x8, x9			; CHECK-NEXT: ldp x9, x10, [x0, #16]
	; CHECK-NEXT: ldur x11, [x1, #30]			; CHECK-NEXT: ccmp x11, x13, #0, eq
	; CHECK-NEXT: eor x12, x12, x14			; CHECK-NEXT: ldp x11, x12, [x1, #16]
	; CHECK-NEXT: eor x13, x13, x15			; CHECK-NEXT: cset w13, ne
	; CHECK-NEXT: orr x9, x12, x13			; CHECK-NEXT: cmp x9, x11
	; CHECK-NEXT: eor x10, x10, x11			; CHECK-NEXT: ccmp x10, x12, #0, eq
	; CHECK-NEXT: orr x8, x8, x9			; CHECK-NEXT: cset w9, ne
	; CHECK-NEXT: orr x8, x8, x10			; CHECK-NEXT: orr w9, w13, w9
	; CHECK-NEXT: cmp x8, #0			; CHECK-NEXT: bic w0, w8, w9
	; CHECK-NEXT: cset w0, eq
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%cr = call i32 @bcmp(ptr %a, ptr %b, i64 38)			%cr = call i32 @bcmp(ptr %a, ptr %b, i64 38)
	%r = icmp eq i32 %cr, 0			%r = icmp eq i32 %cr, 0
	ret i1 %r			ret i1 %r
	}			}

	define i1 @bcmp45(ptr %a, ptr %b) {			define i1 @bcmp45(ptr %a, ptr %b) {
	; CHECK-LABEL: bcmp45:			; CHECK-LABEL: bcmp45:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldp x8, x9, [x0]			; CHECK-NEXT: ldp x8, x9, [x0]
	; CHECK-NEXT: ldp x10, x11, [x1]			; CHECK-NEXT: ldp x10, x11, [x1]
	; CHECK-NEXT: ldp x12, x13, [x0, #16]			; CHECK-NEXT: ldr x12, [x0, #32]
	; CHECK-NEXT: ldp x14, x15, [x1, #16]			; CHECK-NEXT: cmp x8, x10
	; CHECK-NEXT: eor x8, x8, x10			; CHECK-NEXT: ldp x8, x10, [x0, #16]
	; CHECK-NEXT: eor x9, x9, x11			; CHECK-NEXT: ccmp x9, x11, #0, eq
	; CHECK-NEXT: ldr x16, [x0, #32]			; CHECK-NEXT: ldp x9, x11, [x1, #16]
	; CHECK-NEXT: orr x8, x8, x9			; CHECK-NEXT: cset w13, ne
	; CHECK-NEXT: ldr x17, [x1, #32]			; CHECK-NEXT: cmp x8, x9
	; CHECK-NEXT: ldur x18, [x0, #37]			; CHECK-NEXT: ldr x8, [x1, #32]
	; CHECK-NEXT: eor x10, x12, x14			; CHECK-NEXT: ccmp x10, x11, #0, eq
	; CHECK-NEXT: ldur x0, [x1, #37]			; CHECK-NEXT: ldur x9, [x0, #37]
	; CHECK-NEXT: eor x11, x13, x15			; CHECK-NEXT: ldur x10, [x1, #37]
	; CHECK-NEXT: eor x12, x16, x17			; CHECK-NEXT: cset w11, ne
	; CHECK-NEXT: orr x9, x10, x11			; CHECK-NEXT: cmp x12, x8
	; CHECK-NEXT: orr x8, x8, x9			; CHECK-NEXT: ccmp x9, x10, #0, eq
	; CHECK-NEXT: eor x13, x18, x0			; CHECK-NEXT: orr w9, w13, w11
	; CHECK-NEXT: orr x10, x12, x13			; CHECK-NEXT: cset w8, ne
	; CHECK-NEXT: orr x8, x8, x10			; CHECK-NEXT: orr w8, w9, w8
	; CHECK-NEXT: cmp x8, #0			; CHECK-NEXT: eor w0, w8, #0x1
	; CHECK-NEXT: cset w0, eq
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%cr = call i32 @bcmp(ptr %a, ptr %b, i64 45)			%cr = call i32 @bcmp(ptr %a, ptr %b, i64 45)
	%r = icmp eq i32 %cr, 0			%r = icmp eq i32 %cr, 0
	ret i1 %r			ret i1 %r
	}			}

	define i1 @bcmp64(ptr %a, ptr %b) {			define i1 @bcmp64(ptr %a, ptr %b) {
	; CHECK-LABEL: bcmp64:			; CHECK-LABEL: bcmp64:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldp x8, x9, [x0]			; CHECK-NEXT: ldp x8, x9, [x0]
	; CHECK-NEXT: ldp x10, x11, [x1]			; CHECK-NEXT: ldp x10, x11, [x1]
	; CHECK-NEXT: ldp x12, x13, [x0, #16]			; CHECK-NEXT: cmp x8, x10
	; CHECK-NEXT: ldp x14, x15, [x1, #16]			; CHECK-NEXT: ldp x8, x10, [x0, #16]
	; CHECK-NEXT: eor x8, x8, x10			; CHECK-NEXT: ccmp x9, x11, #0, eq
	; CHECK-NEXT: eor x9, x9, x11			; CHECK-NEXT: ldp x9, x11, [x1, #16]
	; CHECK-NEXT: ldp x16, x17, [x0, #32]			; CHECK-NEXT: cset w12, ne
	; CHECK-NEXT: orr x8, x8, x9			; CHECK-NEXT: cmp x8, x9
	; CHECK-NEXT: ldp x18, x2, [x1, #32]			; CHECK-NEXT: ldp x8, x9, [x0, #32]
	; CHECK-NEXT: eor x12, x12, x14			; CHECK-NEXT: ccmp x10, x11, #0, eq
	; CHECK-NEXT: eor x13, x13, x15			; CHECK-NEXT: ldp x10, x11, [x1, #32]
	; CHECK-NEXT: ldp x3, x0, [x0, #48]			; CHECK-NEXT: cset w13, ne
	; CHECK-NEXT: orr x9, x12, x13			; CHECK-NEXT: cmp x8, x10
	; CHECK-NEXT: ldp x10, x11, [x1, #48]			; CHECK-NEXT: ldp x14, x8, [x0, #48]
	; CHECK-NEXT: eor x14, x16, x18			; CHECK-NEXT: ccmp x9, x11, #0, eq
	; CHECK-NEXT: eor x15, x17, x2			; CHECK-NEXT: ldp x9, x10, [x1, #48]
	; CHECK-NEXT: orr x12, x14, x15			; CHECK-NEXT: cset w11, ne
	; CHECK-NEXT: orr x8, x8, x9			; CHECK-NEXT: cmp x14, x9
	; CHECK-NEXT: eor x10, x3, x10			; CHECK-NEXT: orr w9, w12, w13
	; CHECK-NEXT: eor x11, x0, x11			; CHECK-NEXT: ccmp x8, x10, #0, eq
	; CHECK-NEXT: orr x10, x10, x11			; CHECK-NEXT: cset w8, ne
	; CHECK-NEXT: orr x9, x12, x10			; CHECK-NEXT: orr w8, w11, w8
	; CHECK-NEXT: orr x8, x8, x9			; CHECK-NEXT: orr w8, w9, w8
	; CHECK-NEXT: cmp x8, #0			; CHECK-NEXT: eor w0, w8, #0x1
	; CHECK-NEXT: cset w0, eq
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%cr = call i32 @bcmp(ptr %a, ptr %b, i64 64)			%cr = call i32 @bcmp(ptr %a, ptr %b, i64 64)
	%r = icmp eq i32 %cr, 0			%r = icmp eq i32 %cr, 0
	ret i1 %r			ret i1 %r
	}			}

	define i1 @bcmp89(ptr %a, ptr %b) {			define i1 @bcmp89(ptr %a, ptr %b) {
	; CHECK-LABEL: bcmp89:			; CHECK-LABEL: bcmp89:
	Show All 15 Lines