This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Improve codegen for vectorised loops with two active lane masks
Changes PlannedPublic

Authored by david-arm on Mar 3 2023, 5:34 AM.

Download Raw Diff

Details

Reviewers

sdesmalen
kmclaughlin
MattDevereau
paulwalker-arm
dmgreen

Summary

When vectorising loops using tail-folding and interleaving we end up
with two back-to-back active.lane.mask intrinsic calls. Unfortunately,
this leads to poor codegen like this:

.LBB0_1:
  ...
  whilelo p1.b, x11, x1
  cset, mi
  whilelo p0.b, x12, x1
  tbnz ..., #0, .LBB0_1

This is because in AArch64InstrInfo::optimizeCondBranch we bail out if
we find a flag-setting operation between a CSINC and a TBNZW machine
node. However, in these cases nothing depends upon the flags set by
the second whilelo and it's safe to move it above the first whilelo.

I've changed AArch64InstrInfo::optimizeCondBranch to support having
a single flag-setting operation between CSINC and TBNZW, provided
we can prove it's safe to move it above the first flag-setting op.

Diff Detail

Event Timeline

david-arm created this revision.Mar 3 2023, 5:34 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2023, 5:34 AM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald Transcript

david-arm requested review of this revision.Mar 3 2023, 5:34 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2023, 5:34 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

david-arm edited the summary of this revision. (Show Details)Mar 3 2023, 5:35 AM

david-arm edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B217152: Diff 502109.Mar 3 2023, 6:16 AM

I'm no longer sure if this is the best approach, so I'm putting this patch to one side for now.

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

PeepholeOptimizer.cpp

7 lines

Target/

AArch64/

AArch64InstrInfo.cpp

96 lines

test/

CodeGen/

AArch64/

active_lane_mask.ll

103 lines

sve-loop-two-whiles1.mir

83 lines

sve-loop-two-whiles2.mir

82 lines

sve-loop-two-whiles3.mir

81 lines

sve-loop-two-whiles4.mir

86 lines

sve-loop-two-whiles5.mir

81 lines

Diff 502109

llvm/lib/CodeGen/PeepholeOptimizer.cpp

Show First 20 Lines • Show All 656 Lines • ▼ Show 20 Lines	bool PeepholeOptimizer::optimizeSelect(MachineInstr &MI,
LLVM_DEBUG(dbgs() << "Deleting select: " << MI);		LLVM_DEBUG(dbgs() << "Deleting select: " << MI);
MI.eraseFromParent();		MI.eraseFromParent();
++NumSelects;		++NumSelects;
return true;		return true;
}		}

/// Check if a simpler conditional branch can be generated.		/// Check if a simpler conditional branch can be generated.
bool PeepholeOptimizer::optimizeCondBranch(MachineInstr &MI) {		bool PeepholeOptimizer::optimizeCondBranch(MachineInstr &MI) {
return TII->optimizeCondBranch(MI);		LLVM_DEBUG(dbgs() << "Attempting to optimize conditional branch: " << MI);
		if (TII->optimizeCondBranch(MI)) {
		LLVM_DEBUG(dbgs() << " -> Successfully optimized conditional branch!\n");
		return true;
		}
		return false;
}		}

/// Try to find the next source that share the same register file		/// Try to find the next source that share the same register file
/// for the value defined by \p Reg and \p SubReg.		/// for the value defined by \p Reg and \p SubReg.
/// When true is returned, the \p RewriteMap can be used by the client to		/// When true is returned, the \p RewriteMap can be used by the client to
/// retrieve all Def -> Use along the way up to the next source. Any found		/// retrieve all Def -> Use along the way up to the next source. Any found
/// Use that is not itself a key for another entry, is the next source to		/// Use that is not itself a key for another entry, is the next source to
/// use. During the search for the next source, multiple sources can be found		/// use. During the search for the next source, multiple sources can be found
▲ Show 20 Lines • Show All 1,455 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,261 Lines • ▼ Show 20 Lines

/// True when condition flags are accessed (either by writing or reading)		/// True when condition flags are accessed (either by writing or reading)
/// on the instruction trace starting at From and ending at To.		/// on the instruction trace starting at From and ending at To.
///		///
/// Note: If From and To are from different blocks it's assumed CC are accessed		/// Note: If From and To are from different blocks it's assumed CC are accessed
/// on the path.		/// on the path.
static bool areCFlagsAccessedBetweenInstrs(		static bool areCFlagsAccessedBetweenInstrs(
MachineBasicBlock::iterator From, MachineBasicBlock::iterator To,		MachineBasicBlock::iterator From, MachineBasicBlock::iterator To,
const TargetRegisterInfo *TRI, const AccessKind AccessToCheck = AK_All) {		const TargetRegisterInfo *TRI, const AccessKind AccessToCheck = AK_All,
		MachineInstr **Accessor = nullptr) {
// Early exit if To is at the beginning of the BB.		// Early exit if To is at the beginning of the BB.
if (To == To->getParent()->begin())		if (To == To->getParent()->begin()) {
		if (Accessor)
		*Accessor = nullptr;
return true;		return true;
		}

// Check whether the instructions are in the same basic block		// Check whether the instructions are in the same basic block
// If not, assume the condition flags might get modified somewhere.		// If not, assume the condition flags might get modified somewhere.
if (To->getParent() != From->getParent())		if (To->getParent() != From->getParent()) {
		if (Accessor)
		*Accessor = nullptr;
return true;		return true;
		}

// From must be above To.		// From must be above To.
assert(std::any_of(		assert(std::any_of(
++To.getReverse(), To->getParent()->rend(),		++To.getReverse(), To->getParent()->rend(),
[From](MachineInstr &MI) { return MI.getIterator() == From; }));		[From](MachineInstr &MI) { return MI.getIterator() == From; }));

// We iterate backward starting at \p To until we hit \p From.		// We iterate backward starting at \p To until we hit \p From.
for (const MachineInstr &Instr :		for (MachineInstr &Instr :
instructionsWithoutDebug(++To.getReverse(), From.getReverse())) {		instructionsWithoutDebug(++To.getReverse(), From.getReverse())) {
if (((AccessToCheck & AK_Write) &&		if (((AccessToCheck & AK_Write) &&
Instr.modifiesRegister(AArch64::NZCV, TRI)) \|\|		Instr.modifiesRegister(AArch64::NZCV, TRI)) \|\|
((AccessToCheck & AK_Read) && Instr.readsRegister(AArch64::NZCV, TRI)))		((AccessToCheck & AK_Read) &&
		Instr.readsRegister(AArch64::NZCV, TRI))) {
		if (Accessor)
		*Accessor = &Instr;
return true;		return true;
}		}
		}
return false;		return false;
}		}

/// optimizePTestInstr - Attempt to remove a ptest of a predicate-generating		/// optimizePTestInstr - Attempt to remove a ptest of a predicate-generating
/// operation which could set the flags in an identical manner		/// operation which could set the flags in an identical manner
bool AArch64InstrInfo::optimizePTestInstr(		bool AArch64InstrInfo::optimizePTestInstr(
MachineInstr *PTest, unsigned MaskReg, unsigned PredReg,		MachineInstr *PTest, unsigned MaskReg, unsigned PredReg,
const MachineRegisterInfo *MRI) const {		const MachineRegisterInfo *MRI) const {
▲ Show 20 Lines • Show All 5,486 Lines • ▼ Show 20 Lines	void AArch64InstrInfo::genAlternativeCodeSequence(
// instructions that we have combined.		// instructions that we have combined.
uint16_t Flags = Root.getFlags();		uint16_t Flags = Root.getFlags();
if (MUL)		if (MUL)
Flags = Root.mergeFlagsWith(*MUL);		Flags = Root.mergeFlagsWith(*MUL);
for (auto *MI : InsInstrs)		for (auto *MI : InsInstrs)
MI->setFlags(Flags);		MI->setFlags(Flags);
}		}

		static bool accessesNonFlagsPhysicalRegister(MachineInstr *MI) {
		for (const MachineOperand &MO : MI->operands()) {
		if (!MO.isReg())
		continue;
		Register Reg = MO.getReg();
		if (Reg == 0)
		continue;
		if (Reg.isPhysical() && Reg != AArch64::NZCV)
		return true;
		}
		return false;
		}

/// Replace csincr-branch sequence by simple conditional branch		/// Replace csincr-branch sequence by simple conditional branch
///		///
/// Examples:		/// Examples:
/// 1. \code		/// 1. \code
/// csinc w9, wzr, wzr, <condition code>		/// csinc w9, wzr, wzr, <condition code>
/// tbnz w9, #0, 0x44		/// tbnz w9, #0, 0x44
/// \endcode		/// \endcode
/// to		/// to
▲ Show 20 Lines • Show All 141 Lines • ▼ Show 20 Lines	if (!(DefMI->getOperand(1).getReg() == AArch64::WZR &&
!(DefMI->getOperand(1).getReg() == AArch64::XZR &&		!(DefMI->getOperand(1).getReg() == AArch64::XZR &&
DefMI->getOperand(2).getReg() == AArch64::XZR))		DefMI->getOperand(2).getReg() == AArch64::XZR))
return false;		return false;

if (DefMI->findRegisterDefOperandIdx(AArch64::NZCV, true) != -1)		if (DefMI->findRegisterDefOperandIdx(AArch64::NZCV, true) != -1)
return false;		return false;

AArch64CC::CondCode CC = (AArch64CC::CondCode)DefMI->getOperand(3).getImm();		AArch64CC::CondCode CC = (AArch64CC::CondCode)DefMI->getOperand(3).getImm();
// Convert only when the condition code is not modified between		// Check to see if the condition code is modified between the CSINC and the
// the CSINC and the branch. The CC may be used by other		// branch. If so, we may still be able to apply the optimisation by first
// instructions in between.		// moving the intervening instruction above the flag-setting operation that
if (areCFlagsAccessedBetweenInstrs(DefMI, MI, &getRegisterInfo(), AK_Write))		// CSINC depends upon, i.e.
		// Instr1 (implicit-def NZCV)
		// ...
		// CSINC
		// ...
		// Instr2 (implicit-def NZCV) -> move above Instr1 if safe
		// ...
		// TBNZW
		MachineInstr *Accessor;
		if (areCFlagsAccessedBetweenInstrs(DefMI, MI, &getRegisterInfo(), AK_Write,
		&Accessor)) {
		// The function areCFlagsAccessedBetweenInstrs walks backwards from MI
		// to DefMI, and bails out for the first flag-setting operation
		// (Accessor) it finds.
		if (Accessor &&
		// Don't move memory ops for now, or ones that access physical
		// registers. Also, we can't move Accessor if the NZCV flags it sets
		// are alive in the block successors.
		!Accessor->mayLoadOrStore() &&
		!accessesNonFlagsPhysicalRegister(Accessor) &&
		!areCFlagsAliveInSuccessors(MBB)) {
		// We need to find the instruction that sets the NZCV flags that CSINC
		// depends upon.
		const TargetRegisterInfo *TRI = &getRegisterInfo();
		MachineBasicBlock::iterator DefIt = DefMI;
		while (!DefIt->modifiesRegister(AArch64::NZCV, TRI)) {
		--DefIt;
		}

		MachineInstr FlagsDefMI = &(DefIt);
		for (MachineInstr &Instr : instructionsWithoutDebug(
		FlagsDefMI->getIterator(), Accessor->getIterator())) {
		// If another instruction also writes to the NZCV flags inbetween we
		// should bail out because otherwise we'd have to move both!
		if (FlagsDefMI->getIterator() != Instr.getIterator() &&
		Instr.modifiesRegister(AArch64::NZCV, TRI))
return false;		return false;
		for (MachineOperand &MO : Accessor->operands()) {
		if (!MO.isReg() \|\| MO.isDef())
		continue;
		// We can't move Accessor above the instruction that defines one of
		// it's inputs!
		if (Instr.modifiesRegister(MO.getReg()))
		return false;
		}
		}

		// It's now safe to move this above the flag-setting op that feeds into
		// the CSINC!
		Accessor->moveBefore(FlagsDefMI);
		} else
		return false;
		}

MachineBasicBlock &RefToMBB = *MBB;		MachineBasicBlock &RefToMBB = *MBB;
MachineBasicBlock *TBB = MI.getOperand(TargetBBInMI).getMBB();		MachineBasicBlock *TBB = MI.getOperand(TargetBBInMI).getMBB();
DebugLoc DL = MI.getDebugLoc();		DebugLoc DL = MI.getDebugLoc();
if (IsNegativeBranch)		if (IsNegativeBranch)
CC = AArch64CC::getInvertedCondCode(CC);		CC = AArch64CC::getInvertedCondCode(CC);
BuildMI(RefToMBB, MI, DL, get(AArch64::Bcc)).addImm(CC).addMBB(TBB);		BuildMI(RefToMBB, MI, DL, get(AArch64::Bcc)).addImm(CC).addMBB(TBB);
MI.eraseFromParent();		MI.eraseFromParent();
return true;		return true;
▲ Show 20 Lines • Show All 1,350 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/active_lane_mask.ll

	Show First 20 Lines • Show All 519 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: ptrue p0.b, vl256			; CHECK-NEXT: ptrue p0.b, vl256
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%active.lane.mask = call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 256)			%active.lane.mask = call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 256)
	ret <vscale x 16 x i1> %active.lane.mask			ret <vscale x 16 x i1> %active.lane.mask
	}			}


				; == Test for vectorised loops using multiple lane masks for control flow ==

				define void @loop_with_two_lane_masks_nxv16i1(ptr noalias nocapture noundef writeonly %dst, i64 noundef %n) {
				; CHECK-LABEL: loop_with_two_lane_masks_nxv16i1:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: mov x8, xzr
				; CHECK-NEXT: pfalse p0.b
				; CHECK-NEXT: mov p1.b, p0.b
				; CHECK-NEXT: rdvl x9, #2
				; CHECK-NEXT: rdvl x10, #3
				; CHECK-NEXT: addvl x11, x0, #1
				; CHECK-NEXT: mov z0.b, #0 // =0x0
				; CHECK-NEXT: .LBB32_1: // %vector.body
				; CHECK-NEXT: // =>This Inner Loop Header: Depth=1
				; CHECK-NEXT: add x12, x10, x8
				; CHECK-NEXT: add x13, x8, x9
				; CHECK-NEXT: st1b { z0.b }, p0, [x0, x8]
				; CHECK-NEXT: st1b { z0.b }, p1, [x11, x8]
				; CHECK-NEXT: mov x8, x13
				; CHECK-NEXT: whilelo p1.b, x12, x1
				; CHECK-NEXT: whilelo p0.b, x13, x1
				; CHECK-NEXT: b.mi .LBB32_1
				; CHECK-NEXT: // %bb.2: // %for.end
				; CHECK-NEXT: ret
				entry:
				%0 = tail call i64 @llvm.vscale.i64()
				%1 = shl nuw nsw i64 %0, 4
				%2 = shl nuw nsw i64 %0, 5
				br label %vector.body

				vector.body:
				%index = phi i64 [ 0, %entry ], [ %index.next, %vector.body ]
				%active.lane.mask = phi <vscale x 16 x i1> [ zeroinitializer, %entry ], [ %active.lane.mask.next, %vector.body ]
				%active.lane.mask11 = phi <vscale x 16 x i1> [ zeroinitializer, %entry ], [ %active.lane.mask.next13, %vector.body ]
				%dst1 = getelementptr inbounds i8, ptr %dst, i64 %index
				tail call void @llvm.masked.store.nxv16i8.p0(<vscale x 16 x i8> zeroinitializer, ptr %dst1, i32 1, <vscale x 16 x i1> %active.lane.mask)
				%dst2 = getelementptr inbounds i8, ptr %dst1, i64 %1
				tail call void @llvm.masked.store.nxv16i8.p0(<vscale x 16 x i8> zeroinitializer, ptr %dst2, i32 1, <vscale x 16 x i1> %active.lane.mask11)
				%index.next = add i64 %index, %2
				%new.index = add i64 %index.next, %1
				%active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 %index.next, i64 %n)
				%active.lane.mask.next13 = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 %new.index, i64 %n)
				%bit = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0
				br i1 %bit, label %vector.body, label %for.end

				for.end:
				ret void
				}


				define void @loop_with_two_lane_masks_nxv4i1(ptr noalias nocapture noundef writeonly %dst, i64 noundef %n) {
				; CHECK-LABEL: loop_with_two_lane_masks_nxv4i1:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: mov x8, xzr
				; CHECK-NEXT: pfalse p0.b
				; CHECK-NEXT: mov p1.b, p0.b
				; CHECK-NEXT: rdvl x9, #2
				; CHECK-NEXT: rdvl x10, #3
				; CHECK-NEXT: addvl x11, x0, #1
				; CHECK-NEXT: mov z0.s, #0 // =0x0
				; CHECK-NEXT: .LBB33_1: // %vector.body
				; CHECK-NEXT: // =>This Inner Loop Header: Depth=1
				; CHECK-NEXT: add x12, x0, x8
				; CHECK-NEXT: add x13, x11, x8
				; CHECK-NEXT: add x14, x10, x8
				; CHECK-NEXT: add x15, x8, x9
				; CHECK-NEXT: mov x8, x15
				; CHECK-NEXT: st1w { z0.s }, p0, [x12]
				; CHECK-NEXT: st1w { z0.s }, p1, [x13]
				; CHECK-NEXT: whilelo p1.s, x14, x1
				; CHECK-NEXT: whilelo p0.s, x15, x1
				; CHECK-NEXT: b.mi .LBB33_1
				; CHECK-NEXT: // %bb.2: // %for.end
				; CHECK-NEXT: ret
				entry:
				%0 = tail call i64 @llvm.vscale.i64()
				%1 = shl nuw nsw i64 %0, 4
				%2 = shl nuw nsw i64 %0, 5
				br label %vector.body

				vector.body:
				%index = phi i64 [ 0, %entry ], [ %index.next, %vector.body ]
				%active.lane.mask = phi <vscale x 4 x i1> [ zeroinitializer, %entry ], [ %active.lane.mask.next, %vector.body ]
				%active.lane.mask11 = phi <vscale x 4 x i1> [ zeroinitializer, %entry ], [ %active.lane.mask.next13, %vector.body ]
				%dst1 = getelementptr inbounds i8, ptr %dst, i64 %index
				tail call void @llvm.masked.store.nxv4i32.p0(<vscale x 4 x i32> zeroinitializer, ptr %dst1, i32 1, <vscale x 4 x i1> %active.lane.mask)
				%dst2 = getelementptr inbounds i8, ptr %dst1, i64 %1
				tail call void @llvm.masked.store.nxv4i32.p0(<vscale x 4 x i32> zeroinitializer, ptr %dst2, i32 1, <vscale x 4 x i1> %active.lane.mask11)
				%index.next = add i64 %index, %2
				%new.index = add i64 %index.next, %1
				%active.lane.mask.next = tail call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i64(i64 %index.next, i64 %n)
				%active.lane.mask.next13 = tail call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i64(i64 %new.index, i64 %n)
				%bit = extractelement <vscale x 4 x i1> %active.lane.mask.next, i64 0
				br i1 %bit, label %vector.body, label %for.end

				for.end:
				ret void
				}

				declare i64 @llvm.vscale.i64()
				declare void @llvm.masked.store.nxv16i8.p0(<vscale x 16 x i8>, ptr nocapture, i32 immarg, <vscale x 16 x i1>)
				declare void @llvm.masked.store.nxv4i32.p0(<vscale x 4 x i32>, ptr nocapture, i32 immarg, <vscale x 4 x i1>)

	declare <vscale x 32 x i1> @llvm.get.active.lane.mask.nxv32i1.i32(i32, i32)			declare <vscale x 32 x i1> @llvm.get.active.lane.mask.nxv32i1.i32(i32, i32)
	declare <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i32(i32, i32)			declare <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i32(i32, i32)
	declare <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i32(i32, i32)			declare <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i32(i32, i32)
	declare <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i32(i32, i32)			declare <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i32(i32, i32)
	declare <vscale x 2 x i1> @llvm.get.active.lane.mask.nxv2i1.i32(i32, i32)			declare <vscale x 2 x i1> @llvm.get.active.lane.mask.nxv2i1.i32(i32, i32)

	declare <vscale x 32 x i1> @llvm.get.active.lane.mask.nxv32i1.i64(i64, i64)			declare <vscale x 32 x i1> @llvm.get.active.lane.mask.nxv32i1.i64(i64, i64)
	declare <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64, i64)			declare <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64, i64)
	Show All 25 Lines

llvm/test/CodeGen/AArch64/sve-loop-two-whiles1.mir

This file was added.

				# RUN: llc -mtriple=aarch64 -mattr=+sve -run-pass=peephole-opt \
				# RUN: -verify-machineinstrs -debug-only=peephole-opt %s -o - 2> %t \| FileCheck %s
				# RUN: cat %t \| FileCheck %s --check-prefix=DEBUG

				---
				# CHECK: name: foo
				# CHECK-DAG: %8:ppr = WHILELO_PXX_B killed %19, %10, implicit-def dead $nzcv
				# CHECK-NEXT: %7:ppr = WHILELO_PXX_B %18, %10, implicit-def $nzcv
				# CHECK-NEXT: ST1B %17, %4, %9, %3 :: (store unknown-size, align 1)
				# CHECK-NEXT: %21:gpr32 = CSINCWr $wzr, $wzr, 5, implicit $nzcv
				# CHECK-NEXT: ST1B %17, %5, %2, %3 :: (store unknown-size, align 1)
				# CHECK-NEXT: Bcc 4, %bb.1, implicit $nzcv

				# DEBUG: Attempting to optimize conditional branch: TBNZW killed %21:gpr32, 0, %bb.1
				# DEBUG-NEXT: Successfully optimized conditional branch!
				name: foo
				alignment: 4
				tracksRegLiveness: true
				registers:
				- { id: 0, class: gpr64, preferred-register: '' }
				- { id: 1, class: gpr64, preferred-register: '' }
				- { id: 2, class: gpr64sp, preferred-register: '' }
				- { id: 3, class: gpr64common, preferred-register: '' }
				- { id: 4, class: ppr_3b, preferred-register: '' }
				- { id: 5, class: ppr_3b, preferred-register: '' }
				- { id: 6, class: gpr64all, preferred-register: '' }
				- { id: 7, class: ppr, preferred-register: '' }
				- { id: 8, class: ppr, preferred-register: '' }
				- { id: 9, class: gpr64common, preferred-register: '' }
				- { id: 10, class: gpr64, preferred-register: '' }
				- { id: 11, class: gpr64all, preferred-register: '' }
				- { id: 12, class: ppr, preferred-register: '' }
				- { id: 13, class: gpr64, preferred-register: '' }
				- { id: 14, class: gpr64, preferred-register: '' }
				- { id: 15, class: gpr64sp, preferred-register: '' }
				- { id: 16, class: gpr64all, preferred-register: '' }
				- { id: 17, class: zpr, preferred-register: '' }
				- { id: 18, class: gpr64, preferred-register: '' }
				- { id: 19, class: gpr64, preferred-register: '' }
				- { id: 20, class: ppr, preferred-register: '' }
				- { id: 21, class: gpr32, preferred-register: '' }
				liveins:
				- { reg: '$x0', virtual-reg: '%9' }
				- { reg: '$x1', virtual-reg: '%10' }
				body: \|
				bb.0.entry:
				successors: %bb.1(0x80000000)
				liveins: $x0, $x1

				%10:gpr64 = COPY $x1
				%9:gpr64common = COPY $x0
				%13:gpr64 = RDVLI_XI 2
				%14:gpr64 = RDVLI_XI 3
				%15:gpr64sp = ADDVL_XXI %9, 1
				%12:ppr = PFALSE
				%16:gpr64all = COPY $xzr
				%11:gpr64all = COPY %16
				%2:gpr64sp = COPY %15
				%17:zpr = DUP_ZI_B 0, 0
				%20:ppr = PTRUE_B 31

				bb.1:
				successors: %bb.1(0x7c000000), %bb.2(0x04000000)

				%3:gpr64common = PHI %11, %bb.0, %6, %bb.1
				%4:ppr_3b = PHI %12, %bb.0, %7, %bb.1
				%5:ppr_3b = PHI %12, %bb.0, %8, %bb.1
				%18:gpr64 = ADDXrr %3, %13
				%6:gpr64all = COPY %18
				%19:gpr64 = ADDXrr %14, %3
				%7:ppr = WHILELO_PXX_B %18, %10, implicit-def dead $nzcv
				PTEST_PP %20, %7, implicit-def $nzcv
				ST1B %17, %4, %9, %3 :: (store unknown-size, align 1)
				%21:gpr32 = CSINCWr $wzr, $wzr, 5, implicit $nzcv
				ST1B %17, %5, %2, %3 :: (store unknown-size, align 1)
				%8:ppr = WHILELO_PXX_B killed %19, %10, implicit-def dead $nzcv
				TBNZW killed %21, 0, %bb.1
				B %bb.2

				bb.2:
				RET_ReallyLR

				...

llvm/test/CodeGen/AArch64/sve-loop-two-whiles2.mir

This file was added.

				# RUN: llc -mtriple=aarch64 -mattr=+sve -run-pass=peephole-opt \
				# RUN: -verify-machineinstrs -debug-only=peephole-opt %s -o - 2> %t \| FileCheck %s
				# RUN: cat %t \| FileCheck %s --check-prefix=DEBUG

				---
				# CHECK: name: foo
				# CHECK-DAG: %7:ppr = WHILELO_PXX_B %18, %10, implicit-def $nzcv
				# CHECK-NEXT: %21:gpr32 = CSINCWr $wzr, $wzr, 5, implicit $nzcv
				# CHECK-NEXT: $p0 = WHILELO_PXX_B killed %19, %10, implicit-def dead $nzcv
				# CHECK-NEXT: %8:ppr = COPY $p0
				# CHECK-NEXT: TBNZW killed %21, 0, %bb.1

				# DEBUG-NOT: Successfully optimized conditional branch!
				name: foo
				alignment: 4
				tracksRegLiveness: true
				registers:
				- { id: 0, class: gpr64, preferred-register: '' }
				- { id: 1, class: gpr64, preferred-register: '' }
				- { id: 2, class: gpr64sp, preferred-register: '' }
				- { id: 3, class: gpr64common, preferred-register: '' }
				- { id: 4, class: ppr_3b, preferred-register: '' }
				- { id: 5, class: ppr_3b, preferred-register: '' }
				- { id: 6, class: gpr64all, preferred-register: '' }
				- { id: 7, class: ppr, preferred-register: '' }
				- { id: 8, class: ppr, preferred-register: '' }
				- { id: 9, class: gpr64common, preferred-register: '' }
				- { id: 10, class: gpr64, preferred-register: '' }
				- { id: 11, class: gpr64all, preferred-register: '' }
				- { id: 12, class: ppr, preferred-register: '' }
				- { id: 13, class: gpr64, preferred-register: '' }
				- { id: 14, class: gpr64, preferred-register: '' }
				- { id: 15, class: gpr64sp, preferred-register: '' }
				- { id: 16, class: gpr64all, preferred-register: '' }
				- { id: 17, class: zpr, preferred-register: '' }
				- { id: 18, class: gpr64, preferred-register: '' }
				- { id: 19, class: gpr64, preferred-register: '' }
				- { id: 20, class: ppr, preferred-register: '' }
				- { id: 21, class: gpr32, preferred-register: '' }
				liveins:
				- { reg: '$x0', virtual-reg: '%9' }
				- { reg: '$x1', virtual-reg: '%10' }
				body: \|
				bb.0.entry:
				successors: %bb.1(0x80000000)
				liveins: $x0, $x1

				%10:gpr64 = COPY $x1
				%9:gpr64common = COPY $x0
				%13:gpr64 = RDVLI_XI 2
				%14:gpr64 = RDVLI_XI 3
				%15:gpr64sp = ADDVL_XXI %9, 1
				%12:ppr = PFALSE
				%16:gpr64all = COPY $xzr
				%11:gpr64all = COPY %16
				%2:gpr64sp = COPY %15
				%17:zpr = DUP_ZI_B 0, 0
				%20:ppr = PTRUE_B 31

				bb.1:
				successors: %bb.1(0x7c000000), %bb.2(0x04000000)

				%3:gpr64common = PHI %11, %bb.0, %6, %bb.1
				%4:ppr_3b = PHI %12, %bb.0, %7, %bb.1
				%5:ppr_3b = PHI %12, %bb.0, %8, %bb.1
				ST1B %17, %4, %9, %3 :: (store unknown-size, align 1)
				ST1B %17, %5, %2, %3 :: (store unknown-size, align 1)
				%18:gpr64 = ADDXrr %3, %13
				%6:gpr64all = COPY %18
				%19:gpr64 = ADDXrr %14, %3
				%7:ppr = WHILELO_PXX_B %18, %10, implicit-def dead $nzcv
				PTEST_PP %20, %7, implicit-def $nzcv
				%21:gpr32 = CSINCWr $wzr, $wzr, 5, implicit $nzcv
				$p0 = WHILELO_PXX_B killed %19, %10, implicit-def dead $nzcv
				%8:ppr = COPY $p0
				TBNZW killed %21, 0, %bb.1
				B %bb.2

				bb.2:
				RET_ReallyLR

				...

llvm/test/CodeGen/AArch64/sve-loop-two-whiles3.mir

This file was added.

				# RUN: llc -mtriple=aarch64 -mattr=+sve -run-pass=peephole-opt \
				# RUN: -verify-machineinstrs -debug-only=peephole-opt %s -o - 2> %t \| FileCheck %s
				# RUN: cat %t \| FileCheck %s --check-prefix=DEBUG

				---
				# CHECK: name: foo
				# CHECK-DAG: %8:ppr = WHILELO_PXX_B killed %19, %10, implicit-def dead $nzcv
				# CHECK-NEXT: %7:ppr = WHILELO_PXX_B %18, %10, implicit-def $nzcv
				# CHECK-NEXT: %21:gpr32 = CSINCWr $wzr, $wzr, 5, implicit $nzcv
				# CHECK-NEXT: Bcc 4, %bb.1, implicit $nzcv

				# DEBUG: Attempting to optimize conditional branch: TBNZW killed %21:gpr32, 0, %bb.1
				# DEBUG-NEXT: Successfully optimized conditional branch!
				name: foo
				alignment: 4
				tracksRegLiveness: true
				registers:
				- { id: 0, class: gpr64, preferred-register: '' }
				- { id: 1, class: gpr64, preferred-register: '' }
				- { id: 2, class: gpr64sp, preferred-register: '' }
				- { id: 3, class: gpr64common, preferred-register: '' }
				- { id: 4, class: ppr_3b, preferred-register: '' }
				- { id: 5, class: ppr_3b, preferred-register: '' }
				- { id: 6, class: gpr64all, preferred-register: '' }
				- { id: 7, class: ppr, preferred-register: '' }
				- { id: 8, class: ppr, preferred-register: '' }
				- { id: 9, class: gpr64common, preferred-register: '' }
				- { id: 10, class: gpr64, preferred-register: '' }
				- { id: 11, class: gpr64all, preferred-register: '' }
				- { id: 12, class: ppr, preferred-register: '' }
				- { id: 13, class: gpr64, preferred-register: '' }
				- { id: 14, class: gpr64, preferred-register: '' }
				- { id: 15, class: gpr64sp, preferred-register: '' }
				- { id: 16, class: gpr64all, preferred-register: '' }
				- { id: 17, class: zpr, preferred-register: '' }
				- { id: 18, class: gpr64, preferred-register: '' }
				- { id: 19, class: gpr64, preferred-register: '' }
				- { id: 20, class: ppr, preferred-register: '' }
				- { id: 21, class: gpr32, preferred-register: '' }
				liveins:
				- { reg: '$x0', virtual-reg: '%9' }
				- { reg: '$x1', virtual-reg: '%10' }
				body: \|
				bb.0.entry:
				successors: %bb.1(0x80000000)
				liveins: $x0, $x1

				%10:gpr64 = COPY $x1
				%9:gpr64common = COPY $x0
				%13:gpr64 = RDVLI_XI 2
				%14:gpr64 = RDVLI_XI 3
				%15:gpr64sp = ADDVL_XXI %9, 1
				%12:ppr = PFALSE
				%16:gpr64all = COPY $xzr
				%11:gpr64all = COPY %16
				%2:gpr64sp = COPY %15
				%17:zpr = DUP_ZI_B 0, 0
				%20:ppr = PTRUE_B 31

				bb.1:
				successors: %bb.1(0x7c000000), %bb.2(0x04000000)

				%3:gpr64common = PHI %11, %bb.0, %6, %bb.1
				%4:ppr_3b = PHI %12, %bb.0, %7, %bb.1
				%5:ppr_3b = PHI %12, %bb.0, %8, %bb.1
				ST1B %17, %4, %9, %3 :: (store unknown-size, align 1)
				ST1B %17, %5, %2, %3 :: (store unknown-size, align 1)
				%18:gpr64 = ADDXrr %3, %13
				%6:gpr64all = COPY %18
				%19:gpr64 = ADDXrr %14, %3
				%7:ppr = WHILELO_PXX_B %18, %10, implicit-def dead $nzcv
				PTEST_PP %20, %7, implicit-def $nzcv
				%21:gpr32 = CSINCWr $wzr, $wzr, 5, implicit $nzcv
				%8:ppr = WHILELO_PXX_B killed %19, %10, implicit-def dead $nzcv
				TBNZW killed %21, 0, %bb.1
				B %bb.2

				bb.2:
				RET_ReallyLR

				...

llvm/test/CodeGen/AArch64/sve-loop-two-whiles4.mir

This file was added.

				# RUN: llc -mtriple=aarch64 -mattr=+sve -run-pass=peephole-opt \
				# RUN: -verify-machineinstrs -debug-only=peephole-opt %s -o - 2> %t \| FileCheck %s
				# RUN: cat %t \| FileCheck %s --check-prefix=DEBUG

				---
				# CHECK: name: foo
				# CHECK-DAG: %7:ppr = WHILELO_PXX_B %18, %10, implicit-def $nzcv
				# CHECK-NEXT: %21:gpr32 = CSINCWr $wzr, $wzr, 5, implicit $nzcv
				# CHECK-NEXT: %8:ppr = WHILELO_PXX_B %19, %10, implicit-def dead $nzcv
				# CHECK-NEXT: %23:ppr = WHILELO_PXX_B killed %18, %19, implicit-def dead $nzcv
				# CHECK-NEXT: TBNZW killed %21, 0, %bb.1

				# DEBUG-NOT: Successfully optimized conditional branch!
				name: foo
				alignment: 4
				tracksRegLiveness: true
				registers:
				- { id: 0, class: gpr64, preferred-register: '' }
				- { id: 1, class: gpr64, preferred-register: '' }
				- { id: 2, class: gpr64sp, preferred-register: '' }
				- { id: 3, class: gpr64common, preferred-register: '' }
				- { id: 4, class: ppr_3b, preferred-register: '' }
				- { id: 5, class: ppr_3b, preferred-register: '' }
				- { id: 6, class: gpr64all, preferred-register: '' }
				- { id: 7, class: ppr, preferred-register: '' }
				- { id: 8, class: ppr, preferred-register: '' }
				- { id: 9, class: gpr64common, preferred-register: '' }
				- { id: 10, class: gpr64, preferred-register: '' }
				- { id: 11, class: gpr64all, preferred-register: '' }
				- { id: 12, class: ppr, preferred-register: '' }
				- { id: 13, class: gpr64, preferred-register: '' }
				- { id: 14, class: gpr64, preferred-register: '' }
				- { id: 15, class: gpr64sp, preferred-register: '' }
				- { id: 16, class: gpr64all, preferred-register: '' }
				- { id: 17, class: zpr, preferred-register: '' }
				- { id: 18, class: gpr64, preferred-register: '' }
				- { id: 19, class: gpr64, preferred-register: '' }
				- { id: 20, class: ppr, preferred-register: '' }
				- { id: 21, class: gpr32, preferred-register: '' }
				- { id: 22, class: ppr_3b, preferred-register: '' }
				- { id: 23, class: ppr, preferred-register: '' }
				liveins:
				- { reg: '$x0', virtual-reg: '%9' }
				- { reg: '$x1', virtual-reg: '%10' }
				body: \|
				bb.0.entry:
				successors: %bb.1(0x80000000)
				liveins: $x0, $x1

				%10:gpr64 = COPY $x1
				%9:gpr64common = COPY $x0
				%13:gpr64 = RDVLI_XI 2
				%14:gpr64 = RDVLI_XI 3
				%15:gpr64sp = ADDVL_XXI %9, 1
				%12:ppr = PFALSE
				%16:gpr64all = COPY $xzr
				%11:gpr64all = COPY %16
				%2:gpr64sp = COPY %15
				%17:zpr = DUP_ZI_B 0, 0
				%20:ppr = PTRUE_B 31

				bb.1:
				successors: %bb.1(0x7c000000), %bb.2(0x04000000)

				%3:gpr64common = PHI %11, %bb.0, %6, %bb.1
				%4:ppr_3b = PHI %12, %bb.0, %7, %bb.1
				%5:ppr_3b = PHI %12, %bb.0, %8, %bb.1
				%22:ppr_3b = PHI %12, %bb.0, %23, %bb.1
				ST1B %17, %4, %9, %3 :: (store unknown-size, align 1)
				ST1B %17, %5, %2, %3 :: (store unknown-size, align 1)
				ST1B %17, %22, %2, %3 :: (store unknown-size, align 1)
				%18:gpr64 = ADDXrr %3, %13
				%6:gpr64all = COPY %18
				%19:gpr64 = ADDXrr %14, %3
				%7:ppr = WHILELO_PXX_B %18, %10, implicit-def dead $nzcv
				PTEST_PP %20, %7, implicit-def $nzcv
				%21:gpr32 = CSINCWr $wzr, $wzr, 5, implicit $nzcv
				%8:ppr = WHILELO_PXX_B %19, %10, implicit-def dead $nzcv
				%23:ppr = WHILELO_PXX_B killed %18, %19, implicit-def dead $nzcv
				TBNZW killed %21, 0, %bb.1
				B %bb.2

				bb.2:
				RET_ReallyLR

				...

llvm/test/CodeGen/AArch64/sve-loop-two-whiles5.mir

This file was added.

				# RUN: llc -mtriple=aarch64 -mattr=+sve -run-pass=peephole-opt \
				# RUN: -verify-machineinstrs -debug-only=peephole-opt %s -o - 2> %t \| FileCheck %s
				# RUN: cat %t \| FileCheck %s --check-prefix=DEBUG

				---
				# CHECK: name: foo
				# CHECK-DAG: %7:ppr = WHILELO_PXX_B %18, %10, implicit-def $nzcv
				# CHECK-NEXT: %19:gpr64 = ADDXrr %14, %3
				# CHECK-NEXT: %21:gpr32 = CSINCWr $wzr, $wzr, 5, implicit $nzcv
				# CHECK-NEXT: %8:ppr = WHILELO_PXX_B killed %19, %10, implicit-def dead $nzcv
				# CHECK-NEXT: TBNZW killed %21, 0, %bb.1

				# DEBUG-NOT: Successfully optimized conditional branch!
				name: foo
				alignment: 4
				tracksRegLiveness: true
				registers:
				- { id: 0, class: gpr64, preferred-register: '' }
				- { id: 1, class: gpr64, preferred-register: '' }
				- { id: 2, class: gpr64sp, preferred-register: '' }
				- { id: 3, class: gpr64common, preferred-register: '' }
				- { id: 4, class: ppr_3b, preferred-register: '' }
				- { id: 5, class: ppr_3b, preferred-register: '' }
				- { id: 6, class: gpr64all, preferred-register: '' }
				- { id: 7, class: ppr, preferred-register: '' }
				- { id: 8, class: ppr, preferred-register: '' }
				- { id: 9, class: gpr64common, preferred-register: '' }
				- { id: 10, class: gpr64, preferred-register: '' }
				- { id: 11, class: gpr64all, preferred-register: '' }
				- { id: 12, class: ppr, preferred-register: '' }
				- { id: 13, class: gpr64, preferred-register: '' }
				- { id: 14, class: gpr64, preferred-register: '' }
				- { id: 15, class: gpr64sp, preferred-register: '' }
				- { id: 16, class: gpr64all, preferred-register: '' }
				- { id: 17, class: zpr, preferred-register: '' }
				- { id: 18, class: gpr64, preferred-register: '' }
				- { id: 19, class: gpr64, preferred-register: '' }
				- { id: 20, class: ppr, preferred-register: '' }
				- { id: 21, class: gpr32, preferred-register: '' }
				liveins:
				- { reg: '$x0', virtual-reg: '%9' }
				- { reg: '$x1', virtual-reg: '%10' }
				body: \|
				bb.0.entry:
				successors: %bb.1(0x80000000)
				liveins: $x0, $x1

				%10:gpr64 = COPY $x1
				%9:gpr64common = COPY $x0
				%13:gpr64 = RDVLI_XI 2
				%14:gpr64 = RDVLI_XI 3
				%15:gpr64sp = ADDVL_XXI %9, 1
				%12:ppr = PFALSE
				%16:gpr64all = COPY $xzr
				%11:gpr64all = COPY %16
				%2:gpr64sp = COPY %15
				%17:zpr = DUP_ZI_B 0, 0
				%20:ppr = PTRUE_B 31

				bb.1:
				successors: %bb.1(0x7c000000), %bb.2(0x04000000)

				%3:gpr64common = PHI %11, %bb.0, %6, %bb.1
				%4:ppr_3b = PHI %12, %bb.0, %7, %bb.1
				%5:ppr_3b = PHI %12, %bb.0, %8, %bb.1
				ST1B %17, %4, %9, %3 :: (store unknown-size, align 1)
				ST1B %17, %5, %2, %3 :: (store unknown-size, align 1)
				%18:gpr64 = ADDXrr %3, %13
				%6:gpr64all = COPY %18
				%7:ppr = WHILELO_PXX_B %18, %10, implicit-def dead $nzcv
				PTEST_PP %20, %7, implicit-def $nzcv
				%19:gpr64 = ADDXrr %14, %3
				%21:gpr32 = CSINCWr $wzr, $wzr, 5, implicit $nzcv
				%8:ppr = WHILELO_PXX_B killed %19, %10, implicit-def dead $nzcv
				TBNZW killed %21, 0, %bb.1
				B %bb.2

				bb.2:
				RET_ReallyLR

				...