This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/ARM/
-
Target/
-
ARM/
15/17
MVEVPTOptimisationsPass.cpp
-
test/CodeGen/Thumb2/
-
CodeGen/
-
Thumb2/
-
mve-pred-not.ll
-
mve-vpt-blocks.ll
-
mve-vpt-optimisations.mir

Differential D76847

[Target][ARM] Replace re-uses of old VPR values with VPNOTs
ClosedPublic

Authored by Pierre-vh on Mar 26 2020, 7:21 AM.

Download Raw Diff

Details

Reviewers

dmgreen
SjoerdMeijer
samparker
simon_tatham
olista01

Commits

rGbf2183374a67: [Target][ARM] Replace re-uses of old VPR values with VPNOTs

Summary

This patch is adds another optimisation to the MVE VPT Optimisations pass introduced in the previous patch.
This optimisation replaces usages of old VPR (VCCR) values (within predicated instruction blocks) with VPNOTs in order to avoid spill/reloads.
Those VPNOTs can then be removed by the MVE VPT Block Insertion pass, resulting in clean/compact VPT blocks such as TEET, TETE, etc. instead of lots of small blocks with spill/reloads in-between.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

Pierre-vh created this revision.Mar 26 2020, 7:21 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 26 2020, 7:21 AM

Herald added subscribers: llvm-commits, hiraditya, kristof.beyls. · View Herald Transcript

Pierre-vh added a parent revision: D76709: [Target][ARM] Adding MVE VPT Optimisation Pass.Mar 26 2020, 7:21 AM

Harbormaster failed remote builds in B50537: Diff 252838!Mar 26 2020, 8:06 AM

I'm surprised not to see more test changes, does this really have no effect on any existing tests?

In D76847#1945904, @samparker wrote:

I'm surprised not to see more test changes, does this really have no effect on any existing tests?

I don't think so, I ran check-llvm-codegen and it was green (with the changes in this patch and the parent patch).

Ok. Well, how about a couple more tests with differently ordered instructions too then? Like inserting the unpredicated VORR inbetween the predicated ones? And maybe some larger tests that would generate blocks with multiple instructions on a predicate before performing the inversion?

In D76847#1945904, @samparker wrote:

I'm surprised not to see more test changes, does this really have no effect on any existing tests?

This kind of code can come up from intrinsics a lot, but not necessarily from the relatively simple codegen tests we have. Someone can write intrinsics knowing that VPNOT's should be folded into a single VPT block, only for llvm to come along and nicely optimize them all away, producing IR that we end up with worse codegen for. This should help us get back to better assembly.

More testing does sound good.

llvm/lib/Target/ARM/MVEVPTOptimisationsPass.cpp
160–161	addUnpredicatedMveVpredNOp
437	Should this not be replacing _all_ uses of the old condition with the value from the VPNOT? As in, if we see input code like: a = VCMP .. use(a) b = VPNOT a .. use(b) .. use(a) Could it not be generally beneficial to insert a VPNOT between use of b and the second use of a? We could also probably have any amount of code between the predicated uses and it might still be beneficial, considering the costs of spills/reloads vs a single VPNOT.

Pierre-vh marked 3 inline comments as done.Mar 29 2020, 11:51 PM

Pierre-vh added inline comments.

llvm/lib/Target/ARM/MVEVPTOptimisationsPass.cpp
437	I believe it's already doing that. I'll add more tests to show it. I can certainly do the optimisation even outside blocks of predicated instruction, I just need to remove these lines and add a few tests: // Stop as soon as we leave the block of predicated instructions if (getVPTInstrPredicate(*Iter) == ARMVCC::None) break;

Fixed issues found during review (comments marked as done)
Improved the pass:
- It can now move VPNOTs before their first user when needed. This currently only happens when another VCCR value is used between the VPNOT and its first user.
- Instead of inserting VPNOTs before VPNOTs, it instead replaces the existing VPNOT with a COPY.
Added more tests, and they now use virtual registers instead.

Pierre-vh added a child revision: D77201: [CodeGen][SelectionDAG] Flip Booleans More Often.Apr 1 2020, 2:13 AM

Fixed bugs related to isKill flags in multiple places. I now correctly change the isKill flags when needed, and I added tests for that.
Rebased the patch - Added the mve-vpt-blocks.ll test.

Pierre-vh added a parent revision: D77798: [Target][ARM] Fix VPT Block Pass miscompilation.Apr 14 2020, 3:25 AM

Pierre-vh removed a parent revision: D76709: [Target][ARM] Adding MVE VPT Optimisation Pass.

Rebasing the patch because I inserted D77798 between this and D76709
- D77798 also added a new test in mve-pred-not and this pass improves it as well.

Pierre-vh added a child revision: D77712: [Target][ARM] Add PerformVSELECTCombine for MVE Integer Ops.Apr 14 2020, 3:47 AM

Pierre-vh removed a child revision: D77201: [CodeGen][SelectionDAG] Flip Booleans More Often.

Sorry for the delay.

llvm/lib/Target/ARM/MVEVPTOptimisationsPass.cpp
175	if at there -> if there
240	Does this only find the first pair in a basic block? What would happen if there are multiple (potential) vpt blocks in a larger basic block, and several of them can be optimized? Can this be done with a linear scan through the block that attempts to find and hold VPNOT's and the original value, converting any uses of the original to the VPNOT. Or does that not work very well for some reason? Maybe using something like MRI->use_instructions would also help.
256	Can you explain the advantage of moving the VPNOT?

Herald added a subscriber: danielkiss. · View Herald TranscriptApr 16 2020, 6:41 AM

Pierre-vh marked 3 inline comments as done.Apr 16 2020, 7:11 AM

Pierre-vh added inline comments.

llvm/lib/Target/ARM/MVEVPTOptimisationsPass.cpp
240	What would happen if there are multiple (potential) vpt blocks in a larger basic block, and several of them can be optimized? Unfortunately it'd only optimize the first one as it'd only pick up the first pair of VCMP/VPNOT and not the second. I can try to rewrite the function so it handles those cases, I'll do that tomorrow. Can this be done with a linear scan through the block that attempts to find and hold VPNOT's and the original value, converting any uses of the original to the VPNOT. Or does that not work very well for some reason? I don't think it'd work well, but I can certainly try it and see if it works. I'll also look at `use_instructions`, but I'm not sure it'll help.
256	Sometimes, you can have code like this (taken from the bottom test): %2:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 10, 0, $noreg %3:vccr = MVE_VPNOT %2:vccr, 0, $noreg %4:mqpr = MVE_VORR %0:mqpr, %1:mqpr, 1, %2:vccr, undef %4:mqpr %5:mqpr = MVE_VORR %1:mqpr, %0:mqpr, 1, %2:vccr, undef %5:mqpr %6:mqpr = MVE_VORR %4:mqpr, %5:mqpr, 1, %3:vccr, undef %6:mqpr `%3` is not used directly: the original VCCR value, `%2` is used before it, so their lifetimes overlap. If I didn't move the VPNOT further down in such situations, the pass would insert a double VPNOT, like this: %2:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 10, 0, $noreg %3:vccr = MVE_VPNOT %2:vccr, 0, $noreg %foo:vccr = MVE_VPNOT %3:vccr %4:mqpr = MVE_VORR %0:mqpr, %1:mqpr, 1, %foo:vccr, undef %4:mqpr %5:mqpr = MVE_VORR %1:mqpr, %0:mqpr, 1, %foo:vccr, undef %5:mqpr %bar:vccr = MVE_VPNOT %foo:vccr %6:mqpr = MVE_VORR %4:mqpr, %5:mqpr, 1, %bar:vccr, undef %6:mqpr But, since I now move the VPNOT further down, we get this instead, which is of course much better. // VPNOT moved down, no more overlapping VCCR lifetimes, no double VPNOTs. %2:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 10, 0, $noreg %4:mqpr = MVE_VORR %0:mqpr, %1:mqpr, 1, %2:vccr, undef %4:mqpr %5:mqpr = MVE_VORR %1:mqpr, %0:mqpr, 1, %2:vccr, undef %5:mqpr %3:vccr = MVE_VPNOT %2:vccr, 0, $noreg %6:mqpr = MVE_VORR %4:mqpr, %5:mqpr, 1, %3:vccr, undef %6:mqpr That transformation is of course only done if a use of the orginal VCCR value is found before the first use of the VPNOT's result.

Added support for optimising multiple VPT blocks in the same Basic Block, and added a test to show it
Now VCCRValue is only set for VCMPs. (It doesn't really make a difference, but makes it clear that only VCMPs are supported by this optimisation)

Pierre-vh marked 2 inline comments as done.Apr 17 2020, 4:42 AM

Pierre-vh edited child revisions, added: D78201: [Target][ARM] Replace outdated getARMVPTBlockMask function; removed: D77712: [Target][ARM] Add PerformVSELECTCombine for MVE Integer Ops.May 5 2020, 1:12 AM

dmgreen added inline comments.May 11 2020, 1:24 AM

llvm/lib/Target/ARM/MVEVPTOptimisationsPass.cpp
250	We not do this for predicated VCMP's?
257	findRegisterUseOperandIdx != -1
269	What makes us not do this for other opcodes? (I guess vcmp is the vast majority of cases?)
286	The second loop -> This second loop, just to be a little clearer.
302	Do we need to add the copy? We could just use LastVPNOTResult directly? Or would that be more complex?
305	Set Modified = true here too I think.

Minor refactoring of the patch
The pass is no longer limited to VCMPs for VCCRValue, it can now use any instruction that writes to VPR (e.g. VMSR)
The pass no longer replaces VPNOT with copies - it just removes the VPNOT and replaces all of its uses.
Other minor fixes

llvm/lib/Target/ARM/MVEVPTOptimisationsPass.cpp
250	Currently, no, to keep things simple. I think that allowing predicated vcmps would require a bit more analysis to ensure that we don't make things worse by accident, and I'm not even sure it'd be worth it (are there any cases where it could be beneficial?) %0:vccr = vpt ... %1:vccr = vcmp /* predicated on %0 / %2:vccr = vcmp / unpredicated, opposite of %1 */ If we replace the second vcmp with `vpnot %1:vccr`, what happens if the first vcmp isn't executed?
269	It's just that vcmp is the vast majority of cases. There's no issue with enabling it for other instructions that write to VPR. I've removed this restriction and added a test with vmsr.

LGTM. Thanks.

llvm/lib/Target/ARM/MVEVPTOptimisationsPass.cpp
250	It's not about "not executing", a predicated vcmp acts as a "and %0, <newcond>". Overlapping ranges of different predicates certainly does sound more complex though. Lets leave that for other patches if we find cases where it's needed.

This revision is now accepted and ready to land.May 12 2020, 2:16 AM

Closed by commit rGbf2183374a67: [Target][ARM] Replace re-uses of old VPR values with VPNOTs (authored by Pierre-vh). · Explain WhyMay 12 2020, 4:48 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

Target/

ARM/

MVEVPTOptimisationsPass.cpp

240 lines

test/

CodeGen/

Thumb2/

mve-pred-not.ll

14 lines

mve-vpt-blocks.ll

82 lines

mve-vpt-optimisations.mir

418 lines

Diff 263399

llvm/lib/Target/ARM/MVEVPTOptimisationsPass.cpp

//===-- MVEVPTOptimisationsPass.cpp ---------------------------------------===//		//===-- MVEVPTOptimisationsPass.cpp ---------------------------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
/// \file This pass does a few optimisations related to MVE VPT blocks before		/// \file This pass does a few optimisations related to MVE VPT blocks before
/// register allocation is performed. The goal is to maximize the sizes of the		/// register allocation is performed. The goal is to maximize the sizes of the
/// blocks that will be created by the MVE VPT Block Insertion pass (which runs		/// blocks that will be created by the MVE VPT Block Insertion pass (which runs
/// after register allocation). Currently, this pass replaces VCMPs with VPNOTs		/// after register allocation). The first optimisation done by this pass is the
/// when possible, so the Block Insertion pass can delete them later to create		/// replacement of "opposite" VCMPs with VPNOTs, so the Block Insertion pass
/// larger VPT blocks.		/// can delete them later to create larger VPT blocks.
		/// The second optimisation replaces re-uses of old VCCR values with VPNOTs when
		/// inside a block of predicated instructions. This is done to avoid
		/// spill/reloads of VPR in the middle of a block, which prevents the Block
		/// Insertion pass from creating large blocks.
		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "ARM.h"		#include "ARM.h"
#include "ARMSubtarget.h"		#include "ARMSubtarget.h"
#include "MCTargetDesc/ARMBaseInfo.h"		#include "MCTargetDesc/ARMBaseInfo.h"
#include "Thumb2InstrInfo.h"		#include "Thumb2InstrInfo.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/CodeGen/MachineBasicBlock.h"		#include "llvm/CodeGen/MachineBasicBlock.h"
Show All 18 Lines	public:

bool runOnMachineFunction(MachineFunction &Fn) override;		bool runOnMachineFunction(MachineFunction &Fn) override;

StringRef getPassName() const override {		StringRef getPassName() const override {
return "ARM MVE VPT Optimisation Pass";		return "ARM MVE VPT Optimisation Pass";
}		}

private:		private:
		MachineInstr &ReplaceRegisterUseWithVPNOT(MachineBasicBlock &MBB,
		MachineInstr &Instr,
		MachineOperand &User,
		Register Target);
		bool ReduceOldVCCRValueUses(MachineBasicBlock &MBB);
bool ReplaceVCMPsByVPNOTs(MachineBasicBlock &MBB);		bool ReplaceVCMPsByVPNOTs(MachineBasicBlock &MBB);
};		};

char MVEVPTOptimisations::ID = 0;		char MVEVPTOptimisations::ID = 0;

} // end anonymous namespace		} // end anonymous namespace

INITIALIZE_PASS(MVEVPTOptimisations, DEBUG_TYPE,		INITIALIZE_PASS(MVEVPTOptimisations, DEBUG_TYPE,
▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines	static bool IsWritingToVCCR(MachineInstr &Instr) {
Register DstReg = Dst.getReg();		Register DstReg = Dst.getReg();
if (!DstReg.isVirtual())		if (!DstReg.isVirtual())
return false;		return false;
MachineRegisterInfo &RegInfo = Instr.getMF()->getRegInfo();		MachineRegisterInfo &RegInfo = Instr.getMF()->getRegInfo();
const TargetRegisterClass *RegClass = RegInfo.getRegClassOrNull(DstReg);		const TargetRegisterClass *RegClass = RegInfo.getRegClassOrNull(DstReg);
return RegClass && (RegClass->getID() == ARM::VCCRRegClassID);		return RegClass && (RegClass->getID() == ARM::VCCRRegClassID);
}		}

		// Transforms
		// <Instr that uses %A ('User' Operand)>
		// Into
		// %K = VPNOT %Target
		// <Instr that uses %K ('User' Operand)>
		// And returns the newly inserted VPNOT.
		// This optimization is done in the hopes of preventing spills/reloads of VPR by
		// reducing the number of VCCR values with overlapping lifetimes.
		MachineInstr &MVEVPTOptimisations::ReplaceRegisterUseWithVPNOT(
		MachineBasicBlock &MBB, MachineInstr &Instr, MachineOperand &User,
		Register Target) {
		Register NewResult = MRI->createVirtualRegister(MRI->getRegClass(Target));

		MachineInstrBuilder MIBuilder =
		BuildMI(MBB, &Instr, Instr.getDebugLoc(), TII->get(ARM::MVE_VPNOT))
		.addDef(NewResult)
		.addReg(Target);
		addUnpredicatedMveVpredNOp(MIBuilder);
		dmgreenUnsubmitted Done Reply Inline Actions addUnpredicatedMveVpredNOp dmgreen: addUnpredicatedMveVpredNOp

		// Make the user use NewResult instead, and clear its kill flag.
		User.setReg(NewResult);
		User.setIsKill(false);

		LLVM_DEBUG(dbgs() << " Inserting VPNOT (for spill prevention): ";
		MIBuilder.getInstr()->dump());

		return *MIBuilder.getInstr();
		}

		// Moves a VPNOT before its first user if an instruction that uses Reg is found
		// in-between the VPNOT and its user.
		// Returns true if there is at least one user of the VPNOT in the block.
		dmgreenUnsubmitted Done Reply Inline Actions if at there -> if there dmgreen: if at there -> if there
		static bool MoveVPNOTBeforeFirstUser(MachineBasicBlock &MBB,
		MachineBasicBlock::iterator Iter,
		Register Reg) {
		assert(Iter->getOpcode() == ARM::MVE_VPNOT && "Not a VPNOT!");
		assert(getVPTInstrPredicate(*Iter) == ARMVCC::None &&
		"The VPNOT cannot be predicated");

		MachineInstr &VPNOT = *Iter;
		Register VPNOTResult = VPNOT.getOperand(0).getReg();
		Register VPNOTOperand = VPNOT.getOperand(1).getReg();

		// Whether the VPNOT will need to be moved, and whether we found a user of the
		// VPNOT.
		bool MustMove = false, HasUser = false;
		MachineOperand *VPNOTOperandKiller = nullptr;
		for (; Iter != MBB.end(); ++Iter) {
		if (MachineOperand *MO =
		Iter->findRegisterUseOperand(VPNOTOperand, /isKill/ true)) {
		// If we find the operand that kills the VPNOTOperand's result, save it.
		VPNOTOperandKiller = MO;
		}

		if (Iter->findRegisterUseOperandIdx(Reg) != -1) {
		MustMove = true;
		continue;
		}

		if (Iter->findRegisterUseOperandIdx(VPNOTResult) == -1)
		continue;

		HasUser = true;
		if (!MustMove)
		break;

		// Move the VPNOT right before Iter
		LLVM_DEBUG(dbgs() << "Moving: "; VPNOT.dump(); dbgs() << " Before: ";
		Iter->dump());
		MBB.splice(Iter, &MBB, VPNOT.getIterator());
		// If we move the instr, and its operand was killed earlier, remove the kill
		// flag.
		if (VPNOTOperandKiller)
		VPNOTOperandKiller->setIsKill(false);

		break;
		}
		return HasUser;
		}

		// This optimisation attempts to reduce the number of overlapping lifetimes of
		// VCCR values by replacing uses of old VCCR values with VPNOTs. For example,
		// this replaces
		// %A:vccr = (something)
		// %B:vccr = VPNOT %A
		// %Foo = (some op that uses %B)
		// %Bar = (some op that uses %A)
		// With
		// %A:vccr = (something)
		// %B:vccr = VPNOT %A
		// %Foo = (some op that uses %B)
		// %TMP2:vccr = VPNOT %B
		// %Bar = (some op that uses %A)
		bool MVEVPTOptimisations::ReduceOldVCCRValueUses(MachineBasicBlock &MBB) {
		MachineBasicBlock::iterator Iter = MBB.begin(), End = MBB.end();
		SmallVector<MachineInstr *, 4> DeadInstructions;
		bool Modified = false;
		dmgreenUnsubmitted Done Reply Inline Actions Does this only find the first pair in a basic block? What would happen if there are multiple (potential) vpt blocks in a larger basic block, and several of them can be optimized? Can this be done with a linear scan through the block that attempts to find and hold VPNOT's and the original value, converting any uses of the original to the VPNOT. Or does that not work very well for some reason? Maybe using something like MRI->use_instructions would also help. dmgreen: Does this only find the first pair in a basic block? What would happen if there are multiple…
		Pierre-vhAuthorUnsubmitted Done Reply Inline Actions What would happen if there are multiple (potential) vpt blocks in a larger basic block, and several of them can be optimized? Unfortunately it'd only optimize the first one as it'd only pick up the first pair of VCMP/VPNOT and not the second. I can try to rewrite the function so it handles those cases, I'll do that tomorrow. Can this be done with a linear scan through the block that attempts to find and hold VPNOT's and the original value, converting any uses of the original to the VPNOT. Or does that not work very well for some reason? I don't think it'd work well, but I can certainly try it and see if it works. I'll also look at `use_instructions`, but I'm not sure it'll help. Pierre-vh: > What would happen if there are multiple (potential) vpt blocks in a larger basic block, and…

		while (Iter != End) {
		Register VCCRValue, OppositeVCCRValue;
		// The first loop looks for 2 unpredicated instructions:
		// %A:vccr = (instr) ; A is stored in VCCRValue
		// %B:vccr = VPNOT %A ; B is stored in OppositeVCCRValue
		for (; Iter != End; ++Iter) {
		// We're only interested in unpredicated instructions that write to VCCR.
		if (!IsWritingToVCCR(*Iter) \|\|
		getVPTInstrPredicate(*Iter) != ARMVCC::None)
		dmgreenUnsubmitted Not Done Reply Inline Actions We not do this for predicated VCMP's? dmgreen: We not do this for predicated VCMP's?
		Pierre-vhAuthorUnsubmitted Done Reply Inline Actions Currently, no, to keep things simple. I think that allowing predicated vcmps would require a bit more analysis to ensure that we don't make things worse by accident, and I'm not even sure it'd be worth it (are there any cases where it could be beneficial?) %0:vccr = vpt ... %1:vccr = vcmp /* predicated on %0 / %2:vccr = vcmp / unpredicated, opposite of %1 / If we replace the second vcmp with `vpnot %1:vccr`, what happens if the first vcmp isn't executed? Pierre-vh:* Currently, no, to keep things simple. I think that allowing predicated vcmps would require a…
		dmgreenUnsubmitted Not Done Reply Inline Actions It's not about "not executing", a predicated vcmp acts as a "and %0, <newcond>". Overlapping ranges of different predicates certainly does sound more complex though. Lets leave that for other patches if we find cases where it's needed. dmgreen: It's not about "not executing", a predicated vcmp acts as a "and %0, <newcond>". Overlapping…
		continue;
		Register Dst = Iter->getOperand(0).getReg();

		// If we already have a VCCRValue, and this is a VPNOT on VCCRValue, we've
		// found what we were looking for.
		if (VCCRValue && Iter->getOpcode() == ARM::MVE_VPNOT &&
		dmgreenUnsubmitted Done Reply Inline Actions Can you explain the advantage of moving the VPNOT? dmgreen: Can you explain the advantage of moving the VPNOT?
		Pierre-vhAuthorUnsubmitted Done Reply Inline Actions Sometimes, you can have code like this (taken from the bottom test): %2:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 10, 0, $noreg %3:vccr = MVE_VPNOT %2:vccr, 0, $noreg %4:mqpr = MVE_VORR %0:mqpr, %1:mqpr, 1, %2:vccr, undef %4:mqpr %5:mqpr = MVE_VORR %1:mqpr, %0:mqpr, 1, %2:vccr, undef %5:mqpr %6:mqpr = MVE_VORR %4:mqpr, %5:mqpr, 1, %3:vccr, undef %6:mqpr `%3` is not used directly: the original VCCR value, `%2` is used before it, so their lifetimes overlap. If I didn't move the VPNOT further down in such situations, the pass would insert a double VPNOT, like this: %2:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 10, 0, $noreg %3:vccr = MVE_VPNOT %2:vccr, 0, $noreg %foo:vccr = MVE_VPNOT %3:vccr %4:mqpr = MVE_VORR %0:mqpr, %1:mqpr, 1, %foo:vccr, undef %4:mqpr %5:mqpr = MVE_VORR %1:mqpr, %0:mqpr, 1, %foo:vccr, undef %5:mqpr %bar:vccr = MVE_VPNOT %foo:vccr %6:mqpr = MVE_VORR %4:mqpr, %5:mqpr, 1, %bar:vccr, undef %6:mqpr But, since I now move the VPNOT further down, we get this instead, which is of course much better. // VPNOT moved down, no more overlapping VCCR lifetimes, no double VPNOTs. %2:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 10, 0, $noreg %4:mqpr = MVE_VORR %0:mqpr, %1:mqpr, 1, %2:vccr, undef %4:mqpr %5:mqpr = MVE_VORR %1:mqpr, %0:mqpr, 1, %2:vccr, undef %5:mqpr %3:vccr = MVE_VPNOT %2:vccr, 0, $noreg %6:mqpr = MVE_VORR %4:mqpr, %5:mqpr, 1, %3:vccr, undef %6:mqpr That transformation is of course only done if a use of the orginal VCCR value is found before the first use of the VPNOT's result. Pierre-vh: Sometimes, you can have code like this (taken from the bottom test): ``` %2:vccr =…
		Iter->findRegisterUseOperandIdx(VCCRValue) != -1) {
		dmgreenUnsubmitted Done Reply Inline Actions findRegisterUseOperandIdx != -1 dmgreen: findRegisterUseOperandIdx != -1
		// Move the VPNOT closer to its first user if needed, and ignore if it
		// has no users.
		if (!MoveVPNOTBeforeFirstUser(MBB, Iter, VCCRValue))
		continue;

		OppositeVCCRValue = Dst;
		++Iter;
		break;
		}

		// Else, just set VCCRValue.
		VCCRValue = Dst;
		dmgreenUnsubmitted Done Reply Inline Actions What makes us not do this for other opcodes? (I guess vcmp is the vast majority of cases?) dmgreen: What makes us not do this for other opcodes? (I guess vcmp is the vast majority of cases?)
		Pierre-vhAuthorUnsubmitted Done Reply Inline Actions It's just that vcmp is the vast majority of cases. There's no issue with enabling it for other instructions that write to VPR. I've removed this restriction and added a test with vmsr. Pierre-vh: It's just that vcmp is the vast majority of cases. There's no issue with enabling it for other…
		}

		// If the first inner loop didn't find anything, stop here.
		if (Iter == End)
		break;

		assert(VCCRValue && OppositeVCCRValue &&
		"VCCRValue and OppositeVCCRValue shouldn't be empty if the loop "
		"stopped before the end of the block!");
		assert(VCCRValue != OppositeVCCRValue &&
		"VCCRValue should not be equal to OppositeVCCRValue!");

		// LastVPNOTResult always contains the same value as OppositeVCCRValue.
		Register LastVPNOTResult = OppositeVCCRValue;

		// This second loop tries to optimize the remaining instructions.
		for (; Iter != End; ++Iter) {
		dmgreenUnsubmitted Done Reply Inline Actions The second loop -> This second loop, just to be a little clearer. dmgreen: The second loop -> This second loop, just to be a little clearer.
		bool IsInteresting = false;

		if (MachineOperand *MO = Iter->findRegisterUseOperand(VCCRValue)) {
		IsInteresting = true;

		// - If the instruction is a VPNOT, it can be removed, and we can just
		// replace its uses with LastVPNOTResult.
		// - Else, insert a new VPNOT on LastVPNOTResult to recompute VCCRValue.
		if (Iter->getOpcode() == ARM::MVE_VPNOT) {
		Register Result = Iter->getOperand(0).getReg();

		MRI->replaceRegWith(Result, LastVPNOTResult);
		DeadInstructions.push_back(&*Iter);
		Modified = true;

		LLVM_DEBUG(dbgs()
		dmgreenUnsubmitted Done Reply Inline Actions Do we need to add the copy? We could just use LastVPNOTResult directly? Or would that be more complex? dmgreen: Do we need to add the copy? We could just use LastVPNOTResult directly? Or would that be more…
		<< "Replacing all uses of '" << printReg(Result)
		<< "' with '" << printReg(LastVPNOTResult) << "'\n");
		} else {
		dmgreenUnsubmitted Done Reply Inline Actions Set Modified = true here too I think. dmgreen: Set Modified = true here too I think.
		MachineInstr &VPNOT =
		ReplaceRegisterUseWithVPNOT(MBB, Iter, MO, LastVPNOTResult);
		Modified = true;

		LastVPNOTResult = VPNOT.getOperand(0).getReg();
		std::swap(VCCRValue, OppositeVCCRValue);

		LLVM_DEBUG(dbgs() << "Replacing use of '" << printReg(VCCRValue)
		<< "' with '" << printReg(LastVPNOTResult)
		<< "' in instr: " << *Iter);
		}
		} else {
		// If the instr uses OppositeVCCRValue, make it use LastVPNOTResult
		// instead as they contain the same value.
		if (MachineOperand *MO =
		Iter->findRegisterUseOperand(OppositeVCCRValue)) {
		IsInteresting = true;

		// This is pointless if LastVPNOTResult == OppositeVCCRValue.
		if (LastVPNOTResult != OppositeVCCRValue) {
		LLVM_DEBUG(dbgs() << "Replacing usage of '"
		<< printReg(OppositeVCCRValue) << "' with '"
		<< printReg(LastVPNOTResult) << " for instr: ";
		Iter->dump());
		MO->setReg(LastVPNOTResult);
		Modified = true;
		}

		MO->setIsKill(false);
		}

		// If this is an unpredicated VPNOT on
		// LastVPNOTResult/OppositeVCCRValue, we can act like we inserted it.
		if (Iter->getOpcode() == ARM::MVE_VPNOT &&
		getVPTInstrPredicate(*Iter) == ARMVCC::None) {
		Register VPNOTOperand = Iter->getOperand(1).getReg();
		if (VPNOTOperand == LastVPNOTResult \|\|
		VPNOTOperand == OppositeVCCRValue) {
		IsInteresting = true;

		std::swap(VCCRValue, OppositeVCCRValue);
		LastVPNOTResult = Iter->getOperand(0).getReg();
		}
		}
		}

		// If this instruction was not interesting, and it writes to VCCR, stop.
		if (!IsInteresting && IsWritingToVCCR(*Iter))
		break;
		}
		}

		for (MachineInstr *DeadInstruction : DeadInstructions)
		DeadInstruction->removeFromParent();

		return Modified;
		}

// This optimisation replaces VCMPs with VPNOTs when they are equivalent.		// This optimisation replaces VCMPs with VPNOTs when they are equivalent.
bool MVEVPTOptimisations::ReplaceVCMPsByVPNOTs(MachineBasicBlock &MBB) {		bool MVEVPTOptimisations::ReplaceVCMPsByVPNOTs(MachineBasicBlock &MBB) {
SmallVector<MachineInstr *, 4> DeadInstructions;		SmallVector<MachineInstr *, 4> DeadInstructions;

// The last VCMP that we have seen and that couldn't be replaced.		// The last VCMP that we have seen and that couldn't be replaced.
// This is reset when an instruction that writes to VCCR/VPR is found, or when		// This is reset when an instruction that writes to VCCR/VPR is found, or when
// a VCMP is replaced with a VPNOT.		// a VCMP is replaced with a VPNOT.
// We'll only replace VCMPs with VPNOTs when this is not null, and when the		// We'll only replace VCMPs with VPNOTs when this is not null, and when the
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	for (MachineInstr &Instr : MBB.instrs()) {
PrevVCMPResultKiller = nullptr;		PrevVCMPResultKiller = nullptr;
}		}

for (MachineInstr *DeadInstruction : DeadInstructions)		for (MachineInstr *DeadInstruction : DeadInstructions)
DeadInstruction->removeFromParent();		DeadInstruction->removeFromParent();

return !DeadInstructions.empty();		return !DeadInstructions.empty();
}		}

		dmgreenUnsubmitted Done Reply Inline Actions Should this not be replacing _all_ uses of the old condition with the value from the VPNOT? As in, if we see input code like: a = VCMP .. use(a) b = VPNOT a .. use(b) .. use(a) Could it not be generally beneficial to insert a VPNOT between use of b and the second use of a? We could also probably have any amount of code between the predicated uses and it might still be beneficial, considering the costs of spills/reloads vs a single VPNOT. dmgreen: Should this not be replacing _all_ uses of the old condition with the value from the VPNOT? As…
		Pierre-vhAuthorUnsubmitted Done Reply Inline Actions I believe it's already doing that. I'll add more tests to show it. I can certainly do the optimisation even outside blocks of predicated instruction, I just need to remove these lines and add a few tests: // Stop as soon as we leave the block of predicated instructions if (getVPTInstrPredicate(Iter) == ARMVCC::None) break; Pierre-vh:* I believe it's already doing that. I'll add more tests to show it. I can certainly do the…
bool MVEVPTOptimisations::runOnMachineFunction(MachineFunction &Fn) {		bool MVEVPTOptimisations::runOnMachineFunction(MachineFunction &Fn) {
const ARMSubtarget &STI =		const ARMSubtarget &STI =
static_cast<const ARMSubtarget &>(Fn.getSubtarget());		static_cast<const ARMSubtarget &>(Fn.getSubtarget());

if (!STI.isThumb2() \|\| !STI.hasMVEIntegerOps())		if (!STI.isThumb2() \|\| !STI.hasMVEIntegerOps())
return false;		return false;

TII = static_cast<const Thumb2InstrInfo *>(STI.getInstrInfo());		TII = static_cast<const Thumb2InstrInfo *>(STI.getInstrInfo());
MRI = &Fn.getRegInfo();		MRI = &Fn.getRegInfo();

LLVM_DEBUG(dbgs() << "******** ARM MVE VPT Optimisations ********\n"		LLVM_DEBUG(dbgs() << "******** ARM MVE VPT Optimisations ********\n"
<< "********** Function: " << Fn.getName() << '\n');		<< "********** Function: " << Fn.getName() << '\n');

bool Modified = false;		bool Modified = false;
for (MachineBasicBlock &MBB : Fn)		for (MachineBasicBlock &MBB : Fn) {
Modified \|= ReplaceVCMPsByVPNOTs(MBB);		Modified \|= ReplaceVCMPsByVPNOTs(MBB);
		Modified \|= ReduceOldVCCRValueUses(MBB);
		}

LLVM_DEBUG(dbgs() << "**************************************\n");		LLVM_DEBUG(dbgs() << "**************************************\n");
return Modified;		return Modified;
}		}

/// createMVEVPTOptimisationsPass		/// createMVEVPTOptimisationsPass
FunctionPass *llvm::createMVEVPTOptimisationsPass() {		FunctionPass *llvm::createMVEVPTOptimisationsPass() {
return new MVEVPTOptimisations();		return new MVEVPTOptimisations();
}		}

llvm/test/CodeGen/Thumb2/mve-pred-not.ll

	Show First 20 Lines • Show All 399 Lines • ▼ Show 20 Lines
	}			}

	declare <4 x i32> @llvm.arm.mve.max.predicated.v4i32.v4i1(<4 x i32>, <4 x i32>, i32, <4 x i1>, <4 x i32>)			declare <4 x i32> @llvm.arm.mve.max.predicated.v4i32.v4i1(<4 x i32>, <4 x i32>, i32, <4 x i1>, <4 x i32>)
	declare <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32>, <4 x i32>, <4 x i1>, <4 x i32>)			declare <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32>, <4 x i32>, <4 x i1>, <4 x i32>)

	define arm_aapcs_vfpcc <4 x i32> @vpttet_v4i1(<4 x i32> %x, <4 x i32> %y, <4 x i32> %z) {			define arm_aapcs_vfpcc <4 x i32> @vpttet_v4i1(<4 x i32> %x, <4 x i32> %y, <4 x i32> %z) {
	; CHECK-LABEL: vpttet_v4i1:			; CHECK-LABEL: vpttet_v4i1:
	; CHECK: @ %bb.0: @ %entry			; CHECK: @ %bb.0: @ %entry
	; CHECK-NEXT: .pad #4			; CHECK-NEXT: vpttet.s32 ge, q0, q2
	; CHECK-NEXT: sub sp, #4
	; CHECK-NEXT: vcmp.s32 ge, q0, q2
	; CHECK-NEXT: vstr p0, [sp] @ 4-byte Spill
	; CHECK-NEXT: vpstt
	; CHECK-NEXT: vmovt q0, q2			; CHECK-NEXT: vmovt q0, q2
	; CHECK-NEXT: vmovt q0, q2			; CHECK-NEXT: vmovt q0, q2
	; CHECK-NEXT: vldr p0, [sp] @ 4-byte Reload			; CHECK-NEXT: vmove q0, q2
	; CHECK-NEXT: vpnot
	; CHECK-NEXT: vpst
	; CHECK-NEXT: vmovt q0, q2			; CHECK-NEXT: vmovt q0, q2
	; CHECK-NEXT: vldr p0, [sp] @ 4-byte Reload
	; CHECK-NEXT: vpst
	; CHECK-NEXT: vmovt q0, q2
	; CHECK-NEXT: add sp, #4
	; CHECK-NEXT: bx lr			; CHECK-NEXT: bx lr
	entry:			entry:
	%0 = icmp sge <4 x i32> %x, %z			%0 = icmp sge <4 x i32> %x, %z
	%1 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %z, <4 x i32> %z, <4 x i1> %0, <4 x i32> %x)			%1 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %z, <4 x i32> %z, <4 x i1> %0, <4 x i32> %x)
	%2 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %z, <4 x i32> %z, <4 x i1> %0, <4 x i32> %1)			%2 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %z, <4 x i32> %z, <4 x i1> %0, <4 x i32> %1)
	%3 = xor <4 x i1> %0, <i1 true, i1 true, i1 true, i1 true>			%3 = xor <4 x i1> %0, <i1 true, i1 true, i1 true, i1 true>
	%4 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %z, <4 x i32> %z, <4 x i1> %3, <4 x i32> %2)			%4 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %z, <4 x i32> %z, <4 x i1> %3, <4 x i32> %2)
	%5 = xor <4 x i1> %3, <i1 true, i1 true, i1 true, i1 true>			%5 = xor <4 x i1> %3, <i1 true, i1 true, i1 true, i1 true>
	▲ Show 20 Lines • Show All 70 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/mve-vpt-blocks.ll

Show First 20 Lines • Show All 111 Lines • ▼ Show 20 Lines	entry:
%3 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %b, <4 x i32> %c, <4 x i1> %1, <4 x i32> %2)		%3 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %b, <4 x i32> %c, <4 x i1> %1, <4 x i32> %2)
%4 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %b, <4 x i32> %c, <4 x i1> %1, <4 x i32> %3)		%4 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %b, <4 x i32> %c, <4 x i1> %1, <4 x i32> %3)
ret <4 x i32> %4		ret <4 x i32> %4
}		}

define arm_aapcs_vfpcc <4 x i32> @vptet_block(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c) {		define arm_aapcs_vfpcc <4 x i32> @vptet_block(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c) {
; CHECK-LABEL: vptet_block:		; CHECK-LABEL: vptet_block:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: .pad #4		; CHECK-NEXT: vptet.s32 ge, q0, q2
; CHECK-NEXT: sub sp, #4		; CHECK-NEXT: vorrt q0, q1, q2
; CHECK-NEXT: vcmp.s32 ge, q0, q2		; CHECK-NEXT: vmove q0, q2
; CHECK-NEXT: vstr p0, [sp] @ 4-byte Spill
; CHECK-NEXT: vpst
; CHECK-NEXT: vorrt q0, q1, q2
; CHECK-NEXT: vldr p0, [sp] @ 4-byte Reload
; CHECK-NEXT: vpnot
; CHECK-NEXT: vpst
; CHECK-NEXT: vmovt q0, q2
; CHECK-NEXT: vldr p0, [sp] @ 4-byte Reload
; CHECK-NEXT: vpst
; CHECK-NEXT: vmovt q0, q2		; CHECK-NEXT: vmovt q0, q2
; CHECK-NEXT: add sp, #4
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%0 = icmp sge <4 x i32> %a, %c		%0 = icmp sge <4 x i32> %a, %c
%1 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %b, <4 x i32> %c, <4 x i1> %0, <4 x i32> %a)		%1 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %b, <4 x i32> %c, <4 x i1> %0, <4 x i32> %a)
%2 = xor <4 x i1> %0, <i1 true, i1 true, i1 true, i1 true>		%2 = xor <4 x i1> %0, <i1 true, i1 true, i1 true, i1 true>
%3 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %2, <4 x i32> %1)		%3 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %2, <4 x i32> %1)
%4 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %0, <4 x i32> %3)		%4 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %0, <4 x i32> %3)
ret <4 x i32> %4		ret <4 x i32> %4
}		}

define arm_aapcs_vfpcc <4 x i32> @vpttet_block(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c) {		define arm_aapcs_vfpcc <4 x i32> @vpttet_block(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c) {
; CHECK-LABEL: vpttet_block:		; CHECK-LABEL: vpttet_block:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: .pad #4		; CHECK-NEXT: vpttet.s32 ge, q0, q2
; CHECK-NEXT: sub sp, #4
; CHECK-NEXT: vcmp.s32 ge, q0, q2
; CHECK-NEXT: vstr p0, [sp] @ 4-byte Spill
; CHECK-NEXT: vpstt
; CHECK-NEXT: vorrt q0, q1, q2		; CHECK-NEXT: vorrt q0, q1, q2
; CHECK-NEXT: vmovt q0, q2		; CHECK-NEXT: vmovt q0, q2
; CHECK-NEXT: vldr p0, [sp] @ 4-byte Reload		; CHECK-NEXT: vmove q0, q2
; CHECK-NEXT: vpnot
; CHECK-NEXT: vpst
; CHECK-NEXT: vmovt q0, q2
; CHECK-NEXT: vldr p0, [sp] @ 4-byte Reload
; CHECK-NEXT: vpst
; CHECK-NEXT: vmovt q0, q2		; CHECK-NEXT: vmovt q0, q2
; CHECK-NEXT: add sp, #4
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%0 = icmp sge <4 x i32> %a, %c		%0 = icmp sge <4 x i32> %a, %c
%1 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %b, <4 x i32> %c, <4 x i1> %0, <4 x i32> %a)		%1 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %b, <4 x i32> %c, <4 x i1> %0, <4 x i32> %a)
%2 = xor <4 x i1> %0, <i1 true, i1 true, i1 true, i1 true>		%2 = xor <4 x i1> %0, <i1 true, i1 true, i1 true, i1 true>
%3 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %0, <4 x i32> %1)		%3 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %0, <4 x i32> %1)
%4 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %2, <4 x i32> %3)		%4 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %2, <4 x i32> %3)
%5 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %0, <4 x i32> %4)		%5 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %0, <4 x i32> %4)
ret <4 x i32> %5		ret <4 x i32> %5
}		}

define arm_aapcs_vfpcc <4 x i32> @vptett_block(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c) {		define arm_aapcs_vfpcc <4 x i32> @vptett_block(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c) {
; CHECK-LABEL: vptett_block:		; CHECK-LABEL: vptett_block:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: .pad #4		; CHECK-NEXT: vptett.s32 ge, q0, q2
; CHECK-NEXT: sub sp, #4		; CHECK-NEXT: vorrt q0, q1, q2
; CHECK-NEXT: vcmp.s32 ge, q0, q2		; CHECK-NEXT: vmove q0, q2
; CHECK-NEXT: vstr p0, [sp] @ 4-byte Spill
; CHECK-NEXT: vpst
; CHECK-NEXT: vorrt q0, q1, q2
; CHECK-NEXT: vldr p0, [sp] @ 4-byte Reload
; CHECK-NEXT: vpnot
; CHECK-NEXT: vpst
; CHECK-NEXT: vmovt q0, q2
; CHECK-NEXT: vldr p0, [sp] @ 4-byte Reload
; CHECK-NEXT: vpstt
; CHECK-NEXT: vmovt q0, q2		; CHECK-NEXT: vmovt q0, q2
; CHECK-NEXT: vmovt q0, q2		; CHECK-NEXT: vmovt q0, q2
; CHECK-NEXT: add sp, #4
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%0 = icmp sge <4 x i32> %a, %c		%0 = icmp sge <4 x i32> %a, %c
%1 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %b, <4 x i32> %c, <4 x i1> %0, <4 x i32> %a)		%1 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %b, <4 x i32> %c, <4 x i1> %0, <4 x i32> %a)
%2 = xor <4 x i1> %0, <i1 true, i1 true, i1 true, i1 true>		%2 = xor <4 x i1> %0, <i1 true, i1 true, i1 true, i1 true>
%3 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %2, <4 x i32> %1)		%3 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %2, <4 x i32> %1)
%4 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %0, <4 x i32> %3)		%4 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %0, <4 x i32> %3)
%5 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %0, <4 x i32> %4)		%5 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %0, <4 x i32> %4)
ret <4 x i32> %5		ret <4 x i32> %5
}		}

define arm_aapcs_vfpcc <4 x i32> @vpteet_block(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c) {		define arm_aapcs_vfpcc <4 x i32> @vpteet_block(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c) {
; CHECK-LABEL: vpteet_block:		; CHECK-LABEL: vpteet_block:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: .pad #8		; CHECK-NEXT: vpteet.s32 ge, q0, q2
; CHECK-NEXT: sub sp, #8		; CHECK-NEXT: vorrt q0, q1, q2
; CHECK-NEXT: vcmp.s32 ge, q0, q2		; CHECK-NEXT: vmove q0, q2
; CHECK-NEXT: vstr p0, [sp] @ 4-byte Spill		; CHECK-NEXT: vmove q0, q2
; CHECK-NEXT: vpst
; CHECK-NEXT: vorrt q0, q1, q2
; CHECK-NEXT: vldr p0, [sp] @ 4-byte Reload
; CHECK-NEXT: vpnot
; CHECK-NEXT: vstr p0, [sp, #4] @ 4-byte Spill
; CHECK-NEXT: vldr p0, [sp, #4] @ 4-byte Reload
; CHECK-NEXT: vpst
; CHECK-NEXT: vmovt q0, q2
; CHECK-NEXT: vldr p0, [sp, #4] @ 4-byte Reload
; CHECK-NEXT: vpst
; CHECK-NEXT: vmovt q0, q2
; CHECK-NEXT: vldr p0, [sp] @ 4-byte Reload
; CHECK-NEXT: vpst
; CHECK-NEXT: vmovt q0, q2		; CHECK-NEXT: vmovt q0, q2
; CHECK-NEXT: add sp, #8
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%0 = icmp sge <4 x i32> %a, %c		%0 = icmp sge <4 x i32> %a, %c
%1 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %b, <4 x i32> %c, <4 x i1> %0, <4 x i32> %a)		%1 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %b, <4 x i32> %c, <4 x i1> %0, <4 x i32> %a)
%2 = xor <4 x i1> %0, <i1 true, i1 true, i1 true, i1 true>		%2 = xor <4 x i1> %0, <i1 true, i1 true, i1 true, i1 true>
%3 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %2, <4 x i32> %1)		%3 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %2, <4 x i32> %1)
%4 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %2, <4 x i32> %3)		%4 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %2, <4 x i32> %3)
%5 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %0, <4 x i32> %4)		%5 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %0, <4 x i32> %4)
Show All 17 Lines	entry:
%4 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %2, <4 x i32> %3)		%4 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %2, <4 x i32> %3)
%5 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %2, <4 x i32> %4)		%5 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %2, <4 x i32> %4)
ret <4 x i32> %5		ret <4 x i32> %5
}		}

define arm_aapcs_vfpcc <4 x i32> @vptete_block(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c) {		define arm_aapcs_vfpcc <4 x i32> @vptete_block(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c) {
; CHECK-LABEL: vptete_block:		; CHECK-LABEL: vptete_block:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: .pad #8		; CHECK-NEXT: vptete.s32 ge, q0, q2
; CHECK-NEXT: sub sp, #8		; CHECK-NEXT: vorrt q0, q1, q2
; CHECK-NEXT: vcmp.s32 ge, q0, q2		; CHECK-NEXT: vmove q0, q2
; CHECK-NEXT: vstr p0, [sp] @ 4-byte Spill
; CHECK-NEXT: vpst
; CHECK-NEXT: vorrt q0, q1, q2
; CHECK-NEXT: vldr p0, [sp] @ 4-byte Reload
; CHECK-NEXT: vpnot
; CHECK-NEXT: vstr p0, [sp, #4] @ 4-byte Spill
; CHECK-NEXT: vldr p0, [sp, #4] @ 4-byte Reload
; CHECK-NEXT: vpst
; CHECK-NEXT: vmovt q0, q2
; CHECK-NEXT: vldr p0, [sp] @ 4-byte Reload
; CHECK-NEXT: vpst
; CHECK-NEXT: vmovt q0, q2
; CHECK-NEXT: vldr p0, [sp, #4] @ 4-byte Reload
; CHECK-NEXT: vpst
; CHECK-NEXT: vmovt q0, q2		; CHECK-NEXT: vmovt q0, q2
; CHECK-NEXT: add sp, #8		; CHECK-NEXT: vmove q0, q2
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%0 = icmp sge <4 x i32> %a, %c		%0 = icmp sge <4 x i32> %a, %c
%1 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %b, <4 x i32> %c, <4 x i1> %0, <4 x i32> %a)		%1 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %b, <4 x i32> %c, <4 x i1> %0, <4 x i32> %a)
%2 = xor <4 x i1> %0, <i1 true, i1 true, i1 true, i1 true>		%2 = xor <4 x i1> %0, <i1 true, i1 true, i1 true, i1 true>
%3 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %2, <4 x i32> %1)		%3 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %2, <4 x i32> %1)
%4 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %0, <4 x i32> %3)		%4 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %0, <4 x i32> %3)
%5 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %2, <4 x i32> %4)		%5 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %2, <4 x i32> %4)
Show All 40 Lines

llvm/test/CodeGen/Thumb2/mve-vpt-optimisations.mir

Show First 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	entry:
ret <4 x float> %inactive1		ret <4 x float> %inactive1
}		}

define arm_aapcs_vfpcc <4 x float> @vpr_or_vccr_write_between_vcmps(<4 x float> %inactive1) #0 {		define arm_aapcs_vfpcc <4 x float> @vpr_or_vccr_write_between_vcmps(<4 x float> %inactive1) #0 {
entry:		entry:
ret <4 x float> %inactive1		ret <4 x float> %inactive1
}		}

		define arm_aapcs_vfpcc <4 x float> @spill_prevention(<4 x float> %inactive1) #0 {
		entry:
		ret <4 x float> %inactive1
		}

		define arm_aapcs_vfpcc <4 x float> @spill_prevention_multi(<4 x float> %inactive1) #0 {
		entry:
		ret <4 x float> %inactive1
		}

		define arm_aapcs_vfpcc <4 x float> @spill_prevention_predicated_vpnots(<4 x float> %inactive1) #0 {
		entry:
		ret <4 x float> %inactive1
		}

		define arm_aapcs_vfpcc <4 x float> @spill_prevention_copies(<4 x float> %inactive1) #0 {
		entry:
		ret <4 x float> %inactive1
		}

		define arm_aapcs_vfpcc <4 x float> @spill_prevention_vpnot_reordering(<4 x float> %inactive1) #0 {
		entry:
		ret <4 x float> %inactive1
		}

		define arm_aapcs_vfpcc <4 x float> @spill_prevention_stop_after_write(<4 x float> %inactive1) #0 {
		entry:
		ret <4 x float> %inactive1
		}

attributes #0 = { "target-features"="+armv8.1-m.main,+hwdiv,+mve.fp,+ras,+thumb-mode" }		attributes #0 = { "target-features"="+armv8.1-m.main,+hwdiv,+mve.fp,+ras,+thumb-mode" }
...		...
---		---
name: vcmp_with_opposite_cond		name: vcmp_with_opposite_cond
alignment: 4		alignment: 4
body: \|		body: \|
; CHECK-LABEL: name: vcmp_with_opposite_cond		; CHECK-LABEL: name: vcmp_with_opposite_cond
; CHECK: bb.0:		; CHECK: bb.0:
▲ Show 20 Lines • Show All 288 Lines • ▼ Show 20 Lines	bb.0:
%3:vccr = MVE_VCMPs32 %1:mqpr, %0:mqpr, 12, 0, $noreg		%3:vccr = MVE_VCMPs32 %1:mqpr, %0:mqpr, 12, 0, $noreg
%4:vccr = MVE_VCMPs32 %1:mqpr, %0:mqpr, 12, 0, $noreg		%4:vccr = MVE_VCMPs32 %1:mqpr, %0:mqpr, 12, 0, $noreg
tBX_RET 14, $noreg, implicit %0:mqpr		tBX_RET 14, $noreg, implicit %0:mqpr
...		...
---		---
name: killed_vccr_values		name: killed_vccr_values
alignment: 4		alignment: 4
body: \|		body: \|
		; CHECK-LABEL: name: killed_vccr_values
		; CHECK: bb.0:
		; CHECK: successors: %bb.1(0x80000000)
		; CHECK: [[MVE_VCMPf16_:%[0-9]+]]:vccr = MVE_VCMPf16 %1:mqpr, %2:mqpr, 10, 0, $noreg
		; CHECK: [[MVE_VORR:%[0-9]+]]:mqpr = MVE_VORR %1:mqpr, %2:mqpr, 1, [[MVE_VCMPf16_]], undef [[MVE_VORR]]
		; CHECK: [[MVE_VPNOT:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPf16_]], 0, $noreg
		; CHECK: bb.1:
		; CHECK: successors: %bb.2(0x80000000)
		; CHECK: [[MVE_VCMPs32_:%[0-9]+]]:vccr = MVE_VCMPs32 %1:mqpr, %2:mqpr, 10, 0, $noreg
		; CHECK: [[MVE_VPNOT1:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPs32_]], 0, $noreg
		; CHECK: [[MVE_VORR1:%[0-9]+]]:mqpr = MVE_VORR %1:mqpr, %1:mqpr, 1, [[MVE_VPNOT1]], undef [[MVE_VORR1]]
		; CHECK: [[MVE_VPNOT2:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VPNOT1]], 0, $noreg
		; CHECK: [[MVE_VORR2:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR1]], [[MVE_VORR1]], 1, [[MVE_VPNOT2]], undef [[MVE_VORR2]]
		; CHECK: bb.2:
		; CHECK: successors: %bb.3(0x80000000)
		; CHECK: [[MVE_VCMPs32_1:%[0-9]+]]:vccr = MVE_VCMPs32 %1:mqpr, %2:mqpr, 10, 0, $noreg
		; CHECK: [[MVE_VPNOT3:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPs32_1]], 0, $noreg
		; CHECK: [[MVE_VORR3:%[0-9]+]]:mqpr = MVE_VORR %1:mqpr, %1:mqpr, 1, [[MVE_VPNOT3]], undef [[MVE_VORR3]]
		; CHECK: [[MVE_VPNOT4:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VPNOT3]], 0, $noreg
		; CHECK: [[MVE_VORR4:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR3]], [[MVE_VORR3]], 1, [[MVE_VPNOT4]], undef [[MVE_VORR4]]
		; CHECK: bb.3:
		; CHECK: [[MVE_VCMPs32_2:%[0-9]+]]:vccr = MVE_VCMPs32 %1:mqpr, %2:mqpr, 10, 0, $noreg
		; CHECK: [[MVE_VPNOT5:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPs32_2]], 0, $noreg
		; CHECK: [[MVE_VORR5:%[0-9]+]]:mqpr = MVE_VORR %1:mqpr, %1:mqpr, 1, [[MVE_VPNOT5]], undef [[MVE_VORR5]]
		; CHECK: [[MVE_VPNOT6:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VPNOT5]], 0, $noreg
		; CHECK: [[MVE_VORR6:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR5]], [[MVE_VORR5]], 1, [[MVE_VPNOT6]], undef [[MVE_VORR6]]
		; CHECK: [[MVE_VORR7:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR6]], [[MVE_VORR6]], 1, [[MVE_VPNOT6]], undef [[MVE_VORR7]]
		; CHECK: tBX_RET 14 /* CC::al */, $noreg, implicit %1:mqpr
bb.0:		bb.0:
;		;
; Tests that, if the result of the VCMP is killed before the		; Tests that, if the result of the VCMP is killed before the
; second VCMP (that will be converted into a VPNOT) is found,		; second VCMP (that will be converted into a VPNOT) is found,
; the kill flag is removed.		; the kill flag is removed.
;		;
; CHECK-LABEL: name: killed_vccr_values
; CHECK: [[MVE_VCMPf16_:%[0-9]+]]:vccr = MVE_VCMPf16 %1:mqpr, %2:mqpr, 10, 0, $noreg
; CHECK: [[MVE_VORR:%[0-9]+]]:mqpr = MVE_VORR %1:mqpr, %2:mqpr, 1, [[MVE_VCMPf16_]], undef [[MVE_VORR]]
; CHECK: [[MVE_VPNOT:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPf16_]], 0, $noreg
; CHECK: tBX_RET 14 /* CC::al */, $noreg, implicit %1:mqpr
%2:vccr = MVE_VCMPf16 %0:mqpr, %1:mqpr, 10, 0, $noreg		%2:vccr = MVE_VCMPf16 %0:mqpr, %1:mqpr, 10, 0, $noreg
%3:mqpr = MVE_VORR %0:mqpr, %1:mqpr, 1, killed %2:vccr, undef %3:mqpr		%3:mqpr = MVE_VORR %0:mqpr, %1:mqpr, 1, killed %2:vccr, undef %3:mqpr
%4:vccr = MVE_VCMPf16 %0:mqpr, %1:mqpr, 11, 0, $noreg		%4:vccr = MVE_VCMPf16 %0:mqpr, %1:mqpr, 11, 0, $noreg
		bb.1:
		;
		; Tests that, if the result of the VCMP that has been replaced with a
		; VPNOT is killed (before the insertion of the second VPNOT),
		; the kill flag is removed.
		;
		%5:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 10, 0, $noreg
		%6:vccr = MVE_VCMPs32 %1:mqpr, %0:mqpr, 12, 0, $noreg
		%7:mqpr = MVE_VORR %0:mqpr, %0:mqpr, 1, killed %6:vccr, undef %7:mqpr
		%8:mqpr = MVE_VORR %7:mqpr, %7:mqpr, 1, %5:vccr, undef %8:mqpr
		bb.2:
		;
		; Tests that the kill flag is removed when inserting a VPNOT for
		; an instruction.
		;
		%9:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 10, 0, $noreg
		%10:vccr = MVE_VCMPs32 %1:mqpr, %0:mqpr, 12, 0, $noreg
		%11:mqpr = MVE_VORR %0:mqpr, %0:mqpr, 1, %10:vccr, undef %11:mqpr
		%12:mqpr = MVE_VORR %11:mqpr, %11:mqpr, 1, killed %9:vccr, undef %12:mqpr
		bb.3:
		;
		; Tests that the kill flag is correctly removed when replacing a use
		; of the opposite VCCR value with the last VPNOT's result
		;
		%13:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 10, 0, $noreg
		%14:vccr = MVE_VCMPs32 %1:mqpr, %0:mqpr, 12, 0, $noreg
		%15:mqpr = MVE_VORR %0:mqpr, %0:mqpr, 1, %14:vccr, undef %15:mqpr
		%16:mqpr = MVE_VORR %15:mqpr, %15:mqpr, 1, %13:vccr, undef %16:mqpr
		%17:mqpr = MVE_VORR %16:mqpr, %16:mqpr, 1, killed %13:vccr, undef %17:mqpr
tBX_RET 14, $noreg, implicit %0:mqpr		tBX_RET 14, $noreg, implicit %0:mqpr
...		...
---		---
name: predicated_vcmps		name: predicated_vcmps
alignment: 4		alignment: 4
body: \|		body: \|
; CHECK-LABEL: name: predicated_vcmps		; CHECK-LABEL: name: predicated_vcmps
; CHECK: bb.0:		; CHECK: bb.0:
▲ Show 20 Lines • Show All 159 Lines • ▼ Show 20 Lines	bb.0:
; CHECK: [[MVE_VPNOT:%[0-9]+]]:vccr = MVE_VPNOT killed [[MVE_VCMPs32_]], 0, $noreg		; CHECK: [[MVE_VPNOT:%[0-9]+]]:vccr = MVE_VPNOT killed [[MVE_VCMPs32_]], 0, $noreg
; CHECK: [[MVE_VCMPs32_1:%[0-9]+]]:vccr = MVE_VCMPs32 %2:mqpr, %1:mqpr, 10, 0, $noreg		; CHECK: [[MVE_VCMPs32_1:%[0-9]+]]:vccr = MVE_VCMPs32 %2:mqpr, %1:mqpr, 10, 0, $noreg
; CHECK: tBX_RET 14 /* CC::al */, $noreg, implicit %1:mqpr		; CHECK: tBX_RET 14 /* CC::al */, $noreg, implicit %1:mqpr
%2:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 12, 0, $noreg		%2:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 12, 0, $noreg
%3:vccr = MVE_VPNOT killed %2:vccr, 0, $noreg		%3:vccr = MVE_VPNOT killed %2:vccr, 0, $noreg
%4:vccr = MVE_VCMPs32 %1:mqpr, %0:mqpr, 10, 0, $noreg		%4:vccr = MVE_VCMPs32 %1:mqpr, %0:mqpr, 10, 0, $noreg
tBX_RET 14, $noreg, implicit %0:mqpr		tBX_RET 14, $noreg, implicit %0:mqpr
...		...
		---
		name: spill_prevention
		alignment: 4
		body: \|
		; CHECK-LABEL: name: spill_prevention
		; CHECK: bb.0:
		; CHECK: successors: %bb.1(0x80000000)
		; CHECK: [[MVE_VCMPs32_:%[0-9]+]]:vccr = MVE_VCMPs32 %1:mqpr, %2:mqpr, 10, 0, $noreg
		; CHECK: [[MVE_VPNOT:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPs32_]], 0, $noreg
		; CHECK: [[MVE_VORR:%[0-9]+]]:mqpr = MVE_VORR %1:mqpr, %1:mqpr, 1, [[MVE_VPNOT]], undef [[MVE_VORR]]
		; CHECK: [[MVE_VPNOT1:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VPNOT]], 0, $noreg
		; CHECK: [[MVE_VORR1:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR]], [[MVE_VORR]], 1, [[MVE_VPNOT1]], undef [[MVE_VORR1]]
		; CHECK: [[MVE_VPNOT2:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VPNOT1]], 0, $noreg
		; CHECK: [[MVE_VORR2:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR1]], [[MVE_VORR1]], 1, [[MVE_VPNOT2]], undef [[MVE_VORR2]]
		; CHECK: [[MVE_VPNOT3:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VPNOT2]], 0, $noreg
		; CHECK: [[MVE_VORR3:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR2]], [[MVE_VORR2]], 1, [[MVE_VPNOT3]], undef [[MVE_VORR3]]
		; CHECK: [[MVE_VPNOT4:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VPNOT3]], 0, $noreg
		; CHECK: [[MVE_VORR4:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR3]], [[MVE_VORR3]], 1, [[MVE_VPNOT4]], undef [[MVE_VORR4]]
		; CHECK: bb.1:
		; CHECK: successors: %bb.2(0x80000000)
		; CHECK: [[MVE_VCMPs32_1:%[0-9]+]]:vccr = MVE_VCMPs32 %1:mqpr, %2:mqpr, 10, 0, $noreg
		; CHECK: [[MVE_VPNOT5:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPs32_1]], 0, $noreg
		; CHECK: [[MVE_VORR5:%[0-9]+]]:mqpr = MVE_VORR %1:mqpr, %1:mqpr, 1, [[MVE_VPNOT5]], undef [[MVE_VORR5]]
		; CHECK: [[MVE_VORR6:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR5]], [[MVE_VORR5]], 0, $noreg, undef [[MVE_VORR6]]
		; CHECK: [[MVE_VPNOT6:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VPNOT5]], 0, $noreg
		; CHECK: [[MVE_VORR7:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR6]], [[MVE_VORR6]], 1, [[MVE_VPNOT6]], undef [[MVE_VORR7]]
		; CHECK: [[MVE_VORR8:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR7]], [[MVE_VORR7]], 0, $noreg, undef [[MVE_VORR8]]
		; CHECK: [[MVE_VPNOT7:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VPNOT6]], 0, $noreg
		; CHECK: [[MVE_VORR9:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR8]], [[MVE_VORR8]], 1, [[MVE_VPNOT7]], undef [[MVE_VORR9]]
		; CHECK: bb.2:
		; CHECK: successors: %bb.3(0x80000000)
		; CHECK: [[MVE_VCMPs32_2:%[0-9]+]]:vccr = MVE_VCMPs32 %1:mqpr, %2:mqpr, 10, 0, $noreg
		; CHECK: [[MVE_VPNOT8:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPs32_2]], 0, $noreg
		; CHECK: [[MVE_VORR10:%[0-9]+]]:mqpr = MVE_VORR %1:mqpr, %1:mqpr, 1, [[MVE_VPNOT8]], undef [[MVE_VORR10]]
		; CHECK: [[MVE_VORR11:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR10]], [[MVE_VORR10]], 1, [[MVE_VPNOT8]], undef [[MVE_VORR11]]
		; CHECK: [[MVE_VPNOT9:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VPNOT8]], 0, $noreg
		; CHECK: [[MVE_VORR12:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR11]], [[MVE_VORR11]], 1, [[MVE_VPNOT9]], undef [[MVE_VORR12]]
		; CHECK: [[MVE_VORR13:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR12]], [[MVE_VORR12]], 1, [[MVE_VPNOT9]], undef [[MVE_VORR13]]
		; CHECK: [[MVE_VPNOT10:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VPNOT9]], 0, $noreg
		; CHECK: [[MVE_VORR14:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR13]], [[MVE_VORR13]], 1, [[MVE_VPNOT10]], undef [[MVE_VORR14]]
		; CHECK: [[MVE_VORR15:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR14]], [[MVE_VORR14]], 1, [[MVE_VPNOT10]], undef [[MVE_VORR15]]
		; CHECK: [[MVE_VPNOT11:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VPNOT10]], 0, $noreg
		; CHECK: [[MVE_VORR16:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR15]], [[MVE_VORR15]], 1, [[MVE_VPNOT11]], undef [[MVE_VORR16]]
		; CHECK: [[MVE_VORR17:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR16]], [[MVE_VORR16]], 1, [[MVE_VPNOT11]], undef [[MVE_VORR17]]
		; CHECK: bb.3:
		; CHECK: successors: %bb.4(0x80000000)
		; CHECK: [[MVE_VCMPs32_3:%[0-9]+]]:vccr = MVE_VCMPs32 %1:mqpr, %2:mqpr, 10, 0, $noreg
		; CHECK: [[MVE_VPNOT12:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPs32_3]], 0, $noreg
		; CHECK: [[MVE_VORR18:%[0-9]+]]:mqpr = MVE_VORR %1:mqpr, %1:mqpr, 1, [[MVE_VPNOT12]], undef [[MVE_VORR11]]
		; CHECK: [[MVE_VPNOT13:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VPNOT12]], 0, $noreg
		; CHECK: [[MVE_VORR19:%[0-9]+]]:mqpr = MVE_VORR %1:mqpr, %1:mqpr, 1, [[MVE_VPNOT13]], undef [[MVE_VORR19]]
		; CHECK: bb.4:
		; CHECK: [[VMSR_P0_:%[0-9]+]]:vccr = VMSR_P0 killed %32:gpr, 14 /* CC::al */, $noreg
		; CHECK: [[MVE_VPNOT14:%[0-9]+]]:vccr = MVE_VPNOT [[VMSR_P0_]], 0, $noreg
		; CHECK: [[MVE_VORR20:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR19]], [[MVE_VORR19]], 1, [[MVE_VPNOT14]], undef [[MVE_VORR20]]
		; CHECK: [[MVE_VPNOT15:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VPNOT14]], 0, $noreg
		; CHECK: [[MVE_VORR21:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR20]], [[MVE_VORR20]], 1, [[MVE_VPNOT15]], undef [[MVE_VORR21]]
		; CHECK: [[MVE_VPNOT16:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VPNOT15]], 0, $noreg
		; CHECK: [[MVE_VORR22:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR21]], [[MVE_VORR21]], 1, [[MVE_VPNOT16]], undef [[MVE_VORR22]]
		; CHECK: [[MVE_VPNOT17:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VPNOT16]], 0, $noreg
		; CHECK: [[MVE_VORR23:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR22]], [[MVE_VORR22]], 1, [[MVE_VPNOT17]], undef [[MVE_VORR23]]
		; CHECK: [[MVE_VPNOT18:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VPNOT17]], 0, $noreg
		; CHECK: [[MVE_VORR24:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR23]], [[MVE_VORR23]], 1, [[MVE_VPNOT18]], undef [[MVE_VORR24]]
		; CHECK: tBX_RET 14 /* CC::al */, $noreg, implicit %1:mqpr
		bb.0:
		;
		; Basic test case
		;
		%2:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 10, 0, $noreg
		%3:vccr = MVE_VPNOT %2:vccr, 0, $noreg
		%4:mqpr = MVE_VORR %0:mqpr, %0:mqpr, 1, %3:vccr, undef %4:mqpr
		%5:mqpr = MVE_VORR %4:mqpr, %4:mqpr, 1, %2:vccr, undef %5:mqpr
		%6:mqpr = MVE_VORR %5:mqpr, %5:mqpr, 1, %3:vccr, undef %6:mqpr
		%7:mqpr = MVE_VORR %6:mqpr, %6:mqpr, 1, %2:vccr, undef %7:mqpr
		%8:mqpr = MVE_VORR %7:mqpr, %7:mqpr, 1, %3:vccr, undef %8:mqpr
		bb.1:
		;
		; Tests that unpredicated instructions in the middle of the block
		; don't interfere with the replacement.
		;
		%9:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 10, 0, $noreg
		%10:vccr = MVE_VPNOT %9:vccr, 0, $noreg
		%11:mqpr = MVE_VORR %0:mqpr, %0:mqpr, 1, %10:vccr, undef %11:mqpr
		%12:mqpr = MVE_VORR %11:mqpr, %11:mqpr, 0, $noreg, undef %12:mqpr
		%13:mqpr = MVE_VORR %12:mqpr, %12:mqpr, 1, %9:vccr, undef %13:mqpr
		%14:mqpr = MVE_VORR %13:mqpr, %13:mqpr, 0, $noreg, undef %14:mqpr
		%15:mqpr = MVE_VORR %14:mqpr, %14:mqpr, 1, %10:vccr, undef %15:mqpr
		bb.2:
		;
		; Tests that all uses of the register are replaced, even when it's used
		; multiple times in a row.
		;
		%16:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 10, 0, $noreg
		%17:vccr = MVE_VPNOT %16:vccr, 0, $noreg
		%18:mqpr = MVE_VORR %0:mqpr, %0:mqpr, 1, %17:vccr, undef %18:mqpr
		%19:mqpr = MVE_VORR %18:mqpr, %18:mqpr, 1, %17:vccr, undef %19:mqpr
		%20:mqpr = MVE_VORR %19:mqpr, %19:mqpr, 1, %16:vccr, undef %20:mqpr
		%21:mqpr = MVE_VORR %20:mqpr, %20:mqpr, 1, %16:vccr, undef %21:mqpr
		%22:mqpr = MVE_VORR %21:mqpr, %21:mqpr, 1, %17:vccr, undef %22:mqpr
		%23:mqpr = MVE_VORR %22:mqpr, %22:mqpr, 1, %17:vccr, undef %23:mqpr
		%24:mqpr = MVE_VORR %23:mqpr, %23:mqpr, 1, %16:vccr, undef %24:mqpr
		%25:mqpr = MVE_VORR %24:mqpr, %24:mqpr, 1, %16:vccr, undef %25:mqpr
		bb.3:
		;
		; Tests that already present VPNOTs are "registered" by the pass so
		; it does not insert a useless VPNOT.
		;
		%26:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 10, 0, $noreg
		%27:vccr = MVE_VPNOT %26:vccr, 0, $noreg
		%28:mqpr = MVE_VORR %0:mqpr, %0:mqpr, 1, %27:vccr, undef %19:mqpr
		%29:vccr = MVE_VPNOT %27:vccr, 0, $noreg
		%30:mqpr = MVE_VORR %0:mqpr, %0:mqpr, 1, %26:vccr, undef %30:mqpr
		bb.4:
		;
		; Tests that the pass works with instructions other than vcmp.
		;
		%32:vccr = VMSR_P0 killed %31:gpr, 14, $noreg
		%33:vccr = MVE_VPNOT %32:vccr, 0, $noreg
		%34:mqpr = MVE_VORR %30:mqpr, %30:mqpr, 1, %33:vccr, undef %34:mqpr
		%35:mqpr = MVE_VORR %34:mqpr, %34:mqpr, 1, %32:vccr, undef %35:mqpr
		%36:mqpr = MVE_VORR %35:mqpr, %35:mqpr, 1, %33:vccr, undef %36:mqpr
		%37:mqpr = MVE_VORR %36:mqpr, %36:mqpr, 1, %32:vccr, undef %37:mqpr
		%38:mqpr = MVE_VORR %37:mqpr, %37:mqpr, 1, %33:vccr, undef %38:mqpr
		tBX_RET 14, $noreg, implicit %0:mqpr
		...
		---
		name: spill_prevention_multi
		alignment: 4
		body: \|
		bb.0:
		;
		; Tests that multiple groups of predicated instructions in the same basic block are optimized.
		;
		; CHECK-LABEL: name: spill_prevention_multi
		; CHECK: [[MVE_VCMPs32_:%[0-9]+]]:vccr = MVE_VCMPs32 %1:mqpr, %2:mqpr, 10, 0, $noreg
		; CHECK: [[MVE_VPNOT:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPs32_]], 0, $noreg
		; CHECK: [[MVE_VORR:%[0-9]+]]:mqpr = MVE_VORR %1:mqpr, %1:mqpr, 1, [[MVE_VPNOT]], undef [[MVE_VORR]]
		; CHECK: [[MVE_VPNOT1:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VPNOT]], 0, $noreg
		; CHECK: [[MVE_VORR1:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR]], [[MVE_VORR]], 1, [[MVE_VPNOT1]], undef [[MVE_VORR1]]
		; CHECK: [[MVE_VPNOT2:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VPNOT1]], 0, $noreg
		; CHECK: [[MVE_VORR2:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR1]], [[MVE_VORR1]], 1, [[MVE_VPNOT2]], undef [[MVE_VORR2]]
		; CHECK: [[MVE_VPNOT3:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VPNOT2]], 0, $noreg
		; CHECK: [[MVE_VORR3:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR2]], [[MVE_VORR2]], 1, [[MVE_VPNOT3]], undef [[MVE_VORR3]]
		; CHECK: [[MVE_VCMPs32_1:%[0-9]+]]:vccr = MVE_VCMPs32 %1:mqpr, %2:mqpr, 10, 0, $noreg
		; CHECK: [[MVE_VORR4:%[0-9]+]]:mqpr = MVE_VORR %1:mqpr, %1:mqpr, 1, [[MVE_VCMPs32_1]], undef [[MVE_VORR4]]
		; CHECK: [[MVE_VPNOT4:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPs32_1]], 0, $noreg
		; CHECK: [[MVE_VORR5:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR4]], [[MVE_VORR4]], 1, [[MVE_VPNOT4]], undef [[MVE_VORR5]]
		; CHECK: [[MVE_VPNOT5:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VPNOT4]], 0, $noreg
		; CHECK: [[MVE_VORR6:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR5]], [[MVE_VORR5]], 1, [[MVE_VPNOT5]], undef [[MVE_VORR6]]
		; CHECK: [[MVE_VPNOT6:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VPNOT5]], 0, $noreg
		; CHECK: [[MVE_VORR7:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR6]], [[MVE_VORR6]], 1, [[MVE_VPNOT6]], undef [[MVE_VORR7]]
		; CHECK: [[MVE_VCMPs32_2:%[0-9]+]]:vccr = MVE_VCMPs32 %1:mqpr, %2:mqpr, 10, 0, $noreg
		; CHECK: [[MVE_VPNOT7:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPs32_2]], 0, $noreg
		; CHECK: [[MVE_VORR8:%[0-9]+]]:mqpr = MVE_VORR %1:mqpr, %1:mqpr, 1, [[MVE_VPNOT7]], undef [[MVE_VORR8]]
		; CHECK: [[MVE_VPNOT8:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VPNOT7]], 0, $noreg
		; CHECK: [[MVE_VORR9:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR8]], [[MVE_VORR8]], 1, [[MVE_VPNOT8]], undef [[MVE_VORR9]]
		; CHECK: [[MVE_VPNOT9:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VPNOT8]], 0, $noreg
		; CHECK: [[MVE_VORR10:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR9]], [[MVE_VORR9]], 1, [[MVE_VPNOT9]], undef [[MVE_VORR10]]
		; CHECK: [[MVE_VPNOT10:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VPNOT9]], 0, $noreg
		; CHECK: [[MVE_VORR11:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR10]], [[MVE_VORR10]], 1, [[MVE_VPNOT10]], undef [[MVE_VORR11]]
		; CHECK: tBX_RET 14 /* CC::al */, $noreg, implicit %1:mqpr
		%2:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 10, 0, $noreg
		%3:vccr = MVE_VPNOT %2:vccr, 0, $noreg
		%4:mqpr = MVE_VORR %0:mqpr, %0:mqpr, 1, %3:vccr, undef %4:mqpr
		%5:mqpr = MVE_VORR %4:mqpr, %4:mqpr, 1, %2:vccr, undef %5:mqpr
		%6:mqpr = MVE_VORR %5:mqpr, %5:mqpr, 1, %3:vccr, undef %6:mqpr
		%7:mqpr = MVE_VORR %6:mqpr, %6:mqpr, 1, %2:vccr, undef %7:mqpr
		%8:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 10, 0, $noreg
		%9:vccr = MVE_VPNOT %8:vccr, 0, $noreg
		%10:mqpr = MVE_VORR %0:mqpr, %0:mqpr, 1, %8:vccr, undef %10:mqpr
		%11:mqpr = MVE_VORR %10:mqpr, %10:mqpr, 1, %9:vccr, undef %11:mqpr
		%12:mqpr = MVE_VORR %11:mqpr, %11:mqpr, 1, %8:vccr, undef %12:mqpr
		%13:mqpr = MVE_VORR %12:mqpr, %12:mqpr, 1, %9:vccr, undef %13:mqpr
		%14:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 10, 0, $noreg
		%15:vccr = MVE_VPNOT %14:vccr, 0, $noreg
		%16:mqpr = MVE_VORR %0:mqpr, %0:mqpr, 1, %15:vccr, undef %16:mqpr
		%17:mqpr = MVE_VORR %16:mqpr, %16:mqpr, 1, %14:vccr, undef %17:mqpr
		%18:mqpr = MVE_VORR %17:mqpr, %17:mqpr, 1, %15:vccr, undef %18:mqpr
		%19:mqpr = MVE_VORR %18:mqpr, %18:mqpr, 1, %14:vccr, undef %19:mqpr
		tBX_RET 14, $noreg, implicit %0:mqpr
		...
		---
		name: spill_prevention_predicated_vpnots
		alignment: 4
		body: \|
		; CHECK-LABEL: name: spill_prevention_predicated_vpnots
		; CHECK: bb.0:
		; CHECK: successors: %bb.1(0x80000000)
		; CHECK: [[MVE_VCMPs32_:%[0-9]+]]:vccr = MVE_VCMPs32 %1:mqpr, %2:mqpr, 10, 0, $noreg
		; CHECK: [[MVE_VPNOT:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPs32_]], 1, [[MVE_VCMPs32_]]
		; CHECK: [[MVE_VORR:%[0-9]+]]:mqpr = MVE_VORR %1:mqpr, %1:mqpr, 1, [[MVE_VCMPs32_]], undef [[MVE_VORR]]
		; CHECK: [[MVE_VORR1:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR]], [[MVE_VORR]], 1, [[MVE_VPNOT]], undef [[MVE_VORR1]]
		; CHECK: bb.1:
		; CHECK: [[MVE_VCMPs32_1:%[0-9]+]]:vccr = MVE_VCMPs32 %1:mqpr, %2:mqpr, 10, 0, $noreg
		; CHECK: [[MVE_VPNOT1:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPs32_1]], 1, [[MVE_VCMPs32_1]]
		; CHECK: [[MVE_VORR2:%[0-9]+]]:mqpr = MVE_VORR %1:mqpr, %2:mqpr, 1, [[MVE_VPNOT1]], undef [[MVE_VORR2]]
		; CHECK: [[MVE_VORR2:%[0-9]+]]:mqpr = MVE_VORR %1:mqpr, %1:mqpr, 1, [[MVE_VCMPs32_1]], undef [[MVE_VORR2]]
		; CHECK: [[MVE_VORR2:%[0-9]+]]:mqpr = MVE_VORR %2:mqpr, %1:mqpr, 1, [[MVE_VPNOT1]], undef [[MVE_VORR2]]
		; CHECK: tBX_RET 14 /* CC::al */, $noreg, implicit %1:mqpr
		;
		; Tests that predicated VPNOTs are not considered by this pass
		; (This means that these examples should not be optimized.)
		;
		bb.0:
		%2:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 10, 0, $noreg
		%3:vccr = MVE_VPNOT %2:vccr, 1, %2:vccr
		%4:mqpr = MVE_VORR %0:mqpr, %0:mqpr, 1, %2:vccr, undef %4:mqpr
		%5:mqpr = MVE_VORR %4:mqpr, %4:mqpr, 1, %3:vccr, undef %5:mqpr
		bb.1:
		%2:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 10, 0, $noreg
		%3:vccr = MVE_VPNOT %2:vccr, 1, %2:vccr
		%4:mqpr = MVE_VORR %0:mqpr, %1:mqpr, 1, %3:vccr, undef %4:mqpr
		%5:mqpr = MVE_VORR %0:mqpr, %0:mqpr, 1, %2:vccr, undef %5:mqpr
		%6:mqpr = MVE_VORR %1:mqpr, %0:mqpr, 1, %3:vccr, undef %6:mqpr
		tBX_RET 14, $noreg, implicit %0:mqpr
		...
		---
		name: spill_prevention_copies
		alignment: 4
		body: \|
		;
		; Tests that VPNOTs are replaced by a COPY instead of inserting a VPNOT
		; (which would result in a double VPNOT).
		;
		bb.0:
		; CHECK-LABEL: name: spill_prevention_copies
		; CHECK: [[MVE_VCMPs32_:%[0-9]+]]:vccr = MVE_VCMPs32 %1:mqpr, %2:mqpr, 10, 0, $noreg
		; CHECK: [[MVE_VPNOT:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPs32_]], 0, $noreg
		; CHECK: [[MVE_VORR:%[0-9]+]]:mqpr = MVE_VORR %1:mqpr, %1:mqpr, 1, [[MVE_VPNOT]], undef [[MVE_VORR]]
		; CHECK: [[MVE_VORR1:%[0-9]+]]:mqpr = MVE_VORR %1:mqpr, %1:mqpr, 1, [[MVE_VPNOT]], undef [[MVE_VORR1]]
		; CHECK: [[MVE_VORR2:%[0-9]+]]:mqpr = MVE_VORR %1:mqpr, %1:mqpr, 1, [[MVE_VPNOT]], undef [[MVE_VORR2]]
		; CHECK: tBX_RET 14 /* CC::al */, $noreg, implicit %1:mqpr
		%2:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 10, 0, $noreg
		%3:vccr = MVE_VPNOT %2:vccr, 0, $noreg
		%4:mqpr = MVE_VORR %0:mqpr, %0:mqpr, 1, %3:vccr, undef %4:mqpr
		%5:vccr = MVE_VPNOT %2:vccr, 0, $noreg
		%6:mqpr = MVE_VORR %0:mqpr, %0:mqpr, 1, %5:vccr, undef %6:mqpr
		%7:vccr = MVE_VPNOT %2:vccr, 0, $noreg
		%8:mqpr = MVE_VORR %0:mqpr, %0:mqpr, 1, %7:vccr, undef %8:mqpr
		tBX_RET 14, $noreg, implicit %0:mqpr
		...
		---
		name: spill_prevention_vpnot_reordering
		alignment: 4
		body: \|
		; CHECK-LABEL: name: spill_prevention_vpnot_reordering
		; CHECK: bb.0:
		; CHECK: successors: %bb.1(0x80000000)
		; CHECK: [[MVE_VCMPs32_:%[0-9]+]]:vccr = MVE_VCMPs32 %1:mqpr, %2:mqpr, 10, 0, $noreg
		; CHECK: [[MVE_VORR:%[0-9]+]]:mqpr = MVE_VORR %1:mqpr, %2:mqpr, 1, [[MVE_VCMPs32_]], undef [[MVE_VORR]]
		; CHECK: [[MVE_VORR1:%[0-9]+]]:mqpr = MVE_VORR %2:mqpr, %1:mqpr, 1, [[MVE_VCMPs32_]], undef [[MVE_VORR1]]
		; CHECK: [[MVE_VPNOT:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPs32_]], 0, $noreg
		; CHECK: [[MVE_VORR2:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR]], [[MVE_VORR1]], 1, [[MVE_VPNOT]], undef [[MVE_VORR2]]
		; CHECK: bb.1:
		; CHECK: [[MVE_VCMPs32_1:%[0-9]+]]:vccr = MVE_VCMPs32 %1:mqpr, %2:mqpr, 10, 0, $noreg
		; CHECK: [[MVE_VORR3:%[0-9]+]]:mqpr = MVE_VORR %1:mqpr, %2:mqpr, 1, [[MVE_VCMPs32_1]], undef [[MVE_VORR3]]
		; CHECK: [[MVE_VORR4:%[0-9]+]]:mqpr = MVE_VORR %2:mqpr, %1:mqpr, 1, [[MVE_VCMPs32_1]], undef [[MVE_VORR4]]
		; CHECK: [[MVE_VPNOT1:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPs32_1]], 0, $noreg
		; CHECK: [[MVE_VORR5:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR3]], [[MVE_VORR4]], 1, [[MVE_VPNOT1]], undef [[MVE_VORR5]]
		; CHECK: tBX_RET 14 /* CC::al */, $noreg, implicit %1:mqpr
		;
		; Tests that the first VPNOT is moved down when the result of the VCMP is used
		; before the first usage of the VPNOT's result.
		;
		bb.0:
		%2:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 10, 0, $noreg
		%3:vccr = MVE_VPNOT %2:vccr, 0, $noreg
		%4:mqpr = MVE_VORR %0:mqpr, %1:mqpr, 1, %2:vccr, undef %4:mqpr
		%5:mqpr = MVE_VORR %1:mqpr, %0:mqpr, 1, %2:vccr, undef %5:mqpr
		%6:mqpr = MVE_VORR %4:mqpr, %5:mqpr, 1, %3:vccr, undef %6:mqpr
		bb.1:
		; Test again with a "killed" flag to check if it's properly removed.
		%7:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 10, 0, $noreg
		%8:vccr = MVE_VPNOT %7:vccr, 0, $noreg
		%9:mqpr = MVE_VORR %0:mqpr, %1:mqpr, 1, %7:vccr, undef %9:mqpr
		%10:mqpr = MVE_VORR %1:mqpr, %0:mqpr, 1, killed %7:vccr, undef %10:mqpr
		%11:mqpr = MVE_VORR %9:mqpr, %10:mqpr, 1, %8:vccr, undef %11:mqpr
		tBX_RET 14, $noreg, implicit %0:mqpr
		...
		---
		name: spill_prevention_stop_after_write
		alignment: 4
		body: \|
		; CHECK-LABEL: name: spill_prevention_stop_after_write
		; CHECK: bb.0:
		; CHECK: successors: %bb.1(0x80000000)
		; CHECK: [[MVE_VCMPs32_:%[0-9]+]]:vccr = MVE_VCMPs32 %1:mqpr, %2:mqpr, 10, 0, $noreg
		; CHECK: [[MVE_VPNOT:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPs32_]], 0, $noreg
		; CHECK: [[MVE_VORR:%[0-9]+]]:mqpr = MVE_VORR %1:mqpr, %1:mqpr, 1, [[MVE_VPNOT]], undef [[MVE_VORR]]
		; CHECK: [[MVE_VPNOT1:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VPNOT]], 0, $noreg
		; CHECK: [[MVE_VORR1:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR]], [[MVE_VORR]], 1, [[MVE_VPNOT1]], undef [[MVE_VORR1]]
		; CHECK: [[VMSR_P0_:%[0-9]+]]:vccr = VMSR_P0 killed %7:gpr, 14 /* CC::al */, $noreg
		; CHECK: [[MVE_VORR2:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR1]], [[MVE_VORR1]], 1, [[MVE_VCMPs32_]], undef [[MVE_VORR2]]
		; CHECK: [[MVE_VORR3:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR2]], [[MVE_VORR2]], 1, [[MVE_VPNOT]], undef [[MVE_VORR3]]
		; CHECK: bb.1:
		; CHECK: [[MVE_VCMPs32_1:%[0-9]+]]:vccr = MVE_VCMPs32 %1:mqpr, %2:mqpr, 10, 0, $noreg
		; CHECK: [[MVE_VPNOT2:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPs32_1]], 0, $noreg
		; CHECK: [[MVE_VORR4:%[0-9]+]]:mqpr = MVE_VORR %1:mqpr, %1:mqpr, 1, [[MVE_VPNOT2]], undef [[MVE_VORR]]
		; CHECK: [[MVE_VPNOT3:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VPNOT2]], 0, $noreg
		; CHECK: [[MVE_VORR5:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR4]], [[MVE_VORR4]], 1, [[MVE_VPNOT3]], undef [[MVE_VORR5]]
		; CHECK: [[MVE_VCMPs32_2:%[0-9]+]]:vccr = MVE_VCMPs32 %2:mqpr, %1:mqpr, 10, 0, $noreg
		; CHECK: [[MVE_VORR6:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR5]], [[MVE_VORR5]], 1, [[MVE_VPNOT2]], undef [[MVE_VORR6]]
		; CHECK: [[MVE_VORR7:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR6]], [[MVE_VORR6]], 1, [[MVE_VCMPs32_1]], undef [[MVE_VORR7]]
		; CHECK: [[MVE_VORR8:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR7]], [[MVE_VORR7]], 1, [[MVE_VPNOT2]], undef [[MVE_VORR8]]
		;
		; Tests that the optimisation stops when it sees an instruction
		; that writes to VPR, and that doesn't use any of the registers we care about.
		;
		bb.0:
		%2:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 10, 0, $noreg
		%3:vccr = MVE_VPNOT %2:vccr, 0, $noreg
		%4:mqpr = MVE_VORR %0:mqpr, %0:mqpr, 1, %3:vccr, undef %4:mqpr
		%5:mqpr = MVE_VORR %4:mqpr, %4:mqpr, 1, %2:vccr, undef %5:mqpr
		%6:vccr = VMSR_P0 killed %20:gpr, 14, $noreg
		%7:mqpr = MVE_VORR %5:mqpr, %5:mqpr, 1, %2:vccr, undef %7:mqpr
		%8:mqpr = MVE_VORR %7:mqpr, %7:mqpr, 1, %3:vccr, undef %8:mqpr
		bb.1:
		%9:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 10, 0, $noreg
		%10:vccr = MVE_VPNOT %9:vccr, 0, $noreg
		%11:mqpr = MVE_VORR %0:mqpr, %0:mqpr, 1, %10:vccr, undef %4:mqpr
		%12:mqpr = MVE_VORR %11:mqpr, %11:mqpr, 1, %9:vccr, undef %12:mqpr
		%13:vccr = MVE_VCMPs32 %1:mqpr, %0:mqpr, 10, 0, $noreg
		%14:mqpr = MVE_VORR %12:mqpr, %12:mqpr, 1, %10:vccr, undef %14:mqpr
		%15:mqpr = MVE_VORR %14:mqpr, %14:mqpr, 1, %9:vccr, undef %15:mqpr
		%16:mqpr = MVE_VORR %15:mqpr, %15:mqpr, 1, %10:vccr, undef %16:mqpr
		...