This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/ARM/
-
Target/
-
ARM/
15/17
MVEVPTOptimisationsPass.cpp
-
test/CodeGen/Thumb2/
-
CodeGen/
-
Thumb2/
-
mve-vpt-optimisations.mir

Differential D76847

[Target][ARM] Replace re-uses of old VPR values with VPNOTs
ClosedPublic

Authored by Pierre-vh on Mar 26 2020, 7:21 AM.

Download Raw Diff

Details

Reviewers

dmgreen
SjoerdMeijer
samparker
simon_tatham
olista01

Commits

rGbf2183374a67: [Target][ARM] Replace re-uses of old VPR values with VPNOTs

Summary

This patch is adds another optimisation to the MVE VPT Optimisations pass introduced in the previous patch.
This optimisation replaces usages of old VPR (VCCR) values (within predicated instruction blocks) with VPNOTs in order to avoid spill/reloads.
Those VPNOTs can then be removed by the MVE VPT Block Insertion pass, resulting in clean/compact VPT blocks such as TEET, TETE, etc. instead of lots of small blocks with spill/reloads in-between.

Diff Detail

Event Timeline

Pierre-vh created this revision.Mar 26 2020, 7:21 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 26 2020, 7:21 AM

Herald added subscribers: llvm-commits, hiraditya, kristof.beyls. · View Herald Transcript

Pierre-vh added a parent revision: D76709: [Target][ARM] Adding MVE VPT Optimisation Pass.Mar 26 2020, 7:21 AM

Harbormaster failed remote builds in B50537: Diff 252838!Mar 26 2020, 8:06 AM

I'm surprised not to see more test changes, does this really have no effect on any existing tests?

In D76847#1945904, @samparker wrote:

I'm surprised not to see more test changes, does this really have no effect on any existing tests?

I don't think so, I ran check-llvm-codegen and it was green (with the changes in this patch and the parent patch).

Ok. Well, how about a couple more tests with differently ordered instructions too then? Like inserting the unpredicated VORR inbetween the predicated ones? And maybe some larger tests that would generate blocks with multiple instructions on a predicate before performing the inversion?

In D76847#1945904, @samparker wrote:

I'm surprised not to see more test changes, does this really have no effect on any existing tests?

This kind of code can come up from intrinsics a lot, but not necessarily from the relatively simple codegen tests we have. Someone can write intrinsics knowing that VPNOT's should be folded into a single VPT block, only for llvm to come along and nicely optimize them all away, producing IR that we end up with worse codegen for. This should help us get back to better assembly.

More testing does sound good.

llvm/lib/Target/ARM/MVEVPTOptimisationsPass.cpp
160–161	addUnpredicatedMveVpredNOp
398	Should this not be replacing _all_ uses of the old condition with the value from the VPNOT? As in, if we see input code like: a = VCMP .. use(a) b = VPNOT a .. use(b) .. use(a) Could it not be generally beneficial to insert a VPNOT between use of b and the second use of a? We could also probably have any amount of code between the predicated uses and it might still be beneficial, considering the costs of spills/reloads vs a single VPNOT.

Pierre-vh marked 3 inline comments as done.Mar 29 2020, 11:51 PM

Pierre-vh added inline comments.

llvm/lib/Target/ARM/MVEVPTOptimisationsPass.cpp
398	I believe it's already doing that. I'll add more tests to show it. I can certainly do the optimisation even outside blocks of predicated instruction, I just need to remove these lines and add a few tests: // Stop as soon as we leave the block of predicated instructions if (getVPTInstrPredicate(*Iter) == ARMVCC::None) break;

Fixed issues found during review (comments marked as done)
Improved the pass:
- It can now move VPNOTs before their first user when needed. This currently only happens when another VCCR value is used between the VPNOT and its first user.
- Instead of inserting VPNOTs before VPNOTs, it instead replaces the existing VPNOT with a COPY.
Added more tests, and they now use virtual registers instead.

Pierre-vh added a child revision: D77201: [CodeGen][SelectionDAG] Flip Booleans More Often.Apr 1 2020, 2:13 AM

Fixed bugs related to isKill flags in multiple places. I now correctly change the isKill flags when needed, and I added tests for that.
Rebased the patch - Added the mve-vpt-blocks.ll test.

Pierre-vh added a parent revision: D77798: [Target][ARM] Fix VPT Block Pass miscompilation.Apr 14 2020, 3:25 AM

Pierre-vh removed a parent revision: D76709: [Target][ARM] Adding MVE VPT Optimisation Pass.

Rebasing the patch because I inserted D77798 between this and D76709
- D77798 also added a new test in mve-pred-not and this pass improves it as well.

Pierre-vh added a child revision: D77712: [Target][ARM] Add PerformVSELECTCombine for MVE Integer Ops.Apr 14 2020, 3:47 AM

Pierre-vh removed a child revision: D77201: [CodeGen][SelectionDAG] Flip Booleans More Often.

Sorry for the delay.

llvm/lib/Target/ARM/MVEVPTOptimisationsPass.cpp
175	if at there -> if there
240	Does this only find the first pair in a basic block? What would happen if there are multiple (potential) vpt blocks in a larger basic block, and several of them can be optimized? Can this be done with a linear scan through the block that attempts to find and hold VPNOT's and the original value, converting any uses of the original to the VPNOT. Or does that not work very well for some reason? Maybe using something like MRI->use_instructions would also help.
256	Can you explain the advantage of moving the VPNOT?

Herald added a subscriber: danielkiss. · View Herald TranscriptApr 16 2020, 6:41 AM

Pierre-vh marked 3 inline comments as done.Apr 16 2020, 7:11 AM

Pierre-vh added inline comments.

llvm/lib/Target/ARM/MVEVPTOptimisationsPass.cpp
240	What would happen if there are multiple (potential) vpt blocks in a larger basic block, and several of them can be optimized? Unfortunately it'd only optimize the first one as it'd only pick up the first pair of VCMP/VPNOT and not the second. I can try to rewrite the function so it handles those cases, I'll do that tomorrow. Can this be done with a linear scan through the block that attempts to find and hold VPNOT's and the original value, converting any uses of the original to the VPNOT. Or does that not work very well for some reason? I don't think it'd work well, but I can certainly try it and see if it works. I'll also look at `use_instructions`, but I'm not sure it'll help.
256	Sometimes, you can have code like this (taken from the bottom test): %2:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 10, 0, $noreg %3:vccr = MVE_VPNOT %2:vccr, 0, $noreg %4:mqpr = MVE_VORR %0:mqpr, %1:mqpr, 1, %2:vccr, undef %4:mqpr %5:mqpr = MVE_VORR %1:mqpr, %0:mqpr, 1, %2:vccr, undef %5:mqpr %6:mqpr = MVE_VORR %4:mqpr, %5:mqpr, 1, %3:vccr, undef %6:mqpr `%3` is not used directly: the original VCCR value, `%2` is used before it, so their lifetimes overlap. If I didn't move the VPNOT further down in such situations, the pass would insert a double VPNOT, like this: %2:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 10, 0, $noreg %3:vccr = MVE_VPNOT %2:vccr, 0, $noreg %foo:vccr = MVE_VPNOT %3:vccr %4:mqpr = MVE_VORR %0:mqpr, %1:mqpr, 1, %foo:vccr, undef %4:mqpr %5:mqpr = MVE_VORR %1:mqpr, %0:mqpr, 1, %foo:vccr, undef %5:mqpr %bar:vccr = MVE_VPNOT %foo:vccr %6:mqpr = MVE_VORR %4:mqpr, %5:mqpr, 1, %bar:vccr, undef %6:mqpr But, since I now move the VPNOT further down, we get this instead, which is of course much better. // VPNOT moved down, no more overlapping VCCR lifetimes, no double VPNOTs. %2:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 10, 0, $noreg %4:mqpr = MVE_VORR %0:mqpr, %1:mqpr, 1, %2:vccr, undef %4:mqpr %5:mqpr = MVE_VORR %1:mqpr, %0:mqpr, 1, %2:vccr, undef %5:mqpr %3:vccr = MVE_VPNOT %2:vccr, 0, $noreg %6:mqpr = MVE_VORR %4:mqpr, %5:mqpr, 1, %3:vccr, undef %6:mqpr That transformation is of course only done if a use of the orginal VCCR value is found before the first use of the VPNOT's result.

Added support for optimising multiple VPT blocks in the same Basic Block, and added a test to show it
Now VCCRValue is only set for VCMPs. (It doesn't really make a difference, but makes it clear that only VCMPs are supported by this optimisation)

Pierre-vh marked 2 inline comments as done.Apr 17 2020, 4:42 AM

Pierre-vh edited child revisions, added: D78201: [Target][ARM] Replace outdated getARMVPTBlockMask function; removed: D77712: [Target][ARM] Add PerformVSELECTCombine for MVE Integer Ops.May 5 2020, 1:12 AM

dmgreen added inline comments.May 11 2020, 1:24 AM

llvm/lib/Target/ARM/MVEVPTOptimisationsPass.cpp
250	We not do this for predicated VCMP's?
257	findRegisterUseOperandIdx != -1
269	What makes us not do this for other opcodes? (I guess vcmp is the vast majority of cases?)
286	The second loop -> This second loop, just to be a little clearer.
302	Do we need to add the copy? We could just use LastVPNOTResult directly? Or would that be more complex?
305	Set Modified = true here too I think.

Minor refactoring of the patch
The pass is no longer limited to VCMPs for VCCRValue, it can now use any instruction that writes to VPR (e.g. VMSR)
The pass no longer replaces VPNOT with copies - it just removes the VPNOT and replaces all of its uses.
Other minor fixes

llvm/lib/Target/ARM/MVEVPTOptimisationsPass.cpp
250	Currently, no, to keep things simple. I think that allowing predicated vcmps would require a bit more analysis to ensure that we don't make things worse by accident, and I'm not even sure it'd be worth it (are there any cases where it could be beneficial?) %0:vccr = vpt ... %1:vccr = vcmp /* predicated on %0 / %2:vccr = vcmp / unpredicated, opposite of %1 */ If we replace the second vcmp with `vpnot %1:vccr`, what happens if the first vcmp isn't executed?
269	It's just that vcmp is the vast majority of cases. There's no issue with enabling it for other instructions that write to VPR. I've removed this restriction and added a test with vmsr.

LGTM. Thanks.

llvm/lib/Target/ARM/MVEVPTOptimisationsPass.cpp
250	It's not about "not executing", a predicated vcmp acts as a "and %0, <newcond>". Overlapping ranges of different predicates certainly does sound more complex though. Lets leave that for other patches if we find cases where it's needed.

This revision is now accepted and ready to land.May 12 2020, 2:16 AM

Closed by commit rGbf2183374a67: [Target][ARM] Replace re-uses of old VPR values with VPNOTs (authored by Pierre-vh). · Explain WhyMay 12 2020, 4:48 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

Target/

ARM/

MVEVPTOptimisationsPass.cpp

233 lines

test/

CodeGen/

Thumb2/

mve-vpt-optimisations.mir

206 lines

Diff 253848

llvm/lib/Target/ARM/MVEVPTOptimisationsPass.cpp

//===-- MVEVPTOptimisationsPass.cpp ---------------------------------------===//		//===-- MVEVPTOptimisationsPass.cpp ---------------------------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
/// \file This pass does a few optimisations related to MVE VPT blocks before		/// \file This pass does a few optimisations related to MVE VPT blocks before
/// register allocation is performed. The goal is to maximize the sizes of the		/// register allocation is performed. The goal is to maximize the sizes of the
/// blocks that will be created by the MVE VPT Block Insertion pass (which runs		/// blocks that will be created by the MVE VPT Block Insertion pass (which runs
/// after register allocation). Currently, this pass replaces VCMPs with VPNOTs		/// after register allocation). The first optimisation done by this pass is the
/// when possible, so the Block Insertion pass can delete them later to create		/// replacement of "opposite" VCMPs with VPNOTs, so the Block Insertion pass
/// larger VPT blocks.		/// can delete them later to create larger VPT blocks.
		/// The second optimisation replaces re-uses of old VCCR values with VPNOTs when
		/// inside a block of predicated instructions. This is done to avoid
		/// spill/reloads of VPR in the middle of a block, which prevents the Block
		/// Insertion pass from creating large blocks.
		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "ARM.h"		#include "ARM.h"
#include "ARMSubtarget.h"		#include "ARMSubtarget.h"
#include "MCTargetDesc/ARMBaseInfo.h"		#include "MCTargetDesc/ARMBaseInfo.h"
#include "Thumb2InstrInfo.h"		#include "Thumb2InstrInfo.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/CodeGen/MachineBasicBlock.h"		#include "llvm/CodeGen/MachineBasicBlock.h"
Show All 18 Lines	public:

bool runOnMachineFunction(MachineFunction &Fn) override;		bool runOnMachineFunction(MachineFunction &Fn) override;

StringRef getPassName() const override {		StringRef getPassName() const override {
return "ARM MVE VPT Optimisation Pass";		return "ARM MVE VPT Optimisation Pass";
}		}

private:		private:
MachineInstrBuilder BuildVPNOTBefore(MachineBasicBlock &MBB,		MachineInstr &ReplaceRegisterUseWithVPNOT(MachineBasicBlock &MBB,
MachineInstr &Instr);
MachineInstr &ReplaceUsageOfRegisterByVPNOT(MachineBasicBlock &MBB,
MachineInstr &Instr,		MachineInstr &Instr,
unsigned OpIdx, Register Target);		MachineOperand &User,
		Register Target);
		bool ReduceOldVCCRValueUses(MachineBasicBlock &MBB);
bool ReplaceVCMPsByVPNOTs(MachineBasicBlock &MBB);		bool ReplaceVCMPsByVPNOTs(MachineBasicBlock &MBB);
};		};

char MVEVPTOptimisations::ID = 0;		char MVEVPTOptimisations::ID = 0;

} // end anonymous namespace		} // end anonymous namespace

INITIALIZE_PASS(MVEVPTOptimisations, DEBUG_TYPE,		INITIALIZE_PASS(MVEVPTOptimisations, DEBUG_TYPE,
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	static bool IsVPNOTEquivalent(MachineInstr &Cond, MachineInstr &Prev) {
// Check again with operands swapped if possible		// Check again with operands swapped if possible
if (!CanHaveSwappedOperands(Cond.getOpcode()))		if (!CanHaveSwappedOperands(Cond.getOpcode()))
return false;		return false;
ExpectedCode = ARMCC::getSwappedCondition(ExpectedCode);		ExpectedCode = ARMCC::getSwappedCondition(ExpectedCode);
return ExpectedCode == GetCondCode(Prev) && CondOP1.isIdenticalTo(PrevOP2) &&		return ExpectedCode == GetCondCode(Prev) && CondOP1.isIdenticalTo(PrevOP2) &&
CondOP2.isIdenticalTo(PrevOP1);		CondOP2.isIdenticalTo(PrevOP1);
}		}

// Returns true if Instr writes to VCCR or VPR.		// Returns true if Instr writes to VCCR.
static bool IsWritingToVCCRorVPR(MachineInstr &Instr) {		static bool IsWritingToVCCR(MachineInstr &Instr) {
if (Instr.getNumOperands() == 0)		if (Instr.getNumOperands() == 0)
return false;		return false;
MachineOperand &Dst = Instr.getOperand(0);		MachineOperand &Dst = Instr.getOperand(0);
if (!Dst.isReg())		if (!Dst.isReg())
return false;		return false;
Register DstReg = Dst.getReg();		Register DstReg = Dst.getReg();
if (!DstReg.isVirtual())		if (!DstReg.isVirtual())
return false;		return false;
MachineRegisterInfo &RegInfo = Instr.getMF()->getRegInfo();		MachineRegisterInfo &RegInfo = Instr.getMF()->getRegInfo();
const TargetRegisterClass *RegClass = RegInfo.getRegClassOrNull(DstReg);		const TargetRegisterClass *RegClass = RegInfo.getRegClassOrNull(DstReg);
return RegClass && (RegClass->getID() == ARM::VCCRRegClassID);		return RegClass && (RegClass->getID() == ARM::VCCRRegClassID);
}		}

		// Transforms
		// <Instr that uses %A ('User' Operand)>
		// Into
		// %K = VPNOT %Target
		// <Instr that uses %K ('User' Operand)>
		// And returns the newly inserted VPNOT.
		// This optimization is done in the hopes of preventing spills/reloads of VPR by
		// reducing the number of VCCR values with overlapping lifetimes.
		MachineInstr &MVEVPTOptimisations::ReplaceRegisterUseWithVPNOT(
		MachineBasicBlock &MBB, MachineInstr &Instr, MachineOperand &User,
		Register Target) {
		Register NewResult = MRI->createVirtualRegister(MRI->getRegClass(Target));

		MachineInstrBuilder MIBuilder =
		BuildMI(MBB, &Instr, Instr.getDebugLoc(), TII->get(ARM::MVE_VPNOT))
		.addDef(NewResult)
		.addReg(Target);
		addUnpredicatedMveVpredNOp(MIBuilder);
		dmgreenUnsubmitted Done Reply Inline Actions addUnpredicatedMveVpredNOp dmgreen: addUnpredicatedMveVpredNOp

		User.setReg(NewResult);

		LLVM_DEBUG(dbgs() << " Inserting VPNOT (for spill prevention): ";
		MIBuilder.getInstr()->dump());

		return *MIBuilder.getInstr();
		}

		// Moves a VPNOT before its first user if an instruction that uses Reg is found
		// in-between the VPNOT and its user.
		// Returns true if at there is at least one user of the VPNOT in the block.
		static bool MoveVPNOTBeforeFirstUser(MachineBasicBlock &MBB,
		MachineBasicBlock::iterator Iter,
		dmgreenUnsubmitted Done Reply Inline Actions if at there -> if there dmgreen: if at there -> if there
		Register Reg) {
		assert(Iter->getOpcode() == ARM::MVE_VPNOT && "Not a VPNOT!");
		assert(getVPTInstrPredicate(*Iter) == ARMVCC::None &&
		"The VPNOT cannot be predicated");

		MachineInstr &VPNOT = *Iter;
		Register VPNOTResult = VPNOT.getOperand(0).getReg();

		// Whether the VPNOT will need to be moved, and whether we found a user of the
		// VPNOT.
		bool MustMove = false, HasUser = false;
		for (; Iter != MBB.end(); ++Iter) {
		if (Iter->findRegisterUseOperandIdx(Reg) != -1) {
		MustMove = true;
		continue;
		}

		if (Iter->findRegisterUseOperandIdx(VPNOTResult) == -1)
		continue;

		HasUser = true;
		if (!MustMove)
		break;

		// Move the VPNOT right before Iter
		LLVM_DEBUG(dbgs() << "Moving: "; VPNOT.dump(); dbgs() << " Before: ";
		Iter->dump());
		MBB.splice(Iter, &MBB, VPNOT.getIterator());
		break;
		}
		return HasUser;
		}

		// This optimisation attempts to reduce the number of overlapping lifetimes of
		// VCCR values by replacing uses of old VCCR values with VPNOTs. For example,
		// this replaces
		// %A:vccr = (something)
		// %B:vccr = VPNOT %A
		// %Foo = (some op that uses %B)
		// %Bar = (some op that uses %A)
		// With
		// %A:vccr = (something)
		// %B:vccr = VPNOT %A
		// %Foo = (some op that uses %B)
		// %TMP2:vccr = VPNOT %B
		// %Bar = (some op that uses %A)
		bool MVEVPTOptimisations::ReduceOldVCCRValueUses(MachineBasicBlock &MBB) {
		MachineBasicBlock::iterator Iter = MBB.begin(), End = MBB.end();

		Register VCCRValue, OppositeVCCRValue;
		// The first loop looks for 2 unpredicated instructions:
		// %A:vccr = (instr) ; A is stored in VCCRValue
		// %B:vccr = VPNOT %A ; B is stored in OppositeVCCRValue
		for (; Iter != End; ++Iter) {
		// We're only interested in unpredicated instructions that write to VCCR.
		if (!IsWritingToVCCR(Iter) \|\| getVPTInstrPredicate(Iter) != ARMVCC::None)
		continue;
		Register Dst = Iter->getOperand(0).getReg();

		// If we already have a VCCRValue, and this is a VPNOT on VCCRValue, we've
		// found what we were looking for.
		if (VCCRValue && Iter->getOpcode() == ARM::MVE_VPNOT &&
		Iter->findRegisterUseOperand(VCCRValue)) {
		// Move the VPNOT closer to its first user if needed, and ignore if it has
		// no users.
		dmgreenUnsubmitted Done Reply Inline Actions Does this only find the first pair in a basic block? What would happen if there are multiple (potential) vpt blocks in a larger basic block, and several of them can be optimized? Can this be done with a linear scan through the block that attempts to find and hold VPNOT's and the original value, converting any uses of the original to the VPNOT. Or does that not work very well for some reason? Maybe using something like MRI->use_instructions would also help. dmgreen: Does this only find the first pair in a basic block? What would happen if there are multiple…
		Pierre-vhAuthorUnsubmitted Done Reply Inline Actions What would happen if there are multiple (potential) vpt blocks in a larger basic block, and several of them can be optimized? Unfortunately it'd only optimize the first one as it'd only pick up the first pair of VCMP/VPNOT and not the second. I can try to rewrite the function so it handles those cases, I'll do that tomorrow. Can this be done with a linear scan through the block that attempts to find and hold VPNOT's and the original value, converting any uses of the original to the VPNOT. Or does that not work very well for some reason? I don't think it'd work well, but I can certainly try it and see if it works. I'll also look at `use_instructions`, but I'm not sure it'll help. Pierre-vh: > What would happen if there are multiple (potential) vpt blocks in a larger basic block, and…
		if (!MoveVPNOTBeforeFirstUser(MBB, Iter, VCCRValue))
		continue;

		OppositeVCCRValue = Dst;
		++Iter;
		break;
		}

		// Else, just set VCCRValue and continue.
		VCCRValue = Dst;
		dmgreenUnsubmitted Not Done Reply Inline Actions We not do this for predicated VCMP's? dmgreen: We not do this for predicated VCMP's?
		Pierre-vhAuthorUnsubmitted Done Reply Inline Actions Currently, no, to keep things simple. I think that allowing predicated vcmps would require a bit more analysis to ensure that we don't make things worse by accident, and I'm not even sure it'd be worth it (are there any cases where it could be beneficial?) %0:vccr = vpt ... %1:vccr = vcmp /* predicated on %0 / %2:vccr = vcmp / unpredicated, opposite of %1 / If we replace the second vcmp with `vpnot %1:vccr`, what happens if the first vcmp isn't executed? Pierre-vh:* Currently, no, to keep things simple. I think that allowing predicated vcmps would require a…
		dmgreenUnsubmitted Not Done Reply Inline Actions It's not about "not executing", a predicated vcmp acts as a "and %0, <newcond>". Overlapping ranges of different predicates certainly does sound more complex though. Lets leave that for other patches if we find cases where it's needed. dmgreen: It's not about "not executing", a predicated vcmp acts as a "and %0, <newcond>". Overlapping…
		}

		// If the first loop didn't find anything, stop here.
		if (Iter == End)
		return false;

		dmgreenUnsubmitted Done Reply Inline Actions Can you explain the advantage of moving the VPNOT? dmgreen: Can you explain the advantage of moving the VPNOT?
		Pierre-vhAuthorUnsubmitted Done Reply Inline Actions Sometimes, you can have code like this (taken from the bottom test): %2:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 10, 0, $noreg %3:vccr = MVE_VPNOT %2:vccr, 0, $noreg %4:mqpr = MVE_VORR %0:mqpr, %1:mqpr, 1, %2:vccr, undef %4:mqpr %5:mqpr = MVE_VORR %1:mqpr, %0:mqpr, 1, %2:vccr, undef %5:mqpr %6:mqpr = MVE_VORR %4:mqpr, %5:mqpr, 1, %3:vccr, undef %6:mqpr `%3` is not used directly: the original VCCR value, `%2` is used before it, so their lifetimes overlap. If I didn't move the VPNOT further down in such situations, the pass would insert a double VPNOT, like this: %2:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 10, 0, $noreg %3:vccr = MVE_VPNOT %2:vccr, 0, $noreg %foo:vccr = MVE_VPNOT %3:vccr %4:mqpr = MVE_VORR %0:mqpr, %1:mqpr, 1, %foo:vccr, undef %4:mqpr %5:mqpr = MVE_VORR %1:mqpr, %0:mqpr, 1, %foo:vccr, undef %5:mqpr %bar:vccr = MVE_VPNOT %foo:vccr %6:mqpr = MVE_VORR %4:mqpr, %5:mqpr, 1, %bar:vccr, undef %6:mqpr But, since I now move the VPNOT further down, we get this instead, which is of course much better. // VPNOT moved down, no more overlapping VCCR lifetimes, no double VPNOTs. %2:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 10, 0, $noreg %4:mqpr = MVE_VORR %0:mqpr, %1:mqpr, 1, %2:vccr, undef %4:mqpr %5:mqpr = MVE_VORR %1:mqpr, %0:mqpr, 1, %2:vccr, undef %5:mqpr %3:vccr = MVE_VPNOT %2:vccr, 0, $noreg %6:mqpr = MVE_VORR %4:mqpr, %5:mqpr, 1, %3:vccr, undef %6:mqpr That transformation is of course only done if a use of the orginal VCCR value is found before the first use of the VPNOT's result. Pierre-vh: Sometimes, you can have code like this (taken from the bottom test): ``` %2:vccr =…
		assert(VCCRValue && OppositeVCCRValue &&
		dmgreenUnsubmitted Done Reply Inline Actions findRegisterUseOperandIdx != -1 dmgreen: findRegisterUseOperandIdx != -1
		"VCCRValue and OppositeVCCRValue shouldn't be empty if the loop "
		"stopped before the end of the block!");
		assert(VCCRValue != OppositeVCCRValue &&
		"VCCRValue should not be equal to OppositeVCCRValue!");

		bool Modified = false;
		SmallVector<MachineInstr *, 4> DeadInstructions;

		// LastVPNOTResult always contains the same value as OppositeVCCRValue.
		Register LastVPNOTResult = OppositeVCCRValue;

		// Try to optimize the remaining instructions.
		dmgreenUnsubmitted Done Reply Inline Actions What makes us not do this for other opcodes? (I guess vcmp is the vast majority of cases?) dmgreen: What makes us not do this for other opcodes? (I guess vcmp is the vast majority of cases?)
		Pierre-vhAuthorUnsubmitted Done Reply Inline Actions It's just that vcmp is the vast majority of cases. There's no issue with enabling it for other instructions that write to VPR. I've removed this restriction and added a test with vmsr. Pierre-vh: It's just that vcmp is the vast majority of cases. There's no issue with enabling it for other…
		for (; Iter != End; ++Iter) {
		// If this instr uses VCCRValue, we can do something about it.
		if (MachineOperand *MO = Iter->findRegisterUseOperand(VCCRValue)) {
		if (Iter->getOpcode() == ARM::MVE_VPNOT) {
		// Instead of inserting a VPNOT before a VPNOT (= doing a double not),
		// replace the existing VPNOT with COPY LastVPNOTResult.
		Register CopyResult = Iter->getOperand(0).getReg();

		MachineInstrBuilder MIB =
		BuildMI(MBB, &*Iter, Iter->getDebugLoc(), TII->get(ARM::COPY))
		// Use same destination, but copy LastVPNOTResult instead.
		.addDef(CopyResult)
		.addReg(LastVPNOTResult);
		DeadInstructions.push_back(&*Iter);

		// Treat the result of the copy as the LastVPNOTResult
		LastVPNOTResult = CopyResult;
		dmgreenUnsubmitted Done Reply Inline Actions The second loop -> This second loop, just to be a little clearer. dmgreen: The second loop -> This second loop, just to be a little clearer.

		LLVM_DEBUG(dbgs() << "Replacing: "; Iter->dump(); dbgs() << " With: ";
		MIB.getInstr()->dump());
		} else {
		// Replace this usage of VCCRValue by the result of a VPNOT on
		// LastVPNOTResult.
		MachineInstr &VPNOT =
		ReplaceRegisterUseWithVPNOT(MBB, Iter, MO, LastVPNOTResult);
		Modified = true;

		// The result of the VPNOT we just inserted becomes the new
		// LastVPNOTResult, and VCCRValue/OppositeVCCRValue are swapped.
		LastVPNOTResult = VPNOT.getOperand(0).getReg();
		std::swap(VCCRValue, OppositeVCCRValue);

		LLVM_DEBUG(dbgs() << "Replacing usage of '" << printReg(VCCRValue)
		dmgreenUnsubmitted Done Reply Inline Actions Do we need to add the copy? We could just use LastVPNOTResult directly? Or would that be more complex? dmgreen: Do we need to add the copy? We could just use LastVPNOTResult directly? Or would that be more…
		<< "' with '" << printReg(LastVPNOTResult)
		<< " for instr: ";
		Iter->dump());
		dmgreenUnsubmitted Done Reply Inline Actions Set Modified = true here too I think. dmgreen: Set Modified = true here too I think.
		}
		} else {
		// Else, if it uses OppositeVCCRValue, make it use LastVPNOTResult
		// instead.
		if (MachineOperand *Use =
		Iter->findRegisterUseOperand(OppositeVCCRValue)) {
		// This is pointless if LastVPNOTResult == OppositeVCCRValue.
		if (LastVPNOTResult != OppositeVCCRValue) {
		LLVM_DEBUG(dbgs() << "Replacing usage of '"
		<< printReg(OppositeVCCRValue) << "' with '"
		<< printReg(LastVPNOTResult) << " for instr: ";
		Iter->dump());
		Use->setReg(LastVPNOTResult);
		Modified = true;
		}
		}

		// If this instr is an unpredicated VPNOT on LastVPNOTResult or
		// OppositeVCCRValue, swap VCCRValue/OppositeVCCRValue and set
		// LastVPNOTResult to the result of this instr.
		if (Iter->getOpcode() == ARM::MVE_VPNOT &&
		getVPTInstrPredicate(*Iter) == ARMVCC::None) {
		Register VPNOTOperand = Iter->getOperand(1).getReg();
		if (VPNOTOperand == LastVPNOTResult \|\|
		VPNOTOperand == OppositeVCCRValue) {
		std::swap(VCCRValue, OppositeVCCRValue);
		LastVPNOTResult = Iter->getOperand(0).getReg();
		}
		}
		}
		}

		for (MachineInstr *DeadInstruction : DeadInstructions)
		DeadInstruction->removeFromParent();

		return Modified;
		}

// This optimisation replaces VCMPs with VPNOTs when they are equivalent.		// This optimisation replaces VCMPs with VPNOTs when they are equivalent.
bool MVEVPTOptimisations::ReplaceVCMPsByVPNOTs(MachineBasicBlock &MBB) {		bool MVEVPTOptimisations::ReplaceVCMPsByVPNOTs(MachineBasicBlock &MBB) {
SmallVector<MachineInstr *, 4> DeadInstructions;		SmallVector<MachineInstr *, 4> DeadInstructions;

// The last VCMP that we have seen and that couldn't be replaced.		// The last VCMP that we have seen and that couldn't be replaced.
// This is reset when an instruction that writes to VCCR/VPR is found, or when		// This is reset when an instruction that writes to VCCR/VPR is found, or when
// a VCMP is replaced with a VPNOT.		// a VCMP is replaced with a VPNOT.
// We'll only replace VCMPs with VPNOTs when this is not null, and when the		// We'll only replace VCMPs with VPNOTs when this is not null, and when the
// current VCMP is the opposite of PrevVCMP.		// current VCMP is the opposite of PrevVCMP.
MachineInstr *PrevVCMP = nullptr;		MachineInstr *PrevVCMP = nullptr;

for (MachineInstr &Instr : MBB.instrs()) {		for (MachineInstr &Instr : MBB.instrs()) {
// Ignore predicated instructions.		// Ignore predicated instructions.
if (getVPTInstrPredicate(Instr) != ARMVCC::None)		if (getVPTInstrPredicate(Instr) != ARMVCC::None)
continue;		continue;

// Only look at VCMPs		// Only look at VCMPs
if (!IsVCMP(Instr.getOpcode())) {		if (!IsVCMP(Instr.getOpcode())) {
// If the instruction writes to VPR (VCCR), forget the previous VCMP.		// If the instruction writes to VCCR, forget the previous VCMP.
if (IsWritingToVCCRorVPR(Instr))		if (IsWritingToVCCR(Instr))
PrevVCMP = nullptr;		PrevVCMP = nullptr;
continue;		continue;
}		}

if (!PrevVCMP \|\| !IsVPNOTEquivalent(Instr, *PrevVCMP)) {		if (!PrevVCMP \|\| !IsVPNOTEquivalent(Instr, *PrevVCMP)) {
PrevVCMP = &Instr;		PrevVCMP = &Instr;
continue;		continue;
}		}
Show All 18 Lines	bool MVEVPTOptimisations::ReplaceVCMPsByVPNOTs(MachineBasicBlock &MBB) {

for (MachineInstr *DeadInstruction : DeadInstructions)		for (MachineInstr *DeadInstruction : DeadInstructions)
DeadInstruction->removeFromParent();		DeadInstruction->removeFromParent();

return !DeadInstructions.empty();		return !DeadInstructions.empty();
}		}

bool MVEVPTOptimisations::runOnMachineFunction(MachineFunction &Fn) {		bool MVEVPTOptimisations::runOnMachineFunction(MachineFunction &Fn) {
const ARMSubtarget &STI =		const ARMSubtarget &STI =
		dmgreenUnsubmitted Done Reply Inline Actions Should this not be replacing _all_ uses of the old condition with the value from the VPNOT? As in, if we see input code like: a = VCMP .. use(a) b = VPNOT a .. use(b) .. use(a) Could it not be generally beneficial to insert a VPNOT between use of b and the second use of a? We could also probably have any amount of code between the predicated uses and it might still be beneficial, considering the costs of spills/reloads vs a single VPNOT. dmgreen: Should this not be replacing _all_ uses of the old condition with the value from the VPNOT? As…
		Pierre-vhAuthorUnsubmitted Done Reply Inline Actions I believe it's already doing that. I'll add more tests to show it. I can certainly do the optimisation even outside blocks of predicated instruction, I just need to remove these lines and add a few tests: // Stop as soon as we leave the block of predicated instructions if (getVPTInstrPredicate(Iter) == ARMVCC::None) break; Pierre-vh:* I believe it's already doing that. I'll add more tests to show it. I can certainly do the…
static_cast<const ARMSubtarget &>(Fn.getSubtarget());		static_cast<const ARMSubtarget &>(Fn.getSubtarget());

if (!STI.isThumb2() \|\| !STI.hasMVEIntegerOps())		if (!STI.isThumb2() \|\| !STI.hasMVEIntegerOps())
return false;		return false;

TII = static_cast<const Thumb2InstrInfo *>(STI.getInstrInfo());		TII = static_cast<const Thumb2InstrInfo *>(STI.getInstrInfo());
MRI = &Fn.getRegInfo();		MRI = &Fn.getRegInfo();

LLVM_DEBUG(dbgs() << "******** ARM MVE VPT Optimisations ********\n"		LLVM_DEBUG(dbgs() << "******** ARM MVE VPT Optimisations ********\n"
<< "********** Function: " << Fn.getName() << '\n');		<< "********** Function: " << Fn.getName() << '\n');

bool Modified = false;		bool Modified = false;
for (MachineBasicBlock &MBB : Fn)		for (MachineBasicBlock &MBB : Fn) {
Modified \|= ReplaceVCMPsByVPNOTs(MBB);		Modified \|= ReplaceVCMPsByVPNOTs(MBB);
		Modified \|= ReduceOldVCCRValueUses(MBB);
		}

LLVM_DEBUG(dbgs() << "**************************************\n");		LLVM_DEBUG(dbgs() << "**************************************\n");
return Modified;		return Modified;
}		}

/// createMVEVPTOptimisationsPass		/// createMVEVPTOptimisationsPass
FunctionPass *llvm::createMVEVPTOptimisationsPass() {		FunctionPass *llvm::createMVEVPTOptimisationsPass() {
return new MVEVPTOptimisations();		return new MVEVPTOptimisations();
}		}

llvm/test/CodeGen/Thumb2/mve-vpt-optimisations.mir

Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	entry:
ret <4 x float> %inactive1		ret <4 x float> %inactive1
}		}

define arm_aapcs_vfpcc <4 x float> @vpr_or_vccr_write_between_vcmps(<4 x float> %inactive1) #0 {		define arm_aapcs_vfpcc <4 x float> @vpr_or_vccr_write_between_vcmps(<4 x float> %inactive1) #0 {
entry:		entry:
ret <4 x float> %inactive1		ret <4 x float> %inactive1
}		}

		define arm_aapcs_vfpcc <4 x float> @spill_prevention(<4 x float> %inactive1) #0 {
		entry:
		ret <4 x float> %inactive1
		}

		define arm_aapcs_vfpcc <4 x float> @spill_prevention_predicated_vpnots(<4 x float> %inactive1) #0 {
		entry:
		ret <4 x float> %inactive1
		}

		define arm_aapcs_vfpcc <4 x float> @spill_prevention_copies(<4 x float> %inactive1) #0 {
		entry:
		ret <4 x float> %inactive1
		}

		define arm_aapcs_vfpcc <4 x float> @spill_prevention_vpnot_reordering(<4 x float> %inactive1) #0 {
		entry:
		ret <4 x float> %inactive1
		}

attributes #0 = { "target-features"="+armv8.1-m.main,+hwdiv,+mve.fp,+ras,+thumb-mode" }		attributes #0 = { "target-features"="+armv8.1-m.main,+hwdiv,+mve.fp,+ras,+thumb-mode" }
...		...
---		---
name: vcmp_with_opposite_cond		name: vcmp_with_opposite_cond
alignment: 4		alignment: 4
body: \|		body: \|
; CHECK-LABEL: name: vcmp_with_opposite_cond		; CHECK-LABEL: name: vcmp_with_opposite_cond
; CHECK: bb.0:		; CHECK: bb.0:
▲ Show 20 Lines • Show All 457 Lines • ▼ Show 20 Lines	bb.0:
; CHECK: [[MVE_VPNOT:%[0-9]+]]:vccr = MVE_VPNOT killed [[MVE_VCMPs32_]], 0, $noreg		; CHECK: [[MVE_VPNOT:%[0-9]+]]:vccr = MVE_VPNOT killed [[MVE_VCMPs32_]], 0, $noreg
; CHECK: [[MVE_VCMPs32_1:%[0-9]+]]:vccr = MVE_VCMPs32 %2:mqpr, %1:mqpr, 10, 0, $noreg		; CHECK: [[MVE_VCMPs32_1:%[0-9]+]]:vccr = MVE_VCMPs32 %2:mqpr, %1:mqpr, 10, 0, $noreg
; CHECK: tBX_RET 14 /* CC::al */, $noreg, implicit %1:mqpr		; CHECK: tBX_RET 14 /* CC::al */, $noreg, implicit %1:mqpr
%2:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 12, 0, $noreg		%2:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 12, 0, $noreg
%3:vccr = MVE_VPNOT killed %2:vccr, 0, $noreg		%3:vccr = MVE_VPNOT killed %2:vccr, 0, $noreg
%4:vccr = MVE_VCMPs32 %1:mqpr, %0:mqpr, 10, 0, $noreg		%4:vccr = MVE_VCMPs32 %1:mqpr, %0:mqpr, 10, 0, $noreg
tBX_RET 14, $noreg, implicit %0:mqpr		tBX_RET 14, $noreg, implicit %0:mqpr
...		...
		---
		name: spill_prevention
		alignment: 4
		body: \|
		; CHECK-LABEL: name: spill_prevention
		; CHECK: bb.0:
		; CHECK: successors: %bb.1(0x80000000)
		; CHECK: [[MVE_VCMPs32_:%[0-9]+]]:vccr = MVE_VCMPs32 %1:mqpr, %2:mqpr, 10, 0, $noreg
		; CHECK: [[MVE_VPNOT:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPs32_]], 0, $noreg
		; CHECK: [[MVE_VORR:%[0-9]+]]:mqpr = MVE_VORR %1:mqpr, %1:mqpr, 1, [[MVE_VPNOT]], undef [[MVE_VORR]]
		; CHECK: [[MVE_VPNOT1:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VPNOT]], 0, $noreg
		; CHECK: [[MVE_VORR1:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR]], [[MVE_VORR]], 1, [[MVE_VPNOT1]], undef [[MVE_VORR1]]
		; CHECK: [[MVE_VPNOT2:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VPNOT1]], 0, $noreg
		; CHECK: [[MVE_VORR2:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR1]], [[MVE_VORR1]], 1, [[MVE_VPNOT2]], undef [[MVE_VORR2]]
		; CHECK: [[MVE_VPNOT3:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VPNOT2]], 0, $noreg
		; CHECK: [[MVE_VORR3:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR2]], [[MVE_VORR2]], 1, [[MVE_VPNOT3]], undef [[MVE_VORR3]]
		; CHECK: [[MVE_VPNOT4:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VPNOT3]], 0, $noreg
		; CHECK: [[MVE_VORR4:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR3]], [[MVE_VORR3]], 1, [[MVE_VPNOT4]], undef [[MVE_VORR4]]
		; CHECK: bb.1:
		; CHECK: successors: %bb.2(0x80000000)
		; CHECK: [[MVE_VCMPs32_1:%[0-9]+]]:vccr = MVE_VCMPs32 %1:mqpr, %2:mqpr, 10, 0, $noreg
		; CHECK: [[MVE_VPNOT5:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPs32_1]], 0, $noreg
		; CHECK: [[MVE_VORR5:%[0-9]+]]:mqpr = MVE_VORR %1:mqpr, %1:mqpr, 1, [[MVE_VPNOT5]], undef [[MVE_VORR5]]
		; CHECK: [[MVE_VORR6:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR5]], [[MVE_VORR5]], 0, $noreg, undef [[MVE_VORR6]]
		; CHECK: [[MVE_VPNOT6:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VPNOT5]], 0, $noreg
		; CHECK: [[MVE_VORR7:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR6]], [[MVE_VORR6]], 1, [[MVE_VPNOT6]], undef [[MVE_VORR7]]
		; CHECK: [[MVE_VORR8:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR7]], [[MVE_VORR7]], 0, $noreg, undef [[MVE_VORR8]]
		; CHECK: [[MVE_VPNOT7:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VPNOT6]], 0, $noreg
		; CHECK: [[MVE_VORR9:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR8]], [[MVE_VORR8]], 1, [[MVE_VPNOT7]], undef [[MVE_VORR9]]
		; CHECK: bb.2:
		; CHECK: successors: %bb.3(0x80000000)
		; CHECK: [[MVE_VCMPs32_2:%[0-9]+]]:vccr = MVE_VCMPs32 %1:mqpr, %2:mqpr, 10, 0, $noreg
		; CHECK: [[MVE_VPNOT8:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPs32_2]], 0, $noreg
		; CHECK: [[MVE_VORR10:%[0-9]+]]:mqpr = MVE_VORR %1:mqpr, %1:mqpr, 1, [[MVE_VPNOT8]], undef [[MVE_VORR10]]
		; CHECK: [[MVE_VORR11:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR10]], [[MVE_VORR10]], 1, [[MVE_VPNOT8]], undef [[MVE_VORR11]]
		; CHECK: [[MVE_VPNOT9:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VPNOT8]], 0, $noreg
		; CHECK: [[MVE_VORR12:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR11]], [[MVE_VORR11]], 1, [[MVE_VPNOT9]], undef [[MVE_VORR12]]
		; CHECK: [[MVE_VORR13:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR12]], [[MVE_VORR12]], 1, [[MVE_VPNOT9]], undef [[MVE_VORR13]]
		; CHECK: [[MVE_VPNOT10:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VPNOT9]], 0, $noreg
		; CHECK: [[MVE_VORR14:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR13]], [[MVE_VORR13]], 1, [[MVE_VPNOT10]], undef [[MVE_VORR14]]
		; CHECK: [[MVE_VORR15:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR14]], [[MVE_VORR14]], 1, [[MVE_VPNOT10]], undef [[MVE_VORR15]]
		; CHECK: [[MVE_VPNOT11:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VPNOT10]], 0, $noreg
		; CHECK: [[MVE_VORR16:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR15]], [[MVE_VORR15]], 1, [[MVE_VPNOT11]], undef [[MVE_VORR16]]
		; CHECK: [[MVE_VORR17:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR16]], [[MVE_VORR16]], 1, [[MVE_VPNOT11]], undef [[MVE_VORR17]]
		; CHECK: bb.3:
		; CHECK: [[MVE_VCMPs32_3:%[0-9]+]]:vccr = MVE_VCMPs32 %1:mqpr, %2:mqpr, 10, 0, $noreg
		; CHECK: [[MVE_VPNOT12:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPs32_3]], 0, $noreg
		; CHECK: [[MVE_VORR18:%[0-9]+]]:mqpr = MVE_VORR %1:mqpr, %1:mqpr, 1, [[MVE_VPNOT12]], undef [[MVE_VORR11]]
		; CHECK: [[MVE_VPNOT13:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VPNOT12]], 0, $noreg
		; CHECK: [[MVE_VORR19:%[0-9]+]]:mqpr = MVE_VORR %1:mqpr, %1:mqpr, 1, [[MVE_VPNOT13]], undef [[MVE_VORR19]]
		; CHECK: tBX_RET 14 /* CC::al */, $noreg, implicit %1:mqpr
		bb.0:
		;
		; Basic test case
		;
		%2:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 10, 0, $noreg
		%3:vccr = MVE_VPNOT %2:vccr, 0, $noreg
		%4:mqpr = MVE_VORR %0:mqpr, %0:mqpr, 1, %3:vccr, undef %4:mqpr
		%5:mqpr = MVE_VORR %4:mqpr, %4:mqpr, 1, %2:vccr, undef %5:mqpr
		%6:mqpr = MVE_VORR %5:mqpr, %5:mqpr, 1, %3:vccr, undef %6:mqpr
		%7:mqpr = MVE_VORR %6:mqpr, %6:mqpr, 1, %2:vccr, undef %7:mqpr
		%8:mqpr = MVE_VORR %7:mqpr, %7:mqpr, 1, %3:vccr, undef %8:mqpr
		bb.1:
		;
		; Tests that unpredicated instructions in the middle of the block
		; don't interfere with the replacement.
		;
		%9:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 10, 0, $noreg
		%10:vccr = MVE_VPNOT %9:vccr, 0, $noreg
		%11:mqpr = MVE_VORR %0:mqpr, %0:mqpr, 1, %10:vccr, undef %11:mqpr
		%12:mqpr = MVE_VORR %11:mqpr, %11:mqpr, 0, $noreg, undef %12:mqpr
		%13:mqpr = MVE_VORR %12:mqpr, %12:mqpr, 1, %9:vccr, undef %13:mqpr
		%14:mqpr = MVE_VORR %13:mqpr, %13:mqpr, 0, $noreg, undef %14:mqpr
		%15:mqpr = MVE_VORR %14:mqpr, %14:mqpr, 1, %10:vccr, undef %15:mqpr
		bb.2:
		;
		; Tests that all uses of the register are replaced, even when it's used
		; multiple times in a row.
		;
		%16:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 10, 0, $noreg
		%17:vccr = MVE_VPNOT %16:vccr, 0, $noreg
		%18:mqpr = MVE_VORR %0:mqpr, %0:mqpr, 1, %17:vccr, undef %18:mqpr
		%19:mqpr = MVE_VORR %18:mqpr, %18:mqpr, 1, %17:vccr, undef %19:mqpr
		%20:mqpr = MVE_VORR %19:mqpr, %19:mqpr, 1, %16:vccr, undef %20:mqpr
		%21:mqpr = MVE_VORR %20:mqpr, %20:mqpr, 1, %16:vccr, undef %21:mqpr
		%22:mqpr = MVE_VORR %21:mqpr, %21:mqpr, 1, %17:vccr, undef %22:mqpr
		%23:mqpr = MVE_VORR %22:mqpr, %22:mqpr, 1, %17:vccr, undef %23:mqpr
		%24:mqpr = MVE_VORR %23:mqpr, %23:mqpr, 1, %16:vccr, undef %24:mqpr
		%25:mqpr = MVE_VORR %24:mqpr, %24:mqpr, 1, %16:vccr, undef %25:mqpr
		bb.3:
		;
		; Tests that already present VPNOTs are "registered" by the pass so
		; it does not insert a useless VPNOT.
		;
		%26:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 10, 0, $noreg
		%27:vccr = MVE_VPNOT %26:vccr, 0, $noreg
		%28:mqpr = MVE_VORR %0:mqpr, %0:mqpr, 1, %27:vccr, undef %19:mqpr
		%29:vccr = MVE_VPNOT %27:vccr, 0, $noreg
		%30:mqpr = MVE_VORR %0:mqpr, %0:mqpr, 1, %26:vccr, undef %30:mqpr
		tBX_RET 14, $noreg, implicit %0:mqpr
		...
		---
		name: spill_prevention_predicated_vpnots
		alignment: 4
		body: \|
		; CHECK-LABEL: name: spill_prevention_predicated_vpnots
		; CHECK: bb.0:
		; CHECK: successors: %bb.1(0x80000000)
		; CHECK: [[MVE_VCMPs32_:%[0-9]+]]:vccr = MVE_VCMPs32 %1:mqpr, %2:mqpr, 10, 0, $noreg
		; CHECK: [[MVE_VPNOT:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPs32_]], 1, [[MVE_VCMPs32_]]
		; CHECK: [[MVE_VORR:%[0-9]+]]:mqpr = MVE_VORR %1:mqpr, %1:mqpr, 1, [[MVE_VCMPs32_]], undef [[MVE_VORR]]
		; CHECK: [[MVE_VORR1:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR]], [[MVE_VORR]], 1, [[MVE_VPNOT]], undef [[MVE_VORR1]]
		; CHECK: bb.1:
		; CHECK: [[MVE_VCMPs32_1:%[0-9]+]]:vccr = MVE_VCMPs32 %1:mqpr, %2:mqpr, 10, 0, $noreg
		; CHECK: [[MVE_VPNOT1:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPs32_1]], 1, [[MVE_VCMPs32_1]]
		; CHECK: [[MVE_VORR2:%[0-9]+]]:mqpr = MVE_VORR %1:mqpr, %2:mqpr, 1, [[MVE_VPNOT1]], undef [[MVE_VORR2]]
		; CHECK: [[MVE_VORR2:%[0-9]+]]:mqpr = MVE_VORR %1:mqpr, %1:mqpr, 1, [[MVE_VCMPs32_1]], undef [[MVE_VORR2]]
		; CHECK: [[MVE_VORR2:%[0-9]+]]:mqpr = MVE_VORR %2:mqpr, %1:mqpr, 1, [[MVE_VPNOT1]], undef [[MVE_VORR2]]
		; CHECK: tBX_RET 14 /* CC::al */, $noreg, implicit %1:mqpr
		;
		; Tests that predicated VPNOTs are not considered by this pass
		; (This means that these examples should not be optimized.)
		;
		bb.0:
		%2:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 10, 0, $noreg
		%3:vccr = MVE_VPNOT %2:vccr, 1, %2:vccr
		%4:mqpr = MVE_VORR %0:mqpr, %0:mqpr, 1, %2:vccr, undef %4:mqpr
		%5:mqpr = MVE_VORR %4:mqpr, %4:mqpr, 1, %3:vccr, undef %5:mqpr
		bb.1:
		%2:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 10, 0, $noreg
		%3:vccr = MVE_VPNOT %2:vccr, 1, %2:vccr
		%4:mqpr = MVE_VORR %0:mqpr, %1:mqpr, 1, %3:vccr, undef %4:mqpr
		%5:mqpr = MVE_VORR %0:mqpr, %0:mqpr, 1, %2:vccr, undef %5:mqpr
		%6:mqpr = MVE_VORR %1:mqpr, %0:mqpr, 1, %3:vccr, undef %6:mqpr
		tBX_RET 14, $noreg, implicit %0:mqpr
		...
		---
		name: spill_prevention_copies
		alignment: 4
		body: \|
		;
		; Tests that VPNOTs are replaced by a COPY instead of inserting a VPNOT
		; (which would result in a double VPNOT).
		;
		bb.0:
		; CHECK-LABEL: name: spill_prevention_copies
		; CHECK: [[MVE_VCMPs32_:%[0-9]+]]:vccr = MVE_VCMPs32 %1:mqpr, %2:mqpr, 10, 0, $noreg
		; CHECK: [[MVE_VPNOT:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPs32_]], 0, $noreg
		; CHECK: [[MVE_VORR:%[0-9]+]]:mqpr = MVE_VORR %1:mqpr, %1:mqpr, 1, [[MVE_VPNOT]], undef [[MVE_VORR]]
		; CHECK: [[COPY:%[0-9]+]]:vccr = COPY [[MVE_VPNOT]]
		; CHECK: [[MVE_VORR1:%[0-9]+]]:mqpr = MVE_VORR %1:mqpr, %1:mqpr, 1, [[COPY]], undef [[MVE_VORR1]]
		; CHECK: [[COPY1:%[0-9]+]]:vccr = COPY [[COPY]]
		; CHECK: [[MVE_VORR2:%[0-9]+]]:mqpr = MVE_VORR %1:mqpr, %1:mqpr, 1, [[COPY1]], undef [[MVE_VORR2]]
		; CHECK: tBX_RET 14 /* CC::al */, $noreg, implicit %1:mqpr
		%2:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 10, 0, $noreg
		%3:vccr = MVE_VPNOT %2:vccr, 0, $noreg
		%4:mqpr = MVE_VORR %0:mqpr, %0:mqpr, 1, %3:vccr, undef %4:mqpr
		%5:vccr = MVE_VPNOT %2:vccr, 0, $noreg
		%6:mqpr = MVE_VORR %0:mqpr, %0:mqpr, 1, %5:vccr, undef %6:mqpr
		%7:vccr = MVE_VPNOT %2:vccr, 0, $noreg
		%8:mqpr = MVE_VORR %0:mqpr, %0:mqpr, 1, %7:vccr, undef %8:mqpr
		tBX_RET 14, $noreg, implicit %0:mqpr
		...
		---
		name: spill_prevention_vpnot_reordering
		alignment: 4
		body: \|
		;
		; Tests that the first VPNOT is moved down when the result of the VCMP is used
		; before the first usage of the VPNOT's result.
		;
		bb.0:
		; CHECK-LABEL: name: spill_prevention_vpnot_reordering
		; CHECK: [[MVE_VCMPs32_:%[0-9]+]]:vccr = MVE_VCMPs32 %1:mqpr, %2:mqpr, 10, 0, $noreg
		; CHECK: [[MVE_VORR:%[0-9]+]]:mqpr = MVE_VORR %1:mqpr, %2:mqpr, 1, [[MVE_VCMPs32_]], undef [[MVE_VORR]]
		; CHECK: [[MVE_VORR1:%[0-9]+]]:mqpr = MVE_VORR %2:mqpr, %1:mqpr, 1, [[MVE_VCMPs32_]], undef [[MVE_VORR1]]
		; CHECK: [[MVE_VPNOT:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPs32_]], 0, $noreg
		; CHECK: [[MVE_VORR2:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR]], [[MVE_VORR1]], 1, [[MVE_VPNOT]], undef [[MVE_VORR2]]
		; CHECK: tBX_RET 14 /* CC::al */, $noreg, implicit %1:mqpr
		%2:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 10, 0, $noreg
		%3:vccr = MVE_VPNOT %2:vccr, 0, $noreg
		%4:mqpr = MVE_VORR %0:mqpr, %1:mqpr, 1, %2:vccr, undef %4:mqpr
		%6:mqpr = MVE_VORR %1:mqpr, %0:mqpr, 1, %2:vccr, undef %6:mqpr
		%7:mqpr = MVE_VORR %4:mqpr, %6:mqpr, 1, %3:vccr, undef %7:mqpr
		tBX_RET 14, $noreg, implicit %0:mqpr
		...