This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/ARM/
-
Target/
-
ARM/
-
ARM.h
1/1
ARMTargetMachine.cpp
-
CMakeLists.txt
11/11
MVEVPTOptimisationsPass.cpp
-
test/CodeGen/
-
CodeGen/
-
ARM/
-
O3-pipeline.ll
-
Thumb2/
-
mve-vpt-blocks.ll
6/8
mve-vpt-optimisations.mir

Differential D76709

[Target][ARM] Adding MVE VPT Optimisation Pass
ClosedPublic

Authored by Pierre-vh on Mar 24 2020, 8:26 AM.

Download Raw Diff

Details

Reviewers

dmgreen
SjoerdMeijer
samparker
simon_tatham
olista01

Commits

rG456302435625: [Target][ARM] Adding MVE VPT Optimisation Pass

Summary

This patch adds a pass called "MVE VPT Optimisations", which does a few optimisations before register allocation.
The goal of this pass is to maximize the size of the VPT blocks created by the MVE VPT Block Insertion pass.

Currently, this pass:

Replaces VPCMPs with VPNOTs when possible.
- The instruction selector in its current state doesn't generate VPNOTs very often. Instead, it generates a VCMP with the operands swapped and the condition reversed. This pass spots those VCMPs and transforms them into VPNOTs.
- Why generate more VPNOTs? So the MVE VPT Block Insertion pass can use them (& remove them) to create larger/more complex VPT blocks (e.g. TEET, TETE, etc.)
Replaces usages of old VPR values with VPNOTs when inside a block of predicated instructions.
- This is done to avoid overlapping lifetimes of different VPR values, reducing the chance that a spill/reload occurs.
- Why ? Spill/reloads of VPR are particularly harmful to the MVE VPT Block Insertion Pass: it prevents it from creating large VPT blocks.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

Pierre-vh created this revision.Mar 24 2020, 8:26 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 24 2020, 8:26 AM

Herald added subscribers: llvm-commits, danielkiss, hiraditya and 2 others. · View Herald Transcript

Pierre-vh added a parent revision: D75993: [Target][ARM] Improvements to the VPT Block Insertion Pass.Mar 24 2020, 8:31 AM

Harbormaster completed remote builds in B50260: Diff 252329.Mar 24 2020, 9:07 AM

The goal of this pass is to maximize the size of the VPT blocks created by the MVE VPT Block Insertion pass.

Just a general question first, why a separate pass? Why e.g. not just doing this in the MVE VPT Block Insertion pass?

In D76709#1939901, @SjoerdMeijer wrote:

The goal of this pass is to maximize the size of the VPT blocks created by the MVE VPT Block Insertion pass.

Just a general question first, why a separate pass? Why e.g. not just doing this in the MVE VPT Block Insertion pass?

The first optimisation (VCMPs into VPNOTs) could technically be done in the MVE VPT Block insertion pass, but I think it's better to do it in this new pass instead of overloading the block Insertion pass too much.
The second optimisation (VPNOT Insertion for "spill prevention") can't be done in the Block Insertion pass as it needs to be done before register allocation (= before spill/reload instructions are emitted).
Overall, since both optimisations deal with VPNOTs and aren't directly related to VPT Block insertion/creation, I felt that it was a good idea to put them in a separate pass.

Is it possible to split this into two patches? The pass and "replaces VCMPs with VPNOTs when possible" part, then the second part to replace the re-use with the not. I think that would make each part easier to review, more manageable.

The first optimisation (VCMPs into VPNOTs) could technically be done in the MVE VPT Block insertion pass, but I think it's better to do it in this new pass instead of overloading the block Insertion pass too much.
The second optimisation (VPNOT Insertion for "spill prevention") can't be done in the Block Insertion pass as it needs to be done before register allocation (= before spill/reload instructions are emitted).
Overall, since both optimisations deal with VPNOTs and aren't directly related to VPT Block insertion/creation, I felt that it was a good idea to put them in a separate pass.

Plus (I think) in order to do #2, it's easier if you have already done #1.

llvm/lib/Target/ARM/ARMTargetMachine.cpp
490	Perhaps put this into the below getOptLevel() != CodeGenOpt::None block? As it is an optimisation
llvm/lib/Target/ARM/MVEVPTOptimisationsPass.cpp
8	Can you add a file comment here like in other files, explaining what the pass does, like you have in the commit message.
55	Could you just use VCMPOpcodeToVPT, and check the return value isn't 0? To save this extra list being needed here.
llvm/test/CodeGen/Thumb2/mve-vcmpf.ll
700 ↗	(On Diff #252329)	What's going on here? I think it needs to be checking q1 <= q0 && q0 < q1. ord means ordered, which means no NaN's. (Floating point compares can be tricky like that).
llvm/test/CodeGen/Thumb2/mve-vpt-optimisations.mir
2	I think this test would be simpler if there was a number of test functions inside it, each testing one thing (or maybe a small collection of things). Otherwise they run into each other a bit, and it will be hard to tell what's wrong if one of them starts to fail.

In D76709#1940297, @dmgreen wrote:

Is it possible to split this into two patches? The pass and "replaces VCMPs with VPNOTs when possible" part, then the second part to replace the re-use with the not. I think that would make each part easier to review, more manageable.

The first optimisation (VCMPs into VPNOTs) could technically be done in the MVE VPT Block insertion pass, but I think it's better to do it in this new pass instead of overloading the block Insertion pass too much.
The second optimisation (VPNOT Insertion for "spill prevention") can't be done in the Block Insertion pass as it needs to be done before register allocation (= before spill/reload instructions are emitted).
Overall, since both optimisations deal with VPNOTs and aren't directly related to VPT Block insertion/creation, I felt that it was a good idea to put them in a separate pass.

Plus (I think) in order to do #2, it's easier if you have already done #1.

Unfortunately, I've written this pass in a single commit, so there is no easy way for me to split this patch in 2.
I can do it if you want, it's not impossible, but it's going to take me a while to get right. Also, the second optimisation is a relatively small part of this patch, so patch #1 would still be a large patch.

llvm/test/CodeGen/Thumb2/mve-vcmpf.ll
700 ↗	(On Diff #252329)	I didn't know that. In which case is it not "safe" to replace a float VCMP even when the conditions are met? What would be the correct behaviour in this case? Should I disable this optimisation for all float VCMPs?
llvm/test/CodeGen/Thumb2/mve-vpt-optimisations.mir
2	Would putting them all in different basic blocks be enough, or do you prefer functions? The optimisation is done per basic-block, so it'd behave the same as function (but the test would be shorter).

Unfortunately, I've written this pass in a single commit, so there is no easy way for me to split this patch in 2.
I can do it if you want, it's not impossible, but it's going to take me a while to get right. Also, the second optimisation is a relatively small part of this patch, so patch #1 would still be a large patch.

Sure. If it's relatively small it should hopefully be simple enough to pull out into a separate commit.

Also can you make sure that "register" and "zero" variants of the VCMP's are tested. The inverting login around those might be different to a normal compare.

llvm/test/CodeGen/Thumb2/mve-vcmpf.ll
700 ↗	(On Diff #252329)	Umm. I think for floats you cannot swap the operands. That only works for integers. You can still use the opposite condition code (lt <> ge, for example). Have a look at getOppositeCondition and isValidMVECond in ISelLowering for how it's done there.
llvm/test/CodeGen/Thumb2/mve-vpt-optimisations.mir
2	Separate functions would be more canonical. You can hopefully simplify the test quite a bit to make it so that's not too verbose.
10–16	Functions don't actually need bodies. They can just contain unreachable. And if the test below doesn't bare any resemblance to the code here, that is probably for the better.
21–23	I think a lot of this can probably be removed, if you replace +mve.fp it with command line options.
28–65	I think you can remove a lot of this.

samparker added inline comments.Mar 25 2020, 7:36 AM

llvm/lib/Target/ARM/MVEVPTOptimisationsPass.cpp
128	Maybe you could the MachineInstr method 'definesRegister' instead?

Refactoring the patch as requested + fixing a few issues.

Pierre-vh added a child revision: D76847: [Target][ARM] Replace re-uses of old VPR values with VPNOTs.Mar 26 2020, 7:21 AM

Adding newline at the end of the file and refactoring canHaveOperandsSwapped: It has been renamed to CanHaveSwappedOperands and it now returns true for everything except for all VCMPr and VCMPf16/f32 instructions.

dmgreen added inline comments.Mar 29 2020, 3:08 PM

llvm/lib/Target/ARM/MVEVPTOptimisationsPass.cpp
142	How much does this function add? It doesn't seem to do a huge amount.
159–160	Do you have any tests for this bit?
180–181	You can do BuildMI(...).add(Instr.getOperand(0)).addReg(PrevVCMPResultReg)...
183–184	Operand 4 and 5 are always None and noreg? If so you can use addUnpredicatedMveVpredNOp, which makes it more obvious what the operands are expected to be.
llvm/test/CodeGen/Thumb2/mve-vpt-optimisations.mir
155–156	I was expecting these registers to be virtual, given where this is in the pipeline. Will they be physical instead?

Pierre-vh marked 7 inline comments as done.Mar 29 2020, 11:31 PM

Pierre-vh added inline comments.

llvm/lib/Target/ARM/MVEVPTOptimisationsPass.cpp
142	Sure, it doesn't add much, I'll remove it.
159–160	Not yet, I'll add some.
llvm/test/CodeGen/Thumb2/mve-vpt-optimisations.mir
155–156	In the pipeline, they'd be virtual. Should I replace all $vpr/$q0/$q1 here with virtual registers ? It won't make much of a difference testing wise but I can understand that virtual registers would be preferred.

Fixed issues found in review (comments marked as done)
Added "ARM" in front of the pass's name.
Changed the test so it uses virtual registers everywhere.

Looking good.

Can you add some extra ll tests, made from intrinsics that show different blocks being created, but testing the entire backend. All kinds of things from simple to complicated if you can. A good collection of VCMP, conditional VCMP, other conditional instructions and VPNOT's in different positions. Then add --verify-machine-instr to it. It doesn't matter if they are not all optimal yet, they will show what is working well and what isn't yet.

- "rebased" the patch - I renamed IsWritingToVCCRorVPR to IsWritingToVCCR in the child revision, so I renamed it here as well. I also removed the line that checked if ARM::VPR was used, as it was useless outside of tests.
- I added a new test, mve-vpt-blocks.ll that uses some intrinsics to attempt to generate all possible VPT blocks from LLVM IR. In this patch, it doesn't generate every block successfully due to spill/reloads, but that's fixed in the child revision.
Finally, I fixed a bug related to isKill flags on register and added a test for it.

dmgreen added inline comments.Apr 2 2020, 11:53 PM

llvm/lib/Target/ARM/MVEVPTOptimisationsPass.cpp
193	Should we be clearing this value at some points? Setting it back to nullptr?

Pierre-vh marked 2 inline comments as done.Apr 3 2020, 12:01 AM

Pierre-vh added inline comments.

llvm/lib/Target/ARM/MVEVPTOptimisationsPass.cpp
193	Sure, it should be cleared after. I'll see if I can add a test for that as well.

PrevVCMPResultKiller is now correctly reset back to nullptr, but I didn't add a test for it as it was not useful (there was nothing to test).
It's pretty much an NFC, the behaviour is the exact same as before, but it's indeed more correct to reset it once a VCMP has been replaced by a VPNOT.

The reason why this change doesn't really impact the behaviour of the pass is that PrevVCMPResultKiller is either:

- nullptr, so nothing happens.
Contains the "VCMP Result Killer" for the current PrevVCMP (Which is what we want)
Contains the "VCMP Result Killer" for *a previous* PrevVCMP, which is not correct but doesn't cause issues, it's just going to call setIsKill(false) twice on the same operand.
- This case would only happen if 2 VCMPs are replaced with VPNOTs in the same basic block, but the first one set PrevVCMPResultKiller while the second one didn't.

Correctly resetting it to nullptr after use just removes the third possibility.

Nice one. LGTM

This revision is now accepted and ready to land.Apr 3 2020, 2:10 AM

Pierre-vh added a child revision: D77798: [Target][ARM] Fix VPT Block Pass miscompilation.Apr 14 2020, 3:25 AM

Pierre-vh removed a child revision: D76847: [Target][ARM] Replace re-uses of old VPR values with VPNOTs.

Pierre-vh mentioned this in D76847: [Target][ARM] Replace re-uses of old VPR values with VPNOTs.Apr 14 2020, 3:43 AM

Closed by commit rG456302435625: [Target][ARM] Adding MVE VPT Optimisation Pass (authored by Pierre-vh). · Explain WhyApr 14 2020, 7:28 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

Target/

ARM/

ARM.h

2 lines

ARMTargetMachine.cpp

3 lines

CMakeLists.txt

1 line

MVEVPTOptimisationsPass.cpp

232 lines

test/

CodeGen/

ARM/

O3-pipeline.ll

1 line

Thumb2/

mve-vpt-blocks.ll

323 lines

mve-vpt-optimisations.mir

547 lines

Diff 257326

llvm/lib/Target/ARM/ARM.h

Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	FunctionPass *createARMISelDag(ARMBaseTargetMachine &TM,
CodeGenOpt::Level OptLevel);		CodeGenOpt::Level OptLevel);
FunctionPass *createA15SDOptimizerPass();		FunctionPass *createA15SDOptimizerPass();
FunctionPass *createARMLoadStoreOptimizationPass(bool PreAlloc = false);		FunctionPass *createARMLoadStoreOptimizationPass(bool PreAlloc = false);
FunctionPass *createARMExpandPseudoPass();		FunctionPass *createARMExpandPseudoPass();
FunctionPass *createARMConstantIslandPass();		FunctionPass *createARMConstantIslandPass();
FunctionPass *createMLxExpansionPass();		FunctionPass *createMLxExpansionPass();
FunctionPass *createThumb2ITBlockPass();		FunctionPass *createThumb2ITBlockPass();
FunctionPass *createMVEVPTBlockPass();		FunctionPass *createMVEVPTBlockPass();
		FunctionPass *createMVEVPTOptimisationsPass();
FunctionPass *createARMOptimizeBarriersPass();		FunctionPass *createARMOptimizeBarriersPass();
FunctionPass *createThumb2SizeReductionPass(		FunctionPass *createThumb2SizeReductionPass(
std::function<bool(const Function &)> Ftor = nullptr);		std::function<bool(const Function &)> Ftor = nullptr);
InstructionSelector *		InstructionSelector *
createARMInstructionSelector(const ARMBaseTargetMachine &TM, const ARMSubtarget &STI,		createARMInstructionSelector(const ARMBaseTargetMachine &TM, const ARMSubtarget &STI,
const ARMRegisterBankInfo &RBI);		const ARMRegisterBankInfo &RBI);
Pass *createMVEGatherScatterLoweringPass();		Pass *createMVEGatherScatterLoweringPass();

void LowerARMMachineInstrToMCInst(const MachineInstr *MI, MCInst &OutMI,		void LowerARMMachineInstrToMCInst(const MachineInstr *MI, MCInst &OutMI,
ARMAsmPrinter &AP);		ARMAsmPrinter &AP);

void initializeARMParallelDSPPass(PassRegistry &);		void initializeARMParallelDSPPass(PassRegistry &);
void initializeARMLoadStoreOptPass(PassRegistry &);		void initializeARMLoadStoreOptPass(PassRegistry &);
void initializeARMPreAllocLoadStoreOptPass(PassRegistry &);		void initializeARMPreAllocLoadStoreOptPass(PassRegistry &);
void initializeARMConstantIslandsPass(PassRegistry &);		void initializeARMConstantIslandsPass(PassRegistry &);
void initializeARMExpandPseudoPass(PassRegistry &);		void initializeARMExpandPseudoPass(PassRegistry &);
void initializeThumb2SizeReducePass(PassRegistry &);		void initializeThumb2SizeReducePass(PassRegistry &);
void initializeThumb2ITBlockPass(PassRegistry &);		void initializeThumb2ITBlockPass(PassRegistry &);
void initializeMVEVPTBlockPass(PassRegistry &);		void initializeMVEVPTBlockPass(PassRegistry &);
		void initializeMVEVPTOptimisationsPass(PassRegistry &);
void initializeARMLowOverheadLoopsPass(PassRegistry &);		void initializeARMLowOverheadLoopsPass(PassRegistry &);
void initializeMVETailPredicationPass(PassRegistry &);		void initializeMVETailPredicationPass(PassRegistry &);
void initializeMVEGatherScatterLoweringPass(PassRegistry &);		void initializeMVEGatherScatterLoweringPass(PassRegistry &);

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_LIB_TARGET_ARM_ARM_H		#endif // LLVM_LIB_TARGET_ARM_ARM_H

llvm/lib/Target/ARM/ARMTargetMachine.cpp

Show First 20 Lines • Show All 90 Lines • ▼ Show 20 Lines	extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeARMTarget() {
initializeARMLoadStoreOptPass(Registry);		initializeARMLoadStoreOptPass(Registry);
initializeARMPreAllocLoadStoreOptPass(Registry);		initializeARMPreAllocLoadStoreOptPass(Registry);
initializeARMParallelDSPPass(Registry);		initializeARMParallelDSPPass(Registry);
initializeARMConstantIslandsPass(Registry);		initializeARMConstantIslandsPass(Registry);
initializeARMExecutionDomainFixPass(Registry);		initializeARMExecutionDomainFixPass(Registry);
initializeARMExpandPseudoPass(Registry);		initializeARMExpandPseudoPass(Registry);
initializeThumb2SizeReducePass(Registry);		initializeThumb2SizeReducePass(Registry);
initializeMVEVPTBlockPass(Registry);		initializeMVEVPTBlockPass(Registry);
		initializeMVEVPTOptimisationsPass(Registry);
initializeMVETailPredicationPass(Registry);		initializeMVETailPredicationPass(Registry);
initializeARMLowOverheadLoopsPass(Registry);		initializeARMLowOverheadLoopsPass(Registry);
initializeMVEGatherScatterLoweringPass(Registry);		initializeMVEGatherScatterLoweringPass(Registry);
}		}

static std::unique_ptr<TargetLoweringObjectFile> createTLOF(const Triple &TT) {		static std::unique_ptr<TargetLoweringObjectFile> createTLOF(const Triple &TT) {
if (TT.isOSBinFormatMachO())		if (TT.isOSBinFormatMachO())
return std::make_unique<TargetLoweringObjectFileMachO>();		return std::make_unique<TargetLoweringObjectFileMachO>();
▲ Show 20 Lines • Show All 374 Lines • ▼ Show 20 Lines
}		}

bool ARMPassConfig::addGlobalInstructionSelect() {		bool ARMPassConfig::addGlobalInstructionSelect() {
addPass(new InstructionSelect());		addPass(new InstructionSelect());
return false;		return false;
}		}

void ARMPassConfig::addPreRegAlloc() {		void ARMPassConfig::addPreRegAlloc() {
if (getOptLevel() != CodeGenOpt::None) {		if (getOptLevel() != CodeGenOpt::None) {
		dmgreenUnsubmitted Done Reply Inline Actions Perhaps put this into the below getOptLevel() != CodeGenOpt::None block? As it is an optimisation dmgreen: Perhaps put this into the below getOptLevel() != CodeGenOpt::None block? As it is an…
		addPass(createMVEVPTOptimisationsPass());

addPass(createMLxExpansionPass());		addPass(createMLxExpansionPass());

if (EnableARMLoadStoreOpt)		if (EnableARMLoadStoreOpt)
addPass(createARMLoadStoreOptimizationPass(/* pre-register alloc */ true));		addPass(createARMLoadStoreOptimizationPass(/* pre-register alloc */ true));

if (!DisableA15SDOptimization)		if (!DisableA15SDOptimization)
addPass(createA15SDOptimizerPass());		addPass(createA15SDOptimizerPass());
}		}
▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/CMakeLists.txt

Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	add_llvm_target(ARMCodeGen
ARMSubtarget.cpp		ARMSubtarget.cpp
ARMTargetMachine.cpp		ARMTargetMachine.cpp
ARMTargetObjectFile.cpp		ARMTargetObjectFile.cpp
ARMTargetTransformInfo.cpp		ARMTargetTransformInfo.cpp
MLxExpansionPass.cpp		MLxExpansionPass.cpp
MVEGatherScatterLowering.cpp		MVEGatherScatterLowering.cpp
MVETailPredication.cpp		MVETailPredication.cpp
MVEVPTBlockPass.cpp		MVEVPTBlockPass.cpp
		MVEVPTOptimisationsPass.cpp
Thumb1FrameLowering.cpp		Thumb1FrameLowering.cpp
Thumb1InstrInfo.cpp		Thumb1InstrInfo.cpp
ThumbRegisterInfo.cpp		ThumbRegisterInfo.cpp
Thumb2ITBlockPass.cpp		Thumb2ITBlockPass.cpp
Thumb2InstrInfo.cpp		Thumb2InstrInfo.cpp
Thumb2SizeReduction.cpp		Thumb2SizeReduction.cpp
)		)

add_subdirectory(AsmParser)		add_subdirectory(AsmParser)
add_subdirectory(Disassembler)		add_subdirectory(Disassembler)
add_subdirectory(MCTargetDesc)		add_subdirectory(MCTargetDesc)
add_subdirectory(TargetInfo)		add_subdirectory(TargetInfo)
add_subdirectory(Utils)		add_subdirectory(Utils)

llvm/lib/Target/ARM/MVEVPTOptimisationsPass.cpp

This file was added.

				//===-- MVEVPTOptimisationsPass.cpp ---------------------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				dmgreenUnsubmitted Done Reply Inline Actions Can you add a file comment here like in other files, explaining what the pass does, like you have in the commit message. dmgreen: Can you add a file comment here like in other files, explaining what the pass does, like you…
				/// \file This pass does a few optimisations related to MVE VPT blocks before
				/// register allocation is performed. The goal is to maximize the sizes of the
				/// blocks that will be created by the MVE VPT Block Insertion pass (which runs
				/// after register allocation). Currently, this pass replaces VCMPs with VPNOTs
				/// when possible, so the Block Insertion pass can delete them later to create
				/// larger VPT blocks.
				//===----------------------------------------------------------------------===//

				#include "ARM.h"
				#include "ARMSubtarget.h"
				#include "MCTargetDesc/ARMBaseInfo.h"
				#include "Thumb2InstrInfo.h"
				#include "llvm/ADT/SmallVector.h"
				#include "llvm/CodeGen/MachineBasicBlock.h"
				#include "llvm/CodeGen/MachineFunction.h"
				#include "llvm/CodeGen/MachineFunctionPass.h"
				#include "llvm/CodeGen/MachineInstr.h"
				#include "llvm/Support/Debug.h"
				#include <cassert>

				using namespace llvm;

				#define DEBUG_TYPE "arm-mve-vpt-opts"

				namespace {
				class MVEVPTOptimisations : public MachineFunctionPass {
				public:
				static char ID;
				const Thumb2InstrInfo *TII;
				MachineRegisterInfo *MRI;

				MVEVPTOptimisations() : MachineFunctionPass(ID) {}

				bool runOnMachineFunction(MachineFunction &Fn) override;

				StringRef getPassName() const override {
				return "ARM MVE VPT Optimisation Pass";
				}

				private:
				bool ReplaceVCMPsByVPNOTs(MachineBasicBlock &MBB);
				};

				char MVEVPTOptimisations::ID = 0;

				} // end anonymous namespace

				dmgreenUnsubmitted Done Reply Inline Actions Could you just use VCMPOpcodeToVPT, and check the return value isn't 0? To save this extra list being needed here. dmgreen: Could you just use VCMPOpcodeToVPT, and check the return value isn't 0? To save this extra list…
				INITIALIZE_PASS(MVEVPTOptimisations, DEBUG_TYPE,
				"ARM MVE VPT Optimisations pass", false, false)

				// Returns true if Opcode is any VCMP Opcode.
				static bool IsVCMP(unsigned Opcode) { return VCMPOpcodeToVPT(Opcode) != 0; }

				// Returns true if a VCMP with this Opcode can have its operands swapped.
				// There is 2 kind of VCMP that can't have their operands swapped: Float VCMPs,
				// and VCMPr instructions (since the r is always on the right).
				static bool CanHaveSwappedOperands(unsigned Opcode) {
				switch (Opcode) {
				default:
				return true;
				case ARM::MVE_VCMPf32:
				case ARM::MVE_VCMPf16:
				case ARM::MVE_VCMPf32r:
				case ARM::MVE_VCMPf16r:
				case ARM::MVE_VCMPi8r:
				case ARM::MVE_VCMPi16r:
				case ARM::MVE_VCMPi32r:
				case ARM::MVE_VCMPu8r:
				case ARM::MVE_VCMPu16r:
				case ARM::MVE_VCMPu32r:
				case ARM::MVE_VCMPs8r:
				case ARM::MVE_VCMPs16r:
				case ARM::MVE_VCMPs32r:
				return false;
				}
				}

				// Returns the CondCode of a VCMP Instruction.
				static ARMCC::CondCodes GetCondCode(MachineInstr &Instr) {
				assert(IsVCMP(Instr.getOpcode()) && "Inst must be a VCMP");
				return ARMCC::CondCodes(Instr.getOperand(3).getImm());
				}

				// Returns true if Cond is equivalent to a VPNOT instruction on the result of
				// Prev. Cond and Prev must be VCMPs.
				static bool IsVPNOTEquivalent(MachineInstr &Cond, MachineInstr &Prev) {
				assert(IsVCMP(Cond.getOpcode()) && IsVCMP(Prev.getOpcode()));

				// Opcodes must match.
				if (Cond.getOpcode() != Prev.getOpcode())
				return false;

				MachineOperand &CondOP1 = Cond.getOperand(1), &CondOP2 = Cond.getOperand(2);
				MachineOperand &PrevOP1 = Prev.getOperand(1), &PrevOP2 = Prev.getOperand(2);

				// If the VCMP has the opposite condition with the same operands, we can
				// replace it with a VPNOT
				ARMCC::CondCodes ExpectedCode = GetCondCode(Cond);
				ExpectedCode = ARMCC::getOppositeCondition(ExpectedCode);
				if (ExpectedCode == GetCondCode(Prev))
				if (CondOP1.isIdenticalTo(PrevOP1) && CondOP2.isIdenticalTo(PrevOP2))
				return true;
				// Check again with operands swapped if possible
				if (!CanHaveSwappedOperands(Cond.getOpcode()))
				return false;
				ExpectedCode = ARMCC::getSwappedCondition(ExpectedCode);
				return ExpectedCode == GetCondCode(Prev) && CondOP1.isIdenticalTo(PrevOP2) &&
				CondOP2.isIdenticalTo(PrevOP1);
				}

				// Returns true if Instr writes to VCCR.
				static bool IsWritingToVCCR(MachineInstr &Instr) {
				if (Instr.getNumOperands() == 0)
				return false;
				MachineOperand &Dst = Instr.getOperand(0);
				if (!Dst.isReg())
				return false;
				Register DstReg = Dst.getReg();
				if (!DstReg.isVirtual())
				return false;
				samparkerUnsubmitted Done Reply Inline Actions Maybe you could the MachineInstr method 'definesRegister' instead? samparker: Maybe you could the MachineInstr method 'definesRegister' instead?
				MachineRegisterInfo &RegInfo = Instr.getMF()->getRegInfo();
				const TargetRegisterClass *RegClass = RegInfo.getRegClassOrNull(DstReg);
				return RegClass && (RegClass->getID() == ARM::VCCRRegClassID);
				}

				// This optimisation replaces VCMPs with VPNOTs when they are equivalent.
				bool MVEVPTOptimisations::ReplaceVCMPsByVPNOTs(MachineBasicBlock &MBB) {
				SmallVector<MachineInstr *, 4> DeadInstructions;

				// The last VCMP that we have seen and that couldn't be replaced.
				// This is reset when an instruction that writes to VCCR/VPR is found, or when
				// a VCMP is replaced with a VPNOT.
				// We'll only replace VCMPs with VPNOTs when this is not null, and when the
				// current VCMP is the opposite of PrevVCMP.
				dmgreenUnsubmitted Done Reply Inline Actions How much does this function add? It doesn't seem to do a huge amount. dmgreen: How much does this function add? It doesn't seem to do a huge amount.
				Pierre-vhAuthorUnsubmitted Done Reply Inline Actions Sure, it doesn't add much, I'll remove it. Pierre-vh: Sure, it doesn't add much, I'll remove it.
				MachineInstr *PrevVCMP = nullptr;
				// If we find an instruction that kills the result of PrevVCMP, we save the
				// operand here to remove the kill flag in case we need to use PrevVCMP's
				// result.
				MachineOperand *PrevVCMPResultKiller = nullptr;

				for (MachineInstr &Instr : MBB.instrs()) {
				if (PrevVCMP) {
				if (MachineOperand *MO = Instr.findRegisterUseOperand(
				PrevVCMP->getOperand(0).getReg(), /isKill/ true)) {
				// If we come accross the instr that kills PrevVCMP's result, record it
				// so we can remove the kill flag later if we need to.
				PrevVCMPResultKiller = MO;
				}
				}

				// Ignore predicated instructions.
				if (getVPTInstrPredicate(Instr) != ARMVCC::None)
				dmgreenUnsubmitted Done Reply Inline Actions Do you have any tests for this bit? dmgreen: Do you have any tests for this bit?
				Pierre-vhAuthorUnsubmitted Done Reply Inline Actions Not yet, I'll add some. Pierre-vh: Not yet, I'll add some.
				continue;

				// Only look at VCMPs
				if (!IsVCMP(Instr.getOpcode())) {
				// If the instruction writes to VCCR, forget the previous VCMP.
				if (IsWritingToVCCR(Instr))
				PrevVCMP = nullptr;
				continue;
				}

				if (!PrevVCMP \|\| !IsVPNOTEquivalent(Instr, *PrevVCMP)) {
				PrevVCMP = &Instr;
				continue;
				}

				// The register containing the result of the VCMP that we're going to
				// replace.
				Register PrevVCMPResultReg = PrevVCMP->getOperand(0).getReg();

				// Build a VPNOT to replace the VCMP, reusing its operands.
				MachineInstrBuilder MIBuilder =
				dmgreenUnsubmitted Done Reply Inline Actions You can do BuildMI(...).add(Instr.getOperand(0)).addReg(PrevVCMPResultReg)... dmgreen: You can do BuildMI(...).add(Instr.getOperand(0)).addReg(PrevVCMPResultReg)...
				BuildMI(MBB, &Instr, Instr.getDebugLoc(), TII->get(ARM::MVE_VPNOT))
				.add(Instr.getOperand(0))
				.addReg(PrevVCMPResultReg);
				dmgreenUnsubmitted Done Reply Inline Actions Operand 4 and 5 are always None and noreg? If so you can use addUnpredicatedMveVpredNOp, which makes it more obvious what the operands are expected to be. dmgreen: Operand 4 and 5 are always None and noreg? If so you can use addUnpredicatedMveVpredNOp, which…
				addUnpredicatedMveVpredNOp(MIBuilder);
				LLVM_DEBUG(dbgs() << "Inserting VPNOT (to replace VCMP): ";
				MIBuilder.getInstr()->dump(); dbgs() << " Removed VCMP: ";
				Instr.dump());

				// If we found an instruction that uses, and kills PrevVCMP's result,
				// remove the kill flag.
				if (PrevVCMPResultKiller)
				PrevVCMPResultKiller->setIsKill(false);
				dmgreenUnsubmitted Done Reply Inline Actions Should we be clearing this value at some points? Setting it back to nullptr? dmgreen: Should we be clearing this value at some points? Setting it back to nullptr?
				Pierre-vhAuthorUnsubmitted Done Reply Inline Actions Sure, it should be cleared after. I'll see if I can add a test for that as well. Pierre-vh: Sure, it should be cleared after. I'll see if I can add a test for that as well.

				// Finally, mark the old VCMP for removal and reset
				// PrevVCMP/PrevVCMPResultKiller.
				DeadInstructions.push_back(&Instr);
				PrevVCMP = nullptr;
				PrevVCMPResultKiller = nullptr;
				}

				for (MachineInstr *DeadInstruction : DeadInstructions)
				DeadInstruction->removeFromParent();

				return !DeadInstructions.empty();
				}

				bool MVEVPTOptimisations::runOnMachineFunction(MachineFunction &Fn) {
				const ARMSubtarget &STI =
				static_cast<const ARMSubtarget &>(Fn.getSubtarget());

				if (!STI.isThumb2() \|\| !STI.hasMVEIntegerOps())
				return false;

				TII = static_cast<const Thumb2InstrInfo *>(STI.getInstrInfo());
				MRI = &Fn.getRegInfo();

				LLVM_DEBUG(dbgs() << "******** ARM MVE VPT Optimisations ********\n"
				<< "********** Function: " << Fn.getName() << '\n');

				bool Modified = false;
				for (MachineBasicBlock &MBB : Fn)
				Modified \|= ReplaceVCMPsByVPNOTs(MBB);

				LLVM_DEBUG(dbgs() << "**************************************\n");
				return Modified;
				}

				/// createMVEVPTOptimisationsPass
				FunctionPass *llvm::createMVEVPTOptimisationsPass() {
				return new MVEVPTOptimisations();
				}

llvm/test/CodeGen/ARM/O3-pipeline.ll

	Show First 20 Lines • Show All 86 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Early Machine Loop Invariant Code Motion			; CHECK-NEXT: Early Machine Loop Invariant Code Motion
	; CHECK-NEXT: MachineDominator Tree Construction			; CHECK-NEXT: MachineDominator Tree Construction
	; CHECK-NEXT: Machine Block Frequency Analysis			; CHECK-NEXT: Machine Block Frequency Analysis
	; CHECK-NEXT: Machine Common Subexpression Elimination			; CHECK-NEXT: Machine Common Subexpression Elimination
	; CHECK-NEXT: MachinePostDominator Tree Construction			; CHECK-NEXT: MachinePostDominator Tree Construction
	; CHECK-NEXT: Machine code sinking			; CHECK-NEXT: Machine code sinking
	; CHECK-NEXT: Peephole Optimizations			; CHECK-NEXT: Peephole Optimizations
	; CHECK-NEXT: Remove dead machine instructions			; CHECK-NEXT: Remove dead machine instructions
				; CHECK-NEXT: MVE VPT Optimisation Pass
	; CHECK-NEXT: ARM MLA / MLS expansion pass			; CHECK-NEXT: ARM MLA / MLS expansion pass
	; CHECK-NEXT: ARM pre- register allocation load / store optimization pass			; CHECK-NEXT: ARM pre- register allocation load / store optimization pass
	; CHECK-NEXT: ARM A15 S->D optimizer			; CHECK-NEXT: ARM A15 S->D optimizer
	; CHECK-NEXT: Detect Dead Lanes			; CHECK-NEXT: Detect Dead Lanes
	; CHECK-NEXT: Process Implicit Definitions			; CHECK-NEXT: Process Implicit Definitions
	; CHECK-NEXT: Remove unreachable machine basic blocks			; CHECK-NEXT: Remove unreachable machine basic blocks
	; CHECK-NEXT: Live Variable Analysis			; CHECK-NEXT: Live Variable Analysis
	; CHECK-NEXT: MachineDominator Tree Construction			; CHECK-NEXT: MachineDominator Tree Construction
	▲ Show 20 Lines • Show All 76 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/mve-vpt-blocks.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -O3 -mtriple=thumbv8.1m.main-arm-none-eabi --verify-machineinstrs -mattr=+mve.fp %s -o - \| FileCheck %s

				declare <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32>, <4 x i32>, <4 x i1>, <4 x i32>)

				define arm_aapcs_vfpcc <4 x i32> @vpt_block(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c) {
				; CHECK-LABEL: vpt_block:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vpt.s32 ge, q0, q2
				; CHECK-NEXT: vorrt q0, q1, q2
				; CHECK-NEXT: bx lr
				entry:
				%0 = icmp sge <4 x i32> %a, %c
				%1 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %b, <4 x i32> %c, <4 x i1> %0, <4 x i32> %a)
				ret <4 x i32> %1
				}

				define arm_aapcs_vfpcc <4 x i32> @vptt_block(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c) {
				; CHECK-LABEL: vptt_block:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vmov q3, q0
				; CHECK-NEXT: vptt.s32 ge, q0, q2
				; CHECK-NEXT: vorrt q3, q1, q2
				; CHECK-NEXT: vorrt q0, q3, q2
				; CHECK-NEXT: bx lr
				entry:
				%0 = icmp sge <4 x i32> %a, %c
				%1 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %b, <4 x i32> %c, <4 x i1> %0, <4 x i32> %a)
				%2 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %1, <4 x i32> %c, <4 x i1> %0, <4 x i32> %a)
				ret <4 x i32> %2
				}

				define arm_aapcs_vfpcc <4 x i32> @vpttt_block(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c) {
				; CHECK-LABEL: vpttt_block:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vpttt.s32 ge, q0, q2
				; CHECK-NEXT: vorrt q0, q1, q2
				; CHECK-NEXT: vorrt q0, q1, q2
				; CHECK-NEXT: vorrt q0, q1, q2
				; CHECK-NEXT: bx lr
				entry:
				%0 = icmp sge <4 x i32> %a, %c
				%1 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %b, <4 x i32> %c, <4 x i1> %0, <4 x i32> %a)
				%2 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %b, <4 x i32> %c, <4 x i1> %0, <4 x i32> %1)
				%3 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %b, <4 x i32> %c, <4 x i1> %0, <4 x i32> %2)
				ret <4 x i32> %3
				}

				define arm_aapcs_vfpcc <4 x i32> @vptttt_block(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c) {
				; CHECK-LABEL: vptttt_block:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vptttt.s32 ge, q0, q2
				; CHECK-NEXT: vorrt q0, q1, q2
				; CHECK-NEXT: vorrt q0, q1, q2
				; CHECK-NEXT: vorrt q0, q1, q2
				; CHECK-NEXT: vorrt q0, q1, q2
				; CHECK-NEXT: bx lr
				entry:
				%0 = icmp sge <4 x i32> %a, %c
				%1 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %b, <4 x i32> %c, <4 x i1> %0, <4 x i32> %a)
				%2 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %b, <4 x i32> %c, <4 x i1> %0, <4 x i32> %1)
				%3 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %b, <4 x i32> %c, <4 x i1> %0, <4 x i32> %2)
				%4 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %b, <4 x i32> %c, <4 x i1> %0, <4 x i32> %3)
				ret <4 x i32> %4
				}


				define arm_aapcs_vfpcc <4 x i32> @vpte_block(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c) {
				; CHECK-LABEL: vpte_block:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vpte.s32 ge, q0, q2
				; CHECK-NEXT: vorrt q0, q1, q2
				; CHECK-NEXT: vmove q0, q2
				; CHECK-NEXT: bx lr
				entry:
				%0 = icmp sge <4 x i32> %a, %c
				%1 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %b, <4 x i32> %c, <4 x i1> %0, <4 x i32> %a)
				%2 = xor <4 x i1> %0, <i1 true, i1 true, i1 true, i1 true>
				%3 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %2, <4 x i32> %1)
				ret <4 x i32> %3
				}

				define arm_aapcs_vfpcc <4 x i32> @vptte_block(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c) {
				; CHECK-LABEL: vptte_block:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vptte.s32 ge, q0, q2
				; CHECK-NEXT: vorrt q0, q1, q2
				; CHECK-NEXT: vorrt q0, q1, q2
				; CHECK-NEXT: vorre q0, q1, q2
				; CHECK-NEXT: bx lr
				entry:
				%0 = icmp sge <4 x i32> %a, %c
				%1 = xor <4 x i1> %0, <i1 true, i1 true, i1 true, i1 true>
				%2 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %b, <4 x i32> %c, <4 x i1> %0, <4 x i32> %a)
				%3 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %b, <4 x i32> %c, <4 x i1> %0, <4 x i32> %2)
				%4 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %b, <4 x i32> %c, <4 x i1> %1, <4 x i32> %3)
				ret <4 x i32> %4
				}

				define arm_aapcs_vfpcc <4 x i32> @vptee_block(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c) {
				; CHECK-LABEL: vptee_block:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vptee.s32 ge, q0, q2
				; CHECK-NEXT: vorrt q0, q1, q2
				; CHECK-NEXT: vorre q0, q1, q2
				; CHECK-NEXT: vorre q0, q1, q2
				; CHECK-NEXT: bx lr
				entry:
				%0 = icmp sge <4 x i32> %a, %c
				%1 = xor <4 x i1> %0, <i1 true, i1 true, i1 true, i1 true>
				%2 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %b, <4 x i32> %c, <4 x i1> %0, <4 x i32> %a)
				%3 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %b, <4 x i32> %c, <4 x i1> %1, <4 x i32> %2)
				%4 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %b, <4 x i32> %c, <4 x i1> %1, <4 x i32> %3)
				ret <4 x i32> %4
				}

				define arm_aapcs_vfpcc <4 x i32> @vptet_block(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c) {
				; CHECK-LABEL: vptet_block:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: .pad #4
				; CHECK-NEXT: sub sp, #4
				; CHECK-NEXT: vcmp.s32 ge, q0, q2
				; CHECK-NEXT: vstr p0, [sp] @ 4-byte Spill
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vorrt q0, q1, q2
				; CHECK-NEXT: vldr p0, [sp] @ 4-byte Reload
				; CHECK-NEXT: vpnot
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vmovt q0, q2
				; CHECK-NEXT: vldr p0, [sp] @ 4-byte Reload
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vmovt q0, q2
				; CHECK-NEXT: add sp, #4
				; CHECK-NEXT: bx lr
				entry:
				%0 = icmp sge <4 x i32> %a, %c
				%1 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %b, <4 x i32> %c, <4 x i1> %0, <4 x i32> %a)
				%2 = xor <4 x i1> %0, <i1 true, i1 true, i1 true, i1 true>
				%3 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %2, <4 x i32> %1)
				%4 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %0, <4 x i32> %3)
				ret <4 x i32> %4
				}

				define arm_aapcs_vfpcc <4 x i32> @vpttet_block(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c) {
				; CHECK-LABEL: vpttet_block:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: .pad #4
				; CHECK-NEXT: sub sp, #4
				; CHECK-NEXT: vcmp.s32 ge, q0, q2
				; CHECK-NEXT: vstr p0, [sp] @ 4-byte Spill
				; CHECK-NEXT: vpstt
				; CHECK-NEXT: vorrt q0, q1, q2
				; CHECK-NEXT: vmovt q0, q2
				; CHECK-NEXT: vldr p0, [sp] @ 4-byte Reload
				; CHECK-NEXT: vpnot
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vmovt q0, q2
				; CHECK-NEXT: vldr p0, [sp] @ 4-byte Reload
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vmovt q0, q2
				; CHECK-NEXT: add sp, #4
				; CHECK-NEXT: bx lr
				entry:
				%0 = icmp sge <4 x i32> %a, %c
				%1 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %b, <4 x i32> %c, <4 x i1> %0, <4 x i32> %a)
				%2 = xor <4 x i1> %0, <i1 true, i1 true, i1 true, i1 true>
				%3 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %0, <4 x i32> %1)
				%4 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %2, <4 x i32> %3)
				%5 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %0, <4 x i32> %4)
				ret <4 x i32> %5
				}

				define arm_aapcs_vfpcc <4 x i32> @vptett_block(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c) {
				; CHECK-LABEL: vptett_block:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: .pad #4
				; CHECK-NEXT: sub sp, #4
				; CHECK-NEXT: vcmp.s32 ge, q0, q2
				; CHECK-NEXT: vstr p0, [sp] @ 4-byte Spill
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vorrt q0, q1, q2
				; CHECK-NEXT: vldr p0, [sp] @ 4-byte Reload
				; CHECK-NEXT: vpnot
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vmovt q0, q2
				; CHECK-NEXT: vldr p0, [sp] @ 4-byte Reload
				; CHECK-NEXT: vpstt
				; CHECK-NEXT: vmovt q0, q2
				; CHECK-NEXT: vmovt q0, q2
				; CHECK-NEXT: add sp, #4
				; CHECK-NEXT: bx lr
				entry:
				%0 = icmp sge <4 x i32> %a, %c
				%1 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %b, <4 x i32> %c, <4 x i1> %0, <4 x i32> %a)
				%2 = xor <4 x i1> %0, <i1 true, i1 true, i1 true, i1 true>
				%3 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %2, <4 x i32> %1)
				%4 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %0, <4 x i32> %3)
				%5 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %0, <4 x i32> %4)
				ret <4 x i32> %5
				}

				define arm_aapcs_vfpcc <4 x i32> @vpteet_block(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c) {
				; CHECK-LABEL: vpteet_block:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: .pad #8
				; CHECK-NEXT: sub sp, #8
				; CHECK-NEXT: vcmp.s32 ge, q0, q2
				; CHECK-NEXT: vstr p0, [sp] @ 4-byte Spill
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vorrt q0, q1, q2
				; CHECK-NEXT: vldr p0, [sp] @ 4-byte Reload
				; CHECK-NEXT: vpnot
				; CHECK-NEXT: vstr p0, [sp, #4] @ 4-byte Spill
				; CHECK-NEXT: vldr p0, [sp, #4] @ 4-byte Reload
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vmovt q0, q2
				; CHECK-NEXT: vldr p0, [sp, #4] @ 4-byte Reload
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vmovt q0, q2
				; CHECK-NEXT: vldr p0, [sp] @ 4-byte Reload
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vmovt q0, q2
				; CHECK-NEXT: add sp, #8
				; CHECK-NEXT: bx lr
				entry:
				%0 = icmp sge <4 x i32> %a, %c
				%1 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %b, <4 x i32> %c, <4 x i1> %0, <4 x i32> %a)
				%2 = xor <4 x i1> %0, <i1 true, i1 true, i1 true, i1 true>
				%3 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %2, <4 x i32> %1)
				%4 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %2, <4 x i32> %3)
				%5 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %0, <4 x i32> %4)
				ret <4 x i32> %5
				}

				define arm_aapcs_vfpcc <4 x i32> @vpteee_block(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c) {
				; CHECK-LABEL: vpteee_block:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vpteee.s32 ge, q0, q2
				; CHECK-NEXT: vorrt q0, q1, q2
				; CHECK-NEXT: vmove q0, q2
				; CHECK-NEXT: vmove q0, q2
				; CHECK-NEXT: vmove q0, q2
				; CHECK-NEXT: bx lr
				entry:
				%0 = icmp sge <4 x i32> %a, %c
				%1 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %b, <4 x i32> %c, <4 x i1> %0, <4 x i32> %a)
				%2 = xor <4 x i1> %0, <i1 true, i1 true, i1 true, i1 true>
				%3 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %2, <4 x i32> %1)
				%4 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %2, <4 x i32> %3)
				%5 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %2, <4 x i32> %4)
				ret <4 x i32> %5
				}

				define arm_aapcs_vfpcc <4 x i32> @vptete_block(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c) {
				; CHECK-LABEL: vptete_block:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: .pad #8
				; CHECK-NEXT: sub sp, #8
				; CHECK-NEXT: vcmp.s32 ge, q0, q2
				; CHECK-NEXT: vstr p0, [sp] @ 4-byte Spill
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vorrt q0, q1, q2
				; CHECK-NEXT: vldr p0, [sp] @ 4-byte Reload
				; CHECK-NEXT: vpnot
				; CHECK-NEXT: vstr p0, [sp, #4] @ 4-byte Spill
				; CHECK-NEXT: vldr p0, [sp, #4] @ 4-byte Reload
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vmovt q0, q2
				; CHECK-NEXT: vldr p0, [sp] @ 4-byte Reload
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vmovt q0, q2
				; CHECK-NEXT: vldr p0, [sp, #4] @ 4-byte Reload
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vmovt q0, q2
				; CHECK-NEXT: add sp, #8
				; CHECK-NEXT: bx lr
				entry:
				%0 = icmp sge <4 x i32> %a, %c
				%1 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %b, <4 x i32> %c, <4 x i1> %0, <4 x i32> %a)
				%2 = xor <4 x i1> %0, <i1 true, i1 true, i1 true, i1 true>
				%3 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %2, <4 x i32> %1)
				%4 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %0, <4 x i32> %3)
				%5 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %2, <4 x i32> %4)
				ret <4 x i32> %5
				}

				define arm_aapcs_vfpcc <4 x i32> @vpttte_block(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c) {
				; CHECK-LABEL: vpttte_block:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vpttte.s32 ge, q0, q2
				; CHECK-NEXT: vorrt q0, q1, q2
				; CHECK-NEXT: vmovt q0, q2
				; CHECK-NEXT: vmovt q0, q2
				; CHECK-NEXT: vmove q0, q2
				; CHECK-NEXT: bx lr
				entry:
				%0 = icmp sge <4 x i32> %a, %c
				%1 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %b, <4 x i32> %c, <4 x i1> %0, <4 x i32> %a)
				%2 = xor <4 x i1> %0, <i1 true, i1 true, i1 true, i1 true>
				%3 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %0, <4 x i32> %1)
				%4 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %0, <4 x i32> %3)
				%5 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %2, <4 x i32> %4)
				ret <4 x i32> %5
				}

				define arm_aapcs_vfpcc <4 x i32> @vpttee_block(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c) {
				; CHECK-LABEL: vpttee_block:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vpttee.s32 ge, q0, q2
				; CHECK-NEXT: vorrt q0, q1, q2
				; CHECK-NEXT: vmovt q0, q2
				; CHECK-NEXT: vmove q0, q2
				; CHECK-NEXT: vmove q0, q2
				; CHECK-NEXT: bx lr
				entry:
				%0 = icmp sge <4 x i32> %a, %c
				%1 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %b, <4 x i32> %c, <4 x i1> %0, <4 x i32> %a)
				%2 = xor <4 x i1> %0, <i1 true, i1 true, i1 true, i1 true>
				%3 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %0, <4 x i32> %1)
				%4 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %2, <4 x i32> %3)
				%5 = tail call <4 x i32> @llvm.arm.mve.orr.predicated.v4i32.v4i1(<4 x i32> %c, <4 x i32> %c, <4 x i1> %2, <4 x i32> %4)
				ret <4 x i32> %5
				}

llvm/test/CodeGen/Thumb2/mve-vpt-optimisations.mir

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				# RUN: llc -run-pass arm-mve-vpt-opts %s -o - \| FileCheck %s
				dmgreenUnsubmitted Not Done Reply Inline Actions I think this test would be simpler if there was a number of test functions inside it, each testing one thing (or maybe a small collection of things). Otherwise they run into each other a bit, and it will be hard to tell what's wrong if one of them starts to fail. dmgreen: I think this test would be simpler if there was a number of test functions inside it, each…
				Pierre-vhAuthorUnsubmitted Done Reply Inline Actions Would putting them all in different basic blocks be enough, or do you prefer functions? The optimisation is done per basic-block, so it'd behave the same as function (but the test would be shorter). Pierre-vh: Would putting them all in different basic blocks be enough, or do you prefer functions? The…
				dmgreenUnsubmitted Done Reply Inline Actions Separate functions would be more canonical. You can hopefully simplify the test quite a bit to make it so that's not too verbose. dmgreen: Separate functions would be more canonical. You can hopefully simplify the test quite a bit to…

				--- \|
				target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
				target triple = "thumbv8.1m.main-arm-none-eabi"

				; Functions are intentionally left blank - see the MIR sequences below.

				define arm_aapcs_vfpcc <4 x float> @vcmp_with_opposite_cond(<4 x float> %inactive1) #0 {
				entry:
				ret <4 x float> %inactive1
				}

				define arm_aapcs_vfpcc <4 x float> @vcmp_with_opposite_cond_and_swapped_operands(<4 x float> %inactive1) #0 {
				entry:
				dmgreenUnsubmitted Done Reply Inline Actions Functions don't actually need bodies. They can just contain unreachable. And if the test below doesn't bare any resemblance to the code here, that is probably for the better. dmgreen: Functions don't actually need bodies. They can just contain unreachable. And if the test below…
				ret <4 x float> %inactive1
				}

				define arm_aapcs_vfpcc <4 x float> @triple_vcmp(<4 x float> %inactive1) #0 {
				entry:
				ret <4 x float> %inactive1
				}
				dmgreenUnsubmitted Done Reply Inline Actions I think a lot of this can probably be removed, if you replace +mve.fp it with command line options. dmgreen: I think a lot of this can probably be removed, if you replace +mve.fp it with command line…

				define arm_aapcs_vfpcc <4 x float> @killed_vccr_values(<4 x float> %inactive1) #0 {
				entry:
				ret <4 x float> %inactive1
				}

				define arm_aapcs_vfpcc <4 x float> @predicated_vcmps(<4 x float> %inactive1) #0 {
				entry:
				ret <4 x float> %inactive1
				}

				define arm_aapcs_vfpcc <4 x float> @flt_with_swapped_operands(<4 x float> %inactive1) #0 {
				entry:
				ret <4 x float> %inactive1
				}

				define arm_aapcs_vfpcc <4 x float> @different_opcodes(<4 x float> %inactive1) #0 {
				entry:
				ret <4 x float> %inactive1
				}

				define arm_aapcs_vfpcc <4 x float> @incorrect_condcode(<4 x float> %inactive1) #0 {
				entry:
				ret <4 x float> %inactive1
				}

				define arm_aapcs_vfpcc <4 x float> @vpr_or_vccr_write_between_vcmps(<4 x float> %inactive1) #0 {
				entry:
				ret <4 x float> %inactive1
				}

				attributes #0 = { "target-features"="+armv8.1-m.main,+hwdiv,+mve.fp,+ras,+thumb-mode" }
				...
				---
				name: vcmp_with_opposite_cond
				alignment: 4
				body: \|
				; CHECK-LABEL: name: vcmp_with_opposite_cond
				; CHECK: bb.0:
				; CHECK: successors: %bb.1(0x80000000)
				; CHECK: [[MVE_VCMPf16_:%[0-9]+]]:vccr = MVE_VCMPf16 %1:mqpr, %2:mqpr, 10, 0, $noreg
				; CHECK: [[MVE_VPNOT:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPf16_]], 0, $noreg
				dmgreenUnsubmitted Done Reply Inline Actions I think you can remove a lot of this. dmgreen: I think you can remove a lot of this.
				; CHECK: bb.1:
				; CHECK: successors: %bb.2(0x80000000)
				; CHECK: [[MVE_VCMPf32_:%[0-9]+]]:vccr = MVE_VCMPf32 %1:mqpr, %2:mqpr, 10, 0, $noreg
				; CHECK: [[MVE_VPNOT1:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPf32_]], 0, $noreg
				; CHECK: bb.2:
				; CHECK: successors: %bb.3(0x80000000)
				; CHECK: [[MVE_VCMPi16_:%[0-9]+]]:vccr = MVE_VCMPi16 %1:mqpr, %2:mqpr, 10, 0, $noreg
				; CHECK: [[MVE_VPNOT2:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPi16_]], 0, $noreg
				; CHECK: bb.3:
				; CHECK: successors: %bb.4(0x80000000)
				; CHECK: [[MVE_VCMPi32_:%[0-9]+]]:vccr = MVE_VCMPi32 %1:mqpr, %2:mqpr, 10, 0, $noreg
				; CHECK: [[MVE_VPNOT3:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPi32_]], 0, $noreg
				; CHECK: bb.4:
				; CHECK: successors: %bb.5(0x80000000)
				; CHECK: [[MVE_VCMPi8_:%[0-9]+]]:vccr = MVE_VCMPi8 %1:mqpr, %2:mqpr, 10, 0, $noreg
				; CHECK: [[MVE_VPNOT4:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPi8_]], 0, $noreg
				; CHECK: bb.5:
				; CHECK: successors: %bb.6(0x80000000)
				; CHECK: [[MVE_VCMPs16_:%[0-9]+]]:vccr = MVE_VCMPs16 %1:mqpr, %2:mqpr, 10, 0, $noreg
				; CHECK: [[MVE_VPNOT5:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPs16_]], 0, $noreg
				; CHECK: bb.6:
				; CHECK: successors: %bb.7(0x80000000)
				; CHECK: [[MVE_VCMPs32_:%[0-9]+]]:vccr = MVE_VCMPs32 %1:mqpr, %2:mqpr, 10, 0, $noreg
				; CHECK: [[MVE_VPNOT6:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPs32_]], 0, $noreg
				; CHECK: bb.7:
				; CHECK: successors: %bb.8(0x80000000)
				; CHECK: [[MVE_VCMPs8_:%[0-9]+]]:vccr = MVE_VCMPs8 %1:mqpr, %2:mqpr, 10, 0, $noreg
				; CHECK: [[MVE_VPNOT7:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPs8_]], 0, $noreg
				; CHECK: bb.8:
				; CHECK: successors: %bb.9(0x80000000)
				; CHECK: [[MVE_VCMPu16_:%[0-9]+]]:vccr = MVE_VCMPu16 %1:mqpr, %2:mqpr, 10, 0, $noreg
				; CHECK: [[MVE_VPNOT8:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPu16_]], 0, $noreg
				; CHECK: bb.9:
				; CHECK: successors: %bb.10(0x80000000)
				; CHECK: [[MVE_VCMPu32_:%[0-9]+]]:vccr = MVE_VCMPu32 %1:mqpr, %2:mqpr, 10, 0, $noreg
				; CHECK: [[MVE_VPNOT9:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPu32_]], 0, $noreg
				; CHECK: bb.10:
				; CHECK: successors: %bb.11(0x80000000)
				; CHECK: [[MVE_VCMPu8_:%[0-9]+]]:vccr = MVE_VCMPu8 %1:mqpr, %2:mqpr, 10, 0, $noreg
				; CHECK: [[MVE_VPNOT10:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPu8_]], 0, $noreg
				; CHECK: bb.11:
				; CHECK: successors: %bb.12(0x80000000)
				; CHECK: [[MVE_VCMPf16r:%[0-9]+]]:vccr = MVE_VCMPf16r %1:mqpr, %25:gprwithzr, 10, 0, $noreg
				; CHECK: [[MVE_VPNOT11:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPf16r]], 0, $noreg
				; CHECK: bb.12:
				; CHECK: successors: %bb.13(0x80000000)
				; CHECK: [[MVE_VCMPf32r:%[0-9]+]]:vccr = MVE_VCMPf32r %1:mqpr, %25:gprwithzr, 10, 0, $noreg
				; CHECK: [[MVE_VPNOT12:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPf32r]], 0, $noreg
				; CHECK: bb.13:
				; CHECK: successors: %bb.14(0x80000000)
				; CHECK: [[MVE_VCMPi16r:%[0-9]+]]:vccr = MVE_VCMPi16r %1:mqpr, %25:gprwithzr, 10, 0, $noreg
				; CHECK: [[MVE_VPNOT13:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPi16r]], 0, $noreg
				; CHECK: bb.14:
				; CHECK: successors: %bb.15(0x80000000)
				; CHECK: [[MVE_VCMPi32r:%[0-9]+]]:vccr = MVE_VCMPi32r %1:mqpr, %25:gprwithzr, 10, 0, $noreg
				; CHECK: [[MVE_VPNOT14:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPi32r]], 0, $noreg
				; CHECK: bb.15:
				; CHECK: successors: %bb.16(0x80000000)
				; CHECK: [[MVE_VCMPi8r:%[0-9]+]]:vccr = MVE_VCMPi8r %1:mqpr, %25:gprwithzr, 10, 0, $noreg
				; CHECK: [[MVE_VPNOT15:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPi8r]], 0, $noreg
				; CHECK: bb.16:
				; CHECK: successors: %bb.17(0x80000000)
				; CHECK: [[MVE_VCMPs16r:%[0-9]+]]:vccr = MVE_VCMPs16r %1:mqpr, %25:gprwithzr, 10, 0, $noreg
				; CHECK: [[MVE_VPNOT16:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPs16r]], 0, $noreg
				; CHECK: bb.17:
				; CHECK: successors: %bb.18(0x80000000)
				; CHECK: [[MVE_VCMPs32r:%[0-9]+]]:vccr = MVE_VCMPs32r %1:mqpr, %25:gprwithzr, 10, 0, $noreg
				; CHECK: [[MVE_VPNOT17:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPs32r]], 0, $noreg
				; CHECK: bb.18:
				; CHECK: successors: %bb.19(0x80000000)
				; CHECK: [[MVE_VCMPs8r:%[0-9]+]]:vccr = MVE_VCMPs8r %1:mqpr, %25:gprwithzr, 10, 0, $noreg
				; CHECK: [[MVE_VPNOT18:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPs8r]], 0, $noreg
				; CHECK: bb.19:
				; CHECK: successors: %bb.20(0x80000000)
				; CHECK: [[MVE_VCMPu16r:%[0-9]+]]:vccr = MVE_VCMPu16r %1:mqpr, %25:gprwithzr, 10, 0, $noreg
				; CHECK: [[MVE_VPNOT19:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPu16r]], 0, $noreg
				; CHECK: bb.20:
				; CHECK: successors: %bb.21(0x80000000)
				; CHECK: [[MVE_VCMPu32r:%[0-9]+]]:vccr = MVE_VCMPu32r %1:mqpr, %25:gprwithzr, 10, 0, $noreg
				; CHECK: [[MVE_VPNOT20:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPu32r]], 0, $noreg
				; CHECK: bb.21:
				; CHECK: successors: %bb.22(0x80000000)
				; CHECK: [[MVE_VCMPu8r:%[0-9]+]]:vccr = MVE_VCMPu8r %1:mqpr, %25:gprwithzr, 10, 0, $noreg
				; CHECK: [[MVE_VPNOT21:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPu8r]], 0, $noreg
				; CHECK: bb.22:
				; CHECK: [[MVE_VCMPu8r1:%[0-9]+]]:vccr = MVE_VCMPu8r %1:mqpr, $zr, 10, 0, $noreg
				; CHECK: [[MVE_VPNOT22:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPu8r1]], 0, $noreg
				; CHECK: tBX_RET 14 /* CC::al */, $noreg, implicit %1:mqpr
				;
				; Tests that VCMPs with an opposite condition are correctly converted into VPNOTs.
				;
				dmgreenUnsubmitted Not Done Reply Inline Actions I was expecting these registers to be virtual, given where this is in the pipeline. Will they be physical instead? dmgreen: I was expecting these registers to be virtual, given where this is in the pipeline. Will they…
				Pierre-vhAuthorUnsubmitted Done Reply Inline Actions In the pipeline, they'd be virtual. Should I replace all $vpr/$q0/$q1 here with virtual registers ? It won't make much of a difference testing wise but I can understand that virtual registers would be preferred. Pierre-vh: In the pipeline, they'd be virtual. Should I replace all $vpr/$q0/$q1 here with virtual…
				bb.0:
				%3:vccr = MVE_VCMPf16 %0:mqpr, %1:mqpr, 10, 0, $noreg
				%4:vccr = MVE_VCMPf16 %0:mqpr, %1:mqpr, 11, 0, $noreg

				bb.1:
				%5:vccr = MVE_VCMPf32 %0:mqpr, %1:mqpr, 10, 0, $noreg
				%6:vccr = MVE_VCMPf32 %0:mqpr, %1:mqpr, 11, 0, $noreg

				bb.2:
				%7:vccr = MVE_VCMPi16 %0:mqpr, %1:mqpr, 10, 0, $noreg
				%8:vccr = MVE_VCMPi16 %0:mqpr, %1:mqpr, 11, 0, $noreg

				bb.3:
				%9:vccr = MVE_VCMPi32 %0:mqpr, %1:mqpr, 10, 0, $noreg
				%10:vccr = MVE_VCMPi32 %0:mqpr, %1:mqpr, 11, 0, $noreg

				bb.4:
				%11:vccr = MVE_VCMPi8 %0:mqpr, %1:mqpr, 10, 0, $noreg
				%12:vccr = MVE_VCMPi8 %0:mqpr, %1:mqpr, 11, 0, $noreg

				bb.5:
				%13:vccr = MVE_VCMPs16 %0:mqpr, %1:mqpr, 10, 0, $noreg
				%14:vccr = MVE_VCMPs16 %0:mqpr, %1:mqpr, 11, 0, $noreg

				bb.6:
				%15:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 10, 0, $noreg
				%16:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 11, 0, $noreg

				bb.7:
				%17:vccr = MVE_VCMPs8 %0:mqpr, %1:mqpr, 10, 0, $noreg
				%18:vccr = MVE_VCMPs8 %0:mqpr, %1:mqpr, 11, 0, $noreg

				bb.8:
				%19:vccr = MVE_VCMPu16 %0:mqpr, %1:mqpr, 10, 0, $noreg
				%20:vccr = MVE_VCMPu16 %0:mqpr, %1:mqpr, 11, 0, $noreg

				bb.9:
				%21:vccr = MVE_VCMPu32 %0:mqpr, %1:mqpr, 10, 0, $noreg
				%22:vccr = MVE_VCMPu32 %0:mqpr, %1:mqpr, 11, 0, $noreg

				bb.10:
				%23:vccr = MVE_VCMPu8 %0:mqpr, %1:mqpr, 10, 0, $noreg
				%24:vccr = MVE_VCMPu8 %0:mqpr, %1:mqpr, 11, 0, $noreg

				bb.11:
				%25:vccr = MVE_VCMPf16r %0:mqpr, %2:gprwithzr, 10, 0, $noreg
				%26:vccr = MVE_VCMPf16r %0:mqpr, %2:gprwithzr, 11, 0, $noreg

				bb.12:
				%27:vccr = MVE_VCMPf32r %0:mqpr, %2:gprwithzr, 10, 0, $noreg
				%28:vccr = MVE_VCMPf32r %0:mqpr, %2:gprwithzr, 11, 0, $noreg

				bb.13:
				%29:vccr = MVE_VCMPi16r %0:mqpr, %2:gprwithzr, 10, 0, $noreg
				%30:vccr = MVE_VCMPi16r %0:mqpr, %2:gprwithzr, 11, 0, $noreg

				bb.14:
				%31:vccr = MVE_VCMPi32r %0:mqpr, %2:gprwithzr, 10, 0, $noreg
				%32:vccr = MVE_VCMPi32r %0:mqpr, %2:gprwithzr, 11, 0, $noreg

				bb.15:
				%33:vccr = MVE_VCMPi8r %0:mqpr, %2:gprwithzr, 10, 0, $noreg
				%34:vccr = MVE_VCMPi8r %0:mqpr, %2:gprwithzr, 11, 0, $noreg

				bb.16:
				%35:vccr = MVE_VCMPs16r %0:mqpr, %2:gprwithzr, 10, 0, $noreg
				%36:vccr = MVE_VCMPs16r %0:mqpr, %2:gprwithzr, 11, 0, $noreg

				bb.17:
				%37:vccr = MVE_VCMPs32r %0:mqpr, %2:gprwithzr, 10, 0, $noreg
				%38:vccr = MVE_VCMPs32r %0:mqpr, %2:gprwithzr, 11, 0, $noreg

				bb.18:
				%39:vccr = MVE_VCMPs8r %0:mqpr, %2:gprwithzr, 10, 0, $noreg
				%40:vccr = MVE_VCMPs8r %0:mqpr, %2:gprwithzr, 11, 0, $noreg

				bb.19:
				%41:vccr = MVE_VCMPu16r %0:mqpr, %2:gprwithzr, 10, 0, $noreg
				%42:vccr = MVE_VCMPu16r %0:mqpr, %2:gprwithzr, 11, 0, $noreg

				bb.20:
				%43:vccr = MVE_VCMPu32r %0:mqpr, %2:gprwithzr, 10, 0, $noreg
				%44:vccr = MVE_VCMPu32r %0:mqpr, %2:gprwithzr, 11, 0, $noreg

				bb.21:
				%45:vccr = MVE_VCMPu8r %0:mqpr, %2:gprwithzr, 10, 0, $noreg
				%46:vccr = MVE_VCMPu8r %0:mqpr, %2:gprwithzr, 11, 0, $noreg

				bb.22:
				; There shouldn't be any exception for $zr, so the second VCMP should
				; be transformed into a VPNOT.
				%47:vccr = MVE_VCMPu8r %0:mqpr, $zr, 10, 0, $noreg
				%48:vccr = MVE_VCMPu8r %0:mqpr, $zr, 11, 0, $noreg

				tBX_RET 14, $noreg, implicit %0:mqpr
				...
				---
				name: vcmp_with_opposite_cond_and_swapped_operands
				alignment: 4
				body: \|
				; CHECK-LABEL: name: vcmp_with_opposite_cond_and_swapped_operands
				; CHECK: bb.0:
				; CHECK: successors: %bb.1(0x80000000)
				; CHECK: [[MVE_VCMPi16_:%[0-9]+]]:vccr = MVE_VCMPi16 %1:mqpr, %2:mqpr, 10, 0, $noreg
				; CHECK: [[MVE_VPNOT:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPi16_]], 0, $noreg
				; CHECK: bb.1:
				; CHECK: successors: %bb.2(0x80000000)
				; CHECK: [[MVE_VCMPi32_:%[0-9]+]]:vccr = MVE_VCMPi32 %1:mqpr, %2:mqpr, 10, 0, $noreg
				; CHECK: [[MVE_VPNOT1:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPi32_]], 0, $noreg
				; CHECK: bb.2:
				; CHECK: successors: %bb.3(0x80000000)
				; CHECK: [[MVE_VCMPi8_:%[0-9]+]]:vccr = MVE_VCMPi8 %1:mqpr, %2:mqpr, 10, 0, $noreg
				; CHECK: [[MVE_VPNOT2:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPi8_]], 0, $noreg
				; CHECK: bb.3:
				; CHECK: successors: %bb.4(0x80000000)
				; CHECK: [[MVE_VCMPs16_:%[0-9]+]]:vccr = MVE_VCMPs16 %1:mqpr, %2:mqpr, 10, 0, $noreg
				; CHECK: [[MVE_VPNOT3:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPs16_]], 0, $noreg
				; CHECK: bb.4:
				; CHECK: successors: %bb.5(0x80000000)
				; CHECK: [[MVE_VCMPs32_:%[0-9]+]]:vccr = MVE_VCMPs32 %1:mqpr, %2:mqpr, 10, 0, $noreg
				; CHECK: [[MVE_VPNOT4:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPs32_]], 0, $noreg
				; CHECK: bb.5:
				; CHECK: successors: %bb.6(0x80000000)
				; CHECK: [[MVE_VCMPs8_:%[0-9]+]]:vccr = MVE_VCMPs8 %1:mqpr, %2:mqpr, 10, 0, $noreg
				; CHECK: [[MVE_VPNOT5:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPs8_]], 0, $noreg
				; CHECK: bb.6:
				; CHECK: successors: %bb.7(0x80000000)
				; CHECK: [[MVE_VCMPu16_:%[0-9]+]]:vccr = MVE_VCMPu16 %1:mqpr, %2:mqpr, 10, 0, $noreg
				; CHECK: [[MVE_VPNOT6:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPu16_]], 0, $noreg
				; CHECK: bb.7:
				; CHECK: successors: %bb.8(0x80000000)
				; CHECK: [[MVE_VCMPu32_:%[0-9]+]]:vccr = MVE_VCMPu32 %1:mqpr, %2:mqpr, 10, 0, $noreg
				; CHECK: [[MVE_VPNOT7:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPu32_]], 0, $noreg
				; CHECK: bb.8:
				; CHECK: [[MVE_VCMPu8_:%[0-9]+]]:vccr = MVE_VCMPu8 %1:mqpr, %2:mqpr, 10, 0, $noreg
				; CHECK: [[MVE_VPNOT8:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPu8_]], 0, $noreg
				; CHECK: tBX_RET 14 /* CC::al */, $noreg, implicit %1:mqpr
				;
				; Tests that VCMPs with an opposite condition and swapped operands are
				; correctly converted into VPNOTs.
				;
				bb.0:
				%2:vccr = MVE_VCMPi16 %0:mqpr, %1:mqpr, 10, 0, $noreg
				%3:vccr = MVE_VCMPi16 %1:mqpr, %0:mqpr, 12, 0, $noreg

				bb.1:
				%4:vccr = MVE_VCMPi32 %0:mqpr, %1:mqpr, 10, 0, $noreg
				%5:vccr = MVE_VCMPi32 %1:mqpr, %0:mqpr, 12, 0, $noreg

				bb.2:
				%6:vccr = MVE_VCMPi8 %0:mqpr, %1:mqpr, 10, 0, $noreg
				%7:vccr = MVE_VCMPi8 %1:mqpr, %0:mqpr, 12, 0, $noreg

				bb.3:
				%8:vccr = MVE_VCMPs16 %0:mqpr, %1:mqpr, 10, 0, $noreg
				%9:vccr = MVE_VCMPs16 %1:mqpr, %0:mqpr, 12, 0, $noreg

				bb.4:
				%10:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 10, 0, $noreg
				%11:vccr = MVE_VCMPs32 %1:mqpr, %0:mqpr, 12, 0, $noreg

				bb.5:
				%12:vccr = MVE_VCMPs8 %0:mqpr, %1:mqpr, 10, 0, $noreg
				%13:vccr = MVE_VCMPs8 %1:mqpr, %0:mqpr, 12, 0, $noreg

				bb.6:
				%14:vccr = MVE_VCMPu16 %0:mqpr, %1:mqpr, 10, 0, $noreg
				%15:vccr = MVE_VCMPu16 %1:mqpr, %0:mqpr, 12, 0, $noreg

				bb.7:
				%16:vccr = MVE_VCMPu32 %0:mqpr, %1:mqpr, 10, 0, $noreg
				%17:vccr = MVE_VCMPu32 %1:mqpr, %0:mqpr, 12, 0, $noreg

				bb.8:
				%18:vccr = MVE_VCMPu8 %0:mqpr, %1:mqpr, 10, 0, $noreg
				%19:vccr = MVE_VCMPu8 %1:mqpr, %0:mqpr, 12, 0, $noreg

				tBX_RET 14, $noreg, implicit %0:mqpr
				...
				---
				name: triple_vcmp
				alignment: 4
				body: \|
				;
				; Tests that, when there are 2 "VPNOT-like VCMPs" in a row, only the first
				; becomes a VPNOT.
				;
				bb.0:
				; CHECK-LABEL: name: triple_vcmp
				; CHECK: [[MVE_VCMPs32_:%[0-9]+]]:vccr = MVE_VCMPs32 %1:mqpr, %2:mqpr, 10, 0, $noreg
				; CHECK: [[MVE_VPNOT:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPs32_]], 0, $noreg
				; CHECK: [[MVE_VCMPs32_1:%[0-9]+]]:vccr = MVE_VCMPs32 %2:mqpr, %1:mqpr, 12, 0, $noreg
				; CHECK: tBX_RET 14 /* CC::al */, $noreg, implicit %1:mqpr
				%2:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 10, 0, $noreg
				%3:vccr = MVE_VCMPs32 %1:mqpr, %0:mqpr, 12, 0, $noreg
				%4:vccr = MVE_VCMPs32 %1:mqpr, %0:mqpr, 12, 0, $noreg
				tBX_RET 14, $noreg, implicit %0:mqpr
				...
				---
				name: killed_vccr_values
				alignment: 4
				body: \|
				bb.0:
				;
				; Tests that, if the result of the VCMP is killed before the
				; second VCMP (that will be converted into a VPNOT) is found,
				; the kill flag is removed.
				;
				; CHECK-LABEL: name: killed_vccr_values
				; CHECK: [[MVE_VCMPf16_:%[0-9]+]]:vccr = MVE_VCMPf16 %1:mqpr, %2:mqpr, 10, 0, $noreg
				; CHECK: [[MVE_VORR:%[0-9]+]]:mqpr = MVE_VORR %1:mqpr, %2:mqpr, 1, [[MVE_VCMPf16_]], undef [[MVE_VORR]]
				; CHECK: [[MVE_VPNOT:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPf16_]], 0, $noreg
				; CHECK: tBX_RET 14 /* CC::al */, $noreg, implicit %1:mqpr
				%2:vccr = MVE_VCMPf16 %0:mqpr, %1:mqpr, 10, 0, $noreg
				%3:mqpr = MVE_VORR %0:mqpr, %1:mqpr, 1, killed %2:vccr, undef %3:mqpr
				%4:vccr = MVE_VCMPf16 %0:mqpr, %1:mqpr, 11, 0, $noreg
				tBX_RET 14, $noreg, implicit %0:mqpr
				...
				---
				name: predicated_vcmps
				alignment: 4
				body: \|
				; CHECK-LABEL: name: predicated_vcmps
				; CHECK: bb.0:
				; CHECK: successors: %bb.1(0x80000000)
				; CHECK: [[MVE_VCMPi16_:%[0-9]+]]:vccr = MVE_VCMPi16 %1:mqpr, %2:mqpr, 10, 0, $noreg
				; CHECK: [[MVE_VCMPi16_1:%[0-9]+]]:vccr = MVE_VCMPi16 %2:mqpr, %1:mqpr, 12, 1, [[MVE_VCMPi16_]]
				; CHECK: [[MVE_VCMPi16_2:%[0-9]+]]:vccr = MVE_VCMPi16 %1:mqpr, %2:mqpr, 10, 1, [[MVE_VCMPi16_]]
				; CHECK: bb.1:
				; CHECK: successors: %bb.2(0x80000000)
				; CHECK: [[MVE_VCMPi32_:%[0-9]+]]:vccr = MVE_VCMPi32 %1:mqpr, %2:mqpr, 10, 0, $noreg
				; CHECK: [[MVE_VCMPi32_1:%[0-9]+]]:vccr = MVE_VCMPi32 %2:mqpr, %1:mqpr, 12, 1, [[MVE_VCMPi32_]]
				; CHECK: [[MVE_VCMPi32_2:%[0-9]+]]:vccr = MVE_VCMPi32 %1:mqpr, %2:mqpr, 10, 1, [[MVE_VCMPi32_]]
				; CHECK: bb.2:
				; CHECK: successors: %bb.3(0x80000000)
				; CHECK: [[MVE_VCMPf16_:%[0-9]+]]:vccr = MVE_VCMPf16 %1:mqpr, %2:mqpr, 10, 0, $noreg
				; CHECK: [[MVE_VCMPf16_1:%[0-9]+]]:vccr = MVE_VCMPf16 %1:mqpr, %2:mqpr, 11, 1, [[MVE_VCMPf16_]]
				; CHECK: [[MVE_VCMPf16_2:%[0-9]+]]:vccr = MVE_VCMPf16 %1:mqpr, %2:mqpr, 10, 1, [[MVE_VCMPf16_]]
				; CHECK: bb.3:
				; CHECK: successors: %bb.4(0x80000000)
				; CHECK: [[MVE_VCMPf32_:%[0-9]+]]:vccr = MVE_VCMPf32 %1:mqpr, %2:mqpr, 10, 0, $noreg
				; CHECK: [[MVE_VCMPf32_1:%[0-9]+]]:vccr = MVE_VCMPf32 %1:mqpr, %2:mqpr, 11, 1, [[MVE_VCMPf32_]]
				; CHECK: [[MVE_VCMPf32_2:%[0-9]+]]:vccr = MVE_VCMPf32 %1:mqpr, %2:mqpr, 10, 1, [[MVE_VCMPf32_]]
				; CHECK: bb.4:
				; CHECK: successors: %bb.5(0x80000000)
				; CHECK: [[MVE_VCMPi16_3:%[0-9]+]]:vccr = MVE_VCMPi16 %1:mqpr, %2:mqpr, 10, 0, $noreg
				; CHECK: [[MVE_VCMPi16_4:%[0-9]+]]:vccr = MVE_VCMPi16 %1:mqpr, %2:mqpr, 11, 1, [[MVE_VCMPi16_3]]
				; CHECK: [[MVE_VCMPi16_5:%[0-9]+]]:vccr = MVE_VCMPi16 %1:mqpr, %2:mqpr, 10, 1, [[MVE_VCMPi16_3]]
				; CHECK: bb.5:
				; CHECK: [[MVE_VCMPi32_3:%[0-9]+]]:vccr = MVE_VCMPi32 %1:mqpr, %2:mqpr, 10, 0, $noreg
				; CHECK: [[MVE_VCMPi32_4:%[0-9]+]]:vccr = MVE_VCMPi32 %1:mqpr, %2:mqpr, 11, 1, [[MVE_VCMPi32_3]]
				; CHECK: [[MVE_VCMPi32_5:%[0-9]+]]:vccr = MVE_VCMPi32 %1:mqpr, %2:mqpr, 10, 1, [[MVE_VCMPi32_3]]
				; CHECK: tBX_RET 14 /* CC::al */, $noreg, implicit %1:mqpr
				;
				; Tests that predicated VCMPs are not replaced.
				;
				bb.0:
				%2:vccr = MVE_VCMPi16 %0:mqpr, %1:mqpr, 10, 0, $noreg
				%3:vccr = MVE_VCMPi16 %1:mqpr, %0:mqpr, 12, 1, %2:vccr
				%4:vccr = MVE_VCMPi16 %0:mqpr, %1:mqpr, 10, 1, %2:vccr

				bb.1:
				%5:vccr = MVE_VCMPi32 %0:mqpr, %1:mqpr, 10, 0, $noreg
				%6:vccr = MVE_VCMPi32 %1:mqpr, %0:mqpr, 12, 1, %5:vccr
				%7:vccr = MVE_VCMPi32 %0:mqpr, %1:mqpr, 10, 1, %5:vccr

				bb.2:
				%8:vccr = MVE_VCMPf16 %0:mqpr, %1:mqpr, 10, 0, $noreg
				%9:vccr = MVE_VCMPf16 %0:mqpr, %1:mqpr, 11, 1, %8:vccr
				%10:vccr = MVE_VCMPf16 %0:mqpr, %1:mqpr, 10, 1, %8:vccr

				bb.3:
				%11:vccr = MVE_VCMPf32 %0:mqpr, %1:mqpr, 10, 0, $noreg
				%12:vccr = MVE_VCMPf32 %0:mqpr, %1:mqpr, 11, 1, %11:vccr
				%13:vccr = MVE_VCMPf32 %0:mqpr, %1:mqpr, 10, 1, %11:vccr

				bb.4:
				%14:vccr = MVE_VCMPi16 %0:mqpr, %1:mqpr, 10, 0, $noreg
				%15:vccr = MVE_VCMPi16 %0:mqpr, %1:mqpr, 11, 1, %14:vccr
				%16:vccr = MVE_VCMPi16 %0:mqpr, %1:mqpr, 10, 1, %14:vccr

				bb.5:
				%17:vccr = MVE_VCMPi32 %0:mqpr, %1:mqpr, 10, 0, $noreg
				%18:vccr = MVE_VCMPi32 %0:mqpr, %1:mqpr, 11, 1, %17:vccr
				%19:vccr = MVE_VCMPi32 %0:mqpr, %1:mqpr, 10, 1, %17:vccr

				tBX_RET 14, $noreg, implicit %0:mqpr
				...
				---
				name: flt_with_swapped_operands
				alignment: 4
				body: \|
				; CHECK-LABEL: name: flt_with_swapped_operands
				; CHECK: bb.0:
				; CHECK: successors: %bb.1(0x80000000)
				; CHECK: [[MVE_VCMPf16_:%[0-9]+]]:vccr = MVE_VCMPf16 %1:mqpr, %2:mqpr, 10, 0, $noreg
				; CHECK: [[MVE_VCMPf16_1:%[0-9]+]]:vccr = MVE_VCMPf16 %2:mqpr, %1:mqpr, 12, 0, $noreg
				; CHECK: bb.1:
				; CHECK: successors: %bb.2(0x80000000)
				; CHECK: [[MVE_VCMPf32_:%[0-9]+]]:vccr = MVE_VCMPf32 %1:mqpr, %2:mqpr, 10, 0, $noreg
				; CHECK: [[MVE_VCMPf32_1:%[0-9]+]]:vccr = MVE_VCMPf32 %2:mqpr, %1:mqpr, 12, 0, $noreg
				; CHECK: bb.2:
				; CHECK: successors: %bb.3(0x80000000)
				; CHECK: [[MVE_VCMPf16_2:%[0-9]+]]:vccr = MVE_VCMPf16 %1:mqpr, %2:mqpr, 10, 0, $noreg
				; CHECK: [[MVE_VCMPf16_3:%[0-9]+]]:vccr = MVE_VCMPf16 %2:mqpr, %1:mqpr, 11, 0, $noreg
				; CHECK: bb.3:
				; CHECK: [[MVE_VCMPf32_2:%[0-9]+]]:vccr = MVE_VCMPf32 %1:mqpr, %2:mqpr, 10, 0, $noreg
				; CHECK: [[MVE_VCMPf32_3:%[0-9]+]]:vccr = MVE_VCMPf32 %2:mqpr, %1:mqpr, 11, 0, $noreg
				; CHECK: tBX_RET 14 /* CC::al */, $noreg, implicit %1:mqpr
				;
				; Tests that float VCMPs with an opposite condition and swapped operands
				; are not transformed into VPNOTs.
				;
				bb.0:
				%2:vccr = MVE_VCMPf16 %0:mqpr, %1:mqpr, 10, 0, $noreg
				%3:vccr = MVE_VCMPf16 %1:mqpr, %0:mqpr, 12, 0, $noreg

				bb.1:
				%4:vccr = MVE_VCMPf32 %0:mqpr, %1:mqpr, 10, 0, $noreg
				%5:vccr = MVE_VCMPf32 %1:mqpr, %0:mqpr, 12, 0, $noreg

				bb.2:
				%6:vccr = MVE_VCMPf16 %0:mqpr, %1:mqpr, 10, 0, $noreg
				%7:vccr = MVE_VCMPf16 %1:mqpr, %0:mqpr, 11, 0, $noreg

				bb.3:
				%8:vccr = MVE_VCMPf32 %0:mqpr, %1:mqpr, 10, 0, $noreg
				%9:vccr = MVE_VCMPf32 %1:mqpr, %0:mqpr, 11, 0, $noreg
				tBX_RET 14, $noreg, implicit %0:mqpr
				...
				---
				name: different_opcodes
				alignment: 4
				body: \|
				;
				; Tests that a "VPNOT-like VCMP" with an opcode different from the previous VCMP
				; is not transformed into a VPNOT.
				;
				bb.0:
				; CHECK-LABEL: name: different_opcodes
				; CHECK: [[MVE_VCMPf16_:%[0-9]+]]:vccr = MVE_VCMPf16 %1:mqpr, %2:mqpr, 0, 0, $noreg
				; CHECK: [[MVE_VCMPs32_:%[0-9]+]]:vccr = MVE_VCMPs32 %1:mqpr, %2:mqpr, 1, 1, $noreg
				; CHECK: tBX_RET 14 /* CC::al */, $noreg, implicit %1:mqpr
				%2:vccr = MVE_VCMPf16 %0:mqpr, %1:mqpr, 0, 0, $noreg
				%3:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 1, 1, $noreg
				tBX_RET 14, $noreg, implicit %0:mqpr
				...
				---
				name: incorrect_condcode
				alignment: 4
				body: \|
				; CHECK-LABEL: name: incorrect_condcode
				; CHECK: bb.0:
				; CHECK: successors: %bb.1(0x80000000)
				; CHECK: [[MVE_VCMPs32_:%[0-9]+]]:vccr = MVE_VCMPs32 %1:mqpr, %2:mqpr, 10, 0, $noreg
				; CHECK: [[MVE_VCMPs32_1:%[0-9]+]]:vccr = MVE_VCMPs32 %2:mqpr, %1:mqpr, 11, 0, $noreg
				; CHECK: bb.1:
				; CHECK: [[MVE_VCMPs32_2:%[0-9]+]]:vccr = MVE_VCMPs32 %1:mqpr, %2:mqpr, 10, 0, $noreg
				; CHECK: [[MVE_VCMPs32_3:%[0-9]+]]:vccr = MVE_VCMPs32 %1:mqpr, %2:mqpr, 12, 0, $noreg
				; CHECK: tBX_RET 14 /* CC::al */, $noreg, implicit %1:mqpr
				;
				; Tests that a VCMP is not transformed into a VPNOT if its CondCode is not
				; the opposite CondCode.
				;
				bb.0:
				%2:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 10, 0, $noreg
				%3:vccr = MVE_VCMPs32 %1:mqpr, %0:mqpr, 11, 0, $noreg
				bb.1:
				%4:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 10, 0, $noreg
				%5:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 12, 0, $noreg
				tBX_RET 14, $noreg, implicit %0:mqpr
				...
				---
				name: vpr_or_vccr_write_between_vcmps
				alignment: 4
				body: \|
				;
				; Tests that a "VPNOT-like VCMP" will not be transformed into a VPNOT if
				; VCCR/VPR is written to in-between.
				;
				bb.0:
				; CHECK-LABEL: name: vpr_or_vccr_write_between_vcmps
				; CHECK: [[MVE_VCMPs32_:%[0-9]+]]:vccr = MVE_VCMPs32 %1:mqpr, %2:mqpr, 12, 0, $noreg
				; CHECK: [[MVE_VPNOT:%[0-9]+]]:vccr = MVE_VPNOT killed [[MVE_VCMPs32_]], 0, $noreg
				; CHECK: [[MVE_VCMPs32_1:%[0-9]+]]:vccr = MVE_VCMPs32 %2:mqpr, %1:mqpr, 10, 0, $noreg
				; CHECK: tBX_RET 14 /* CC::al */, $noreg, implicit %1:mqpr
				%2:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 12, 0, $noreg
				%3:vccr = MVE_VPNOT killed %2:vccr, 0, $noreg
				%4:vccr = MVE_VCMPs32 %1:mqpr, %0:mqpr, 10, 0, $noreg
				tBX_RET 14, $noreg, implicit %0:mqpr
				...