This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/ARM/
-
Target/
-
ARM/
-
ARM.h
1/1
ARMTargetMachine.cpp
-
CMakeLists.txt
11/11
MVEVPTOptimisationsPass.cpp
-
test/CodeGen/
-
CodeGen/
-
ARM/
-
O3-pipeline.ll
-
Thumb2/
1/3
mve-vcmpf.ll
6/8
mve-vpt-optimisations.mir

Differential D76709

[Target][ARM] Adding MVE VPT Optimisation Pass
ClosedPublic

Authored by Pierre-vh on Mar 24 2020, 8:26 AM.

Download Raw Diff

Details

Reviewers

dmgreen
SjoerdMeijer
samparker
simon_tatham
olista01

Commits

rG456302435625: [Target][ARM] Adding MVE VPT Optimisation Pass

Summary

This patch adds a pass called "MVE VPT Optimisations", which does a few optimisations before register allocation.
The goal of this pass is to maximize the size of the VPT blocks created by the MVE VPT Block Insertion pass.

Currently, this pass:

Replaces VPCMPs with VPNOTs when possible.
- The instruction selector in its current state doesn't generate VPNOTs very often. Instead, it generates a VCMP with the operands swapped and the condition reversed. This pass spots those VCMPs and transforms them into VPNOTs.
- Why generate more VPNOTs? So the MVE VPT Block Insertion pass can use them (& remove them) to create larger/more complex VPT blocks (e.g. TEET, TETE, etc.)
Replaces usages of old VPR values with VPNOTs when inside a block of predicated instructions.
- This is done to avoid overlapping lifetimes of different VPR values, reducing the chance that a spill/reload occurs.
- Why ? Spill/reloads of VPR are particularly harmful to the MVE VPT Block Insertion Pass: it prevents it from creating large VPT blocks.

Diff Detail

Event Timeline

Pierre-vh created this revision.Mar 24 2020, 8:26 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 24 2020, 8:26 AM

Herald added subscribers: llvm-commits, danielkiss, hiraditya and 2 others. · View Herald Transcript

Pierre-vh added a parent revision: D75993: [Target][ARM] Improvements to the VPT Block Insertion Pass.Mar 24 2020, 8:31 AM

Harbormaster completed remote builds in B50260: Diff 252329.Mar 24 2020, 9:07 AM

The goal of this pass is to maximize the size of the VPT blocks created by the MVE VPT Block Insertion pass.

Just a general question first, why a separate pass? Why e.g. not just doing this in the MVE VPT Block Insertion pass?

In D76709#1939901, @SjoerdMeijer wrote:

The goal of this pass is to maximize the size of the VPT blocks created by the MVE VPT Block Insertion pass.

Just a general question first, why a separate pass? Why e.g. not just doing this in the MVE VPT Block Insertion pass?

The first optimisation (VCMPs into VPNOTs) could technically be done in the MVE VPT Block insertion pass, but I think it's better to do it in this new pass instead of overloading the block Insertion pass too much.
The second optimisation (VPNOT Insertion for "spill prevention") can't be done in the Block Insertion pass as it needs to be done before register allocation (= before spill/reload instructions are emitted).
Overall, since both optimisations deal with VPNOTs and aren't directly related to VPT Block insertion/creation, I felt that it was a good idea to put them in a separate pass.

Is it possible to split this into two patches? The pass and "replaces VCMPs with VPNOTs when possible" part, then the second part to replace the re-use with the not. I think that would make each part easier to review, more manageable.

The first optimisation (VCMPs into VPNOTs) could technically be done in the MVE VPT Block insertion pass, but I think it's better to do it in this new pass instead of overloading the block Insertion pass too much.
The second optimisation (VPNOT Insertion for "spill prevention") can't be done in the Block Insertion pass as it needs to be done before register allocation (= before spill/reload instructions are emitted).
Overall, since both optimisations deal with VPNOTs and aren't directly related to VPT Block insertion/creation, I felt that it was a good idea to put them in a separate pass.

Plus (I think) in order to do #2, it's easier if you have already done #1.

llvm/lib/Target/ARM/ARMTargetMachine.cpp
489	Perhaps put this into the below getOptLevel() != CodeGenOpt::None block? As it is an optimisation
llvm/lib/Target/ARM/MVEVPTOptimisationsPass.cpp
7	Can you add a file comment here like in other files, explaining what the pass does, like you have in the commit message.
54	Could you just use VCMPOpcodeToVPT, and check the return value isn't 0? To save this extra list being needed here.
llvm/test/CodeGen/Thumb2/mve-vcmpf.ll
700	What's going on here? I think it needs to be checking q1 <= q0 && q0 < q1. ord means ordered, which means no NaN's. (Floating point compares can be tricky like that).
llvm/test/CodeGen/Thumb2/mve-vpt-optimisations.mir
1	I think this test would be simpler if there was a number of test functions inside it, each testing one thing (or maybe a small collection of things). Otherwise they run into each other a bit, and it will be hard to tell what's wrong if one of them starts to fail.

In D76709#1940297, @dmgreen wrote:

Is it possible to split this into two patches? The pass and "replaces VCMPs with VPNOTs when possible" part, then the second part to replace the re-use with the not. I think that would make each part easier to review, more manageable.

The first optimisation (VCMPs into VPNOTs) could technically be done in the MVE VPT Block insertion pass, but I think it's better to do it in this new pass instead of overloading the block Insertion pass too much.
The second optimisation (VPNOT Insertion for "spill prevention") can't be done in the Block Insertion pass as it needs to be done before register allocation (= before spill/reload instructions are emitted).
Overall, since both optimisations deal with VPNOTs and aren't directly related to VPT Block insertion/creation, I felt that it was a good idea to put them in a separate pass.

Plus (I think) in order to do #2, it's easier if you have already done #1.

Unfortunately, I've written this pass in a single commit, so there is no easy way for me to split this patch in 2.
I can do it if you want, it's not impossible, but it's going to take me a while to get right. Also, the second optimisation is a relatively small part of this patch, so patch #1 would still be a large patch.

llvm/test/CodeGen/Thumb2/mve-vcmpf.ll
700	I didn't know that. In which case is it not "safe" to replace a float VCMP even when the conditions are met? What would be the correct behaviour in this case? Should I disable this optimisation for all float VCMPs?
llvm/test/CodeGen/Thumb2/mve-vpt-optimisations.mir
1	Would putting them all in different basic blocks be enough, or do you prefer functions? The optimisation is done per basic-block, so it'd behave the same as function (but the test would be shorter).

Unfortunately, I've written this pass in a single commit, so there is no easy way for me to split this patch in 2.
I can do it if you want, it's not impossible, but it's going to take me a while to get right. Also, the second optimisation is a relatively small part of this patch, so patch #1 would still be a large patch.

Sure. If it's relatively small it should hopefully be simple enough to pull out into a separate commit.

Also can you make sure that "register" and "zero" variants of the VCMP's are tested. The inverting login around those might be different to a normal compare.

llvm/test/CodeGen/Thumb2/mve-vcmpf.ll
700	Umm. I think for floats you cannot swap the operands. That only works for integers. You can still use the opposite condition code (lt <> ge, for example). Have a look at getOppositeCondition and isValidMVECond in ISelLowering for how it's done there.
llvm/test/CodeGen/Thumb2/mve-vpt-optimisations.mir
1	Separate functions would be more canonical. You can hopefully simplify the test quite a bit to make it so that's not too verbose.
9–15	Functions don't actually need bodies. They can just contain unreachable. And if the test below doesn't bare any resemblance to the code here, that is probably for the better.
20–22	I think a lot of this can probably be removed, if you replace +mve.fp it with command line options.
27–64	I think you can remove a lot of this.

samparker added inline comments.Mar 25 2020, 7:36 AM

llvm/lib/Target/ARM/MVEVPTOptimisationsPass.cpp
127	Maybe you could the MachineInstr method 'definesRegister' instead?

Refactoring the patch as requested + fixing a few issues.

Pierre-vh added a child revision: D76847: [Target][ARM] Replace re-uses of old VPR values with VPNOTs.Mar 26 2020, 7:21 AM

Adding newline at the end of the file and refactoring canHaveOperandsSwapped: It has been renamed to CanHaveSwappedOperands and it now returns true for everything except for all VCMPr and VCMPf16/f32 instructions.

dmgreen added inline comments.Mar 29 2020, 3:08 PM

llvm/lib/Target/ARM/MVEVPTOptimisationsPass.cpp
142	How much does this function add? It doesn't seem to do a huge amount.
159–160	Do you have any tests for this bit?
180–181	You can do BuildMI(...).add(Instr.getOperand(0)).addReg(PrevVCMPResultReg)...
183–184	Operand 4 and 5 are always None and noreg? If so you can use addUnpredicatedMveVpredNOp, which makes it more obvious what the operands are expected to be.
llvm/test/CodeGen/Thumb2/mve-vpt-optimisations.mir
155–156	I was expecting these registers to be virtual, given where this is in the pipeline. Will they be physical instead?

Pierre-vh marked 7 inline comments as done.Mar 29 2020, 11:31 PM

Pierre-vh added inline comments.

llvm/lib/Target/ARM/MVEVPTOptimisationsPass.cpp
142	Sure, it doesn't add much, I'll remove it.
159–160	Not yet, I'll add some.
llvm/test/CodeGen/Thumb2/mve-vpt-optimisations.mir
155–156	In the pipeline, they'd be virtual. Should I replace all $vpr/$q0/$q1 here with virtual registers ? It won't make much of a difference testing wise but I can understand that virtual registers would be preferred.

Fixed issues found in review (comments marked as done)
Added "ARM" in front of the pass's name.
Changed the test so it uses virtual registers everywhere.

Looking good.

Can you add some extra ll tests, made from intrinsics that show different blocks being created, but testing the entire backend. All kinds of things from simple to complicated if you can. A good collection of VCMP, conditional VCMP, other conditional instructions and VPNOT's in different positions. Then add --verify-machine-instr to it. It doesn't matter if they are not all optimal yet, they will show what is working well and what isn't yet.

- "rebased" the patch - I renamed IsWritingToVCCRorVPR to IsWritingToVCCR in the child revision, so I renamed it here as well. I also removed the line that checked if ARM::VPR was used, as it was useless outside of tests.
- I added a new test, mve-vpt-blocks.ll that uses some intrinsics to attempt to generate all possible VPT blocks from LLVM IR. In this patch, it doesn't generate every block successfully due to spill/reloads, but that's fixed in the child revision.
Finally, I fixed a bug related to isKill flags on register and added a test for it.

dmgreen added inline comments.Apr 2 2020, 11:53 PM

llvm/lib/Target/ARM/MVEVPTOptimisationsPass.cpp
193	Should we be clearing this value at some points? Setting it back to nullptr?

Pierre-vh marked 2 inline comments as done.Apr 3 2020, 12:01 AM

Pierre-vh added inline comments.

llvm/lib/Target/ARM/MVEVPTOptimisationsPass.cpp
193	Sure, it should be cleared after. I'll see if I can add a test for that as well.

PrevVCMPResultKiller is now correctly reset back to nullptr, but I didn't add a test for it as it was not useful (there was nothing to test).
It's pretty much an NFC, the behaviour is the exact same as before, but it's indeed more correct to reset it once a VCMP has been replaced by a VPNOT.

The reason why this change doesn't really impact the behaviour of the pass is that PrevVCMPResultKiller is either:

- nullptr, so nothing happens.
Contains the "VCMP Result Killer" for the current PrevVCMP (Which is what we want)
Contains the "VCMP Result Killer" for *a previous* PrevVCMP, which is not correct but doesn't cause issues, it's just going to call setIsKill(false) twice on the same operand.
- This case would only happen if 2 VCMPs are replaced with VPNOTs in the same basic block, but the first one set PrevVCMPResultKiller while the second one didn't.

Correctly resetting it to nullptr after use just removes the third possibility.

Nice one. LGTM

This revision is now accepted and ready to land.Apr 3 2020, 2:10 AM

Pierre-vh added a child revision: D77798: [Target][ARM] Fix VPT Block Pass miscompilation.Apr 14 2020, 3:25 AM

Pierre-vh removed a child revision: D76847: [Target][ARM] Replace re-uses of old VPR values with VPNOTs.

Pierre-vh mentioned this in D76847: [Target][ARM] Replace re-uses of old VPR values with VPNOTs.Apr 14 2020, 3:43 AM

Closed by commit rG456302435625: [Target][ARM] Adding MVE VPT Optimisation Pass (authored by Pierre-vh). · Explain WhyApr 14 2020, 7:28 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

Target/

ARM/

ARM.h

2 lines

ARMTargetMachine.cpp

3 lines

CMakeLists.txt

1 line

MVEVPTOptimisationsPass.cpp

282 lines

test/

CodeGen/

ARM/

O3-pipeline.ll

1 line

Thumb2/

mve-vcmpf.ll

8 lines

mve-vpt-optimisations.mir

225 lines

Diff 252329

llvm/lib/Target/ARM/ARM.h

Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	FunctionPass *createARMISelDag(ARMBaseTargetMachine &TM,
CodeGenOpt::Level OptLevel);		CodeGenOpt::Level OptLevel);
FunctionPass *createA15SDOptimizerPass();		FunctionPass *createA15SDOptimizerPass();
FunctionPass *createARMLoadStoreOptimizationPass(bool PreAlloc = false);		FunctionPass *createARMLoadStoreOptimizationPass(bool PreAlloc = false);
FunctionPass *createARMExpandPseudoPass();		FunctionPass *createARMExpandPseudoPass();
FunctionPass *createARMConstantIslandPass();		FunctionPass *createARMConstantIslandPass();
FunctionPass *createMLxExpansionPass();		FunctionPass *createMLxExpansionPass();
FunctionPass *createThumb2ITBlockPass();		FunctionPass *createThumb2ITBlockPass();
FunctionPass *createMVEVPTBlockPass();		FunctionPass *createMVEVPTBlockPass();
		FunctionPass *createMVEVPTOptimisationsPass();
FunctionPass *createARMOptimizeBarriersPass();		FunctionPass *createARMOptimizeBarriersPass();
FunctionPass *createThumb2SizeReductionPass(		FunctionPass *createThumb2SizeReductionPass(
std::function<bool(const Function &)> Ftor = nullptr);		std::function<bool(const Function &)> Ftor = nullptr);
InstructionSelector *		InstructionSelector *
createARMInstructionSelector(const ARMBaseTargetMachine &TM, const ARMSubtarget &STI,		createARMInstructionSelector(const ARMBaseTargetMachine &TM, const ARMSubtarget &STI,
const ARMRegisterBankInfo &RBI);		const ARMRegisterBankInfo &RBI);
Pass *createMVEGatherScatterLoweringPass();		Pass *createMVEGatherScatterLoweringPass();

void LowerARMMachineInstrToMCInst(const MachineInstr *MI, MCInst &OutMI,		void LowerARMMachineInstrToMCInst(const MachineInstr *MI, MCInst &OutMI,
ARMAsmPrinter &AP);		ARMAsmPrinter &AP);

void initializeARMParallelDSPPass(PassRegistry &);		void initializeARMParallelDSPPass(PassRegistry &);
void initializeARMLoadStoreOptPass(PassRegistry &);		void initializeARMLoadStoreOptPass(PassRegistry &);
void initializeARMPreAllocLoadStoreOptPass(PassRegistry &);		void initializeARMPreAllocLoadStoreOptPass(PassRegistry &);
void initializeARMConstantIslandsPass(PassRegistry &);		void initializeARMConstantIslandsPass(PassRegistry &);
void initializeARMExpandPseudoPass(PassRegistry &);		void initializeARMExpandPseudoPass(PassRegistry &);
void initializeThumb2SizeReducePass(PassRegistry &);		void initializeThumb2SizeReducePass(PassRegistry &);
void initializeThumb2ITBlockPass(PassRegistry &);		void initializeThumb2ITBlockPass(PassRegistry &);
void initializeMVEVPTBlockPass(PassRegistry &);		void initializeMVEVPTBlockPass(PassRegistry &);
		void initializeMVEVPTOptimisationsPass(PassRegistry &);
void initializeARMLowOverheadLoopsPass(PassRegistry &);		void initializeARMLowOverheadLoopsPass(PassRegistry &);
void initializeMVETailPredicationPass(PassRegistry &);		void initializeMVETailPredicationPass(PassRegistry &);
void initializeMVEGatherScatterLoweringPass(PassRegistry &);		void initializeMVEGatherScatterLoweringPass(PassRegistry &);

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_LIB_TARGET_ARM_ARM_H		#endif // LLVM_LIB_TARGET_ARM_ARM_H

llvm/lib/Target/ARM/ARMTargetMachine.cpp

Show First 20 Lines • Show All 90 Lines • ▼ Show 20 Lines	extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeARMTarget() {
initializeARMLoadStoreOptPass(Registry);		initializeARMLoadStoreOptPass(Registry);
initializeARMPreAllocLoadStoreOptPass(Registry);		initializeARMPreAllocLoadStoreOptPass(Registry);
initializeARMParallelDSPPass(Registry);		initializeARMParallelDSPPass(Registry);
initializeARMConstantIslandsPass(Registry);		initializeARMConstantIslandsPass(Registry);
initializeARMExecutionDomainFixPass(Registry);		initializeARMExecutionDomainFixPass(Registry);
initializeARMExpandPseudoPass(Registry);		initializeARMExpandPseudoPass(Registry);
initializeThumb2SizeReducePass(Registry);		initializeThumb2SizeReducePass(Registry);
initializeMVEVPTBlockPass(Registry);		initializeMVEVPTBlockPass(Registry);
		initializeMVEVPTOptimisationsPass(Registry);
initializeMVETailPredicationPass(Registry);		initializeMVETailPredicationPass(Registry);
initializeARMLowOverheadLoopsPass(Registry);		initializeARMLowOverheadLoopsPass(Registry);
initializeMVEGatherScatterLoweringPass(Registry);		initializeMVEGatherScatterLoweringPass(Registry);
}		}

static std::unique_ptr<TargetLoweringObjectFile> createTLOF(const Triple &TT) {		static std::unique_ptr<TargetLoweringObjectFile> createTLOF(const Triple &TT) {
if (TT.isOSBinFormatMachO())		if (TT.isOSBinFormatMachO())
return std::make_unique<TargetLoweringObjectFileMachO>();		return std::make_unique<TargetLoweringObjectFileMachO>();
▲ Show 20 Lines • Show All 373 Lines • ▼ Show 20 Lines
}		}

bool ARMPassConfig::addGlobalInstructionSelect() {		bool ARMPassConfig::addGlobalInstructionSelect() {
addPass(new InstructionSelect());		addPass(new InstructionSelect());
return false;		return false;
}		}

void ARMPassConfig::addPreRegAlloc() {		void ARMPassConfig::addPreRegAlloc() {
		addPass(createMVEVPTOptimisationsPass());
		dmgreenUnsubmitted Done Reply Inline Actions Perhaps put this into the below getOptLevel() != CodeGenOpt::None block? As it is an optimisation dmgreen: Perhaps put this into the below getOptLevel() != CodeGenOpt::None block? As it is an…

if (getOptLevel() != CodeGenOpt::None) {		if (getOptLevel() != CodeGenOpt::None) {
addPass(createMLxExpansionPass());		addPass(createMLxExpansionPass());

if (EnableARMLoadStoreOpt)		if (EnableARMLoadStoreOpt)
addPass(createARMLoadStoreOptimizationPass(/* pre-register alloc */ true));		addPass(createARMLoadStoreOptimizationPass(/* pre-register alloc */ true));

if (!DisableA15SDOptimization)		if (!DisableA15SDOptimization)
addPass(createA15SDOptimizerPass());		addPass(createA15SDOptimizerPass());
▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/CMakeLists.txt

Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	add_llvm_target(ARMCodeGen
ARMSubtarget.cpp		ARMSubtarget.cpp
ARMTargetMachine.cpp		ARMTargetMachine.cpp
ARMTargetObjectFile.cpp		ARMTargetObjectFile.cpp
ARMTargetTransformInfo.cpp		ARMTargetTransformInfo.cpp
MLxExpansionPass.cpp		MLxExpansionPass.cpp
MVEGatherScatterLowering.cpp		MVEGatherScatterLowering.cpp
MVETailPredication.cpp		MVETailPredication.cpp
MVEVPTBlockPass.cpp		MVEVPTBlockPass.cpp
		MVEVPTOptimisationsPass.cpp
Thumb1FrameLowering.cpp		Thumb1FrameLowering.cpp
Thumb1InstrInfo.cpp		Thumb1InstrInfo.cpp
ThumbRegisterInfo.cpp		ThumbRegisterInfo.cpp
Thumb2ITBlockPass.cpp		Thumb2ITBlockPass.cpp
Thumb2InstrInfo.cpp		Thumb2InstrInfo.cpp
Thumb2SizeReduction.cpp		Thumb2SizeReduction.cpp
)		)

add_subdirectory(AsmParser)		add_subdirectory(AsmParser)
add_subdirectory(Disassembler)		add_subdirectory(Disassembler)
add_subdirectory(MCTargetDesc)		add_subdirectory(MCTargetDesc)
add_subdirectory(TargetInfo)		add_subdirectory(TargetInfo)
add_subdirectory(Utils)		add_subdirectory(Utils)

llvm/lib/Target/ARM/MVEVPTOptimisationsPass.cpp

This file was added.

				//===-- MVEVPTOptimisationsPass.cpp ---------------------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				dmgreenUnsubmitted Done Reply Inline Actions Can you add a file comment here like in other files, explaining what the pass does, like you have in the commit message. dmgreen: Can you add a file comment here like in other files, explaining what the pass does, like you…

				#include "ARM.h"
				#include "ARMSubtarget.h"
				#include "MCTargetDesc/ARMBaseInfo.h"
				#include "Thumb2InstrInfo.h"
				#include "llvm/ADT/SmallVector.h"
				#include "llvm/CodeGen/MachineBasicBlock.h"
				#include "llvm/CodeGen/MachineFunction.h"
				#include "llvm/CodeGen/MachineFunctionPass.h"
				#include "llvm/CodeGen/MachineInstr.h"
				#include "llvm/Support/Debug.h"
				#include <cassert>

				using namespace llvm;

				#define DEBUG_TYPE "arm-mve-vpt-opts"

				namespace {
				class MVEVPTOptimisations : public MachineFunctionPass {
				public:
				static char ID;
				const Thumb2InstrInfo *TII;
				MachineRegisterInfo *MRI;

				MVEVPTOptimisations() : MachineFunctionPass(ID) {}

				bool runOnMachineFunction(MachineFunction &Fn) override;

				StringRef getPassName() const override { return "MVE VPT Optimisation Pass"; }

				private:
				MachineInstrBuilder BuildVPNOTBefore(MachineBasicBlock &MBB,
				MachineInstr &Instr);
				MachineInstr &ReplaceUsageOfRegisterByVPNOT(MachineBasicBlock &MBB,
				MachineInstr &Instr,
				unsigned OpIdx, Register Target);
				bool InsertVPNOTs(MachineBasicBlock &MBB);
				};

				char MVEVPTOptimisations::ID = 0;

				} // end anonymous namespace

				INITIALIZE_PASS(MVEVPTOptimisations, DEBUG_TYPE,
				"ARM MVE VPT Optimisations pass", false, false)

				static bool IsVCMP(unsigned Opcode) {
				dmgreenUnsubmitted Done Reply Inline Actions Could you just use VCMPOpcodeToVPT, and check the return value isn't 0? To save this extra list being needed here. dmgreen: Could you just use VCMPOpcodeToVPT, and check the return value isn't 0? To save this extra list…
				switch (Opcode) {
				case ARM::MVE_VCMPf16:
				case ARM::MVE_VCMPf16r:
				case ARM::MVE_VCMPf32:
				case ARM::MVE_VCMPf32r:
				case ARM::MVE_VCMPi16:
				case ARM::MVE_VCMPi16r:
				case ARM::MVE_VCMPi32:
				case ARM::MVE_VCMPi32r:
				case ARM::MVE_VCMPi8:
				case ARM::MVE_VCMPi8r:
				case ARM::MVE_VCMPs16:
				case ARM::MVE_VCMPs16r:
				case ARM::MVE_VCMPs32:
				case ARM::MVE_VCMPs32r:
				case ARM::MVE_VCMPs8:
				case ARM::MVE_VCMPs8r:
				case ARM::MVE_VCMPu16:
				case ARM::MVE_VCMPu16r:
				case ARM::MVE_VCMPu32:
				case ARM::MVE_VCMPu32r:
				case ARM::MVE_VCMPu8:
				case ARM::MVE_VCMPu8r:
				return true;
				default:
				return false;
				}
				}

				// Returns the CondCode of a VCMP Instruction.
				static ARMCC::CondCodes GetCondCode(MachineInstr &Instr) {
				assert(IsVCMP(Instr.getOpcode()) && "Inst must be a VCMP");
				return ARMCC::CondCodes(Instr.getOperand(3).getImm());
				}

				// Returns true if Cond is equivalent to a VPNOT instruction on the result of
				// Prev. Cond and Prev must be VCMPs.
				static bool IsVPNOTEquivalent(MachineInstr &Cond, MachineInstr &Prev) {
				assert(IsVCMP(Cond.getOpcode()) && IsVCMP(Prev.getOpcode()));

				// Opcodes must match.
				if (Cond.getOpcode() != Prev.getOpcode())
				return false;

				// The condition code of Cond must be the opposite of Prev's, with
				// operands swapped.
				ARMCC::CondCodes ExpectedCode = GetCondCode(Cond);
				ExpectedCode = ARMCC::getSwappedCondition(ExpectedCode);
				ExpectedCode = ARMCC::getOppositeCondition(ExpectedCode);
				if (ExpectedCode != GetCondCode(Prev))
				return false;

				MachineOperand &CondOP1 = Cond.getOperand(1), &CondOP2 = Cond.getOperand(2);
				MachineOperand &PrevOP1 = Prev.getOperand(1), &PrevOP2 = Prev.getOperand(2);
				// If we have == and != (or the opposite), the operands can be identical.
				if ((GetCondCode(Cond) == ARMCC::NE && GetCondCode(Prev) == ARMCC::EQ) \|\|
				(GetCondCode(Cond) == ARMCC::EQ && GetCondCode(Prev) == ARMCC::NE))
				if (CondOP1.isIdenticalTo(PrevOP1) && CondOP2.isIdenticalTo(PrevOP2))
				return true;
				// Else, operands must be swapped.
				return CondOP1.isIdenticalTo(PrevOP2) && CondOP2.isIdenticalTo(PrevOP1);
				}

				// Returns true if Instr writes to VCCR or VPR.
				static bool IsWritingToVCCRorVPR(MachineInstr &Instr) {
				if (Instr.getNumOperands() == 0)
				return false;
				MachineOperand &Dst = Instr.getOperand(0);
				if (!Dst.isReg())
				return false;
				Register DstReg = Dst.getReg();
				if (!DstReg.isVirtual())
				return DstReg.id() == ARM::VPR;
				samparkerUnsubmitted Done Reply Inline Actions Maybe you could the MachineInstr method 'definesRegister' instead? samparker: Maybe you could the MachineInstr method 'definesRegister' instead?
				MachineRegisterInfo &RegInfo = Instr.getMF()->getRegInfo();
				const TargetRegisterClass *RegClass = RegInfo.getRegClassOrNull(DstReg);
				return RegClass && (RegClass->getID() == ARM::VCCRRegClassID);
				}

				// Creates a VPNOT before Instr.
				MachineInstrBuilder
				MVEVPTOptimisations::BuildVPNOTBefore(MachineBasicBlock &MBB,
				MachineInstr &Instr) {
				return BuildMI(MBB, &Instr, Instr.getDebugLoc(), TII->get(ARM::MVE_VPNOT));
				}

				// Transforms
				// <Instr that uses %A (at OpIdx)>
				// Into
				dmgreenUnsubmitted Done Reply Inline Actions How much does this function add? It doesn't seem to do a huge amount. dmgreen: How much does this function add? It doesn't seem to do a huge amount.
				Pierre-vhAuthorUnsubmitted Done Reply Inline Actions Sure, it doesn't add much, I'll remove it. Pierre-vh: Sure, it doesn't add much, I'll remove it.
				// %K = VPNOT %Target
				// <Instr that uses %K (at OpIdx)>
				// And returns %K.
				// This optimization is done in the hopes of preventing spills/reloads of VPR.
				MachineInstr &MVEVPTOptimisations::ReplaceUsageOfRegisterByVPNOT(
				MachineBasicBlock &MBB, MachineInstr &Instr, unsigned OpIdx,
				Register Target) {
				MachineOperand &InstrOperand = Instr.getOperand(OpIdx);

				Register NewResult = MRI->createVirtualRegister(MRI->getRegClass(Target));
				MachineInstrBuilder MIBuilder = BuildVPNOTBefore(MBB, Instr);
				MIBuilder.add(MachineOperand::CreateReg(NewResult, /isDef/ true));
				MIBuilder.add(MachineOperand::CreateReg(Target, /isDef/ false));
				MIBuilder.addImm(0);
				MIBuilder.addReg({});
				InstrOperand.setReg(NewResult);

				LLVM_DEBUG(dbgs() << " Inserting VPNOT (for spill prevention): ";
				dmgreenUnsubmitted Done Reply Inline Actions Do you have any tests for this bit? dmgreen: Do you have any tests for this bit?
				Pierre-vhAuthorUnsubmitted Done Reply Inline Actions Not yet, I'll add some. Pierre-vh: Not yet, I'll add some.
				MIBuilder.getInstr()->dump());

				return *MIBuilder.getInstr();
				}

				// Replaces VCMPs by VPNOTs when possible, and tries to reduce spills by
				// replacing uses of old VPR values with VPNOTs inside predicated instruction
				// blocks.
				bool MVEVPTOptimisations::InsertVPNOTs(MachineBasicBlock &MBB) {
				// The first instruction is the VCMP that will be replaced by a VPNOT.
				// The second instruction is the VCMP that defines the register that'll be the
				// VPNOT's operand.
				SmallVector<std::pair<MachineInstr , MachineInstr >, 4> WorkList;

				// The last VCMP that we have seen and that couldn't be replaced.
				// This is reset when an instruction that writes to VCCR/VPR is found, or when
				// an element is added to the WorkList.
				MachineInstr *PrevVCMP = nullptr;

				// Iterate over all VCMPs to create the worklist.
				for (MachineInstr &Instr : MBB.instrs()) {
				dmgreenUnsubmitted Done Reply Inline Actions You can do BuildMI(...).add(Instr.getOperand(0)).addReg(PrevVCMPResultReg)... dmgreen: You can do BuildMI(...).add(Instr.getOperand(0)).addReg(PrevVCMPResultReg)...
				if (!IsVCMP(Instr.getOpcode())) {
				// If it's an unpredicated instruction that writes to VPR (VCCR), forget
				// about the previous VCMP.
				dmgreenUnsubmitted Done Reply Inline Actions Operand 4 and 5 are always None and noreg? If so you can use addUnpredicatedMveVpredNOp, which makes it more obvious what the operands are expected to be. dmgreen: Operand 4 and 5 are always None and noreg? If so you can use addUnpredicatedMveVpredNOp, which…
				if ((getVPTInstrPredicate(Instr) == ARMVCC::None) &&
				IsWritingToVCCRorVPR(Instr))
				PrevVCMP = nullptr;
				continue;
				}

				// If we have seen a VCMP previously, and this is VCMP is equivalent to a
				// VPNOT, we can replace it, so add it to the worklist.
				if (PrevVCMP && IsVPNOTEquivalent(Instr, *PrevVCMP)) {
				dmgreenUnsubmitted Done Reply Inline Actions Should we be clearing this value at some points? Setting it back to nullptr? dmgreen: Should we be clearing this value at some points? Setting it back to nullptr?
				Pierre-vhAuthorUnsubmitted Done Reply Inline Actions Sure, it should be cleared after. I'll see if I can add a test for that as well. Pierre-vh: Sure, it should be cleared after. I'll see if I can add a test for that as well.
				LLVM_DEBUG(dbgs() << " Adding VCMP to WorkList:"; Instr.dump());
				WorkList.push_back({&Instr, PrevVCMP});
				PrevVCMP = nullptr;
				} else
				PrevVCMP = &Instr;
				}

				LLVM_DEBUG(dbgs() << (WorkList.empty() ? "No Work to do\n"
				: "Processing worklist\n"));
				for (std::pair<MachineInstr , MachineInstr > Item : WorkList) {
				MachineInstr *SwappedVCMP = Item.first;
				MachineInstr *OriginalVCMP = Item.second;
				Register Reg = OriginalVCMP->getOperand(0).getReg();

				MachineInstrBuilder MIBuilder = BuildVPNOTBefore(MBB, *SwappedVCMP);
				MIBuilder.add(SwappedVCMP->getOperand(0));
				MIBuilder.addReg(Reg);
				MIBuilder.add(SwappedVCMP->getOperand(4));
				MIBuilder.add(SwappedVCMP->getOperand(5));
				LLVM_DEBUG(dbgs() << " Inserting VPNOT (to replace VCMP): ";
				MIBuilder.getInstr()->dump());

				// While inside the block of predicated instructions, replace usages of old
				// VCCR values by VPNOTs. That way, we avoid overlapping lifetimes
				// of different VPR values (which always result in spill/reloads).
				// Those VPNOTs can then be removed by the MVE VPT Block Insertion pass,
				// and we should end up with clean blocks like "TETE", "TEET", etc.

				Register ValueReg = Reg;
				Register InverseValueReg = SwappedVCMP->getOperand(0).getReg();
				Register VPNOTOperand = InverseValueReg;

				// On each iteration, we try to replace an usage of "ValueReg" with a VPNOT
				// on "VPNOTOperand". When this transformation happens, ValueReg and
				// InverseValueReg are swapped, and VPNOTOperand is set to the result of the
				// latest VPNOT inserted.
				for (MachineBasicBlock::instr_iterator Iter = ++SwappedVCMP->getIterator();
				Iter != MBB.end(); ++Iter) {
				// Stop as soon as we leave the block of predicated instructions
				if (getVPTInstrPredicate(*Iter) == ARMVCC::None)
				break;

				// Keep going until we find an instruction that uses ValueReg.
				int Idx = Iter->findRegisterUseOperandIdx(ValueReg.id());
				if (Idx == -1)
				continue;

				// Replace the usage of said register by a VPNOT on VPNOTOperand
				MachineInstr &VPNOT =
				ReplaceUsageOfRegisterByVPNOT(MBB, *Iter, Idx, VPNOTOperand);

				// Continue: The result of the VPNOT we just inserted becomes the new
				// VPNOTOperand, and we swap ValueReg/InverseValueReg.
				VPNOTOperand = VPNOT.getOperand(0).getReg();
				std::swap(ValueReg, InverseValueReg);
				}

				// Finally, remove the old VCMP.
				SwappedVCMP->removeFromParent();
				}

				return !WorkList.empty();
				}

				bool MVEVPTOptimisations::runOnMachineFunction(MachineFunction &Fn) {
				const ARMSubtarget &STI =
				static_cast<const ARMSubtarget &>(Fn.getSubtarget());

				if (!STI.isThumb2() \|\| !STI.hasMVEIntegerOps())
				return false;

				TII = static_cast<const Thumb2InstrInfo *>(STI.getInstrInfo());
				MRI = &Fn.getRegInfo();

				LLVM_DEBUG(dbgs() << "******** ARM MVE VPT Optimisations ********\n"
				<< "********** Function: " << Fn.getName() << '\n');

				bool Modified = false;
				for (MachineBasicBlock &MBB : Fn)
				Modified \|= InsertVPNOTs(MBB);

				LLVM_DEBUG(dbgs() << "**************************************\n");
				return Modified;
				}

				/// createMVEVPTOptimisations
				FunctionPass *llvm::createMVEVPTOptimisationsPass() {
				return new MVEVPTOptimisations();
				}
				No newline at end of file

llvm/test/CodeGen/ARM/O3-pipeline.ll

	Show First 20 Lines • Show All 86 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Early Machine Loop Invariant Code Motion			; CHECK-NEXT: Early Machine Loop Invariant Code Motion
	; CHECK-NEXT: MachineDominator Tree Construction			; CHECK-NEXT: MachineDominator Tree Construction
	; CHECK-NEXT: Machine Block Frequency Analysis			; CHECK-NEXT: Machine Block Frequency Analysis
	; CHECK-NEXT: Machine Common Subexpression Elimination			; CHECK-NEXT: Machine Common Subexpression Elimination
	; CHECK-NEXT: MachinePostDominator Tree Construction			; CHECK-NEXT: MachinePostDominator Tree Construction
	; CHECK-NEXT: Machine code sinking			; CHECK-NEXT: Machine code sinking
	; CHECK-NEXT: Peephole Optimizations			; CHECK-NEXT: Peephole Optimizations
	; CHECK-NEXT: Remove dead machine instructions			; CHECK-NEXT: Remove dead machine instructions
				; CHECK-NEXT: MVE VPT Optimisation Pass
	; CHECK-NEXT: ARM MLA / MLS expansion pass			; CHECK-NEXT: ARM MLA / MLS expansion pass
	; CHECK-NEXT: ARM pre- register allocation load / store optimization pass			; CHECK-NEXT: ARM pre- register allocation load / store optimization pass
	; CHECK-NEXT: ARM A15 S->D optimizer			; CHECK-NEXT: ARM A15 S->D optimizer
	; CHECK-NEXT: Detect Dead Lanes			; CHECK-NEXT: Detect Dead Lanes
	; CHECK-NEXT: Process Implicit Definitions			; CHECK-NEXT: Process Implicit Definitions
	; CHECK-NEXT: Remove unreachable machine basic blocks			; CHECK-NEXT: Remove unreachable machine basic blocks
	; CHECK-NEXT: Live Variable Analysis			; CHECK-NEXT: Live Variable Analysis
	; CHECK-NEXT: MachineDominator Tree Construction			; CHECK-NEXT: MachineDominator Tree Construction
	▲ Show 20 Lines • Show All 74 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/mve-vcmpf.ll

	Show First 20 Lines • Show All 691 Lines • ▼ Show 20 Lines
	; CHECK-MVE-NEXT: vseleq.f32 s1, s13, s9			; CHECK-MVE-NEXT: vseleq.f32 s1, s13, s9
	; CHECK-MVE-NEXT: lsls r0, r1, #31			; CHECK-MVE-NEXT: lsls r0, r1, #31
	; CHECK-MVE-NEXT: vseleq.f32 s0, s12, s8			; CHECK-MVE-NEXT: vseleq.f32 s0, s12, s8
	; CHECK-MVE-NEXT: bx lr			; CHECK-MVE-NEXT: bx lr
	;			;
	; CHECK-MVEFP-LABEL: vcmp_ord_v4f32:			; CHECK-MVEFP-LABEL: vcmp_ord_v4f32:
	; CHECK-MVEFP: @ %bb.0: @ %entry			; CHECK-MVEFP: @ %bb.0: @ %entry
	; CHECK-MVEFP-NEXT: vpt.f32 le, q1, q0			; CHECK-MVEFP-NEXT: vpt.f32 le, q1, q0
	; CHECK-MVEFP-NEXT: vcmpt.f32 lt, q0, q1			; CHECK-MVEFP-NEXT: vpnott
				dmgreenUnsubmitted Not Done Reply Inline Actions What's going on here? I think it needs to be checking q1 <= q0 && q0 < q1. ord means ordered, which means no NaN's. (Floating point compares can be tricky like that). dmgreen: What's going on here? I think it needs to be checking q1 <= q0 && q0 < q1. ord means ordered…
				Pierre-vhAuthorUnsubmitted Done Reply Inline Actions I didn't know that. In which case is it not "safe" to replace a float VCMP even when the conditions are met? What would be the correct behaviour in this case? Should I disable this optimisation for all float VCMPs? Pierre-vh: I didn't know that. In which case is it not "safe" to replace a float VCMP even when the…
				dmgreenUnsubmitted Not Done Reply Inline Actions Umm. I think for floats you cannot swap the operands. That only works for integers. You can still use the opposite condition code (lt <> ge, for example). Have a look at getOppositeCondition and isValidMVECond in ISelLowering for how it's done there. dmgreen: Umm. I think for floats you cannot swap the operands. That only works for integers. You can…
	; CHECK-MVEFP-NEXT: vpnot			; CHECK-MVEFP-NEXT: vpnot
	; CHECK-MVEFP-NEXT: vpsel q0, q2, q3			; CHECK-MVEFP-NEXT: vpsel q0, q2, q3
	; CHECK-MVEFP-NEXT: bx lr			; CHECK-MVEFP-NEXT: bx lr
	entry:			entry:
	%c = fcmp ord <4 x float> %src, %src2			%c = fcmp ord <4 x float> %src, %src2
	%s = select <4 x i1> %c, <4 x float> %a, <4 x float> %b			%s = select <4 x i1> %c, <4 x float> %a, <4 x float> %b
	ret <4 x float> %s			ret <4 x float> %s
	}			}
	Show All 37 Lines
	; CHECK-MVE-NEXT: vseleq.f32 s1, s13, s9			; CHECK-MVE-NEXT: vseleq.f32 s1, s13, s9
	; CHECK-MVE-NEXT: lsls r0, r1, #31			; CHECK-MVE-NEXT: lsls r0, r1, #31
	; CHECK-MVE-NEXT: vseleq.f32 s0, s12, s8			; CHECK-MVE-NEXT: vseleq.f32 s0, s12, s8
	; CHECK-MVE-NEXT: bx lr			; CHECK-MVE-NEXT: bx lr
	;			;
	; CHECK-MVEFP-LABEL: vcmp_uno_v4f32:			; CHECK-MVEFP-LABEL: vcmp_uno_v4f32:
	; CHECK-MVEFP: @ %bb.0: @ %entry			; CHECK-MVEFP: @ %bb.0: @ %entry
	; CHECK-MVEFP-NEXT: vpt.f32 le, q1, q0			; CHECK-MVEFP-NEXT: vpt.f32 le, q1, q0
	; CHECK-MVEFP-NEXT: vcmpt.f32 lt, q0, q1			; CHECK-MVEFP-NEXT: vpnott
	; CHECK-MVEFP-NEXT: vpsel q0, q2, q3			; CHECK-MVEFP-NEXT: vpsel q0, q2, q3
	; CHECK-MVEFP-NEXT: bx lr			; CHECK-MVEFP-NEXT: bx lr
	entry:			entry:
	%c = fcmp uno <4 x float> %src, %src2			%c = fcmp uno <4 x float> %src, %src2
	%s = select <4 x i1> %c, <4 x float> %a, <4 x float> %b			%s = select <4 x i1> %c, <4 x float> %a, <4 x float> %b
	ret <4 x float> %s			ret <4 x float> %s
	}			}

	▲ Show 20 Lines • Show All 1,637 Lines • ▼ Show 20 Lines
	; CHECK-MVE-NEXT: vmov.16 q4[7], r0			; CHECK-MVE-NEXT: vmov.16 q4[7], r0
	; CHECK-MVE-NEXT: vmov q0, q4			; CHECK-MVE-NEXT: vmov q0, q4
	; CHECK-MVE-NEXT: vpop {d8, d9, d10, d11}			; CHECK-MVE-NEXT: vpop {d8, d9, d10, d11}
	; CHECK-MVE-NEXT: bx lr			; CHECK-MVE-NEXT: bx lr
	;			;
	; CHECK-MVEFP-LABEL: vcmp_ord_v8f16:			; CHECK-MVEFP-LABEL: vcmp_ord_v8f16:
	; CHECK-MVEFP: @ %bb.0: @ %entry			; CHECK-MVEFP: @ %bb.0: @ %entry
	; CHECK-MVEFP-NEXT: vpt.f16 le, q1, q0			; CHECK-MVEFP-NEXT: vpt.f16 le, q1, q0
	; CHECK-MVEFP-NEXT: vcmpt.f16 lt, q0, q1			; CHECK-MVEFP-NEXT: vpnott
	; CHECK-MVEFP-NEXT: vpnot			; CHECK-MVEFP-NEXT: vpnot
	; CHECK-MVEFP-NEXT: vpsel q0, q2, q3			; CHECK-MVEFP-NEXT: vpsel q0, q2, q3
	; CHECK-MVEFP-NEXT: bx lr			; CHECK-MVEFP-NEXT: bx lr
	entry:			entry:
	%c = fcmp ord <8 x half> %src, %src2			%c = fcmp ord <8 x half> %src, %src2
	%s = select <8 x i1> %c, <8 x half> %a, <8 x half> %b			%s = select <8 x i1> %c, <8 x half> %a, <8 x half> %b
	ret <8 x half> %s			ret <8 x half> %s
	}			}
	▲ Show 20 Lines • Show All 109 Lines • ▼ Show 20 Lines
	; CHECK-MVE-NEXT: vmov.16 q4[7], r0			; CHECK-MVE-NEXT: vmov.16 q4[7], r0
	; CHECK-MVE-NEXT: vmov q0, q4			; CHECK-MVE-NEXT: vmov q0, q4
	; CHECK-MVE-NEXT: vpop {d8, d9, d10, d11}			; CHECK-MVE-NEXT: vpop {d8, d9, d10, d11}
	; CHECK-MVE-NEXT: bx lr			; CHECK-MVE-NEXT: bx lr
	;			;
	; CHECK-MVEFP-LABEL: vcmp_uno_v8f16:			; CHECK-MVEFP-LABEL: vcmp_uno_v8f16:
	; CHECK-MVEFP: @ %bb.0: @ %entry			; CHECK-MVEFP: @ %bb.0: @ %entry
	; CHECK-MVEFP-NEXT: vpt.f16 le, q1, q0			; CHECK-MVEFP-NEXT: vpt.f16 le, q1, q0
	; CHECK-MVEFP-NEXT: vcmpt.f16 lt, q0, q1			; CHECK-MVEFP-NEXT: vpnott
	; CHECK-MVEFP-NEXT: vpsel q0, q2, q3			; CHECK-MVEFP-NEXT: vpsel q0, q2, q3
	; CHECK-MVEFP-NEXT: bx lr			; CHECK-MVEFP-NEXT: bx lr
	entry:			entry:
	%c = fcmp uno <8 x half> %src, %src2			%c = fcmp uno <8 x half> %src, %src2
	%s = select <8 x i1> %c, <8 x half> %a, <8 x half> %b			%s = select <8 x i1> %c, <8 x half> %a, <8 x half> %b
	ret <8 x half> %s			ret <8 x half> %s
	}			}

llvm/test/CodeGen/Thumb2/mve-vpt-optimisations.mir

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				dmgreenUnsubmitted Not Done Reply Inline Actions I think this test would be simpler if there was a number of test functions inside it, each testing one thing (or maybe a small collection of things). Otherwise they run into each other a bit, and it will be hard to tell what's wrong if one of them starts to fail. dmgreen: I think this test would be simpler if there was a number of test functions inside it, each…
				Pierre-vhAuthorUnsubmitted Done Reply Inline Actions Would putting them all in different basic blocks be enough, or do you prefer functions? The optimisation is done per basic-block, so it'd behave the same as function (but the test would be shorter). Pierre-vh: Would putting them all in different basic blocks be enough, or do you prefer functions? The…
				dmgreenUnsubmitted Done Reply Inline Actions Separate functions would be more canonical. You can hopefully simplify the test quite a bit to make it so that's not too verbose. dmgreen: Separate functions would be more canonical. You can hopefully simplify the test quite a bit to…
				# RUN: llc -run-pass arm-mve-vpt-opts %s -o - \| FileCheck %s

				--- \|
				target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
				target triple = "thumbv8.1m.main-arm-none-eabi"

				define hidden arm_aapcs_vfpcc <4 x float> @vpt_opts(<4 x float> %inactive1, <4 x float> %inactive2, <4 x float> %a, <4 x float> %b, i16 zeroext %p) local_unnamed_addr #0 {
				entry:
				%conv.i = zext i16 %p to i32
				%0 = tail call nnan ninf nsz <4 x float> @llvm.arm.mve.vminnm.m.v4f32.v4f32.v4f32.v4f32.i32(<4 x float> undef, <4 x float> %a, <4 x float> %b, i32 %conv.i) #2
				%1 = tail call nnan ninf nsz <4 x float> @llvm.arm.mve.vminnm.m.v4f32.v4f32.v4f32.v4f32.i32(<4 x float> undef, <4 x float> %0, <4 x float> %0, i32 %conv.i) #2
				%2 = tail call nnan ninf nsz <4 x float> @llvm.arm.mve.vminnm.m.v4f32.v4f32.v4f32.v4f32.i32(<4 x float> %inactive1, <4 x float> %1, <4 x float> %b, i32 %conv.i) #2
				%3 = tail call nnan ninf nsz <4 x float> @llvm.arm.mve.vminnm.m.v4f32.v4f32.v4f32.v4f32.i32(<4 x float> %inactive2, <4 x float> %2, <4 x float> %b, i32 %conv.i) #2
				ret <4 x float> %3
				dmgreenUnsubmitted Done Reply Inline Actions Functions don't actually need bodies. They can just contain unreachable. And if the test below doesn't bare any resemblance to the code here, that is probably for the better. dmgreen: Functions don't actually need bodies. They can just contain unreachable. And if the test below…
				}

				declare <4 x float> @llvm.arm.mve.vminnm.m.v4f32.v4f32.v4f32.v4f32.i32(<4 x float>, <4 x float>, <4 x float>, i32) #1

				attributes #0 = { nounwind readnone "correctly-rounded-divide-sqrt-fp-math"="false" "denormal-fp-math"="preserve-sign" "disable-tail-calls"="false" "less-precise-fpmad"="false" "min-legal-vector-width"="128" "frame-pointer"="none" "no-infs-fp-math"="true" "no-jump-tables"="false" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic" "target-features"="+armv8.1-m.main,+hwdiv,+mve.fp,+ras,+thumb-mode" "unsafe-fp-math"="false" "use-soft-float"="false" }
				attributes #1 = { nounwind readnone }
				attributes #2 = { nounwind }
				dmgreenUnsubmitted Done Reply Inline Actions I think a lot of this can probably be removed, if you replace +mve.fp it with command line options. dmgreen: I think a lot of this can probably be removed, if you replace +mve.fp it with command line…

				...
				---
				name: vpt_opts
				alignment: 4
				exposesReturnsTwice: false
				legalized: false
				regBankSelected: false
				selected: false
				failedISel: false
				tracksRegLiveness: true
				hasWinCFI: false
				registers: []
				liveins:
				- { reg: '$q0', virtual-reg: '' }
				- { reg: '$q1', virtual-reg: '' }
				- { reg: '$q2', virtual-reg: '' }
				- { reg: '$q3', virtual-reg: '' }
				- { reg: '$r0', virtual-reg: '' }
				- { reg: '$r1', virtual-reg: '' }
				frameInfo:
				isFrameAddressTaken: false
				isReturnAddressTaken: false
				hasStackMap: false
				hasPatchPoint: false
				stackSize: 0
				offsetAdjustment: 0
				maxAlignment: 0
				adjustsStack: false
				hasCalls: false
				stackProtector: ''
				maxCallFrameSize: 0
				cvBytesOfCalleeSavedRegisters: 0
				hasOpaqueSPAdjustment: false
				hasVAStart: false
				hasMustTailInVarArgFunc: false
				localFrameSize: 0
				savePoint: ''
				restorePoint: ''
				fixedStack: []
				stack: []
				constants: []
				dmgreenUnsubmitted Done Reply Inline Actions I think you can remove a lot of this. dmgreen: I think you can remove a lot of this.
				body: \|
				bb.0.entry:
				liveins: $q0, $q1, $q2, $r0, $r1

				; CHECK-LABEL: name: vpt_opts
				; CHECK: liveins: $q0, $q1, $q2, $r0, $r1
				; CHECK: renamable $vpr = MVE_VCMPs32 renamable $q0, renamable $q2, 10, 0, $noreg
				; CHECK: renamable $vpr = MVE_VPNOT $vpr, 0, $noreg
				; CHECK: renamable $vpr = MVE_VCMPf16 renamable $q0, renamable $q2, 10, 0, $noreg
				; CHECK: renamable $vpr = MVE_VPNOT $vpr, 0, $noreg
				; CHECK: renamable $vpr = MVE_VCMPf32 renamable $q0, renamable $q2, 10, 0, $noreg
				; CHECK: renamable $vpr = MVE_VPNOT $vpr, 0, $noreg
				; CHECK: renamable $vpr = MVE_VCMPi16 renamable $q0, renamable $q2, 10, 0, $noreg
				; CHECK: renamable $vpr = MVE_VPNOT $vpr, 0, $noreg
				; CHECK: renamable $vpr = MVE_VCMPi32 renamable $q0, renamable $q2, 10, 0, $noreg
				; CHECK: renamable $vpr = MVE_VPNOT $vpr, 0, $noreg
				; CHECK: renamable $vpr = MVE_VCMPi8 renamable $q0, renamable $q2, 10, 0, $noreg
				; CHECK: renamable $vpr = MVE_VPNOT $vpr, 0, $noreg
				; CHECK: renamable $vpr = MVE_VCMPs16 renamable $q0, renamable $q2, 10, 0, $noreg
				; CHECK: renamable $vpr = MVE_VPNOT $vpr, 0, $noreg
				; CHECK: renamable $vpr = MVE_VCMPs32 renamable $q0, renamable $q2, 10, 0, $noreg
				; CHECK: renamable $vpr = MVE_VPNOT $vpr, 0, $noreg
				; CHECK: renamable $vpr = MVE_VCMPs8 renamable $q0, renamable $q2, 10, 0, $noreg
				; CHECK: renamable $vpr = MVE_VPNOT $vpr, 0, $noreg
				; CHECK: renamable $vpr = MVE_VCMPu16 renamable $q0, renamable $q2, 10, 0, $noreg
				; CHECK: renamable $vpr = MVE_VPNOT $vpr, 0, $noreg
				; CHECK: renamable $vpr = MVE_VCMPu32 renamable $q0, renamable $q2, 10, 0, $noreg
				; CHECK: renamable $vpr = MVE_VPNOT $vpr, 0, $noreg
				; CHECK: renamable $vpr = MVE_VCMPu8 renamable $q0, renamable $q2, 10, 0, $noreg
				; CHECK: renamable $vpr = MVE_VPNOT $vpr, 0, $noreg
				; CHECK: renamable $vpr = MVE_VCMPs32 renamable $q0, renamable $q2, 11, 0, $noreg
				; CHECK: renamable $vpr = MVE_VPNOT $vpr, 0, $noreg
				; CHECK: renamable $vpr = MVE_VCMPs32 renamable $q0, renamable $q2, 12, 0, $noreg
				; CHECK: renamable $vpr = MVE_VPNOT $vpr, 0, $noreg
				; CHECK: renamable $vpr = MVE_VCMPs32 renamable $q0, renamable $q2, 13, 0, $noreg
				; CHECK: renamable $vpr = MVE_VPNOT $vpr, 0, $noreg
				; CHECK: renamable $vpr = MVE_VCMPs32 renamable $q0, renamable $q2, 0, 0, $noreg
				; CHECK: renamable $vpr = MVE_VPNOT $vpr, 0, $noreg
				; CHECK: renamable $vpr = MVE_VCMPs32 renamable $q0, renamable $q2, 0, 0, $noreg
				; CHECK: renamable $vpr = MVE_VPNOT $vpr, 0, $noreg
				; CHECK: renamable $vpr = MVE_VCMPs32 renamable $q0, renamable $q2, 1, 0, $noreg
				; CHECK: renamable $vpr = MVE_VPNOT $vpr, 0, $noreg
				; CHECK: renamable $vpr = MVE_VCMPs32 renamable $q0, renamable $q2, 1, 0, $noreg
				; CHECK: renamable $vpr = MVE_VPNOT $vpr, 0, $noreg
				; CHECK: renamable $vpr = MVE_VCMPs32 renamable $q0, renamable $q2, 10, 0, $noreg
				; CHECK: renamable $vpr = MVE_VPNOT $vpr, 0, $noreg
				; CHECK: renamable $vpr = MVE_VCMPs32 renamable $q2, renamable $q0, 12, 0, $noreg
				; CHECK: renamable $vpr = MVE_VCMPf16 renamable $q0, renamable $q2, 0, 0, $noreg
				; CHECK: renamable $vpr = MVE_VCMPs32 renamable $q0, renamable $q2, 1, 0, $noreg
				; CHECK: renamable $vpr = MVE_VCMPs32 renamable $q0, renamable $q2, 10, 0, $noreg
				; CHECK: renamable $vpr = MVE_VCMPs32 renamable $q2, renamable $q0, 11, 0, $noreg
				; CHECK: renamable $vpr = MVE_VCMPs32 renamable $q0, renamable $q2, 10, 0, $noreg
				; CHECK: renamable $vpr = MVE_VCMPs32 renamable $q0, renamable $q2, 12, 0, $noreg
				; CHECK: renamable $vpr = MVE_VCMPs32 renamable $q0, renamable $q2, 11, 0, $noreg
				; CHECK: renamable $vpr = MVE_VPNOT killed renamable $vpr, 0, $noreg
				; CHECK: renamable $vpr = MVE_VCMPs32 renamable $q2, renamable $q0, 13, 0, $noreg
				; CHECK: renamable $vpr = MVE_VCMPs32 renamable $q0, renamable $q2, 12, 0, $noreg
				; CHECK: [[MVE_VPNOT:%[0-9]+]]:vccr = MVE_VPNOT killed renamable $vpr, 0, $noreg
				; CHECK: renamable $vpr = MVE_VCMPs32 renamable $q2, renamable $q0, 10, 0, $noreg
				; CHECK: [[MVE_VCMPs32_:%[0-9]+]]:vccr = MVE_VCMPs32 renamable $q0, renamable $q2, 10, 0, $noreg
				; CHECK: [[MVE_VPNOT:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VCMPs32_]], 0, $noreg
				; CHECK: [[MVE_VORR:%[0-9]+]]:mqpr = MVE_VORR renamable $q2, renamable $q2, 1, [[MVE_VPNOT]], undef [[MVE_VORR]]
				; CHECK: [[MVE_VPNOT1:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VPNOT]], 0, $noreg
				; CHECK: [[MVE_VORR1:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR]], [[MVE_VORR]], 1, [[MVE_VPNOT1]], undef [[MVE_VORR1]]
				; CHECK: [[MVE_VPNOT2:%[0-9]+]]:vccr = MVE_VPNOT [[MVE_VPNOT1]], 0, $noreg
				; CHECK: [[MVE_VORR2:%[0-9]+]]:mqpr = MVE_VORR [[MVE_VORR1]], [[MVE_VORR1]], 1, [[MVE_VPNOT2]], undef [[MVE_VORR2]]
				; CHECK: tBX_RET 14 /* CC::al */, $noreg, implicit $q0
				renamable $vpr = MVE_VCMPs32 renamable $q0, renamable $q2, 10, 0, $noreg
				renamable $vpr = MVE_VCMPs32 renamable $q2, renamable $q0, 12, 0, $noreg

				renamable $vpr = MVE_VCMPf16 renamable $q0, renamable $q2, 10, 0, $noreg
				renamable $vpr = MVE_VCMPf16 renamable $q2, renamable $q0, 12, 0, $noreg

				renamable $vpr = MVE_VCMPf32 renamable $q0, renamable $q2, 10, 0, $noreg
				renamable $vpr = MVE_VCMPf32 renamable $q2, renamable $q0, 12, 0, $noreg

				renamable $vpr = MVE_VCMPi16 renamable $q0, renamable $q2, 10, 0, $noreg
				renamable $vpr = MVE_VCMPi16 renamable $q2, renamable $q0, 12, 0, $noreg

				renamable $vpr = MVE_VCMPi32 renamable $q0, renamable $q2, 10, 0, $noreg
				renamable $vpr = MVE_VCMPi32 renamable $q2, renamable $q0, 12, 0, $noreg

				renamable $vpr = MVE_VCMPi8 renamable $q0, renamable $q2, 10, 0, $noreg
				renamable $vpr = MVE_VCMPi8 renamable $q2, renamable $q0, 12, 0, $noreg

				renamable $vpr = MVE_VCMPs16 renamable $q0, renamable $q2, 10, 0, $noreg
				renamable $vpr = MVE_VCMPs16 renamable $q2, renamable $q0, 12, 0, $noreg

				renamable $vpr = MVE_VCMPs32 renamable $q0, renamable $q2, 10, 0, $noreg
				renamable $vpr = MVE_VCMPs32 renamable $q2, renamable $q0, 12, 0, $noreg

				renamable $vpr = MVE_VCMPs8 renamable $q0, renamable $q2, 10, 0, $noreg
				dmgreenUnsubmitted Not Done Reply Inline Actions I was expecting these registers to be virtual, given where this is in the pipeline. Will they be physical instead? dmgreen: I was expecting these registers to be virtual, given where this is in the pipeline. Will they…
				Pierre-vhAuthorUnsubmitted Done Reply Inline Actions In the pipeline, they'd be virtual. Should I replace all $vpr/$q0/$q1 here with virtual registers ? It won't make much of a difference testing wise but I can understand that virtual registers would be preferred. Pierre-vh: In the pipeline, they'd be virtual. Should I replace all $vpr/$q0/$q1 here with virtual…
				renamable $vpr = MVE_VCMPs8 renamable $q2, renamable $q0, 12, 0, $noreg

				renamable $vpr = MVE_VCMPu16 renamable $q0, renamable $q2, 10, 0, $noreg
				renamable $vpr = MVE_VCMPu16 renamable $q2, renamable $q0, 12, 0, $noreg

				renamable $vpr = MVE_VCMPu32 renamable $q0, renamable $q2, 10, 0, $noreg
				renamable $vpr = MVE_VCMPu32 renamable $q2, renamable $q0, 12, 0, $noreg

				renamable $vpr = MVE_VCMPu8 renamable $q0, renamable $q2, 10, 0, $noreg
				renamable $vpr = MVE_VCMPu8 renamable $q2, renamable $q0, 12, 0, $noreg

				renamable $vpr = MVE_VCMPs32 renamable $q0, renamable $q2, 11, 0, $noreg
				renamable $vpr = MVE_VCMPs32 renamable $q2, renamable $q0, 13, 0, $noreg

				renamable $vpr = MVE_VCMPs32 renamable $q0, renamable $q2, 12, 0, $noreg
				renamable $vpr = MVE_VCMPs32 renamable $q2, renamable $q0, 10, 0, $noreg

				renamable $vpr = MVE_VCMPs32 renamable $q0, renamable $q2, 13, 0, $noreg
				renamable $vpr = MVE_VCMPs32 renamable $q2, renamable $q0, 11, 0, $noreg

				renamable $vpr = MVE_VCMPs32 renamable $q0, renamable $q2, 0, 0, $noreg
				renamable $vpr = MVE_VCMPs32 renamable $q2, renamable $q0, 1, 0, $noreg

				renamable $vpr = MVE_VCMPs32 renamable $q0, renamable $q2, 0, 0, $noreg
				renamable $vpr = MVE_VCMPs32 renamable $q0, renamable $q2, 1, 0, $noreg

				renamable $vpr = MVE_VCMPs32 renamable $q0, renamable $q2, 1, 0, $noreg
				renamable $vpr = MVE_VCMPs32 renamable $q2, renamable $q0, 0, 0, $noreg

				renamable $vpr = MVE_VCMPs32 renamable $q0, renamable $q2, 1, 0, $noreg
				renamable $vpr = MVE_VCMPs32 renamable $q0, renamable $q2, 0, 0, $noreg

				; Shouldn't insert 2 VPNOTs
				renamable $vpr = MVE_VCMPs32 renamable $q0, renamable $q2, 10, 0, $noreg
				renamable $vpr = MVE_VCMPs32 renamable $q2, renamable $q0, 12, 0, $noreg
				renamable $vpr = MVE_VCMPs32 renamable $q2, renamable $q0, 12, 0, $noreg

				; Shouldn't replace by a VPNOT: Opcodes are different
				renamable $vpr = MVE_VCMPf16 renamable $q0, renamable $q2, 0, 0, $noreg
				renamable $vpr = MVE_VCMPs32 renamable $q0, renamable $q2, 1, 0, $noreg

				; Shouldn't replace by a VPNOT: Condition code is incorrect for second VCMP.
				renamable $vpr = MVE_VCMPs32 renamable $q0, renamable $q2, 10, 0, $noreg
				renamable $vpr = MVE_VCMPs32 renamable $q2, renamable $q0, 11, 0, $noreg

				; Shouldn't replace by a VPNOT: Operands are not swapped.
				renamable $vpr = MVE_VCMPs32 renamable $q0, renamable $q2, 10, 0, $noreg
				renamable $vpr = MVE_VCMPs32 renamable $q0, renamable $q2, 12, 0, $noreg

				; Shouldn't replace by a VPNOT: Something writes to VPR in-between.
				renamable $vpr = MVE_VCMPs32 renamable $q0, renamable $q2, 11, 0, $noreg
				renamable $vpr = MVE_VPNOT killed renamable $vpr, 0, $noreg
				renamable $vpr = MVE_VCMPs32 renamable $q2, renamable $q0, 13, 0, $noreg

				; Shouldn't replace by a VPNOT: Something writes to VCCR (=VPR) in-between.
				renamable $vpr = MVE_VCMPs32 renamable $q0, renamable $q2, 12, 0, $noreg
				%0:vccr = MVE_VPNOT killed renamable $vpr, 0, $noreg
				renamable $vpr = MVE_VCMPs32 renamable $q2, renamable $q0, 10, 0, $noreg

				; Spill-prevention: Prevent a spill/reload by inserting another VPNOT
				; instead of reusing %0/%1 after VPR has been written to.
				%0:vccr = MVE_VCMPs32 renamable $q0, renamable $q2, 10, 0, $noreg
				%1:vccr = MVE_VCMPs32 renamable $q2, renamable $q0, 12, 0, $noreg
				%2:mqpr = MVE_VORR renamable $q2, renamable $q2, 1, %1, undef %2
				%3:mqpr = MVE_VORR %2, %2, 1, %0:vccr, undef %3:mqpr
				%4:mqpr = MVE_VORR %3, %3, 1, %1:vccr, undef %4:mqpr

				tBX_RET 14, $noreg, implicit $q0
				...