This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/
-
CodeGen/
-
ReachingDefAnalysis.cpp
-
Target/ARM/
-
ARM/
1/5
MVEVPTBlockPass.cpp
-
test/CodeGen/
-
CodeGen/
-
ARM/
1/2
O3-pipeline.ll
-
Thumb2/
-
mve-vpt-block-fold-vcmp.mir
1
mve-vpt-block-optnone.mir

Differential D71470

Recommit "[ARM][MVE] findVCMPToFoldIntoVPS"
ClosedPublic

Authored by SjoerdMeijer on Dec 13 2019, 8:19 AM.

Download Raw Diff

Details

Reviewers

samparker
dmgreen

Commits

rGe34801c8e6df: [ARM][MVE] VPT Blocks: findVCMPToFoldIntoVPS

Summary

This is a recommit of D71330, but with a few things fixed/changed:

ReachingDefAnalysis: this was not running with optnone as it was checking skipFunction(), which other analysis passes don't do. I guess this is a copy-paste from a codegen pass.
VPTBlockPass: here I've added skipFunction(), because like most/all optimisations, we don't want to run this with optnone.

This fixes the issues with the initial/previous commit of this: the VPTBlockPass was running with optnone, but ReachingDefAnalysis wasn't, and so VPTBlockPass was crashing querying ReachingDefAnalysis.

I've added test case mve-vpt-block-optnone.mir to check that we don't run VPTBlock with optnone.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

SjoerdMeijer created this revision.Dec 13 2019, 8:19 AM

Herald added a project: Restricted Project. · View Herald TranscriptDec 13 2019, 8:19 AM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald Transcript

SjoerdMeijer added inline comments.Dec 13 2019, 9:30 AM

llvm/lib/Target/ARM/MVEVPTBlockPass.cpp
153–157	Actually, I was planning on making one more change: save all the instructions that we want to remove to a worklist, so that they can all be removed in one go at the end. This should avoid that RDA possibly has an inconsistent view on the block...

Delayed deleting the VCMP, in order not to invalidate the RDA analysis.

I think this is okay... But do we have a test where we have two VPT blocks, predicated on the same value, where that value is generated by a foldable vcmp?

Thanks for taking a look, I have added test case mve-vpt-block-fold-vcmp.mir.

Cheers, my mind is at ease, LGTM.

This revision is now accepted and ready to land.Jan 6 2020, 9:13 AM

Closed by commit rGe34801c8e6df: [ARM][MVE] VPT Blocks: findVCMPToFoldIntoVPS (authored by SjoerdMeijer). · Explain WhyJan 7 2020, 6:00 AM

This revision was automatically updated to reflect the committed changes.

Hello. I had to revert this in https://reviews.llvm.org/rGd50e188a072deca9d48149e05a05756c474bf569 because it was causing some problems to do with checking for uses between the VCMP and the VPT block it is folded into. When looking I didn't think that some of the other details here looked right either.

I've tried to keep your tests as they still seemed useful, and will add a new one for the case I was looking at today.

Sorry for not really looking at it earlier. I wasn't paying enough attention after the first revision. I have put some extra comments below.

llvm/lib/Target/ARM/MVEVPTBlockPass.cpp
75	What happened to this modifies register check? This was the original error I was looking at.
176	I don't think it's valid to skip this at -O0. It is needed for valid assembly. You could argue that some of the optimisations this pass is doing shouldn't be done at O0, but at least VPST's need to be added.
llvm/test/CodeGen/ARM/O3-pipeline.ll
147	RDA looks like quite an expensive pass to me. It would scan through and store reaching defs info for every instruction in every block of a function? Do you think it's worth it for this case, where we are only doing some minor code cleanup?
llvm/test/CodeGen/Thumb2/mve-vpt-block-optnone.mir
73	If this instruction is predicated, then we need to generate a VPST, otherwise the final assembly will be invalid (the "Then" predicate will be silently dropped as this is not part of a valid VPT block).

Can you provide the test case, or point me at it if it is there already?

Sorry for the delay. The original reproducer I had was with a predicated vdup intrinsic. But when they were added recently it turns out it needs the downstream version (or an odd combination of downstream and upstream intrinsics), and so the same problem didn't show up.

It turns out it was simpler to just write it by hand. Now in as rGf64b3466b6bb.

In D71470#1860178, @dmgreen wrote:

Sorry for the delay. The original reproducer I had was with a predicated vdup intrinsic. But when they were added recently it turns out it needs the downstream version (or an odd combination of downstream and upstream intrinsics), and so the same problem didn't show up.

It turns out it was simpler to just write it by hand. Now in as rGf64b3466b6bb.

Thanks, I will look into that.

llvm/lib/Target/ARM/MVEVPTBlockPass.cpp
75	I think thhat is covered by `RDA->getReachingMIDef()`, which gets the first def of VPR, and looks to me to be equivalent to the `modifiesRegister()` here if I'm not wrong. But will take a look at the test case and how that behaves.
llvm/test/CodeGen/ARM/O3-pipeline.ll
147	There's clearly a trade-off here. The 3 calls to RDA are concise, easy to read, and reusing exiting code. This is a big benefit, but indeed comes at the cost of running RDA. I guess the only way to answer your question is by measuring compile times.

dmgreen added inline comments.Feb 7 2020, 1:21 AM

llvm/lib/Target/ARM/MVEVPTBlockPass.cpp
75	Ah, yes. Sorry. The "readsRegister" uses check is what this should have pointed at.

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

ReachingDefAnalysis.cpp

2 lines

Target/

ARM/

MVEVPTBlockPass.cpp

72 lines

test/

CodeGen/

ARM/

O3-pipeline.ll

1 line

Thumb2/

mve-vpt-block-fold-vcmp.mir

128 lines

mve-vpt-block-optnone.mir

75 lines

Diff 236570

llvm/lib/CodeGen/ReachingDefAnalysis.cpp

Show First 20 Lines • Show All 127 Lines • ▼ Show 20 Lines	void ReachingDefAnalysis::processBasicBlock(
for (MachineInstr &MI : *TraversedMBB.MBB) {		for (MachineInstr &MI : *TraversedMBB.MBB) {
if (!MI.isDebugInstr())		if (!MI.isDebugInstr())
processDefs(&MI);		processDefs(&MI);
}		}
leaveBasicBlock(TraversedMBB);		leaveBasicBlock(TraversedMBB);
}		}

bool ReachingDefAnalysis::runOnMachineFunction(MachineFunction &mf) {		bool ReachingDefAnalysis::runOnMachineFunction(MachineFunction &mf) {
if (skipFunction(mf.getFunction()))
return false;
MF = &mf;		MF = &mf;
TRI = MF->getSubtarget().getRegisterInfo();		TRI = MF->getSubtarget().getRegisterInfo();

LiveRegs.clear();		LiveRegs.clear();
NumRegUnits = TRI->getNumRegUnits();		NumRegUnits = TRI->getNumRegUnits();

MBBReachingDefs.resize(mf.getNumBlockIDs());		MBBReachingDefs.resize(mf.getNumBlockIDs());

▲ Show 20 Lines • Show All 189 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/MVEVPTBlockPass.cpp

Show All 16 Lines
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/CodeGen/MachineBasicBlock.h"		#include "llvm/CodeGen/MachineBasicBlock.h"
#include "llvm/CodeGen/MachineFunction.h"		#include "llvm/CodeGen/MachineFunction.h"
#include "llvm/CodeGen/MachineFunctionPass.h"		#include "llvm/CodeGen/MachineFunctionPass.h"
#include "llvm/CodeGen/MachineInstr.h"		#include "llvm/CodeGen/MachineInstr.h"
#include "llvm/CodeGen/MachineInstrBuilder.h"		#include "llvm/CodeGen/MachineInstrBuilder.h"
#include "llvm/CodeGen/MachineInstrBundle.h"		#include "llvm/CodeGen/MachineInstrBundle.h"
#include "llvm/CodeGen/MachineOperand.h"		#include "llvm/CodeGen/MachineOperand.h"
		#include "llvm/CodeGen/ReachingDefAnalysis.h"
#include "llvm/IR/DebugLoc.h"		#include "llvm/IR/DebugLoc.h"
#include "llvm/MC/MCInstrDesc.h"		#include "llvm/MC/MCInstrDesc.h"
#include "llvm/MC/MCRegisterInfo.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include <cassert>		#include <cassert>
#include <new>		#include <new>

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "arm-mve-vpt"		#define DEBUG_TYPE "arm-mve-vpt"

namespace {		namespace {
class MVEVPTBlock : public MachineFunctionPass {		class MVEVPTBlock : public MachineFunctionPass {
public:		public:
static char ID;		static char ID;
const Thumb2InstrInfo *TII;
const TargetRegisterInfo *TRI;

MVEVPTBlock() : MachineFunctionPass(ID) {}		MVEVPTBlock() : MachineFunctionPass(ID) {}

bool runOnMachineFunction(MachineFunction &Fn) override;		bool runOnMachineFunction(MachineFunction &Fn) override;

		void getAnalysisUsage(AnalysisUsage &AU) const override {
		AU.setPreservesCFG();
		AU.addRequired<ReachingDefAnalysis>();
		MachineFunctionPass::getAnalysisUsage(AU);
		}

MachineFunctionProperties getRequiredProperties() const override {		MachineFunctionProperties getRequiredProperties() const override {
return MachineFunctionProperties().set(		return MachineFunctionProperties().set(
MachineFunctionProperties::Property::NoVRegs);		MachineFunctionProperties::Property::NoVRegs).set(
		MachineFunctionProperties::Property::TracksLiveness);
}		}

StringRef getPassName() const override {		StringRef getPassName() const override {
return "MVE VPT block insertion pass";		return "MVE VPT block insertion pass";
}		}

private:		private:
bool InsertVPTBlocks(MachineBasicBlock &MBB);		bool InsertVPTBlocks(MachineBasicBlock &MBB);

		const Thumb2InstrInfo *TII = nullptr;
		ReachingDefAnalysis *RDA = nullptr;
};		};

char MVEVPTBlock::ID = 0;		char MVEVPTBlock::ID = 0;

} // end anonymous namespace		} // end anonymous namespace

INITIALIZE_PASS(MVEVPTBlock, DEBUG_TYPE, "ARM MVE VPT block pass", false, false)		INITIALIZE_PASS(MVEVPTBlock, DEBUG_TYPE, "ARM MVE VPT block pass", false, false)

static MachineInstr *findVCMPToFoldIntoVPST(MachineBasicBlock::iterator MI,		static MachineInstr findVCMPToFoldIntoVPST(MachineInstr MI,
const TargetRegisterInfo *TRI,		ReachingDefAnalysis *RDA,
unsigned &NewOpcode) {		unsigned &NewOpcode) {
// Search backwards to the instruction that defines VPR. This may or not		// First, search backwards to the instruction that defines VPR
// be a VCMP, we check that after this loop. If we find another instruction		auto *Def = RDA->getReachingMIDef(MI, ARM::VPR);
// that reads cpsr, we return nullptr.		if (!Def)
MachineBasicBlock::iterator CmpMI = MI;
while (CmpMI != MI->getParent()->begin()) {
--CmpMI;
if (CmpMI->modifiesRegister(ARM::VPR, TRI))
dmgreenUnsubmitted Not Done Reply Inline Actions What happened to this modifies register check? This was the original error I was looking at. dmgreen: What happened to this modifies register check? This was the original error I was looking at.
SjoerdMeijerAuthorUnsubmitted Done Reply Inline Actions I think thhat is covered by `RDA->getReachingMIDef()`, which gets the first def of VPR, and looks to me to be equivalent to the `modifiesRegister()` here if I'm not wrong. But will take a look at the test case and how that behaves. SjoerdMeijer: I think thhat is covered by `RDA->getReachingMIDef()`, which gets the first def of VPR, and…
dmgreenUnsubmitted Not Done Reply Inline Actions Ah, yes. Sorry. The "readsRegister" uses check is what this should have pointed at. dmgreen: Ah, yes. Sorry. The "readsRegister" uses check is what this should have pointed at.
break;
if (CmpMI->readsRegister(ARM::VPR, TRI))
break;
}

if (CmpMI == MI)
return nullptr;
NewOpcode = VCMPOpcodeToVPT(CmpMI->getOpcode());
if (NewOpcode == 0)
return nullptr;		return nullptr;

// Search forward from CmpMI to MI, checking if either register was def'd		// Now check that Def is a VCMP
if (registerDefinedBetween(CmpMI->getOperand(1).getReg(), std::next(CmpMI),		if (!(NewOpcode = VCMPOpcodeToVPT(Def->getOpcode())))
MI, TRI))
return nullptr;		return nullptr;
if (registerDefinedBetween(CmpMI->getOperand(2).getReg(), std::next(CmpMI),
MI, TRI))		// Check that Def's operands are not defined between the VCMP and MI, i.e.
		// check that they have the same reaching def.
		if (!RDA->hasSameReachingDef(Def, MI, Def->getOperand(1).getReg()) \|\|
		!RDA->hasSameReachingDef(Def, MI, Def->getOperand(2).getReg()))
return nullptr;		return nullptr;
return &*CmpMI;
		return Def;
}		}

bool MVEVPTBlock::InsertVPTBlocks(MachineBasicBlock &Block) {		bool MVEVPTBlock::InsertVPTBlocks(MachineBasicBlock &Block) {
bool Modified = false;		bool Modified = false;
MachineBasicBlock::instr_iterator MBIter = Block.instr_begin();		MachineBasicBlock::instr_iterator MBIter = Block.instr_begin();
MachineBasicBlock::instr_iterator EndIter = Block.instr_end();		MachineBasicBlock::instr_iterator EndIter = Block.instr_end();
		SmallVector<MachineInstr *, 4> RemovedVCMPs;

while (MBIter != EndIter) {		while (MBIter != EndIter) {
MachineInstr MI = &MBIter;		MachineInstr MI = &MBIter;
unsigned PredReg = 0;		unsigned PredReg = 0;
DebugLoc dl = MI->getDebugLoc();		DebugLoc dl = MI->getDebugLoc();

ARMVCC::VPTCodes Pred = getVPTInstrPredicate(*MI, PredReg);		ARMVCC::VPTCodes Pred = getVPTInstrPredicate(*MI, PredReg);

Show All 29 Lines	while (MBIter != EndIter) {
};		};

unsigned BlockMask = getARMVPTBlockMask(VPTInstCnt);		unsigned BlockMask = getARMVPTBlockMask(VPTInstCnt);

// Search back for a VCMP that can be folded to create a VPT, or else create		// Search back for a VCMP that can be folded to create a VPT, or else create
// a VPST directly		// a VPST directly
MachineInstrBuilder MIBuilder;		MachineInstrBuilder MIBuilder;
unsigned NewOpcode;		unsigned NewOpcode;
MachineInstr *VCMP = findVCMPToFoldIntoVPST(MI, TRI, NewOpcode);		MachineInstr *VCMP = findVCMPToFoldIntoVPST(MI, RDA, NewOpcode);
if (VCMP) {		if (VCMP) {
LLVM_DEBUG(dbgs() << " folding VCMP into VPST: "; VCMP->dump());		LLVM_DEBUG(dbgs() << " folding VCMP into VPST: "; VCMP->dump());
MIBuilder = BuildMI(Block, MI, dl, TII->get(NewOpcode));		MIBuilder = BuildMI(Block, MI, dl, TII->get(NewOpcode));
MIBuilder.addImm(BlockMask);		MIBuilder.addImm(BlockMask);
MIBuilder.add(VCMP->getOperand(1));		MIBuilder.add(VCMP->getOperand(1));
MIBuilder.add(VCMP->getOperand(2));		MIBuilder.add(VCMP->getOperand(2));
MIBuilder.add(VCMP->getOperand(3));		MIBuilder.add(VCMP->getOperand(3));
VCMP->eraseFromParent();		// We delay removing the actual VCMP instruction by saving it to a list
		// and deleting all instructions in this list in one go after we have
		// created the VPT blocks. We do this in order not to invalidate the
		// ReachingDefAnalysis that is queried by 'findVCMPToFoldIntoVPST'.
		RemovedVCMPs.push_back(VCMP);
		SjoerdMeijerAuthorUnsubmitted Not Done Reply Inline Actions Actually, I was planning on making one more change: save all the instructions that we want to remove to a worklist, so that they can all be removed in one go at the end. This should avoid that RDA possibly has an inconsistent view on the block... SjoerdMeijer: Actually, I was planning on making one more change: save all the instructions that we want to…
} else {		} else {
MIBuilder = BuildMI(Block, MI, dl, TII->get(ARM::MVE_VPST));		MIBuilder = BuildMI(Block, MI, dl, TII->get(ARM::MVE_VPST));
MIBuilder.addImm(BlockMask);		MIBuilder.addImm(BlockMask);
}		}

finalizeBundle(		finalizeBundle(
Block, MachineBasicBlock::instr_iterator(MIBuilder.getInstr()), MBIter);		Block, MachineBasicBlock::instr_iterator(MIBuilder.getInstr()), MBIter);

Modified = true;		Modified = true;
}		}

		for (auto *I : RemovedVCMPs)
		I->eraseFromParent();

return Modified;		return Modified;
}		}

bool MVEVPTBlock::runOnMachineFunction(MachineFunction &Fn) {		bool MVEVPTBlock::runOnMachineFunction(MachineFunction &Fn) {
		if (skipFunction(Fn.getFunction()))
		dmgreenUnsubmitted Not Done Reply Inline Actions I don't think it's valid to skip this at -O0. It is needed for valid assembly. You could argue that some of the optimisations this pass is doing shouldn't be done at O0, but at least VPST's need to be added. dmgreen: I don't think it's valid to skip this at -O0. It is needed for valid assembly. You could argue…
		return false;

const ARMSubtarget &STI =		const ARMSubtarget &STI =
static_cast<const ARMSubtarget &>(Fn.getSubtarget());		static_cast<const ARMSubtarget &>(Fn.getSubtarget());

if (!STI.isThumb2() \|\| !STI.hasMVEIntegerOps())		if (!STI.isThumb2() \|\| !STI.hasMVEIntegerOps())
return false;		return false;

TII = static_cast<const Thumb2InstrInfo *>(STI.getInstrInfo());		TII = static_cast<const Thumb2InstrInfo *>(STI.getInstrInfo());
TRI = STI.getRegisterInfo();		RDA = &getAnalysis<ReachingDefAnalysis>();

LLVM_DEBUG(dbgs() << "******** ARM MVE VPT BLOCKS ********\n"		LLVM_DEBUG(dbgs() << "******** ARM MVE VPT BLOCKS ********\n"
<< "********** Function: " << Fn.getName() << '\n');		<< "********** Function: " << Fn.getName() << '\n');

bool Modified = false;		bool Modified = false;
for (MachineBasicBlock &MBB : Fn)		for (MachineBasicBlock &MBB : Fn)
Modified \|= InsertVPTBlocks(MBB);		Modified \|= InsertVPTBlocks(MBB);

LLVM_DEBUG(dbgs() << "**************************************\n");		LLVM_DEBUG(dbgs() << "**************************************\n");
return Modified;		return Modified;
}		}

/// createMVEVPTBlock - Returns an instance of the MVE VPT block		/// createMVEVPTBlock - Returns an instance of the MVE VPT block
/// insertion pass.		/// insertion pass.
FunctionPass *llvm::createMVEVPTBlockPass() { return new MVEVPTBlock(); }		FunctionPass *llvm::createMVEVPTBlockPass() { return new MVEVPTBlock(); }

llvm/test/CodeGen/ARM/O3-pipeline.ll

	Show First 20 Lines • Show All 138 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: ARM Execution Domain Fix			; CHECK-NEXT: ARM Execution Domain Fix
	; CHECK-NEXT: BreakFalseDeps			; CHECK-NEXT: BreakFalseDeps
	; CHECK-NEXT: ARM pseudo instruction expansion pass			; CHECK-NEXT: ARM pseudo instruction expansion pass
	; CHECK-NEXT: Thumb2 instruction size reduce pass			; CHECK-NEXT: Thumb2 instruction size reduce pass
	; CHECK-NEXT: MachineDominator Tree Construction			; CHECK-NEXT: MachineDominator Tree Construction
	; CHECK-NEXT: Machine Natural Loop Construction			; CHECK-NEXT: Machine Natural Loop Construction
	; CHECK-NEXT: Machine Block Frequency Analysis			; CHECK-NEXT: Machine Block Frequency Analysis
	; CHECK-NEXT: If Converter			; CHECK-NEXT: If Converter
				; CHECK-NEXT: ReachingDefAnalysis
				dmgreenUnsubmitted Not Done Reply Inline Actions RDA looks like quite an expensive pass to me. It would scan through and store reaching defs info for every instruction in every block of a function? Do you think it's worth it for this case, where we are only doing some minor code cleanup? dmgreen: RDA looks like quite an expensive pass to me. It would scan through and store reaching defs…
				SjoerdMeijerAuthorUnsubmitted Done Reply Inline Actions There's clearly a trade-off here. The 3 calls to RDA are concise, easy to read, and reusing exiting code. This is a big benefit, but indeed comes at the cost of running RDA. I guess the only way to answer your question is by measuring compile times. SjoerdMeijer: There's clearly a trade-off here. The 3 calls to RDA are concise, easy to read, and reusing…
	; CHECK-NEXT: MVE VPT block insertion pass			; CHECK-NEXT: MVE VPT block insertion pass
	; CHECK-NEXT: Thumb IT blocks insertion pass			; CHECK-NEXT: Thumb IT blocks insertion pass
	; CHECK-NEXT: MachineDominator Tree Construction			; CHECK-NEXT: MachineDominator Tree Construction
	; CHECK-NEXT: Machine Natural Loop Construction			; CHECK-NEXT: Machine Natural Loop Construction
	; CHECK-NEXT: PostRA Machine Instruction Scheduler			; CHECK-NEXT: PostRA Machine Instruction Scheduler
	; CHECK-NEXT: Post RA top-down list latency scheduler			; CHECK-NEXT: Post RA top-down list latency scheduler
	; CHECK-NEXT: Analyze Machine Code For Garbage Collection			; CHECK-NEXT: Analyze Machine Code For Garbage Collection
	; CHECK-NEXT: Machine Block Frequency Analysis			; CHECK-NEXT: Machine Block Frequency Analysis
	Show All 21 Lines

llvm/test/CodeGen/Thumb2/mve-vpt-block-fold-vcmp.mir

This file was added.

				# RUN: llc -run-pass arm-mve-vpt %s -o - \| FileCheck %s

				--- \|
				target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
				target triple = "thumbv8.1m.main-arm-unknown-eabihf"
				define dso_local <4 x i32> @foo(<4 x i32>* %src, <4 x i32>* %src2, <4 x i32>* %src3, <4 x i32>* %dest, <4 x i32>* %dest2, <4 x i32>* %dest3, <4 x float> %a1) local_unnamed_addr #0 {
				entry:
				%c = fcmp one <4 x float> %a1, zeroinitializer
				%w = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %src, i32 4, <4 x i1> %c, <4 x i32> undef)
				tail call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> %w, <4 x i32>* %dest, i32 4, <4 x i1> %c)
				%w2 = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %src2, i32 4, <4 x i1> %c, <4 x i32> undef)
				tail call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> %w2, <4 x i32>* %dest2, i32 4, <4 x i1> %c)
				%w3 = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %src3, i32 4, <4 x i1> %c, <4 x i32> undef)
				tail call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> %w3, <4 x i32>* %dest3, i32 4, <4 x i1> %c)
				ret <4 x i32> %w3
				}
				declare <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>*, i32 immarg, <4 x i1>, <4 x i32>) #2
				declare void @llvm.masked.store.v4i32.p0v4i32(<4 x i32>, <4 x i32>*, i32 immarg, <4 x i1>) #3

				attributes #0 = { nounwind readnone "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="all" "less-precise-fpmad"="false" "min-legal-vector-width"="128" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="generic" "target-features"="+armv8.1-m.main,+fp-armv8d16sp,+fp16,+fpregs,+fullfp16,+hwdiv,+lob,+mve.fp,+ras,+strict-align,+thumb-mode,+vfp2sp,+vfp3d16sp,+vfp4d16sp" "unsafe-fp-math"="false" "use-soft-float"="false" }
				attributes #1 = { nounwind readnone }
				attributes #2 = { argmemonly nounwind readonly willreturn }
				attributes #3 = { argmemonly nounwind willreturn }
				attributes #4 = { noduplicate nounwind }
				attributes #5 = { nounwind }

				!llvm.module.flags = !{!0, !1}
				!llvm.ident = !{!2}

				!0 = !{i32 1, !"wchar_size", i32 4}
				!1 = !{i32 1, !"min_enum_size", i32 4}
				!2 = !{!"clang version 10.0.0 (http://github.com/llvm/llvm-project 90450197deaf91160a22825e6746d998aad05704)"}

				...
				---
				name: foo
				alignment: 2
				exposesReturnsTwice: false
				legalized: false
				regBankSelected: false
				selected: false
				failedISel: false
				tracksRegLiveness: true
				hasWinCFI: false
				registers: []
				liveins:
				- { reg: '$r0', virtual-reg: '' }
				- { reg: '$r1', virtual-reg: '' }
				- { reg: '$r2', virtual-reg: '' }
				- { reg: '$q0', virtual-reg: '' }
				frameInfo:
				isFrameAddressTaken: false
				isReturnAddressTaken: false
				hasStackMap: false
				hasPatchPoint: false
				stackSize: 8
				offsetAdjustment: 0
				maxAlignment: 4
				adjustsStack: false
				hasCalls: false
				stackProtector: ''
				maxCallFrameSize: 0
				cvBytesOfCalleeSavedRegisters: 0
				hasOpaqueSPAdjustment: false
				hasVAStart: false
				hasMustTailInVarArgFunc: false
				localFrameSize: 0
				savePoint: ''
				restorePoint: ''
				fixedStack:
				- { id: 0, type: default, offset: 12, size: 4, alignment: 4, stack-id: default,
				isImmutable: true, isAliased: false, callee-saved-register: '', callee-saved-restored: true,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				- { id: 1, type: default, offset: 8, size: 4, alignment: 8, stack-id: default,
				isImmutable: true, isAliased: false, callee-saved-register: '', callee-saved-restored: true,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				- { id: 2, type: default, offset: 4, size: 4, alignment: 4, stack-id: default,
				isImmutable: true, isAliased: false, callee-saved-register: '', callee-saved-restored: true,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				- { id: 3, type: default, offset: 0, size: 4, alignment: 8, stack-id: default,
				isImmutable: true, isAliased: false, callee-saved-register: '', callee-saved-restored: true,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				stack:
				- { id: 0, name: '', type: spill-slot, offset: -4, size: 4, alignment: 4,
				stack-id: default, callee-saved-register: '$lr', callee-saved-restored: false,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				- { id: 1, name: '', type: spill-slot, offset: -8, size: 4, alignment: 4,
				stack-id: default, callee-saved-register: '$r7', callee-saved-restored: true,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				callSites: []
				constants: []
				machineFunctionInfo: {}
				body: \|
				bb.0.entry:
				liveins: $q0, $r0, $r1, $r2, $lr

				; CHECK: BUNDLE implicit-def $vpr, implicit-def dead $q0, implicit-def $d0, implicit-def $s0, implicit-def $s1, implicit-def $d1, implicit-def $s2, implicit-def $s3, implicit $q0, implicit $zr, implicit killed $r0, implicit killed $r3, implicit killed $r1, implicit killed $lr {
				; CHECK: MVE_VPTv4f32r 1, renamable $q0, $zr, 10, implicit-def $vpr
				; CHECK: renamable $q0 = MVE_VLDRWU32 killed renamable $r0, 0, 1, internal renamable $vpr :: (load 16 from %ir.src, align 4)
				; CHECK: MVE_VSTRWU32 internal killed renamable $q0, killed renamable $r3, 0, 1, internal renamable $vpr :: (store 16 into %ir.dest, align 4)
				; CHECK: renamable $q0 = MVE_VLDRWU32 killed renamable $r1, 0, 1, internal renamable $vpr :: (load 16 from %ir.src2, align 4)
				; CHECK: MVE_VSTRWU32 internal killed renamable $q0, killed renamable $lr, 0, 1, internal renamable $vpr :: (store 16 into %ir.dest2, align 4)
				; CHECK: }
				; CHECK: BUNDLE implicit-def $q0, implicit-def $d0, implicit-def $s0, implicit-def $s1, implicit-def $d1, implicit-def $s2, implicit-def $s3, implicit killed $vpr, implicit killed $r2, implicit killed $r12 {
				; CHECK: MVE_VPST 4, implicit $vpr
				; CHECK: renamable $q0 = MVE_VLDRWU32 killed renamable $r2, 0, 1, renamable $vpr :: (load 16 from %ir.src3, align 4)
				; CHECK: MVE_VSTRWU32 internal renamable $q0, killed renamable $r12, 0, 1, killed renamable $vpr :: (store 16 into %ir.dest3, align 4)
				; CHECK: }

				$sp = frame-setup t2STMDB_UPD $sp, 14, $noreg, killed $r7, killed $lr
				frame-setup CFI_INSTRUCTION def_cfa_offset 8
				frame-setup CFI_INSTRUCTION offset $lr, -4
				frame-setup CFI_INSTRUCTION offset $r7, -8
				$r7 = frame-setup tMOVr killed $sp, 14, $noreg
				frame-setup CFI_INSTRUCTION def_cfa_register $r7
				renamable $r12 = t2LDRi12 $r7, 16, 14, $noreg :: (load 4 from %fixed-stack.1)
				renamable $lr = t2LDRi12 $r7, 12, 14, $noreg :: (load 4 from %fixed-stack.2)
				renamable $r3 = t2LDRi12 $r7, 8, 14, $noreg :: (load 4 from %fixed-stack.3)
				renamable $vpr = MVE_VCMPf32r renamable $q0, $zr, 10, 0, $noreg
				renamable $q0 = MVE_VLDRWU32 killed renamable $r0, 0, 1, renamable $vpr :: (load 16 from %ir.src, align 4)
				MVE_VSTRWU32 killed renamable $q0, killed renamable $r3, 0, 1, renamable $vpr :: (store 16 into %ir.dest, align 4)
				renamable $q0 = MVE_VLDRWU32 killed renamable $r1, 0, 1, renamable $vpr :: (load 16 from %ir.src2, align 4)
				MVE_VSTRWU32 killed renamable $q0, killed renamable $lr, 0, 1, renamable $vpr :: (store 16 into %ir.dest2, align 4)
				renamable $q0 = MVE_VLDRWU32 killed renamable $r2, 0, 1, renamable $vpr :: (load 16 from %ir.src3, align 4)
				MVE_VSTRWU32 renamable $q0, killed renamable $r12, 0, 1, killed renamable $vpr :: (store 16 into %ir.dest3, align 4)
				$sp = t2LDMIA_RET $sp, 14, $noreg, def $r7, def $pc, implicit $q0

				...

llvm/test/CodeGen/Thumb2/mve-vpt-block-optnone.mir

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				# RUN: llc -run-pass arm-mve-vpt %s -o - \| FileCheck %s

				--- \|
				target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
				target triple = "thumbv8.1m.main-arm-none-eabi"

				define hidden arm_aapcs_vfpcc <4 x float> @test_vminnmq_m_f32_v2(<4 x float> %inactive, <4 x float> %a, <4 x float> %b, i16 zeroext %p) local_unnamed_addr #0 {
				entry:
				%conv.i = zext i16 %p to i32
				%0 = tail call nnan ninf nsz <4 x float> @llvm.arm.mve.vminnm.m.v4f32.v4f32.v4f32.v4f32.i32(<4 x float> %inactive, <4 x float> %a, <4 x float> %b, i32 %conv.i) #2
				ret <4 x float> %0
				}

				declare <4 x float> @llvm.arm.mve.vminnm.m.v4f32.v4f32.v4f32.v4f32.i32(<4 x float>, <4 x float>, <4 x float>, i32) #1

				attributes #0 = { noinline optnone nounwind readnone "correctly-rounded-divide-sqrt-fp-math"="false" "denormal-fp-math"="preserve-sign" "disable-tail-calls"="false" "less-precise-fpmad"="false" "min-legal-vector-width"="128" "no-frame-pointer-elim"="false" "no-infs-fp-math"="true" "no-jump-tables"="false" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic" "target-features"="+armv8.1-m.main,+hwdiv,+mve.fp,+ras,+thumb-mode" "unsafe-fp-math"="false" "use-soft-float"="false" }
				attributes #1 = { nounwind readnone }
				attributes #2 = { nounwind }


				...
				---
				name: test_vminnmq_m_f32_v2
				alignment: 4
				exposesReturnsTwice: false
				legalized: false
				regBankSelected: false
				selected: false
				failedISel: false
				tracksRegLiveness: true
				hasWinCFI: false
				registers: []
				liveins:
				- { reg: '$q0', virtual-reg: '' }
				- { reg: '$q1', virtual-reg: '' }
				- { reg: '$q2', virtual-reg: '' }
				- { reg: '$r0', virtual-reg: '' }
				frameInfo:
				isFrameAddressTaken: false
				isReturnAddressTaken: false
				hasStackMap: false
				hasPatchPoint: false
				stackSize: 0
				offsetAdjustment: 0
				maxAlignment: 0
				adjustsStack: false
				hasCalls: false
				stackProtector: ''
				maxCallFrameSize: 0
				cvBytesOfCalleeSavedRegisters: 0
				hasOpaqueSPAdjustment: false
				hasVAStart: false
				hasMustTailInVarArgFunc: false
				localFrameSize: 0
				savePoint: ''
				restorePoint: ''
				fixedStack: []
				stack: []
				constants: []
				body: \|
				bb.0.entry:
				liveins: $q0, $q1, $q2, $r0

				; CHECK-LABEL: name: test_vminnmq_m_f32_v2
				; CHECK: liveins: $q0, $q1, $q2, $r0
				; CHECK: $vpr = VMSR_P0 killed $r0, 14, $noreg
				; CHECK: renamable $q0 = nnan ninf nsz MVE_VMINNMf32 killed renamable $q1, killed renamable $q2, 1, killed renamable $vpr, killed renamable $q0
				; CHECK: tBX_RET 14, $noreg, implicit $q0

				$vpr = VMSR_P0 killed $r0, 14, $noreg
				renamable $q0 = nnan ninf nsz MVE_VMINNMf32 killed renamable $q1, killed renamable $q2, 1, killed renamable $vpr, killed renamable $q0
				tBX_RET 14, $noreg, implicit $q0
				dmgreenUnsubmitted Not Done Reply Inline Actions If this instruction is predicated, then we need to generate a VPST, otherwise the final assembly will be invalid (the "Then" predicate will be silently dropped as this is not part of a valid VPT block). dmgreen: If this instruction is predicated, then we need to generate a VPST, otherwise the final…

				...