This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/ARM/
-
Target/
-
ARM/
14/26
ARMLowOverheadLoops.cpp
-
test/CodeGen/Thumb2/LowOverheadLoops/
-
CodeGen/
-
Thumb2/
-
LowOverheadLoops/
-
cond-vector-reduce-mve-codegen.ll
-
vctp-in-vpt-2.mir
-
vpt-blocks.mir

Differential D78206

[Target][ARM] Make Low Overhead Loops coexist with VPT blocks
ClosedPublic

Authored by Pierre-vh on Apr 15 2020, 6:43 AM.

Download Raw Diff

Details

Reviewers

dmgreen
samparker
SjoerdMeijer

Commits

rG835251f7d99a: [Target][ARM] Make Low Overhead Loops coexist with VPT blocks.

Summary

Previously, the LowOverheadLoops pass couldn't handle VPT blocks that used the vpt instruction, or loops containing multiple identical VCTPs.
This patch improves the LowOverheadLoops pass so it can handle those cases

I'm still unsure about the changes in this patch, so comments/suggestions are welcome.

This patch will also need a follow-up ARMTargetTransformInfo change to work because the TTI, in its current state, won't allow the vectorizer to do tail-predication for loops bigger than 1 basic block, and loops containing compare instructions, and, as VPT blocks are generated from comparisons (which create the predicate), they never make it to this pass in the current state of things.

However, with the right changes to the TTI and the right compiler options, you can generate this kind of code with these changes:

// C++
void test(int* A, int n, int x)  {             
    for(int i = 0; i < n; i++)  
      if (A[i] < x && A[i] > -x)
        A[i] = 0;               
}
// assembly
	dlstp.32	lr, r1
.LBB0_1:                                @ %vector.body
                                        @ =>This Inner Loop Header: Depth=1
	vldrw.u32	q1, [r0]
	vptt.s32	lt, q1, r2
	vcmpt.s32	gt, q1, r3
	vstrwt.32	q0, [r0], #16
	letp	lr, .LBB0_1

Diff Detail

Event Timeline

Pierre-vh created this revision.Apr 15 2020, 6:43 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 15 2020, 6:43 AM

Herald added subscribers: llvm-commits, danielkiss, hiraditya, kristof.beyls. · View Herald Transcript

Pierre-vh added a parent revision: D78201: [Target][ARM] Replace outdated getARMVPTBlockMask function.Apr 15 2020, 6:45 AM

Harbormaster failed remote builds in B53354: Diff 257701!Apr 15 2020, 7:05 AM

removing a one-line-change that didn't belong to this patch (a TTI change)

samparker added inline comments.Apr 17 2020, 6:30 AM

llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
172	Something's a miss here! This should have been caught by a test.
845	isIdentical is not going to do what you're hoping for here. Use RDA is figure out if both VCTPs are operating on the same value (not just the same register). Also needs a test. I'm wondering whether we should also assert that the VCTP, if predicated, is 'Then' predicated. An example you had (last week?) confused me because I'm not sure why an Else predicate VCTP should appear here and how it maps the current idea that all predicates are ANDed.
853	Following on from my above comment... the logic here was assuming that each value in VPR.P0 is rooted at the VCTP and then subsequent instructions (only VCMP) can modify the VPR and those instructions have to predicated on the VCTP. This should result in a predicate along the lines of VCTP && VCMP. Now we're trying to allow VPTs, which create a predicate and cannot be predicated by an existing VPT block - so how do we know that the resulting predicate is still rooted at VCTP and not 'disjoint'? Everything in this pass assumes that any instruction using VPR is predicated on the VCTP and this is the place where we need to make that guarantee. So a couple of things that I can think of now are: that the isVectorPredicated function actually means 'isPredicatedOnVCTP' and that will no longer be true. the notion of 'disjoint' has become more complicated, though a VPT doesn't use the VPR value defined by VCTP, it could be using arguments that are dependent upon the VCTP - and I have the feeling this is the only case that we can support. These nuances will also require more testing.
857	Format this differently so this statement is attached to the rest of the conditional blocks.
1235–1236	Document the new case(s).

Updated the patch following review.

llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
172	This function was unused, which is why the mistake was not caught in a test - I fixed it now.
845	Also needs a test. I'm wondering whether we should also assert that the VCTP, if predicated, is 'Then' predicated. An example you had (last week?) confused me because I'm not sure why an Else predicate VCTP should appear here and how it maps the current idea that all predicates are ANDed. I found an example where a else-predicated VCTP is generated and I added it to the tests, it comes from this C++ code: void test(int * data, int N, int T) { for(int i = 0; i < N; i++) { int d = data[i]; if (d < T \|\| d > -T) data[i] = 0; } }
853	the notion of 'disjoint' has become more complicated, though a VPT doesn't use the VPR value defined by VCTP, it could be using arguments that are dependent upon the VCTP - and I have the feeling this is the only case that we can support. I added a check in `ValidateMVEInst` - It'll now refuse VPTs if none of their operands is defined by a predicated instruction, and I added a test for this. Is this enough?

Sorry, I forgot about this. Thanks for that example, I now see how the VCTP can be Else predicated, but I obviously don't quite understand how VPT predication works! The code below is what is generated from the example and now I don't understand why the VSTR is Else predicated. Is it because it needs the same inverted predicate as the VCTP, coming from the VCMP, as well as being ANDed with the VCTP? My intuition wanted it to be Then predicated after the VCTP has already done an inversion.

vctp.32 r1
vpst
vldrwt.u32      q1, [r0]
vptee.s32       ge, q1, r2
vcmpt.s32       le, q1, r3
vctpe.32        r1
vstrwe.32       q0, [r0], #16
subs    r1, #4
le      lr, .LBB0_1

In D78206#2014681, @samparker wrote:
Sorry, I forgot about this. Thanks for that example, I now see how the VCTP can be Else predicated, but I obviously don't quite understand how VPT predication works! The code below is what is generated from the example and now I don't understand why the VSTR is Else predicated. Is it because it needs the same inverted predicate as the VCTP, coming from the VCMP, as well as being ANDed with the VCTP? My intuition wanted it to be Then predicated after the VCTP has already done an inversion.
vctp.32 r1
vpst
vldrwt.u32      q1, [r0]
vptee.s32       ge, q1, r2
vcmpt.s32       le, q1, r3
vctpe.32        r1
vstrwe.32       q0, [r0], #16
subs    r1, #4
le      lr, .LBB0_1

The VPT Blocks pass changes the predicate to E when there is a VPNOT, so if it generated this code, there was a VPNOT between the VCTP/VCMP. This is why the last 2 instructions are else predicated.
The VSTR is else predicated as well because we want it to be executed only when the VCTP is. If it were then predicated, it wouldn't be the case - it would be executed even when both conditions evaluate to true (and the VCTP would be skipped).
(This is kind of related to D77798, which was a miscompilation issue that occured because we initially thought that we had to change the predicate back to a "then" after VPR is written to.)

Ah, thanks. Well, I guess I'm glad I'm not the only one to be confused by this! I'm carry on reviewing the patch now.

samparker added inline comments.May 4 2020, 12:44 AM

llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
230	I'm still pondering about this... I think this should be checking that both operands are either predicated q-regs, scalar or are loop invariant, that way should ensure that the result will be the same after the transform. This method also restricts us to instructions that are directly predicated, so it would be good to add a TODO here as well.
823	You don't have to call id() on the register.
1239	Does the block have to contain a vctp? Do we check that this is true?
1299	I might be viewing the nesting incorrectly here, but isn't this handled on line 1250?
1315	Surely we should have discovered all of these during the loop above?

Pierre-vh marked 2 inline comments as done.May 4 2020, 1:39 AM

Pierre-vh added inline comments.

llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
230	I'm still pondering about this... I think this should be checking that both operands are either predicated q-regs, scalar or are loop invariant, that way should ensure that the result will be the same after the transform. If I understand correctly, this function currently only handles the first case (predicated q-regs), but it should be expanded to allow any r-reg and loop invariants (registers that aren't defined inside any of the loop's basic blocks) ? This method also restricts us to instructions that are directly predicated, so it would be good to add a TODO here as well. What do you mean by "directly predicated"? Can you please give me an example of an indirectly predicated instruction in this context?
1239	It doesn't have to contain a VCTP. I'll change this comment to make it clear.
1299	The if at line 1250 is inside the if at line 1244. That else-if is from the if at line 1244. if (1244) if (1250) else if (1299)
1315	The loop above catches the VCTP inside VPT blocks, but not the other ones. So for instance, if you have this (from `/test/CodeGen/Thumb2/LowOverheadLoops/vctp-in-vpt-2.mir`) MVE_VPST 4, implicit $vpr renamable $vpr = MVE_VCTP32 renamable $r2, 1, killed renamable $vpr renamable $r1, renamable $q0 = MVE_VLDRWU32_post killed renamable $r1, 16, 1, killed renamable $vpr :: (load 16 from %ir.lsr.iv24, align 4) renamable $vpr = MVE_VCTP32 renamable $r2, 0, $noreg renamable $r2, dead $cpsr = tSUBi8 killed renamable $r2, 4, 14, $noreg The first VCTP will be removed by the loop above, but the "secondary" one below won't - it'll be removed by this loop.

Pierre-vh updated this revision to Diff 261760.May 4 2020, 2:12 AM

Pierre-vh marked 3 inline comments as done.

samparker added inline comments.May 4 2020, 2:50 AM

llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
230	By directly predicated, I mean in a VPT block. This pass also tries to understand which instructions are indirectly predicated too, by looking at their use-def chain in LowOverheadLoop::ValidateLiveOuts. Just checking q-regs is fine at the moment, but is it really acceptable is only one operand is predicated?

Pierre-vh added inline comments.May 4 2020, 3:16 AM

llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
230	Right, I see, I'll change this so it requires that both operands are either a predicated q-reg, a r-reg, or a loop invariant. Can I accept every r-reg, or only the loop invariants ones? Can I accept loop-invariant non-predicated q-regs as well?

samparker added inline comments.May 4 2020, 4:07 AM

llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
230	I would say predicated or invariant for both register classes. I think should should guarantee what we need.

Change IsAcceptableVPT and adding GetReachingDefs

samparker added inline comments.May 6 2020, 7:36 AM

llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
263	I feel like this will only be safe if we're still guaranteeing that any predicated instruction is predicated upon the VCTP. I'm concerned about some code that may look llike this: loop.ph: a = ... b = ... c = ... n = ... z = ... DLS loop: VCTP VPTT z, n VSTRT VPTTTT a, b VLDRT d VLDRT e VADDT z, e, d LE loop Is it possible now to have a VCTP in the loop with nothing actually predicated upon it?
842	I doubt CurrentPredicate is correct here as a VPT would be generating its own, maybe it should be cleared..?

Pierre-vh marked 2 inline comments as done.May 7 2020, 5:32 AM

Pierre-vh added inline comments.

llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
263	I did this with the assumption that VPT blocks that aren't related to the VCTP would be rejected. Is it possible now to have a VCTP in the loop with nothing actually predicated upon it? I think it's possible if something clears VPR.P0, or overwrites it, right after the VCTP, like so: VCTP VMSR VPR.P0 R0 ... (Of course I don't know if that ever happens) Other instructions, such VCMP/VPT, "and" their result with VPR.P0 so those should be fine I believe. What do you think I should do here ? Should I check that the Def is inside a VPT block that we know? (e.g. by saving all predicated Defs in a set and checking whether Def is in the set).
842	According to the Armv8-M Architecture Reference Manual, VPT is similar to VCMP, it `and`'s the result of the comparison with the current element mask (see page 1174) , it doesn't overwrite VPR.P0 completely, so I don't think this needs to be cleared.

samparker added inline comments.May 18 2020, 5:33 AM

llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
263	I did this with the assumption that VPT blocks that aren't related to the VCTP would be rejected. That's the idea, but I suspect that this code now no longer does. I don't mind how you go about it, but yes, somehow we need to add more information into the predicate tracking and we can't accept anything that we cannot reason about.

Added note explaining how instructions that write to VPR.P0 work
Removed the logic to check VPTs, they're acceptable whenever a VCMP is
Adding VPT before VCTP test case
- It was causing a crash, which I fixed by adding a null-pointer check at line 404
Added new debug message at line

Is there a test for the case for something like this?

VPTTT
VLDRT
VCTPT
VSTRT

I just want to guarantee that we reject this.

llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
402–403	Looking at this again, I guess the logic should be if (Block.HasNonUniformPredicate() && !isVCTP(...)) because then we won't try to dereference a nullptr.

Allow VCMPs before the VCTP
Add test case as requested
Revert the null-pointer check I added in the previous patch & refactor the condition as suggested

Good stuff, thanks for all your work on this. LGTM.

This revision is now accepted and ready to land.May 20 2020, 3:29 AM

Closed by commit rG835251f7d99a: [Target][ARM] Make Low Overhead Loops coexist with VPT blocks. (authored by Pierre-vh). · Explain WhyMay 20 2020, 4:51 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

Target/

ARM/

ARMLowOverheadLoops.cpp

186 lines

test/

CodeGen/

Thumb2/

LowOverheadLoops/

cond-vector-reduce-mve-codegen.ll

17 lines

vctp-in-vpt-2.mir

18 lines

vpt-blocks.mir

714 lines

Diff 258689

llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp

Show First 20 Lines • Show All 117 Lines • ▼ Show 20 Lines	public:
}		}
};		};

struct PredicatedMI {		struct PredicatedMI {
MachineInstr *MI = nullptr;		MachineInstr *MI = nullptr;
SetVector<MachineInstr*> Predicates;		SetVector<MachineInstr*> Predicates;

public:		public:
PredicatedMI(MachineInstr I, SetVector<MachineInstr> &Preds) :		PredicatedMI(MachineInstr I, SetVector<MachineInstr > &Preds) : MI(I) {
MI(I) { Predicates.insert(Preds.begin(), Preds.end()); }		assert(I && "Instruction must not be null!");
		Predicates.insert(Preds.begin(), Preds.end());
		}
};		};

// Represent a VPT block, a list of instructions that begins with a VPST and		// Represent a VPT block, a list of instructions that begins with a VPT/VPST
// has a maximum of four proceeding instructions. All instructions within the		// and has a maximum of four proceeding instructions. All instructions within
// block are predicated upon the vpr and we allow instructions to define the		// the block are predicated upon the vpr and we allow instructions to define
// vpr within in the block too.		// the vpr within in the block too.
class VPTBlock {		class VPTBlock {
std::unique_ptr<PredicatedMI> VPST;		// The predicate then instruction, which is either a VPT, or a VPST
		// instruction.
		std::unique_ptr<PredicatedMI> PredicateThen;
PredicatedMI *Divergent = nullptr;		PredicatedMI *Divergent = nullptr;
SmallVector<PredicatedMI, 4> Insts;		SmallVector<PredicatedMI, 4> Insts;

public:		public:
VPTBlock(MachineInstr MI, SetVector<MachineInstr> &Preds) {		VPTBlock(MachineInstr MI, SetVector<MachineInstr> &Preds) {
VPST = std::make_unique<PredicatedMI>(MI, Preds);		PredicateThen = std::make_unique<PredicatedMI>(MI, Preds);
}		}

void addInst(MachineInstr MI, SetVector<MachineInstr> &Preds) {		void addInst(MachineInstr MI, SetVector<MachineInstr> &Preds) {
LLVM_DEBUG(dbgs() << "ARM Loops: Adding predicated MI: " << *MI);		LLVM_DEBUG(dbgs() << "ARM Loops: Adding predicated MI: " << *MI);
if (!Divergent && !set_difference(Preds, VPST->Predicates).empty()) {		if (!Divergent && !set_difference(Preds, PredicateThen->Predicates).empty()) {
Divergent = &Insts.back();		Divergent = &Insts.back();
LLVM_DEBUG(dbgs() << " - has divergent predicate: " << *Divergent->MI);		LLVM_DEBUG(dbgs() << " - has divergent predicate: " << *Divergent->MI);
}		}
Insts.emplace_back(MI, Preds);		Insts.emplace_back(MI, Preds);
assert(Insts.size() <= 4 && "Too many instructions in VPT block!");		assert(Insts.size() <= 4 && "Too many instructions in VPT block!");
}		}

// Have we found an instruction within the block which defines the vpr? If		// Have we found an instruction within the block which defines the vpr? If
// so, not all the instructions in the block will have the same predicate.		// so, not all the instructions in the block will have the same predicate.
bool HasNonUniformPredicate() const {		bool HasNonUniformPredicate() const {
return Divergent != nullptr;		return Divergent != nullptr;
}		}

// Is the given instruction part of the predicate set controlling the entry		// Is the given instruction part of the predicate set controlling the entry
// to the block.		// to the block.
bool IsPredicatedOn(MachineInstr *MI) const {		bool IsPredicatedOn(MachineInstr *MI) const {
return VPST->Predicates.count(MI);		return PredicateThen->Predicates.count(MI);
		}

		// Returns true if this is a VPT instruction.
		bool isVPT() const { return !isVPST(); }

		samparkerUnsubmitted Done Reply Inline Actions Something's a miss here! This should have been caught by a test. samparker: Something's a miss here! This should have been caught by a test.
		Pierre-vhAuthorUnsubmitted Done Reply Inline Actions This function was unused, which is why the mistake was not caught in a test - I fixed it now. Pierre-vh: This function was unused, which is why the mistake was not caught in a test - I fixed it now.
		// Returns true if this is a VPST instruction.
		bool isVPST() const {
		return PredicateThen->MI->getOpcode() == ARM::MVE_VPST;
}		}

// Is the given instruction the only predicate which controls the entry to		// Is the given instruction the only predicate which controls the entry to
// the block.		// the block.
bool IsOnlyPredicatedOn(MachineInstr *MI) const {		bool IsOnlyPredicatedOn(MachineInstr *MI) const {
return IsPredicatedOn(MI) && VPST->Predicates.size() == 1;		return IsPredicatedOn(MI) && PredicateThen->Predicates.size() == 1;
}		}

unsigned size() const { return Insts.size(); }		unsigned size() const { return Insts.size(); }
SmallVectorImpl<PredicatedMI> &getInsts() { return Insts; }		SmallVectorImpl<PredicatedMI> &getInsts() { return Insts; }
MachineInstr *getVPST() const { return VPST->MI; }		MachineInstr *getPredicateThen() const { return PredicateThen->MI; }
PredicatedMI *getDivergent() const { return Divergent; }		PredicatedMI *getDivergent() const { return Divergent; }
};		};

struct LowOverheadLoop {		struct LowOverheadLoop {

MachineLoop &ML;		MachineLoop &ML;
MachineLoopInfo &MLI;		MachineLoopInfo &MLI;
ReachingDefAnalysis &RDA;		ReachingDefAnalysis &RDA;
const TargetRegisterInfo &TRI;		const TargetRegisterInfo &TRI;
MachineFunction *MF = nullptr;		MachineFunction *MF = nullptr;
MachineInstr *InsertPt = nullptr;		MachineInstr *InsertPt = nullptr;
MachineInstr *Start = nullptr;		MachineInstr *Start = nullptr;
MachineInstr *Dec = nullptr;		MachineInstr *Dec = nullptr;
MachineInstr *End = nullptr;		MachineInstr *End = nullptr;
MachineInstr *VCTP = nullptr;		MachineInstr *VCTP = nullptr;
		SmallPtrSet<MachineInstr*, 4> SecondaryVCTPs;
VPTBlock *CurrentBlock = nullptr;		VPTBlock *CurrentBlock = nullptr;
SetVector<MachineInstr*> CurrentPredicate;		SetVector<MachineInstr*> CurrentPredicate;
SmallVector<VPTBlock, 4> VPTBlocks;		SmallVector<VPTBlock, 4> VPTBlocks;
SmallPtrSet<MachineInstr*, 4> ToRemove;		SmallPtrSet<MachineInstr*, 4> ToRemove;
SmallPtrSet<MachineInstr*, 4> BlockMasksToRecompute;		SmallPtrSet<MachineInstr*, 4> BlockMasksToRecompute;
bool Revert = false;		bool Revert = false;
bool CannotTailPredicate = false;		bool CannotTailPredicate = false;

LowOverheadLoop(MachineLoop &ML, MachineLoopInfo &MLI,		LowOverheadLoop(MachineLoop &ML, MachineLoopInfo &MLI,
ReachingDefAnalysis &RDA, const TargetRegisterInfo &TRI)		ReachingDefAnalysis &RDA, const TargetRegisterInfo &TRI)
: ML(ML), MLI(MLI), RDA(RDA), TRI(TRI) {		: ML(ML), MLI(MLI), RDA(RDA), TRI(TRI) {
MF = ML.getHeader()->getParent();		MF = ML.getHeader()->getParent();
}		}

// If this is an MVE instruction, check that we know how to use tail		// If this is an MVE instruction, check that we know how to use tail
// predication with it. Record VPT blocks and return whether the		// predication with it. Record VPT blocks and return whether the
// instruction is valid for tail predication.		// instruction is valid for tail predication.
bool ValidateMVEInst(MachineInstr *MI);		bool ValidateMVEInst(MachineInstr *MI);

		// Returns true if at least one of the operands of MI (a VPT instr) is
		// defined by a predicated instruction of a previous VPTBlock.
		bool IsAcceptableVPT(MachineInstr *MI) {
		assert(isVPTOpcode(MI->getOpcode()) &&
		(MI->getOpcode() != ARM::MVE_VPST) && "Not a VPT!");

		MachineInstr *Op1Def = RDA.getMIOperand(MI, MI->getOperand(1));
		MachineInstr *Op2Def = RDA.getMIOperand(MI, MI->getOperand(2));
		if(Op1Def == nullptr && Op2Def == nullptr)
		samparkerUnsubmitted Not Done Reply Inline Actions I'm still pondering about this... I think this should be checking that both operands are either predicated q-regs, scalar or are loop invariant, that way should ensure that the result will be the same after the transform. This method also restricts us to instructions that are directly predicated, so it would be good to add a TODO here as well. samparker: I'm still pondering about this... I think this should be checking that both operands are either…
		Pierre-vhAuthorUnsubmitted Not Done Reply Inline Actions I'm still pondering about this... I think this should be checking that both operands are either predicated q-regs, scalar or are loop invariant, that way should ensure that the result will be the same after the transform. If I understand correctly, this function currently only handles the first case (predicated q-regs), but it should be expanded to allow any r-reg and loop invariants (registers that aren't defined inside any of the loop's basic blocks) ? This method also restricts us to instructions that are directly predicated, so it would be good to add a TODO here as well. What do you mean by "directly predicated"? Can you please give me an example of an indirectly predicated instruction in this context? Pierre-vh: > I'm still pondering about this... I think this should be checking that both operands are…
		samparkerUnsubmitted Not Done Reply Inline Actions By directly predicated, I mean in a VPT block. This pass also tries to understand which instructions are indirectly predicated too, by looking at their use-def chain in LowOverheadLoop::ValidateLiveOuts. Just checking q-regs is fine at the moment, but is it really acceptable is only one operand is predicated? samparker: By directly predicated, I mean in a VPT block. This pass also tries to understand which…
		Pierre-vhAuthorUnsubmitted Not Done Reply Inline Actions Right, I see, I'll change this so it requires that both operands are either a predicated q-reg, a r-reg, or a loop invariant. Can I accept every r-reg, or only the loop invariants ones? Can I accept loop-invariant non-predicated q-regs as well? Pierre-vh: Right, I see, I'll change this so it requires that both operands are either a predicated q-reg…
		samparkerUnsubmitted Not Done Reply Inline Actions I would say predicated or invariant for both register classes. I think should should guarantee what we need. samparker: I would say predicated or invariant for both register classes. I think should should guarantee…
		return false;

		for (VPTBlock &Block : VPTBlocks)
		for (PredicatedMI &PMI : Block.getInsts())
		if (Op1Def == PMI.MI \|\| Op2Def == PMI.MI)
		return true;
		return false;
		}

void AnalyseMVEInst(MachineInstr *MI) {		void AnalyseMVEInst(MachineInstr *MI) {
CannotTailPredicate = !ValidateMVEInst(MI);		CannotTailPredicate = !ValidateMVEInst(MI);
}		}

bool IsTailPredicationLegal() const {		bool IsTailPredicationLegal() const {
// For now, let's keep things really simple and only support a single		// For now, let's keep things really simple and only support a single
// block for tail predication.		// block for tail predication.
return !Revert && FoundAllComponents() && VCTP &&		return !Revert && FoundAllComponents() && VCTP &&
!CannotTailPredicate && ML.getNumBlocks() == 1;		!CannotTailPredicate && ML.getNumBlocks() == 1;
}		}

// Check that the predication in the loop will be equivalent once we		// Check that the predication in the loop will be equivalent once we
// perform the conversion. Also ensure that we can provide the number		// perform the conversion. Also ensure that we can provide the number
// of elements to the loop start instruction.		// of elements to the loop start instruction.
bool ValidateTailPredicate(MachineInstr *StartInsertPt);		bool ValidateTailPredicate(MachineInstr *StartInsertPt);

// Check that any values available outside of the loop will be the same		// Check that any values available outside of the loop will be the same
// after tail predication conversion.		// after tail predication conversion.
bool ValidateLiveOuts() const;		bool ValidateLiveOuts() const;

// Is it safe to define LR with DLS/WLS?		// Is it safe to define LR with DLS/WLS?
// LR can be defined if it is the operand to start, because it's the same		// LR can be defined if it is the operand to start, because it's the same
// value, or if it's going to be equivalent to the operand to Start.		// value, or if it's going to be equivalent to the operand to Start.
MachineInstr *isSafeToDefineLR();		MachineInstr *isSafeToDefineLR();
		samparkerUnsubmitted Not Done Reply Inline Actions I feel like this will only be safe if we're still guaranteeing that any predicated instruction is predicated upon the VCTP. I'm concerned about some code that may look llike this: loop.ph: a = ... b = ... c = ... n = ... z = ... DLS loop: VCTP VPTT z, n VSTRT VPTTTT a, b VLDRT d VLDRT e VADDT z, e, d LE loop Is it possible now to have a VCTP in the loop with nothing actually predicated upon it? samparker: I feel like this will only be safe if we're still guaranteeing that any predicated instruction…
		Pierre-vhAuthorUnsubmitted Done Reply Inline Actions I did this with the assumption that VPT blocks that aren't related to the VCTP would be rejected. Is it possible now to have a VCTP in the loop with nothing actually predicated upon it? I think it's possible if something clears VPR.P0, or overwrites it, right after the VCTP, like so: VCTP VMSR VPR.P0 R0 ... (Of course I don't know if that ever happens) Other instructions, such VCMP/VPT, "and" their result with VPR.P0 so those should be fine I believe. What do you think I should do here ? Should I check that the Def is inside a VPT block that we know? (e.g. by saving all predicated Defs in a set and checking whether Def is in the set). Pierre-vh: I did this with the assumption that VPT blocks that aren't related to the VCTP would be…
		samparkerUnsubmitted Not Done Reply Inline Actions I did this with the assumption that VPT blocks that aren't related to the VCTP would be rejected. That's the idea, but I suspect that this code now no longer does. I don't mind how you go about it, but yes, somehow we need to add more information into the predicate tracking and we can't accept anything that we cannot reason about. samparker: > I did this with the assumption that VPT blocks that aren't related to the VCTP would be…

// Check the branch targets are within range and we satisfy our		// Check the branch targets are within range and we satisfy our
// restrictions.		// restrictions.
void CheckLegality(ARMBasicBlockUtils *BBUtils);		void CheckLegality(ARMBasicBlockUtils *BBUtils);

bool FoundAllComponents() const {		bool FoundAllComponents() const {
return Start && Dec && End;		return Start && Dec && End;
}		}
▲ Show 20 Lines • Show All 122 Lines • ▼ Show 20 Lines
bool LowOverheadLoop::ValidateTailPredicate(MachineInstr *StartInsertPt) {		bool LowOverheadLoop::ValidateTailPredicate(MachineInstr *StartInsertPt) {
assert(VCTP && "VCTP instruction expected but is not set");		assert(VCTP && "VCTP instruction expected but is not set");
// All predication within the loop should be based on vctp. If the block		// All predication within the loop should be based on vctp. If the block
// isn't predicated on entry, check whether the vctp is within the block		// isn't predicated on entry, check whether the vctp is within the block
// and that all other instructions are then predicated on it.		// and that all other instructions are then predicated on it.
for (auto &Block : VPTBlocks) {		for (auto &Block : VPTBlocks) {
if (Block.IsPredicatedOn(VCTP))		if (Block.IsPredicatedOn(VCTP))
continue;		continue;
if (!Block.HasNonUniformPredicate() \|\| !isVCTP(Block.getDivergent()->MI)) {		if (!Block.HasNonUniformPredicate() \|\| !isVCTP(Block.getDivergent()->MI)) {
LLVM_DEBUG(dbgs() << "ARM Loops: Found unsupported diverging predicate: "		LLVM_DEBUG(dbgs() << "ARM Loops: Found unsupported diverging predicate: "
		samparkerUnsubmitted Done Reply Inline Actions Looking at this again, I guess the logic should be if (Block.HasNonUniformPredicate() && !isVCTP(...)) because then we won't try to dereference a nullptr. samparker: Looking at this again, I guess the logic should be if (Block.HasNonUniformPredicate() && !
<< *Block.getDivergent()->MI);		<< *Block.getDivergent()->MI);
return false;		return false;
}		}
SmallVectorImpl<PredicatedMI> &Insts = Block.getInsts();		SmallVectorImpl<PredicatedMI> &Insts = Block.getInsts();
for (auto &PredMI : Insts) {		for (auto &PredMI : Insts) {
if (PredMI.Predicates.count(VCTP) \|\| isVCTP(PredMI.MI))		if (PredMI.Predicates.count(VCTP) \|\| isVCTP(PredMI.MI))
continue;		continue;
LLVM_DEBUG(dbgs() << "ARM Loops: Can't convert: " << *PredMI.MI		LLVM_DEBUG(dbgs() << "ARM Loops: Can't convert: " << *PredMI.MI
▲ Show 20 Lines • Show All 99 Lines • ▼ Show 20 Lines	auto IsValidSub = [](MachineInstr *MI, unsigned ExpectedVecWidth) {
}		}
return MI->getOperand(ImmOpIdx).getImm() == ExpectedVecWidth;		return MI->getOperand(ImmOpIdx).getImm() == ExpectedVecWidth;
};		};

MBB = VCTP->getParent();		MBB = VCTP->getParent();
if (auto *Def = RDA.getUniqueReachingMIDef(&MBB->back(), NumElements)) {		if (auto *Def = RDA.getUniqueReachingMIDef(&MBB->back(), NumElements)) {
SmallPtrSet<MachineInstr*, 2> ElementChain;		SmallPtrSet<MachineInstr*, 2> ElementChain;
SmallPtrSet<MachineInstr*, 2> Ignore = { VCTP };		SmallPtrSet<MachineInstr*, 2> Ignore = { VCTP };
		Ignore.insert(SecondaryVCTPs.begin(), SecondaryVCTPs.end());

unsigned ExpectedVectorWidth = getTailPredVectorWidth(VCTP->getOpcode());		unsigned ExpectedVectorWidth = getTailPredVectorWidth(VCTP->getOpcode());

if (RDA.isSafeToRemove(Def, ElementChain, Ignore)) {		if (RDA.isSafeToRemove(Def, ElementChain, Ignore)) {
bool FoundSub = false;		bool FoundSub = false;

for (auto *MI : ElementChain) {		for (auto *MI : ElementChain) {
if (isMovRegOpcode(MI->getOpcode()))		if (isMovRegOpcode(MI->getOpcode()))
continue;		continue;

if (isSubImmOpcode(MI->getOpcode())) {		if (isSubImmOpcode(MI->getOpcode())) {
if (FoundSub \|\| !IsValidSub(MI, ExpectedVectorWidth))		if (FoundSub \|\| !IsValidSub(MI, ExpectedVectorWidth))
return false;		return false;
FoundSub = true;		FoundSub = true;
} else		} else
return false;		return false;
}		}

LLVM_DEBUG(dbgs() << "ARM Loops: Will remove element count chain:\n";		LLVM_DEBUG(dbgs() << "ARM Loops: Will remove element count chain:\n";
for (auto *MI : ElementChain)		for (auto *MI : ElementChain)
dbgs() << " - " << *MI);		dbgs() << " - " << *MI);
ToRemove.insert(ElementChain.begin(), ElementChain.end());		ToRemove.insert(ElementChain.begin(), ElementChain.end());
}		}
}		}

return true;		return true;
}		}

static bool isVectorPredicated(MachineInstr *MI) {		static bool isVectorPredicated(MachineInstr *MI) {
int PIdx = llvm::findFirstVPTPredOperandIdx(*MI);		int PIdx = llvm::findFirstVPTPredOperandIdx(*MI);
return PIdx != -1 && MI->getOperand(PIdx + 1).getReg() == ARM::VPR;		return PIdx != -1 && MI->getOperand(PIdx + 1).getReg() == ARM::VPR;
}		}

▲ Show 20 Lines • Show All 111 Lines • ▼ Show 20 Lines	bool LowOverheadLoop::ValidateLiveOuts() const {
MachineBasicBlock *MBB = ML.getHeader();		MachineBasicBlock *MBB = ML.getHeader();

for (auto &MI : *MBB) {		for (auto &MI : *MBB) {
const MCInstrDesc &MCID = MI.getDesc();		const MCInstrDesc &MCID = MI.getDesc();
uint64_t Flags = MCID.TSFlags;		uint64_t Flags = MCID.TSFlags;
if ((Flags & ARMII::DomainMask) != ARMII::DomainMVE)		if ((Flags & ARMII::DomainMask) != ARMII::DomainMVE)
continue;		continue;

if (isVCTP(&MI) \|\| MI.getOpcode() == ARM::MVE_VPST)		if (isVCTP(&MI) \|\| isVPTOpcode(MI.getOpcode()))
continue;		continue;

// Predicated loads will write zeros to the falsely predicated bytes of the		// Predicated loads will write zeros to the falsely predicated bytes of the
// destination register.		// destination register.
if (isVectorPredicated(&MI)) {		if (isVectorPredicated(&MI)) {
if (MI.mayLoad())		if (MI.mayLoad())
FalseLanesZero.insert(&MI);		FalseLanesZero.insert(&MI);
Predicated.insert(&MI);		Predicated.insert(&MI);
▲ Show 20 Lines • Show All 129 Lines • ▼ Show 20 Lines	void LowOverheadLoop::CheckLegality(ARMBasicBlockUtils *BBUtils) {
LLVM_DEBUG(if (CannotTailPredicate)		LLVM_DEBUG(if (CannotTailPredicate)
dbgs() << "ARM Loops: Couldn't validate tail predicate.\n");		dbgs() << "ARM Loops: Couldn't validate tail predicate.\n");
}		}

bool LowOverheadLoop::ValidateMVEInst(MachineInstr* MI) {		bool LowOverheadLoop::ValidateMVEInst(MachineInstr* MI) {
if (CannotTailPredicate)		if (CannotTailPredicate)
return false;		return false;

// Only support a single vctp.		if (isVCTP(MI)) {
if (isVCTP(MI) && VCTP)		// If we find another VCTP, check whether it uses the same value as the main VCTP.
		// If it does, store it in the SecondaryVCTPs set, else refuse it.
		if (VCTP) {
		if (!VCTP->getOperand(1).isIdenticalTo(MI->getOperand(1)) \|\|
		!RDA.hasSameReachingDef(VCTP, MI, MI->getOperand(1).getReg().id()))
		samparkerUnsubmitted Done Reply Inline Actions You don't have to call id() on the register. samparker: You don't have to call id() on the register.
return false;		return false;
		LLVM_DEBUG(dbgs() << "ARM Loops: Found secondary VCTP: " << *MI);
		SecondaryVCTPs.insert(MI);
		} else {
		LLVM_DEBUG(dbgs() << "ARM Loops: Found 'main' VCTP: " << *MI);
		VCTP = MI;
		}
		} else if (isVPTOpcode(MI->getOpcode())) {
		// We do not need to do anything special for VPSTs, but VPTs are tricky - we
		// need to check that at least one of their operands is defined by a
		// predicated instruction in a previous VPT block.
		if (MI->getOpcode() != ARM::MVE_VPST && !IsAcceptableVPT(MI)) {
		LLVM_DEBUG(dbgs() << "ARM Loops: Rejecting VPT - none of its operands "
		"are defined by a predicated instruction: "
		<< *MI);
		return false;
		}

// Start a new vpt block when we discover a vpt.
if (MI->getOpcode() == ARM::MVE_VPST) {
VPTBlocks.emplace_back(MI, CurrentPredicate);		VPTBlocks.emplace_back(MI, CurrentPredicate);
		samparkerUnsubmitted Not Done Reply Inline Actions I doubt CurrentPredicate is correct here as a VPT would be generating its own, maybe it should be cleared..? samparker: I doubt CurrentPredicate is correct here as a VPT would be generating its own, maybe it should…
		Pierre-vhAuthorUnsubmitted Done Reply Inline Actions According to the Armv8-M Architecture Reference Manual, VPT is similar to VCMP, it `and`'s the result of the comparison with the current element mask (see page 1174) , it doesn't overwrite VPR.P0 completely, so I don't think this needs to be cleared. Pierre-vh: According to the [[ https://static.docs.arm.com/ddi0553/bi/DDI0553B_i_armv8m_arm.pdf?_ga=2.
CurrentBlock = &VPTBlocks.back();		CurrentBlock = &VPTBlocks.back();
return true;		return true;
} else if (isVCTP(MI))		} else if (MI->getOpcode() == ARM::MVE_VPSEL \|\|
		samparkerUnsubmitted Done Reply Inline Actions isIdentical is not going to do what you're hoping for here. Use RDA is figure out if both VCTPs are operating on the same value (not just the same register). Also needs a test. I'm wondering whether we should also assert that the VCTP, if predicated, is 'Then' predicated. An example you had (last week?) confused me because I'm not sure why an Else predicate VCTP should appear here and how it maps the current idea that all predicates are ANDed. samparker: isIdentical is not going to do what you're hoping for here. Use RDA is figure out if both VCTPs…
		Pierre-vhAuthorUnsubmitted Done Reply Inline Actions Also needs a test. I'm wondering whether we should also assert that the VCTP, if predicated, is 'Then' predicated. An example you had (last week?) confused me because I'm not sure why an Else predicate VCTP should appear here and how it maps the current idea that all predicates are ANDed. I found an example where a else-predicated VCTP is generated and I added it to the tests, it comes from this C++ code: void test(int * data, int N, int T) { for(int i = 0; i < N; i++) { int d = data[i]; if (d < T \|\| d > -T) data[i] = 0; } } Pierre-vh: > Also needs a test. I'm wondering whether we should also assert that the VCTP, if predicated…
VCTP = MI;		MI->getOpcode() == ARM::MVE_VPNOT) {
else if (MI->getOpcode() == ARM::MVE_VPSEL \|\|
MI->getOpcode() == ARM::MVE_VPNOT)
return false;

// TODO: Allow VPSEL and VPNOT, we currently cannot because:		// TODO: Allow VPSEL and VPNOT, we currently cannot because:
// 1) It will use the VPR as a predicate operand, but doesn't have to be		// 1) It will use the VPR as a predicate operand, but doesn't have to be
// instead a VPT block, which means we can assert while building up		// instead a VPT block, which means we can assert while building up
// the VPT block because we don't find another VPST to being a new		// the VPT block because we don't find another VPT or VPST to being a new
// one.		// one.
// 2) VPSEL still requires a VPR operand even after tail predicating,		// 2) VPSEL still requires a VPR operand even after tail predicating,
// which means we can't remove it unless there is another		// which means we can't remove it unless there is another
		samparkerUnsubmitted Not Done Reply Inline Actions Following on from my above comment... the logic here was assuming that each value in VPR.P0 is rooted at the VCTP and then subsequent instructions (only VCMP) can modify the VPR and those instructions have to predicated on the VCTP. This should result in a predicate along the lines of VCTP && VCMP. Now we're trying to allow VPTs, which create a predicate and cannot be predicated by an existing VPT block - so how do we know that the resulting predicate is still rooted at VCTP and not 'disjoint'? Everything in this pass assumes that any instruction using VPR is predicated on the VCTP and this is the place where we need to make that guarantee. So a couple of things that I can think of now are: that the isVectorPredicated function actually means 'isPredicatedOnVCTP' and that will no longer be true. the notion of 'disjoint' has become more complicated, though a VPT doesn't use the VPR value defined by VCTP, it could be using arguments that are dependent upon the VCTP - and I have the feeling this is the only case that we can support. These nuances will also require more testing. samparker: Following on from my above comment... the logic here was assuming that each value in VPR.P0 is…
		Pierre-vhAuthorUnsubmitted Done Reply Inline Actions the notion of 'disjoint' has become more complicated, though a VPT doesn't use the VPR value defined by VCTP, it could be using arguments that are dependent upon the VCTP - and I have the feeling this is the only case that we can support. I added a check in `ValidateMVEInst` - It'll now refuse VPTs if none of their operands is defined by a predicated instruction, and I added a test for this. Is this enough? Pierre-vh: > the notion of 'disjoint' has become more complicated, though a VPT doesn't use the VPR value…
// instruction, such as vcmp, that can provide the VPR def.		// instruction, such as vcmp, that can provide the VPR def.
		return false;
		}

		samparkerUnsubmitted Done Reply Inline Actions Format this differently so this statement is attached to the rest of the conditional blocks. samparker: Format this differently so this statement is attached to the rest of the conditional blocks.
bool IsUse = false;		bool IsUse = false;
bool IsDef = false;		bool IsDef = false;
const MCInstrDesc &MCID = MI->getDesc();		const MCInstrDesc &MCID = MI->getDesc();
for (int i = MI->getNumOperands() - 1; i >= 0; --i) {		for (int i = MI->getNumOperands() - 1; i >= 0; --i) {
const MachineOperand &MO = MI->getOperand(i);		const MachineOperand &MO = MI->getOperand(i);
if (!MO.isReg() \|\| MO.getReg() != ARM::VPR)		if (!MO.isReg() \|\| MO.getReg() != ARM::VPR)
continue;		continue;

▲ Show 20 Lines • Show All 361 Lines • ▼ Show 20 Lines	if (int PIdx = llvm::findFirstVPTPredOperandIdx(*MI)) {
assert(MI->getOperand(PIdx).getImm() == ARMVCC::Then &&		assert(MI->getOperand(PIdx).getImm() == ARMVCC::Then &&
"Expected Then predicate!");		"Expected Then predicate!");
MI->getOperand(PIdx).setImm(ARMVCC::None);		MI->getOperand(PIdx).setImm(ARMVCC::None);
MI->getOperand(PIdx+1).setReg(0);		MI->getOperand(PIdx+1).setReg(0);
} else		} else
llvm_unreachable("trying to unpredicate a non-predicated instruction");		llvm_unreachable("trying to unpredicate a non-predicated instruction");
};		};

// There are a few scenarios which we have to fix up:		// There are a few scenarios which we have to fix up:
// 1) A VPT block with is only predicated by the vctp and has no internal vpr		// 1. VPT Blocks with non-uniform predicates:
		samparkerUnsubmitted Not Done Reply Inline Actions Document the new case(s). samparker: Document the new case(s).
// defs.		// - a. When the divergent instruction is a vctp
// 2) A VPT block which is only predicated by the vctp but has an internal		// - b. When the block uses a vpst, and is only predicated on the vctp
// vpr def.		// - c. When the block uses a vpt and contains one or more vctp
		samparkerUnsubmitted Done Reply Inline Actions Does the block have to contain a vctp? Do we check that this is true? samparker: Does the block have to contain a vctp? Do we check that this is true?
		Pierre-vhAuthorUnsubmitted Done Reply Inline Actions It doesn't have to contain a VCTP. I'll change this comment to make it clear. Pierre-vh: It doesn't have to contain a VCTP. I'll change this comment to make it clear.
// 3) A VPT block which is predicated upon the vctp as well as another vpr		// 2. VPT Blocks with uniform predicates:
// def.		// - a. The block uses a vpst, and is only predicated on the vctp
// 4) A VPT block which is not predicated upon a vctp, but contains it and
// all instructions within the block are predicated upon in.

for (auto &Block : LoLoop.getVPTBlocks()) {		for (auto &Block : LoLoop.getVPTBlocks()) {
SmallVectorImpl<PredicatedMI> &Insts = Block.getInsts();		SmallVectorImpl<PredicatedMI> &Insts = Block.getInsts();
if (Block.HasNonUniformPredicate()) {		if (Block.HasNonUniformPredicate()) {
PredicatedMI *Divergent = Block.getDivergent();		PredicatedMI *Divergent = Block.getDivergent();
if (isVCTP(Divergent->MI)) {		if (isVCTP(Divergent->MI)) {
// The vctp will be removed, so the block mask of the VPST/VPT will need		// The vctp will be removed, so the block mask of the vp(s)t will need
// to be recomputed.		// to be recomputed.
LoLoop.BlockMasksToRecompute.insert(Block.getVPST());		LoLoop.BlockMasksToRecompute.insert(Block.getPredicateThen());
} else if (Block.IsOnlyPredicatedOn(LoLoop.VCTP)) {		} else if (Block.isVPST() && Block.IsOnlyPredicatedOn(LoLoop.VCTP)) {
// The VPT block has a non-uniform predicate but it's entry is guarded		// The VPT block has a non-uniform predicate but it uses a vpst and its
// only by a vctp, which means we:		// entry is guarded only by a vctp, which means we:
// - Need to remove the original vpst.		// - Need to remove the original vpst.
// - Then need to unpredicate any following instructions, until		// - Then need to unpredicate any following instructions, until
// we come across the divergent vpr def.		// we come across the divergent vpr def.
// - Insert a new vpst to predicate the instruction(s) that following		// - Insert a new vpst to predicate the instruction(s) that following
// the divergent vpr def.		// the divergent vpr def.
// TODO: We could be producing more VPT blocks than necessary and could		// TODO: We could be producing more VPT blocks than necessary and could
// fold the newly created one into a proceeding one.		// fold the newly created one into a proceeding one.
for (auto I = ++MachineBasicBlock::iterator(Block.getVPST()),		for (auto I = ++MachineBasicBlock::iterator(Block.getPredicateThen()),
E = ++MachineBasicBlock::iterator(Divergent->MI); I != E; ++I)		E = ++MachineBasicBlock::iterator(Divergent->MI); I != E; ++I)
RemovePredicate(&*I);		RemovePredicate(&*I);

unsigned Size = 0;		unsigned Size = 0;
auto E = MachineBasicBlock::reverse_iterator(Divergent->MI);		auto E = MachineBasicBlock::reverse_iterator(Divergent->MI);
auto I = MachineBasicBlock::reverse_iterator(Insts.back().MI);		auto I = MachineBasicBlock::reverse_iterator(Insts.back().MI);
MachineInstr *InsertAt = nullptr;		MachineInstr *InsertAt = nullptr;
while (I != E) {		while (I != E) {
InsertAt = &*I;		InsertAt = &*I;
++Size;		++Size;
++I;		++I;
}		}
// Create a VPST with a null mask, we'll recompute it later.		// Create a VPST (with a null mask for now, we'll recompute it later).
MachineInstrBuilder MIB = BuildMI(*InsertAt->getParent(), InsertAt,		MachineInstrBuilder MIB = BuildMI(*InsertAt->getParent(), InsertAt,
InsertAt->getDebugLoc(),		InsertAt->getDebugLoc(),
TII->get(ARM::MVE_VPST));		TII->get(ARM::MVE_VPST));
MIB.addImm(0);		MIB.addImm(0);
LLVM_DEBUG(dbgs() << "ARM Loops: Removing VPST: " << *Block.getVPST());		LLVM_DEBUG(dbgs() << "ARM Loops: Removing VPST: " << *Block.getPredicateThen());
LLVM_DEBUG(dbgs() << "ARM Loops: Created VPST: " << *MIB);		LLVM_DEBUG(dbgs() << "ARM Loops: Created VPST: " << *MIB);
LoLoop.ToRemove.insert(Block.getVPST());		LoLoop.ToRemove.insert(Block.getPredicateThen());
LoLoop.BlockMasksToRecompute.insert(MIB.getInstr());		LoLoop.BlockMasksToRecompute.insert(MIB.getInstr());
}		}
} else if (Block.IsOnlyPredicatedOn(LoLoop.VCTP)) {		// Else, if the block uses a vpt, iterate over the block, removing the
// A vpt block which is only predicated upon vctp and has no internal vpr		// extra VCTPs it may contain.
// defs:		else if (Block.isVPT()) {
		bool RemovedVCTP = false;
		for (PredicatedMI &Elt : Block.getInsts()) {
		MachineInstr *MI = Elt.MI;
		if (isVCTP(MI)) {
		LLVM_DEBUG(dbgs() << "ARM Loops: Removing VCTP: " << *MI);
		LoLoop.ToRemove.insert(MI);
		RemovedVCTP = true;
		continue;
		}
		}
		if (RemovedVCTP)
		LoLoop.BlockMasksToRecompute.insert(Block.getPredicateThen());
		}
		} else if (Block.IsOnlyPredicatedOn(LoLoop.VCTP) && Block.isVPST()) {
		samparkerUnsubmitted Not Done Reply Inline Actions I might be viewing the nesting incorrectly here, but isn't this handled on line 1250? samparker: I might be viewing the nesting incorrectly here, but isn't this handled on line 1250?
		Pierre-vhAuthorUnsubmitted Done Reply Inline Actions The if at line 1250 is inside the if at line 1244. That else-if is from the if at line 1244. if (1244) if (1250) else if (1299) Pierre-vh: The if at line 1250 is inside the if at line 1244. That else-if is from the if at line 1244.
		// A vpt block starting with VPST, is only predicated upon vctp and has no
		// internal vpr defs:
// - Remove vpst.		// - Remove vpst.
// - Unpredicate the remaining instructions.		// - Unpredicate the remaining instructions.
LLVM_DEBUG(dbgs() << "ARM Loops: Removing VPST: " << *Block.getVPST());		LLVM_DEBUG(dbgs() << "ARM Loops: Removing VPST: " << *Block.getPredicateThen());
LoLoop.ToRemove.insert(Block.getVPST());		LoLoop.ToRemove.insert(Block.getPredicateThen());
for (auto &PredMI : Insts)		for (auto &PredMI : Insts)
RemovePredicate(PredMI.MI);		RemovePredicate(PredMI.MI);
}		}
}		}
LLVM_DEBUG(dbgs() << "ARM Loops: Removing VCTP: " << *LoLoop.VCTP);		LLVM_DEBUG(dbgs() << "ARM Loops: Removing remaining VCTPs...\n");
		// Remove the "main" VCTP
LoLoop.ToRemove.insert(LoLoop.VCTP);		LoLoop.ToRemove.insert(LoLoop.VCTP);
		LLVM_DEBUG(dbgs() << " " << *LoLoop.VCTP);
		// Remove remaining secondary VCTPs
		for (MachineInstr *VCTP : LoLoop.SecondaryVCTPs) {
		samparkerUnsubmitted Not Done Reply Inline Actions Surely we should have discovered all of these during the loop above? samparker: Surely we should have discovered all of these during the loop above?
		Pierre-vhAuthorUnsubmitted Done Reply Inline Actions The loop above catches the VCTP inside VPT blocks, but not the other ones. So for instance, if you have this (from `/test/CodeGen/Thumb2/LowOverheadLoops/vctp-in-vpt-2.mir`) MVE_VPST 4, implicit $vpr renamable $vpr = MVE_VCTP32 renamable $r2, 1, killed renamable $vpr renamable $r1, renamable $q0 = MVE_VLDRWU32_post killed renamable $r1, 16, 1, killed renamable $vpr :: (load 16 from %ir.lsr.iv24, align 4) renamable $vpr = MVE_VCTP32 renamable $r2, 0, $noreg renamable $r2, dead $cpsr = tSUBi8 killed renamable $r2, 4, 14, $noreg The first VCTP will be removed by the loop above, but the "secondary" one below won't - it'll be removed by this loop. Pierre-vh: The loop above catches the VCTP inside VPT blocks, but not the other ones. So for instance, if…
		// All VCTPs that aren't marked for removal yet should be unpredicated ones.
		// The predicated ones should have already been marked for removal when
		// visiting the VPT blocks.
		if (LoLoop.ToRemove.insert(VCTP).second) {
		assert(getVPTInstrPredicate(*VCTP) == ARMVCC::None &&
		"Removing Predicated VCTP without updating the block mask!");
		LLVM_DEBUG(dbgs() << " " << *VCTP);
		}
		}
}		}

void ARMLowOverheadLoops::Expand(LowOverheadLoop &LoLoop) {		void ARMLowOverheadLoops::Expand(LowOverheadLoop &LoLoop) {

// Combine the LoopDec and LoopEnd instructions into LE(TP).		// Combine the LoopDec and LoopEnd instructions into LE(TP).
auto ExpandLoopEnd = [this](LowOverheadLoop &LoLoop) {		auto ExpandLoopEnd = [this](LowOverheadLoop &LoLoop) {
MachineInstr *End = LoLoop.End;		MachineInstr *End = LoLoop.End;
MachineBasicBlock *MBB = End->getParent();		MachineBasicBlock *MBB = End->getParent();
▲ Show 20 Lines • Show All 113 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/LowOverheadLoops/cond-vector-reduce-mve-codegen.ll

	Show First 20 Lines • Show All 443 Lines • ▼ Show 20 Lines

	define dso_local arm_aapcs_vfpcc void @range_test(i32* noalias nocapture %arg, i32* noalias nocapture readonly %arg1, i32 %arg2, i32 %arg3) {			define dso_local arm_aapcs_vfpcc void @range_test(i32* noalias nocapture %arg, i32* noalias nocapture readonly %arg1, i32 %arg2, i32 %arg3) {
	; CHECK-LABEL: range_test:			; CHECK-LABEL: range_test:
	; CHECK: @ %bb.0: @ %bb			; CHECK: @ %bb.0: @ %bb
	; CHECK-NEXT: push {r7, lr}			; CHECK-NEXT: push {r7, lr}
	; CHECK-NEXT: cmp r3, #0			; CHECK-NEXT: cmp r3, #0
	; CHECK-NEXT: it eq			; CHECK-NEXT: it eq
	; CHECK-NEXT: popeq {r7, pc}			; CHECK-NEXT: popeq {r7, pc}
	; CHECK-NEXT: add.w r12, r3, #3			; CHECK-NEXT: dlstp.32 lr, r3
	; CHECK-NEXT: mov.w lr, #1
	; CHECK-NEXT: bic r12, r12, #3
	; CHECK-NEXT: sub.w r12, r12, #4
	; CHECK-NEXT: add.w lr, lr, r12, lsr #2
	; CHECK-NEXT: dls lr, lr
	; CHECK-NEXT: .LBB5_1: @ %bb12			; CHECK-NEXT: .LBB5_1: @ %bb12
	; CHECK-NEXT: @ =>This Inner Loop Header: Depth=1			; CHECK-NEXT: @ =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: vctp.32 r3			; CHECK-NEXT: vldrw.u32 q0, [r0]
	; CHECK-NEXT: vpst			; CHECK-NEXT: vptt.i32 ne, q0, zr
	; CHECK-NEXT: vldrwt.u32 q0, [r0]
	; CHECK-NEXT: vpttt.i32 ne, q0, zr
	; CHECK-NEXT: vcmpt.s32 le, q0, r2			; CHECK-NEXT: vcmpt.s32 le, q0, r2
	; CHECK-NEXT: vctpt.32 r3
	; CHECK-NEXT: vldrwt.u32 q1, [r1], #16			; CHECK-NEXT: vldrwt.u32 q1, [r1], #16
	; CHECK-NEXT: subs r3, #4
	; CHECK-NEXT: vmul.i32 q0, q1, q0			; CHECK-NEXT: vmul.i32 q0, q1, q0
	; CHECK-NEXT: vpst			; CHECK-NEXT: vpst
	; CHECK-NEXT: vstrwt.32 q0, [r0], #16			; CHECK-NEXT: vstrwt.32 q0, [r0], #16
	; CHECK-NEXT: le lr, .LBB5_1			; CHECK-NEXT: letp lr, .LBB5_1
	; CHECK-NEXT: @ %bb.2: @ %bb32			; CHECK-NEXT: @ %bb.2: @ %bb32
	; CHECK-NEXT: pop {r7, pc}			; CHECK-NEXT: pop {r7, pc}
	bb:			bb:
	%tmp = icmp eq i32 %arg3, 0			%tmp = icmp eq i32 %arg3, 0
	br i1 %tmp, label %bb32, label %bb4			br i1 %tmp, label %bb32, label %bb4

	bb4: ; preds = %bb			bb4: ; preds = %bb
	%tmp5 = add i32 %arg3, 3			%tmp5 = add i32 %arg3, 3
	▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/LowOverheadLoops/vctp-in-vpt-2.mir

Show First 20 Lines • Show All 112 Lines • ▼ Show 20 Lines	body: \|
; CHECK: frame-setup CFI_INSTRUCTION offset $r7, -8		; CHECK: frame-setup CFI_INSTRUCTION offset $r7, -8
; CHECK: dead $r7 = frame-setup tMOVr $sp, 14 /* CC::al */, $noreg		; CHECK: dead $r7 = frame-setup tMOVr $sp, 14 /* CC::al */, $noreg
; CHECK: frame-setup CFI_INSTRUCTION def_cfa_register $r7		; CHECK: frame-setup CFI_INSTRUCTION def_cfa_register $r7
; CHECK: $sp = frame-setup tSUBspi $sp, 1, 14 /* CC::al */, $noreg		; CHECK: $sp = frame-setup tSUBspi $sp, 1, 14 /* CC::al */, $noreg
; CHECK: tCBZ $r2, %bb.3		; CHECK: tCBZ $r2, %bb.3
; CHECK: bb.1.bb3:		; CHECK: bb.1.bb3:
; CHECK: successors: %bb.2(0x80000000)		; CHECK: successors: %bb.2(0x80000000)
; CHECK: liveins: $r0, $r1, $r2, $r3		; CHECK: liveins: $r0, $r1, $r2, $r3
; CHECK: renamable $r12 = t2ADDri renamable $r2, 3, 14 /* CC::al */, $noreg, $noreg
; CHECK: renamable $lr = t2MOVi 1, 14 /* CC::al */, $noreg, $noreg
; CHECK: renamable $r12 = t2BICri killed renamable $r12, 3, 14 /* CC::al */, $noreg, $noreg
; CHECK: $vpr = VMSR_P0 killed $r3, 14 /* CC::al */, $noreg		; CHECK: $vpr = VMSR_P0 killed $r3, 14 /* CC::al */, $noreg
; CHECK: renamable $r12 = t2SUBri killed renamable $r12, 4, 14 /* CC::al */, $noreg, $noreg
; CHECK: VSTR_P0_off killed renamable $vpr, $sp, 0, 14 /* CC::al */, $noreg :: (store 4 into %stack.0)		; CHECK: VSTR_P0_off killed renamable $vpr, $sp, 0, 14 /* CC::al */, $noreg :: (store 4 into %stack.0)
; CHECK: $r3 = tMOVr $r0, 14 /* CC::al */, $noreg		; CHECK: $r3 = tMOVr $r0, 14 /* CC::al */, $noreg
; CHECK: renamable $lr = nuw nsw t2ADDrs killed renamable $lr, killed renamable $r12, 19, 14 /* CC::al */, $noreg, $noreg		; CHECK: $lr = MVE_DLSTP_32 killed renamable $r2
; CHECK: $lr = t2DLS killed renamable $lr
; CHECK: bb.2.bb9:		; CHECK: bb.2.bb9:
; CHECK: successors: %bb.2(0x7c000000), %bb.3(0x04000000)		; CHECK: successors: %bb.2(0x7c000000), %bb.3(0x04000000)
; CHECK: liveins: $lr, $r0, $r1, $r2, $r3		; CHECK: liveins: $lr, $r0, $r1, $r3
; CHECK: renamable $vpr = VLDR_P0_off $sp, 0, 14 /* CC::al */, $noreg :: (load 4 from %stack.0)		; CHECK: renamable $vpr = VLDR_P0_off $sp, 0, 14 /* CC::al */, $noreg :: (load 4 from %stack.0)
; CHECK: MVE_VPST 4, implicit $vpr		; CHECK: MVE_VPST 8, implicit $vpr
; CHECK: renamable $vpr = MVE_VCTP32 renamable $r2, 1, killed renamable $vpr		; CHECK: renamable $r1, renamable $q0 = MVE_VLDRWU32_post killed renamable $r1, 16, 1, renamable $vpr :: (load 16 from %ir.lsr.iv24, align 4)
; CHECK: renamable $r1, renamable $q0 = MVE_VLDRWU32_post killed renamable $r1, 16, 1, killed renamable $vpr :: (load 16 from %ir.lsr.iv24, align 4)
; CHECK: renamable $vpr = MVE_VCTP32 renamable $r2, 0, $noreg
; CHECK: renamable $r2, dead $cpsr = tSUBi8 killed renamable $r2, 4, 14 /* CC::al */, $noreg
; CHECK: MVE_VPST 4, implicit $vpr		; CHECK: MVE_VPST 4, implicit $vpr
; CHECK: renamable $vpr = MVE_VCMPi32r renamable $q0, $zr, 1, 1, killed renamable $vpr		; CHECK: renamable $vpr = MVE_VCMPi32r renamable $q0, $zr, 1, 1, killed renamable $vpr
; CHECK: renamable $r3, renamable $q1 = MVE_VLDRWU32_post killed renamable $r3, 16, 1, renamable $vpr :: (load 16 from %ir.lsr.iv1, align 4)		; CHECK: renamable $r3, renamable $q1 = MVE_VLDRWU32_post killed renamable $r3, 16, 1, renamable $vpr :: (load 16 from %ir.lsr.iv1, align 4)
; CHECK: renamable $q0 = nsw MVE_VMULi32 killed renamable $q1, killed renamable $q0, 0, $noreg, undef renamable $q0		; CHECK: renamable $q0 = nsw MVE_VMULi32 killed renamable $q1, killed renamable $q0, 0, $noreg, undef renamable $q0
; CHECK: MVE_VPST 8, implicit $vpr		; CHECK: MVE_VPST 8, implicit $vpr
; CHECK: MVE_VSTRWU32 killed renamable $q0, killed renamable $r0, 0, 1, killed renamable $vpr :: (store 16 into %ir.lsr.iv1, align 4)		; CHECK: MVE_VSTRWU32 killed renamable $q0, killed renamable $r0, 0, 1, killed renamable $vpr :: (store 16 into %ir.lsr.iv1, align 4)
; CHECK: $r0 = tMOVr $r3, 14 /* CC::al */, $noreg		; CHECK: $r0 = tMOVr $r3, 14 /* CC::al */, $noreg
; CHECK: $lr = t2LEUpdate killed renamable $lr, %bb.2		; CHECK: $lr = MVE_LETP killed renamable $lr, %bb.2
; CHECK: bb.3.bb27:		; CHECK: bb.3.bb27:
; CHECK: $sp = tADDspi $sp, 1, 14 /* CC::al */, $noreg		; CHECK: $sp = tADDspi $sp, 1, 14 /* CC::al */, $noreg
; CHECK: tPOP_RET 14 /* CC::al */, $noreg, def $r7, def $pc		; CHECK: tPOP_RET 14 /* CC::al */, $noreg, def $r7, def $pc
bb.0.bb:		bb.0.bb:
successors: %bb.3(0x30000000), %bb.1(0x50000000)		successors: %bb.3(0x30000000), %bb.1(0x50000000)
liveins: $r0, $r1, $r2, $r3, $lr		liveins: $r0, $r1, $r2, $r3, $lr

frame-setup tPUSH 14, $noreg, killed $lr, implicit-def $sp, implicit $sp		frame-setup tPUSH 14, $noreg, killed $lr, implicit-def $sp, implicit $sp
▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/LowOverheadLoops/vpt-blocks.mir

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				# RUN: llc -mtriple=thumbv8.1m.main -mattr=+mve -run-pass=arm-low-overhead-loops %s -o - \| FileCheck %s

				--- \|
				@_ZL3arr = internal global [10 x i32] [i32 1, i32 2, i32 3, i32 5, i32 5, i32 5, i32 -2, i32 0, i32 -8, i32 -1], align 4
				@.str = private unnamed_addr constant [5 x i8] c"%d, \00", align 1

				define arm_aapcs_vfpcc void @vpt_block(i32* nocapture %A, i32 %n, i32 %x) {
				entry:
				%cmp9 = icmp sgt i32 %n, 0
				%0 = add i32 %n, 3
				%1 = lshr i32 %0, 2
				%2 = shl nuw i32 %1, 2
				%3 = add i32 %2, -4
				%4 = lshr i32 %3, 2
				%5 = add nuw nsw i32 %4, 1
				br i1 %cmp9, label %vector.ph, label %for.cond.cleanup

				vector.ph: ; preds = %entry
				%sub = sub nsw i32 0, %x
				call void @llvm.set.loop.iterations.i32(i32 %5)
				br label %vector.body

				vector.body: ; preds = %vector.body, %vector.ph
				%lsr.iv1 = phi i32* [ %scevgep, %vector.body ], [ %A, %vector.ph ]
				%6 = phi i32 [ %5, %vector.ph ], [ %18, %vector.body ]
				%7 = phi i32 [ %n, %vector.ph ], [ %9, %vector.body ]
				%lsr.iv12 = bitcast i32* %lsr.iv1 to <4 x i32>*
				%8 = call <4 x i1> @llvm.arm.mve.vctp32(i32 %7)
				%9 = sub i32 %7, 4
				%wide.masked.load = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %lsr.iv12, i32 4, <4 x i1> %8, <4 x i32> undef)
				%10 = insertelement <4 x i32> undef, i32 %x, i32 0
				%11 = shufflevector <4 x i32> %10, <4 x i32> undef, <4 x i32> zeroinitializer
				%12 = icmp slt <4 x i32> %wide.masked.load, %11
				%13 = insertelement <4 x i32> undef, i32 %sub, i32 0
				%14 = shufflevector <4 x i32> %13, <4 x i32> undef, <4 x i32> zeroinitializer
				%15 = icmp sgt <4 x i32> %wide.masked.load, %14
				%16 = and <4 x i1> %12, %15
				%17 = and <4 x i1> %16, %8
				call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> zeroinitializer, <4 x i32>* %lsr.iv12, i32 4, <4 x i1> %17)
				%scevgep = getelementptr i32, i32* %lsr.iv1, i32 4
				%18 = call i32 @llvm.loop.decrement.reg.i32.i32.i32(i32 %6, i32 1)
				%19 = icmp ne i32 %18, 0
				br i1 %19, label %vector.body, label %for.cond.cleanup

				for.cond.cleanup: ; preds = %vector.body, %entry
				ret void
				}

				define arm_aapcs_vfpcc void @different_vcpt_reaching_def(i32* nocapture %A, i32 %n, i32 %x) {
				; Intentionally left blank - see MIR sequence below.
				entry:
				unreachable
				vector.body:
				unreachable
				for.cond.cleanup:
				unreachable
				}

				define arm_aapcs_vfpcc void @different_vcpt_operand(i32* nocapture %A, i32 %n, i32 %x) {
				; Intentionally left blank - see MIR sequence below.
				entry:
				unreachable
				vector.body:
				unreachable
				for.cond.cleanup:
				unreachable
				}

				define arm_aapcs_vfpcc void @else_vcpt(i32* nocapture %data, i32 %N, i32 %T) {
				entry:
				%cmp9 = icmp sgt i32 %N, 0
				%0 = add i32 %N, 3
				%1 = lshr i32 %0, 2
				%2 = shl nuw i32 %1, 2
				%3 = add i32 %2, -4
				%4 = lshr i32 %3, 2
				%5 = add nuw nsw i32 %4, 1
				br i1 %cmp9, label %vector.ph, label %for.cond.cleanup

				vector.ph: ; preds = %entry
				%sub = sub nsw i32 0, %T
				call void @llvm.set.loop.iterations.i32(i32 %5)
				br label %vector.body

				vector.body: ; preds = %vector.body, %vector.ph
				%lsr.iv1 = phi i32* [ %scevgep, %vector.body ], [ %data, %vector.ph ]
				%6 = phi i32 [ %5, %vector.ph ], [ %18, %vector.body ]
				%7 = phi i32 [ %N, %vector.ph ], [ %9, %vector.body ]
				%lsr.iv12 = bitcast i32* %lsr.iv1 to <4 x i32>*
				%8 = call <4 x i1> @llvm.arm.mve.vctp32(i32 %7)
				%9 = sub i32 %7, 4
				%wide.masked.load = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %lsr.iv12, i32 4, <4 x i1> %8, <4 x i32> undef)
				%10 = insertelement <4 x i32> undef, i32 %T, i32 0
				%11 = shufflevector <4 x i32> %10, <4 x i32> undef, <4 x i32> zeroinitializer
				%12 = icmp slt <4 x i32> %wide.masked.load, %11
				%13 = insertelement <4 x i32> undef, i32 %sub, i32 0
				%14 = shufflevector <4 x i32> %13, <4 x i32> undef, <4 x i32> zeroinitializer
				%15 = icmp sgt <4 x i32> %wide.masked.load, %14
				%16 = or <4 x i1> %12, %15
				%17 = and <4 x i1> %16, %8
				call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> zeroinitializer, <4 x i32>* %lsr.iv12, i32 4, <4 x i1> %17)
				%scevgep = getelementptr i32, i32* %lsr.iv1, i32 4
				%18 = call i32 @llvm.loop.decrement.reg.i32.i32.i32(i32 %6, i32 1)
				%19 = icmp ne i32 %18, 0
				br i1 %19, label %vector.body, label %for.cond.cleanup

				for.cond.cleanup: ; preds = %vector.body, %entry
				ret void
				}

				define arm_aapcs_vfpcc void @unrelated_vpt(i32* nocapture %A, i32 %n, i32 %x) {
				; Intentionally left blank - see MIR sequence below.
				entry:
				unreachable
				vector.body:
				unreachable
				for.cond.cleanup:
				unreachable
				}

				declare <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>*, i32 immarg, <4 x i1>, <4 x i32>)
				declare void @llvm.masked.store.v4i32.p0v4i32(<4 x i32>, <4 x i32>*, i32 immarg, <4 x i1>)
				declare void @llvm.set.loop.iterations.i32(i32)
				declare i32 @llvm.loop.decrement.reg.i32.i32.i32(i32, i32)
				declare <4 x i1> @llvm.arm.mve.vctp32(i32)
				...
				---
				name: vpt_block
				alignment: 2
				exposesReturnsTwice: false
				legalized: false
				regBankSelected: false
				selected: false
				failedISel: false
				tracksRegLiveness: true
				hasWinCFI: false
				registers: []
				liveins:
				- { reg: '$r0', virtual-reg: '' }
				- { reg: '$r1', virtual-reg: '' }
				- { reg: '$r2', virtual-reg: '' }
				frameInfo:
				isFrameAddressTaken: false
				isReturnAddressTaken: false
				hasStackMap: false
				hasPatchPoint: false
				stackSize: 8
				offsetAdjustment: 0
				maxAlignment: 4
				adjustsStack: false
				hasCalls: false
				stackProtector: ''
				maxCallFrameSize: 0
				cvBytesOfCalleeSavedRegisters: 0
				hasOpaqueSPAdjustment: false
				hasVAStart: false
				hasMustTailInVarArgFunc: false
				localFrameSize: 0
				savePoint: ''
				restorePoint: ''
				fixedStack: []
				stack:
				- { id: 0, name: '', type: spill-slot, offset: -4, size: 4, alignment: 4,
				stack-id: default, callee-saved-register: '$lr', callee-saved-restored: false,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				- { id: 1, name: '', type: spill-slot, offset: -8, size: 4, alignment: 4,
				stack-id: default, callee-saved-register: '$r7', callee-saved-restored: true,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				callSites: []
				constants: []
				machineFunctionInfo: {}
				body: \|
				; CHECK-LABEL: name: vpt_block
				; CHECK: bb.0.entry:
				; CHECK: successors: %bb.1(0x80000000)
				; CHECK: liveins: $lr, $r0, $r1, $r2, $r7
				; CHECK: frame-setup tPUSH 14 /* CC::al */, $noreg, killed $r7, killed $lr, implicit-def $sp, implicit $sp
				; CHECK: frame-setup CFI_INSTRUCTION def_cfa_offset 8
				; CHECK: frame-setup CFI_INSTRUCTION offset $lr, -4
				; CHECK: frame-setup CFI_INSTRUCTION offset $r7, -8
				; CHECK: tCMPi8 renamable $r1, 1, 14 /* CC::al */, $noreg, implicit-def $cpsr
				; CHECK: t2IT 11, 8, implicit-def $itstate
				; CHECK: frame-destroy tPOP_RET 11 /* CC::lt */, killed $cpsr, def $r7, def $pc, implicit killed $itstate
				; CHECK: renamable $q0 = MVE_VMOVimmi32 0, 0, $noreg, undef renamable $q0
				; CHECK: renamable $r3, dead $cpsr = nsw tRSB renamable $r2, 14 /* CC::al */, $noreg
				; CHECK: $lr = MVE_DLSTP_32 killed renamable $r1
				; CHECK: bb.1.vector.body:
				; CHECK: successors: %bb.1(0x7c000000), %bb.2(0x04000000)
				; CHECK: liveins: $lr, $q0, $r0, $r2, $r3
				; CHECK: renamable $q1 = MVE_VLDRWU32 renamable $r0, 0, 0, killed $noreg
				; CHECK: MVE_VPTv4s32r 4, renamable $q1, renamable $r2, 11, implicit-def $vpr
				; CHECK: renamable $vpr = MVE_VCMPs32r killed renamable $q1, renamable $r3, 12, 1, killed renamable $vpr
				; CHECK: renamable $r0 = MVE_VSTRWU32_post renamable $q0, killed renamable $r0, 16, 1, killed renamable $vpr
				; CHECK: $lr = MVE_LETP killed renamable $lr, %bb.1
				; CHECK: bb.2.for.cond.cleanup:
				; CHECK: frame-destroy tPOP_RET 14 /* CC::al */, $noreg, def $r7, def $pc
				bb.0.entry:
				successors: %bb.1(0x80000000)
				liveins: $r0, $r1, $r2, $r7, $lr

				frame-setup tPUSH 14 /* CC::al */, $noreg, killed $r7, killed $lr, implicit-def $sp, implicit $sp
				frame-setup CFI_INSTRUCTION def_cfa_offset 8
				frame-setup CFI_INSTRUCTION offset $lr, -4
				frame-setup CFI_INSTRUCTION offset $r7, -8
				tCMPi8 renamable $r1, 1, 14 /* CC::al */, $noreg, implicit-def $cpsr
				t2IT 11, 8, implicit-def $itstate
				frame-destroy tPOP_RET 11 /* CC::lt */, killed $cpsr, def $r7, def $pc, implicit killed $itstate
				renamable $r3, dead $cpsr = tADDi3 renamable $r1, 3, 14 /* CC::al */, $noreg
				renamable $q0 = MVE_VMOVimmi32 0, 0, $noreg, undef renamable $q0
				renamable $r3 = t2BICri killed renamable $r3, 3, 14 /* CC::al */, $noreg, $noreg
				renamable $r12 = t2SUBri killed renamable $r3, 4, 14 /* CC::al */, $noreg, $noreg
				renamable $r3, dead $cpsr = tMOVi8 1, 14 /* CC::al */, $noreg
				renamable $lr = nuw nsw t2ADDrs killed renamable $r3, killed renamable $r12, 19, 14 /* CC::al */, $noreg, $noreg
				renamable $r3, dead $cpsr = nsw tRSB renamable $r2, 14 /* CC::al */, $noreg
				t2DoLoopStart renamable $lr

				bb.1.vector.body:
				successors: %bb.1(0x7c000000), %bb.2(0x04000000)
				liveins: $lr, $q0, $r0, $r1, $r2, $r3

				renamable $vpr = MVE_VCTP32 renamable $r1, 0, $noreg
				MVE_VPST 8, implicit $vpr
				renamable $q1 = MVE_VLDRWU32 renamable $r0, 0, 1, killed renamable $vpr
				MVE_VPTv4s32r 2, renamable $q1, renamable $r2, 11, implicit-def $vpr
				renamable $vpr = MVE_VCMPs32r killed renamable $q1, renamable $r3, 12, 1, killed renamable $vpr
				renamable $vpr = MVE_VCTP32 renamable $r1, 1, killed renamable $vpr
				renamable $r0 = MVE_VSTRWU32_post renamable $q0, killed renamable $r0, 16, 1, killed renamable $vpr
				renamable $r1, dead $cpsr = tSUBi8 killed renamable $r1, 4, 14 /* CC::al */, $noreg
				renamable $lr = t2LoopDec killed renamable $lr, 1
				t2LoopEnd renamable $lr, %bb.1, implicit-def dead $cpsr
				tB %bb.2, 14 /* CC::al */, $noreg

				bb.2.for.cond.cleanup:
				frame-destroy tPOP_RET 14 /* CC::al */, $noreg, def $r7, def $pc
				...
				---
				name: different_vcpt_reaching_def
				alignment: 2
				exposesReturnsTwice: false
				legalized: false
				regBankSelected: false
				selected: false
				failedISel: false
				tracksRegLiveness: true
				hasWinCFI: false
				registers: []
				liveins:
				- { reg: '$r0', virtual-reg: '' }
				- { reg: '$r1', virtual-reg: '' }
				- { reg: '$r2', virtual-reg: '' }
				frameInfo:
				isFrameAddressTaken: false
				isReturnAddressTaken: false
				hasStackMap: false
				hasPatchPoint: false
				stackSize: 8
				offsetAdjustment: 0
				maxAlignment: 4
				adjustsStack: false
				hasCalls: false
				stackProtector: ''
				maxCallFrameSize: 0
				cvBytesOfCalleeSavedRegisters: 0
				hasOpaqueSPAdjustment: false
				hasVAStart: false
				hasMustTailInVarArgFunc: false
				localFrameSize: 0
				savePoint: ''
				restorePoint: ''
				fixedStack: []
				stack:
				- { id: 0, name: '', type: spill-slot, offset: -4, size: 4, alignment: 4,
				stack-id: default, callee-saved-register: '$lr', callee-saved-restored: false,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				- { id: 1, name: '', type: spill-slot, offset: -8, size: 4, alignment: 4,
				stack-id: default, callee-saved-register: '$r7', callee-saved-restored: true,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				callSites: []
				constants: []
				machineFunctionInfo: {}
				body: \|
				; CHECK-LABEL: name: different_vcpt_reaching_def
				; CHECK: bb.0.entry:
				; CHECK: successors: %bb.1(0x80000000)
				; CHECK: liveins: $lr, $r0, $r1, $r2, $r7
				; CHECK: frame-setup tPUSH 14 /* CC::al */, $noreg, killed $r7, killed $lr, implicit-def $sp, implicit $sp
				; CHECK: frame-setup CFI_INSTRUCTION def_cfa_offset 8
				; CHECK: frame-setup CFI_INSTRUCTION offset $lr, -4
				; CHECK: frame-setup CFI_INSTRUCTION offset $r7, -8
				; CHECK: tCMPi8 renamable $r1, 1, 14 /* CC::al */, $noreg, implicit-def $cpsr
				; CHECK: t2IT 11, 8, implicit-def $itstate
				; CHECK: frame-destroy tPOP_RET 11 /* CC::lt */, killed $cpsr, def $r7, def $pc, implicit killed $itstate
				; CHECK: renamable $r3, dead $cpsr = tADDi3 renamable $r1, 3, 14 /* CC::al */, $noreg
				; CHECK: renamable $q0 = MVE_VMOVimmi32 0, 0, $noreg, undef renamable $q0
				; CHECK: renamable $r3 = t2BICri killed renamable $r3, 3, 14 /* CC::al */, $noreg, $noreg
				; CHECK: renamable $r12 = t2SUBri killed renamable $r3, 4, 14 /* CC::al */, $noreg, $noreg
				; CHECK: renamable $r3, dead $cpsr = tMOVi8 1, 14 /* CC::al */, $noreg
				; CHECK: renamable $lr = nuw nsw t2ADDrs killed renamable $r3, killed renamable $r12, 19, 14 /* CC::al */, $noreg, $noreg
				; CHECK: renamable $r3, dead $cpsr = nsw tRSB renamable $r2, 14 /* CC::al */, $noreg
				; CHECK: $lr = t2DLS killed renamable $lr
				; CHECK: bb.1.vector.body:
				; CHECK: successors: %bb.1(0x7c000000), %bb.2(0x04000000)
				; CHECK: liveins: $lr, $q0, $r0, $r1, $r2, $r3
				; CHECK: renamable $vpr = MVE_VCTP32 renamable $r1, 0, $noreg
				; CHECK: MVE_VPST 8, implicit $vpr
				; CHECK: renamable $r1 = MVE_VSTRWU32_post renamable $q0, killed renamable $r1, 16, 1, renamable $vpr
				; CHECK: renamable $q1 = MVE_VLDRWU32 renamable $r0, 0, 1, killed renamable $vpr
				; CHECK: MVE_VPTv4s32r 2, renamable $q1, renamable $r2, 11, implicit-def $vpr
				; CHECK: renamable $vpr = MVE_VCMPs32r killed renamable $q1, renamable $r3, 12, 1, killed renamable $vpr
				; CHECK: renamable $vpr = MVE_VCTP32 renamable $r1, 1, killed renamable $vpr
				; CHECK: renamable $r0 = MVE_VSTRWU32_post renamable $q0, killed renamable $r0, 16, 1, killed renamable $vpr
				; CHECK: renamable $r1, dead $cpsr = tSUBi8 killed renamable $r1, 4, 14 /* CC::al */, $noreg
				; CHECK: $lr = t2LEUpdate killed renamable $lr, %bb.1
				; CHECK: bb.2.for.cond.cleanup:
				; CHECK: frame-destroy tPOP_RET 14 /* CC::al */, $noreg, def $r7, def $pc
				;
				; Tests that secondary VCTPs are refused when their operand's reaching definition is not the same as the main
				; VCTP's.
				;
				bb.0.entry:
				successors: %bb.1(0x80000000)
				liveins: $r0, $r1, $r2, $r7, $lr

				frame-setup tPUSH 14 /* CC::al */, $noreg, killed $r7, killed $lr, implicit-def $sp, implicit $sp
				frame-setup CFI_INSTRUCTION def_cfa_offset 8
				frame-setup CFI_INSTRUCTION offset $lr, -4
				frame-setup CFI_INSTRUCTION offset $r7, -8
				tCMPi8 renamable $r1, 1, 14 /* CC::al */, $noreg, implicit-def $cpsr
				t2IT 11, 8, implicit-def $itstate
				frame-destroy tPOP_RET 11 /* CC::lt */, killed $cpsr, def $r7, def $pc, implicit killed $itstate
				renamable $r3, dead $cpsr = tADDi3 renamable $r1, 3, 14 /* CC::al */, $noreg
				renamable $q0 = MVE_VMOVimmi32 0, 0, $noreg, undef renamable $q0
				renamable $r3 = t2BICri killed renamable $r3, 3, 14 /* CC::al */, $noreg, $noreg
				renamable $r12 = t2SUBri killed renamable $r3, 4, 14 /* CC::al */, $noreg, $noreg
				renamable $r3, dead $cpsr = tMOVi8 1, 14 /* CC::al */, $noreg
				renamable $lr = nuw nsw t2ADDrs killed renamable $r3, killed renamable $r12, 19, 14 /* CC::al */, $noreg, $noreg
				renamable $r3, dead $cpsr = nsw tRSB renamable $r2, 14 /* CC::al */, $noreg
				t2DoLoopStart renamable $lr

				bb.1.vector.body:
				successors: %bb.1(0x7c000000), %bb.2(0x04000000)
				liveins: $lr, $q0, $r0, $r1, $r2, $r3

				renamable $vpr = MVE_VCTP32 renamable $r1, 0, $noreg
				MVE_VPST 8, implicit $vpr
				renamable $r1 = MVE_VSTRWU32_post renamable $q0, killed renamable $r1, 16, 1, renamable $vpr
				renamable $q1 = MVE_VLDRWU32 renamable $r0, 0, 1, killed renamable $vpr
				MVE_VPTv4s32r 2, renamable $q1, renamable $r2, 11, implicit-def $vpr
				renamable $vpr = MVE_VCMPs32r killed renamable $q1, renamable $r3, 12, 1, killed renamable $vpr
				renamable $vpr = MVE_VCTP32 renamable $r1, 1, killed renamable $vpr
				renamable $r0 = MVE_VSTRWU32_post renamable $q0, killed renamable $r0, 16, 1, killed renamable $vpr
				renamable $r1, dead $cpsr = tSUBi8 killed renamable $r1, 4, 14 /* CC::al */, $noreg
				renamable $lr = t2LoopDec killed renamable $lr, 1
				t2LoopEnd renamable $lr, %bb.1, implicit-def dead $cpsr
				tB %bb.2, 14 /* CC::al */, $noreg

				bb.2.for.cond.cleanup:
				frame-destroy tPOP_RET 14 /* CC::al */, $noreg, def $r7, def $pc
				...
				---
				name: different_vcpt_operand
				alignment: 2
				exposesReturnsTwice: false
				legalized: false
				regBankSelected: false
				selected: false
				failedISel: false
				tracksRegLiveness: true
				hasWinCFI: false
				registers: []
				liveins:
				- { reg: '$r0', virtual-reg: '' }
				- { reg: '$r1', virtual-reg: '' }
				- { reg: '$r2', virtual-reg: '' }
				frameInfo:
				isFrameAddressTaken: false
				isReturnAddressTaken: false
				hasStackMap: false
				hasPatchPoint: false
				stackSize: 8
				offsetAdjustment: 0
				maxAlignment: 4
				adjustsStack: false
				hasCalls: false
				stackProtector: ''
				maxCallFrameSize: 0
				cvBytesOfCalleeSavedRegisters: 0
				hasOpaqueSPAdjustment: false
				hasVAStart: false
				hasMustTailInVarArgFunc: false
				localFrameSize: 0
				savePoint: ''
				restorePoint: ''
				fixedStack: []
				stack:
				- { id: 0, name: '', type: spill-slot, offset: -4, size: 4, alignment: 4,
				stack-id: default, callee-saved-register: '$lr', callee-saved-restored: false,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				- { id: 1, name: '', type: spill-slot, offset: -8, size: 4, alignment: 4,
				stack-id: default, callee-saved-register: '$r7', callee-saved-restored: true,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				callSites: []
				constants: []
				machineFunctionInfo: {}
				body: \|
				; CHECK-LABEL: name: different_vcpt_operand
				; CHECK: bb.0.entry:
				; CHECK: successors: %bb.1(0x80000000)
				; CHECK: liveins: $lr, $r0, $r1, $r2, $r7
				; CHECK: frame-setup tPUSH 14 /* CC::al */, $noreg, killed $r7, killed $lr, implicit-def $sp, implicit $sp
				; CHECK: frame-setup CFI_INSTRUCTION def_cfa_offset 8
				; CHECK: frame-setup CFI_INSTRUCTION offset $lr, -4
				; CHECK: frame-setup CFI_INSTRUCTION offset $r7, -8
				; CHECK: tCMPi8 renamable $r1, 1, 14 /* CC::al */, $noreg, implicit-def $cpsr
				; CHECK: t2IT 11, 8, implicit-def $itstate
				; CHECK: frame-destroy tPOP_RET 11 /* CC::lt */, killed $cpsr, def $r7, def $pc, implicit killed $itstate
				; CHECK: renamable $r3, dead $cpsr = tADDi3 renamable $r1, 3, 14 /* CC::al */, $noreg
				; CHECK: renamable $q0 = MVE_VMOVimmi32 0, 0, $noreg, undef renamable $q0
				; CHECK: renamable $r3 = t2BICri killed renamable $r3, 3, 14 /* CC::al */, $noreg, $noreg
				; CHECK: renamable $r12 = t2SUBri killed renamable $r3, 4, 14 /* CC::al */, $noreg, $noreg
				; CHECK: renamable $r3, dead $cpsr = tMOVi8 1, 14 /* CC::al */, $noreg
				; CHECK: renamable $lr = nuw nsw t2ADDrs killed renamable $r3, killed renamable $r12, 19, 14 /* CC::al */, $noreg, $noreg
				; CHECK: renamable $r3, dead $cpsr = nsw tRSB renamable $r2, 14 /* CC::al */, $noreg
				; CHECK: $lr = t2DLS killed renamable $lr
				; CHECK: bb.1.vector.body:
				; CHECK: successors: %bb.1(0x7c000000), %bb.2(0x04000000)
				; CHECK: liveins: $lr, $q0, $r0, $r1, $r2, $r3
				; CHECK: renamable $vpr = MVE_VCTP32 renamable $r1, 0, $noreg
				; CHECK: MVE_VPST 8, implicit $vpr
				; CHECK: renamable $q1 = MVE_VLDRWU32 renamable $r0, 0, 1, killed renamable $vpr
				; CHECK: MVE_VPTv4s32r 2, renamable $q1, renamable $r2, 11, implicit-def $vpr
				; CHECK: renamable $vpr = MVE_VCMPs32r killed renamable $q1, renamable $r3, 12, 1, killed renamable $vpr
				; CHECK: renamable $vpr = MVE_VCTP32 renamable $r2, 1, killed renamable $vpr
				; CHECK: renamable $r0 = MVE_VSTRWU32_post renamable $q0, killed renamable $r0, 16, 1, killed renamable $vpr
				; CHECK: renamable $r1, dead $cpsr = tSUBi8 killed renamable $r1, 4, 14 /* CC::al */, $noreg
				; CHECK: $lr = t2LEUpdate killed renamable $lr, %bb.1
				; CHECK: bb.2.for.cond.cleanup:
				; CHECK: frame-destroy tPOP_RET 14 /* CC::al */, $noreg, def $r7, def $pc
				;
				; Tests that secondary VCTPs are refused when their operand is not the same register as the main VCTP's.
				;
				bb.0.entry:
				successors: %bb.1(0x80000000)
				liveins: $r0, $r1, $r2, $r7, $lr

				frame-setup tPUSH 14 /* CC::al */, $noreg, killed $r7, killed $lr, implicit-def $sp, implicit $sp
				frame-setup CFI_INSTRUCTION def_cfa_offset 8
				frame-setup CFI_INSTRUCTION offset $lr, -4
				frame-setup CFI_INSTRUCTION offset $r7, -8
				tCMPi8 renamable $r1, 1, 14 /* CC::al */, $noreg, implicit-def $cpsr
				t2IT 11, 8, implicit-def $itstate
				frame-destroy tPOP_RET 11 /* CC::lt */, killed $cpsr, def $r7, def $pc, implicit killed $itstate
				renamable $r3, dead $cpsr = tADDi3 renamable $r1, 3, 14 /* CC::al */, $noreg
				renamable $q0 = MVE_VMOVimmi32 0, 0, $noreg, undef renamable $q0
				renamable $r3 = t2BICri killed renamable $r3, 3, 14 /* CC::al */, $noreg, $noreg
				renamable $r12 = t2SUBri killed renamable $r3, 4, 14 /* CC::al */, $noreg, $noreg
				renamable $r3, dead $cpsr = tMOVi8 1, 14 /* CC::al */, $noreg
				renamable $lr = nuw nsw t2ADDrs killed renamable $r3, killed renamable $r12, 19, 14 /* CC::al */, $noreg, $noreg
				renamable $r3, dead $cpsr = nsw tRSB renamable $r2, 14 /* CC::al */, $noreg
				t2DoLoopStart renamable $lr

				bb.1.vector.body:
				successors: %bb.1(0x7c000000), %bb.2(0x04000000)
				liveins: $lr, $q0, $r0, $r1, $r2, $r3

				renamable $vpr = MVE_VCTP32 renamable $r1, 0, $noreg
				MVE_VPST 8, implicit $vpr
				renamable $q1 = MVE_VLDRWU32 renamable $r0, 0, 1, killed renamable $vpr
				MVE_VPTv4s32r 2, renamable $q1, renamable $r2, 11, implicit-def $vpr
				renamable $vpr = MVE_VCMPs32r killed renamable $q1, renamable $r3, 12, 1, killed renamable $vpr
				renamable $vpr = MVE_VCTP32 renamable $r2, 1, killed renamable $vpr
				renamable $r0 = MVE_VSTRWU32_post renamable $q0, killed renamable $r0, 16, 1, killed renamable $vpr
				renamable $r1, dead $cpsr = tSUBi8 killed renamable $r1, 4, 14 /* CC::al */, $noreg
				renamable $lr = t2LoopDec killed renamable $lr, 1
				t2LoopEnd renamable $lr, %bb.1, implicit-def dead $cpsr
				tB %bb.2, 14 /* CC::al */, $noreg

				bb.2.for.cond.cleanup:
				frame-destroy tPOP_RET 14 /* CC::al */, $noreg, def $r7, def $pc
				...
				---
				name: else_vcpt
				alignment: 2
				exposesReturnsTwice: false
				legalized: false
				regBankSelected: false
				selected: false
				failedISel: false
				tracksRegLiveness: true
				hasWinCFI: false
				registers: []
				liveins:
				- { reg: '$r0', virtual-reg: '' }
				- { reg: '$r1', virtual-reg: '' }
				- { reg: '$r2', virtual-reg: '' }
				frameInfo:
				isFrameAddressTaken: false
				isReturnAddressTaken: false
				hasStackMap: false
				hasPatchPoint: false
				stackSize: 8
				offsetAdjustment: 0
				maxAlignment: 4
				adjustsStack: false
				hasCalls: false
				stackProtector: ''
				maxCallFrameSize: 0
				cvBytesOfCalleeSavedRegisters: 0
				hasOpaqueSPAdjustment: false
				hasVAStart: false
				hasMustTailInVarArgFunc: false
				localFrameSize: 0
				savePoint: ''
				restorePoint: ''
				fixedStack: []
				stack:
				- { id: 0, name: '', type: spill-slot, offset: -4, size: 4, alignment: 4,
				stack-id: default, callee-saved-register: '$lr', callee-saved-restored: false,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				- { id: 1, name: '', type: spill-slot, offset: -8, size: 4, alignment: 4,
				stack-id: default, callee-saved-register: '$r7', callee-saved-restored: true,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				callSites: []
				constants: []
				machineFunctionInfo: {}
				body: \|
				; CHECK-LABEL: name: else_vcpt
				; CHECK: bb.0.entry:
				; CHECK: successors: %bb.1(0x80000000)
				; CHECK: liveins: $lr, $r0, $r1, $r2, $r7
				; CHECK: frame-setup tPUSH 14 /* CC::al */, $noreg, killed $r7, killed $lr, implicit-def $sp, implicit $sp
				; CHECK: frame-setup CFI_INSTRUCTION def_cfa_offset 8
				; CHECK: frame-setup CFI_INSTRUCTION offset $lr, -4
				; CHECK: frame-setup CFI_INSTRUCTION offset $r7, -8
				; CHECK: tCMPi8 renamable $r1, 1, 14 /* CC::al */, $noreg, implicit-def $cpsr
				; CHECK: t2IT 11, 8, implicit-def $itstate
				; CHECK: frame-destroy tPOP_RET 11 /* CC::lt */, killed $cpsr, def $r7, def $pc, implicit killed $itstate
				; CHECK: renamable $q0 = MVE_VMOVimmi32 0, 0, $noreg, undef renamable $q0
				; CHECK: renamable $r3, dead $cpsr = nsw tRSB renamable $r2, 14 /* CC::al */, $noreg
				; CHECK: $lr = MVE_DLSTP_32 killed renamable $r1
				; CHECK: bb.1.vector.body:
				; CHECK: successors: %bb.1(0x7c000000), %bb.2(0x04000000)
				; CHECK: liveins: $lr, $q0, $r0, $r2, $r3
				; CHECK: renamable $q1 = MVE_VLDRWU32 renamable $r0, 0, 0, killed $noreg
				; CHECK: MVE_VPTv4s32r 12, renamable $q1, renamable $r2, 10, implicit-def $vpr
				; CHECK: renamable $vpr = MVE_VCMPs32r killed renamable $q1, renamable $r3, 13, 1, killed renamable $vpr
				; CHECK: renamable $r0 = MVE_VSTRWU32_post renamable $q0, killed renamable $r0, 16, 2, killed renamable $vpr
				; CHECK: $lr = MVE_LETP killed renamable $lr, %bb.1
				; CHECK: bb.2.for.cond.cleanup:
				; CHECK: frame-destroy tPOP_RET 14 /* CC::al */, $noreg, def $r7, def $pc
				;
				; Test including a else-predicated VCTP.
				;
				bb.0.entry:
				successors: %bb.1(0x80000000)
				liveins: $r0, $r1, $r2, $r7, $lr

				frame-setup tPUSH 14 /* CC::al */, $noreg, killed $r7, killed $lr, implicit-def $sp, implicit $sp
				frame-setup CFI_INSTRUCTION def_cfa_offset 8
				frame-setup CFI_INSTRUCTION offset $lr, -4
				frame-setup CFI_INSTRUCTION offset $r7, -8
				tCMPi8 renamable $r1, 1, 14 /* CC::al */, $noreg, implicit-def $cpsr
				t2IT 11, 8, implicit-def $itstate
				frame-destroy tPOP_RET 11 /* CC::lt */, killed $cpsr, def $r7, def $pc, implicit killed $itstate
				renamable $r3, dead $cpsr = tADDi3 renamable $r1, 3, 14 /* CC::al */, $noreg
				renamable $q0 = MVE_VMOVimmi32 0, 0, $noreg, undef renamable $q0
				renamable $r3 = t2BICri killed renamable $r3, 3, 14 /* CC::al */, $noreg, $noreg
				renamable $r12 = t2SUBri killed renamable $r3, 4, 14 /* CC::al */, $noreg, $noreg
				renamable $r3, dead $cpsr = tMOVi8 1, 14 /* CC::al */, $noreg
				renamable $lr = nuw nsw t2ADDrs killed renamable $r3, killed renamable $r12, 19, 14 /* CC::al */, $noreg, $noreg
				renamable $r3, dead $cpsr = nsw tRSB renamable $r2, 14 /* CC::al */, $noreg
				t2DoLoopStart renamable $lr

				bb.1.vector.body:
				successors: %bb.1(0x7c000000), %bb.2(0x04000000)
				liveins: $lr, $q0, $r0, $r1, $r2, $r3

				renamable $vpr = MVE_VCTP32 renamable $r1, 0, $noreg
				MVE_VPST 8, implicit $vpr
				renamable $q1 = MVE_VLDRWU32 renamable $r0, 0, 1, killed renamable $vpr
				MVE_VPTv4s32r 14, renamable $q1, renamable $r2, 10, implicit-def $vpr
				renamable $vpr = MVE_VCMPs32r killed renamable $q1, renamable $r3, 13, 1, killed renamable $vpr
				renamable $vpr = MVE_VCTP32 renamable $r1, 2, killed renamable $vpr
				renamable $r0 = MVE_VSTRWU32_post renamable $q0, killed renamable $r0, 16, 2, killed renamable $vpr
				renamable $r1, dead $cpsr = tSUBi8 killed renamable $r1, 4, 14 /* CC::al */, $noreg
				renamable $lr = t2LoopDec killed renamable $lr, 1
				t2LoopEnd renamable $lr, %bb.1, implicit-def dead $cpsr
				tB %bb.2, 14 /* CC::al */, $noreg

				bb.2.for.cond.cleanup:
				frame-destroy tPOP_RET 14 /* CC::al */, $noreg, def $r7, def $pc
				...
				---
				name: unrelated_vpt
				alignment: 2
				exposesReturnsTwice: false
				legalized: false
				regBankSelected: false
				selected: false
				failedISel: false
				tracksRegLiveness: true
				hasWinCFI: false
				registers: []
				liveins:
				- { reg: '$r0', virtual-reg: '' }
				- { reg: '$r1', virtual-reg: '' }
				- { reg: '$r2', virtual-reg: '' }
				frameInfo:
				isFrameAddressTaken: false
				isReturnAddressTaken: false
				hasStackMap: false
				hasPatchPoint: false
				stackSize: 8
				offsetAdjustment: 0
				maxAlignment: 4
				adjustsStack: false
				hasCalls: false
				stackProtector: ''
				maxCallFrameSize: 0
				cvBytesOfCalleeSavedRegisters: 0
				hasOpaqueSPAdjustment: false
				hasVAStart: false
				hasMustTailInVarArgFunc: false
				localFrameSize: 0
				savePoint: ''
				restorePoint: ''
				fixedStack: []
				stack:
				- { id: 0, name: '', type: spill-slot, offset: -4, size: 4, alignment: 4,
				stack-id: default, callee-saved-register: '$lr', callee-saved-restored: false,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				- { id: 1, name: '', type: spill-slot, offset: -8, size: 4, alignment: 4,
				stack-id: default, callee-saved-register: '$r7', callee-saved-restored: true,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				callSites: []
				constants: []
				machineFunctionInfo: {}
				body: \|
				; CHECK-LABEL: name: unrelated_vpt
				; CHECK: bb.0.entry:
				; CHECK: successors: %bb.1(0x80000000)
				; CHECK: liveins: $lr, $r0, $r1, $r2, $r7
				; CHECK: frame-setup tPUSH 14 /* CC::al */, $noreg, killed $r7, killed $lr, implicit-def $sp, implicit $sp
				; CHECK: frame-setup CFI_INSTRUCTION def_cfa_offset 8
				; CHECK: frame-setup CFI_INSTRUCTION offset $lr, -4
				; CHECK: frame-setup CFI_INSTRUCTION offset $r7, -8
				; CHECK: tCMPi8 renamable $r1, 1, 14 /* CC::al */, $noreg, implicit-def $cpsr
				; CHECK: t2IT 11, 8, implicit-def $itstate
				; CHECK: frame-destroy tPOP_RET 11 /* CC::lt */, killed $cpsr, def $r7, def $pc, implicit killed $itstate
				; CHECK: renamable $r3, dead $cpsr = tADDi3 renamable $r1, 3, 14 /* CC::al */, $noreg
				; CHECK: renamable $q0 = MVE_VMOVimmi32 0, 0, $noreg, undef renamable $q0
				; CHECK: renamable $r3 = t2BICri killed renamable $r3, 3, 14 /* CC::al */, $noreg, $noreg
				; CHECK: renamable $r12 = t2SUBri killed renamable $r3, 4, 14 /* CC::al */, $noreg, $noreg
				; CHECK: renamable $r3, dead $cpsr = tMOVi8 1, 14 /* CC::al */, $noreg
				; CHECK: renamable $lr = nuw nsw t2ADDrs killed renamable $r3, killed renamable $r12, 19, 14 /* CC::al */, $noreg, $noreg
				; CHECK: renamable $r3, dead $cpsr = nsw tRSB renamable $r2, 14 /* CC::al */, $noreg
				; CHECK: $lr = t2DLS killed renamable $lr
				; CHECK: bb.1.vector.body:
				; CHECK: successors: %bb.1(0x7c000000), %bb.2(0x04000000)
				; CHECK: liveins: $lr, $q0, $r0, $r1, $r2, $r3
				; CHECK: renamable $vpr = MVE_VCTP32 renamable $r1, 0, $noreg
				; CHECK: MVE_VPST 8, implicit $vpr
				; CHECK: renamable $q1 = MVE_VLDRWU32 renamable $r0, 0, 1, killed renamable $vpr
				; CHECK: MVE_VPTv4s32r 2, renamable $q0, renamable $r2, 11, implicit-def $vpr
				; CHECK: renamable $vpr = MVE_VCMPs32r killed renamable $q1, renamable $r3, 12, 1, killed renamable $vpr
				; CHECK: renamable $vpr = MVE_VCTP32 renamable $r1, 1, killed renamable $vpr
				; CHECK: renamable $r0 = MVE_VSTRWU32_post renamable $q0, killed renamable $r0, 16, 1, killed renamable $vpr
				; CHECK: renamable $r1, dead $cpsr = tSUBi8 killed renamable $r1, 4, 14 /* CC::al */, $noreg
				; CHECK: $lr = t2LEUpdate killed renamable $lr, %bb.1
				; CHECK: bb.2.for.cond.cleanup:
				; CHECK: frame-destroy tPOP_RET 14 /* CC::al */, $noreg, def $r7, def $pc
				;
				; Tests that a VPT block will be refused if it uses a VPT instruction and its operand are
				; not related in any way to the VCTP.
				;
				bb.0.entry:
				successors: %bb.1(0x80000000)
				liveins: $r0, $r1, $r2, $r7, $lr
				frame-setup tPUSH 14 /* CC::al */, $noreg, killed $r7, killed $lr, implicit-def $sp, implicit $sp
				frame-setup CFI_INSTRUCTION def_cfa_offset 8
				frame-setup CFI_INSTRUCTION offset $lr, -4
				frame-setup CFI_INSTRUCTION offset $r7, -8
				tCMPi8 renamable $r1, 1, 14 /* CC::al */, $noreg, implicit-def $cpsr
				t2IT 11, 8, implicit-def $itstate
				frame-destroy tPOP_RET 11 /* CC::lt */, killed $cpsr, def $r7, def $pc, implicit killed $itstate
				renamable $r3, dead $cpsr = tADDi3 renamable $r1, 3, 14 /* CC::al */, $noreg
				renamable $q0 = MVE_VMOVimmi32 0, 0, $noreg, undef renamable $q0
				renamable $r3 = t2BICri killed renamable $r3, 3, 14 /* CC::al */, $noreg, $noreg
				renamable $r12 = t2SUBri killed renamable $r3, 4, 14 /* CC::al */, $noreg, $noreg
				renamable $r3, dead $cpsr = tMOVi8 1, 14 /* CC::al */, $noreg
				renamable $lr = nuw nsw t2ADDrs killed renamable $r3, killed renamable $r12, 19, 14 /* CC::al */, $noreg, $noreg
				renamable $r3, dead $cpsr = nsw tRSB renamable $r2, 14 /* CC::al */, $noreg
				t2DoLoopStart renamable $lr

				bb.1.vector.body:
				successors: %bb.1(0x7c000000), %bb.2(0x04000000)
				liveins: $lr, $q0, $r0, $r1, $r2, $r3

				renamable $vpr = MVE_VCTP32 renamable $r1, 0, $noreg
				MVE_VPST 8, implicit $vpr
				renamable $q1 = MVE_VLDRWU32 renamable $r0, 0, 1, killed renamable $vpr
				MVE_VPTv4s32r 2, renamable $q0, renamable $r2, 11, implicit-def $vpr
				renamable $vpr = MVE_VCMPs32r killed renamable $q1, renamable $r3, 12, 1, killed renamable $vpr
				renamable $vpr = MVE_VCTP32 renamable $r1, 1, killed renamable $vpr
				renamable $r0 = MVE_VSTRWU32_post renamable $q0, killed renamable $r0, 16, 1, killed renamable $vpr
				renamable $r1, dead $cpsr = tSUBi8 killed renamable $r1, 4, 14 /* CC::al */, $noreg
				renamable $lr = t2LoopDec killed renamable $lr, 1
				t2LoopEnd renamable $lr, %bb.1, implicit-def dead $cpsr
				tB %bb.2, 14 /* CC::al */, $noreg

				bb.2.for.cond.cleanup:
				frame-destroy tPOP_RET 14 /* CC::al */, $noreg, def $r7, def $pc
				...