This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/ARM/
-
Target/
-
ARM/
4/10
ARMLowOverheadLoops.cpp
-
test/CodeGen/Thumb2/LowOverheadLoops/
-
CodeGen/
-
Thumb2/
-
LowOverheadLoops/
-
invariant-qreg.mir
-
predicated-invariant.mir
1
unpredicated-min.mir

Differential D75452

[ARM][MVE] Validate tail predication values
ClosedPublic

Authored by samparker on Mar 2 2020, 6:51 AM.

Download Raw Diff

Details

Reviewers

dmgreen
SjoerdMeijer

Commits

rGff9ac33e1e02: [ARM][MVE] Validate tail predication values

Summary

Iterate through the loop and check that the observable values produced are the same whether tail predication happens or not.
We want to find out if the tail-predicated version of this loop will produce the same values as the loop in its original form. For this to be true, the newly inserted implicit predication must not change the the results. All MVE loads and stores have to be predicated, so we know that any load operands, or stored results are equivalent already.
Other explicitly predicated instructions will perform the same operation in the original loop and the tail-predicated form. We call predicated instructions 'Known' and their users are also Known if it overwrites an Known operand. This is because tail predication will cause the false lanes to not be updated, but we know the output because we know the input.
For any 'Unknown' instructions, we can check that they're only consumed by Known instructions because it means that the unknown false lane(s) are replaced with known lane(s).
What this should result in is that all instructions are dependent upon predicated inputs. It also means that for an iteration of the loop, a register should hold the same value whether tail predication has happened or not, with the exception when the differences are masked away by their user(s) and not observable elsewhere.

Diff Detail

Event Timeline

samparker created this revision.Mar 2 2020, 6:51 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2020, 6:51 AM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald Transcript

Harbormaster failed remote builds in B47782: Diff 247626!Mar 2 2020, 7:09 AM

Rewrite.

samparker added a child revision: D75533: [ARM][LowOverheadLoops] Handle reductions.Mar 3 2020, 8:35 AM

Renamed the sets.
Change comment describing what I'm trying to achieve.

SjoerdMeijer added inline comments.Mar 5 2020, 2:15 AM

llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
555	Would it be good to spell out here that we have these 2 use cases? IR produced by the vectoriser, and IR originating from hand written (vector) intrinsics, which can also trigger tail-predication. loads/stores produced by the vectoriser (when it thinks tail-folding is good) will all be nicely masked, so this concerns the 2nd use case. Is that right?
560	nit: analyse -> analysis?
574	I appreciate this is bikeshedding, but here we go anyway: UnknownFalseLanes and KnownFalseZeros is a bit difficult to read (in the code below), but your descriptions/comments above are clear. Summarising that, we have a set of instructions with some behaviour that we (immediately) understand, and there are some instructions that needs to be further analysed. So, was wondering `KnownFalseZeros` could be named something along the lines of just `Predicated`, and `UnknownFalseLanes` be the `Worklist` of things to check?
589	can you use one of your new RDA helper functions for this?

samparker marked 3 inline comments as done.Mar 5 2020, 3:45 AM

samparker added inline comments.

llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
555	This pass shouldn't be concerned with whether the input is from the vectorizer or the intrinsics, as we can't know. On line 762 we enforce that all MVE memory operations are explicitly predicated.
574	I did have Predicated at some point, but I thought that was misleading because some instructions won't be using the VPR. I like KnownFalseZeros because it's explicit in what we expect and I'm much rather have code that doesn't require a big comment to make any sense of what is going on. It doesn't have to be that, but its name should convey what's important about the set. The naming of UnknownFalseLanes is definitely less important though.
589	There were just static helpers and I don't think we won't really do enough of this logic to warrant more helpers.

was just wondering about more corner cases for "negative" tests: do we have test where lanes are swapped (if that makes sense)?

llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
582	Is there a test for this?
llvm/test/CodeGen/Thumb2/LowOverheadLoops/unpredicated-min.mir
1	nit: rename file unpredicated-min.mir -> unpredicated-max.mir?

do we have test where lanes are swapped (if that makes sense)?

I don't think there's one for this pass... but that was the reason for having a unit test for validForTailPredication, so we don't have to write a whole loop to test an instruction.

llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
582	The only instructions that I can think of would be indexed load/stores, but they'll be caught by isVectorPredicated. If/When we can enable reductions, we could have an instruction producing two scalar values and then this logic will need to be reorganised a bit.

LGTM, with one nit inline.

llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
582	okidoki, then probably just remove this, or replace it with an assert?

This revision is now accepted and ready to land.Mar 10 2020, 2:27 AM

Closed by commit rGff9ac33e1e02: [ARM][MVE] Validate tail predication values (authored by samparker). · Explain WhyMar 10 2020, 3:03 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

Target/

ARM/

ARMLowOverheadLoops.cpp

101 lines

test/

CodeGen/

Thumb2/

LowOverheadLoops/

invariant-qreg.mir

473 lines

predicated-invariant.mir

156 lines

unpredicated-min.mir

150 lines

Diff 248180

llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp

	Show First 20 Lines • Show All 492 Lines • ▼ Show 20 Lines
	for (auto *MI : ElementChain)			for (auto *MI : ElementChain)
	dbgs() << " - " << *MI);			dbgs() << " - " << *MI);
	ToRemove.insert(ElementChain.begin(), ElementChain.end());			ToRemove.insert(ElementChain.begin(), ElementChain.end());
	}			}
	}			}
	return true;			return true;
	}			}

				static bool isVectorPredicated(MachineInstr *MI) {
				int PIdx = llvm::findFirstVPTPredOperandIdx(*MI);
				return PIdx != -1 && MI->getOperand(PIdx + 1).getReg() == ARM::VPR;
				}

				static bool isRegInClass(const MachineOperand &MO,
				const TargetRegisterClass *Class) {
				return MO.isReg() && MO.getReg() && Class->contains(MO.getReg());
				}

	bool LowOverheadLoop::ValidateLiveOuts() const {			bool LowOverheadLoop::ValidateLiveOuts() const {
	// Collect Q-regs that are live in the exit blocks. We don't collect scalars			// Collect Q-regs that are live in the exit blocks. We don't collect scalars
	// because they won't be affected by lane predication.			// because they won't be affected by lane predication.
	const TargetRegisterClass *QPRs = TRI.getRegClass(ARM::MQPRRegClassID);			const TargetRegisterClass *QPRs = TRI.getRegClass(ARM::MQPRRegClassID);
	SmallSet<Register, 2> LiveOuts;			SmallSet<Register, 2> LiveOuts;
	SmallVector<MachineBasicBlock*, 2> ExitBlocks;			SmallVector<MachineBasicBlock *, 2> ExitBlocks;
	ML.getExitBlocks(ExitBlocks);			ML.getExitBlocks(ExitBlocks);
	for (auto *MBB : ExitBlocks)			for (auto *MBB : ExitBlocks)
	for (const MachineBasicBlock::RegisterMaskPair &RegMask : MBB->liveins())			for (const MachineBasicBlock::RegisterMaskPair &RegMask : MBB->liveins())
	if (QPRs->contains(RegMask.PhysReg))			if (QPRs->contains(RegMask.PhysReg))
	LiveOuts.insert(RegMask.PhysReg);			LiveOuts.insert(RegMask.PhysReg);

	// Collect the instructions in the loop body that define the live-out values.			// Collect the instructions in the loop body that define the live-out values.
	SmallPtrSet<MachineInstr*, 2> LiveMIs;			SmallPtrSet<MachineInstr *, 2> LiveMIs;
				assert(ML.getNumBlocks() == 1 && "Expected single block loop!");
	MachineBasicBlock *MBB = ML.getHeader();			MachineBasicBlock *MBB = ML.getHeader();
	for (auto Reg : LiveOuts)			for (auto Reg : LiveOuts)
	if (auto *MI = RDA.getLocalLiveOutMIDef(MBB, Reg))			if (auto *MI = RDA.getLocalLiveOutMIDef(MBB, Reg))
	LiveMIs.insert(MI);			LiveMIs.insert(MI);

	LLVM_DEBUG(dbgs() << "ARM Loops: Found loop live-outs:\n";			LLVM_DEBUG(dbgs() << "ARM Loops: Found loop live-outs:\n";
	for (auto *MI : LiveMIs)			for (auto *MI : LiveMIs)
	dbgs() << " - " << *MI);			dbgs() << " - " << *MI);
	// We've already validated that any VPT predication within the loop will be			// We've already validated that any VPT predication within the loop will be
	// equivalent when we perform the predication transformation; so we know that			// equivalent when we perform the predication transformation; so we know that
	// any VPT predicated instruction is predicated upon VCTP. Any live-out			// any VPT predicated instruction is predicated upon VCTP. Any live-out
	// instruction needs to be predicated, so check this here.			// instruction needs to be predicated, so check this here.
	for (auto *MI : LiveMIs) {			for (auto *MI : LiveMIs)
	int PIdx = llvm::findFirstVPTPredOperandIdx(*MI);			if (!isVectorPredicated(MI))
	if (PIdx == -1 \|\| MI->getOperand(PIdx+1).getReg() != ARM::VPR)
	return false;			return false;

				// We want to find out if the tail-predicated version of this loop will
				// produce the same values as the loop in its original form. For this to
				// be true, the newly inserted implicit predication must not change the
				// the (observable) results.
				SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Would it be good to spell out here that we have these 2 use cases? IR produced by the vectoriser, and IR originating from hand written (vector) intrinsics, which can also trigger tail-predication. loads/stores produced by the vectoriser (when it thinks tail-folding is good) will all be nicely masked, so this concerns the 2nd use case. Is that right? SjoerdMeijer: Would it be good to spell out here that we have these 2 use cases? - IR produced by the…
				samparkerAuthorUnsubmitted Done Reply Inline Actions This pass shouldn't be concerned with whether the input is from the vectorizer or the intrinsics, as we can't know. On line 762 we enforce that all MVE memory operations are explicitly predicated. samparker: This pass shouldn't be concerned with whether the input is from the vectorizer or the…
				// We're doing this because many instructions in the loop will not be
				// predicated and so the conversion from VPT predication to tail-predication
				// can result in different values being produced; due to the tail-predication
				// preventing many instructions from updating their falsely predicated
				// lanes. This analyse assumes that all the instructions perform lane-wise
				SjoerdMeijerUnsubmitted Not Done Reply Inline Actions nit: analyse -> analysis? SjoerdMeijer: nit: analyse -> analysis?
				// operations and don't perform any exchanges.
				// A masked load, whether through VPT or tail predication, will write zeros
				// to any of the falsely predicated bytes. So, from the loads, we know that
				// the false lanes are zeroed and here we're trying to track that those false
				// lanes remain zero, or where they change, the differences are masked away
				// by their user(s).
				// All MVE loads and stores have to be predicated, so we know that any load
				// operands, or stored results are equivalent already. Other explicitly
				// predicated instructions will perform the same operation in the original
				// loop and the tail-predicated form too. Because of this, we can insert
				// loads, stores and other predicated instructions into our KnownFalseZeros
				// set and build from there.
				SetVector<MachineInstr *> UnknownFalseLanes;
				SmallPtrSet<MachineInstr *, 4> KnownFalseZeros;
				SjoerdMeijerUnsubmitted Not Done Reply Inline Actions I appreciate this is bikeshedding, but here we go anyway: UnknownFalseLanes and KnownFalseZeros is a bit difficult to read (in the code below), but your descriptions/comments above are clear. Summarising that, we have a set of instructions with some behaviour that we (immediately) understand, and there are some instructions that needs to be further analysed. So, was wondering `KnownFalseZeros` could be named something along the lines of just `Predicated`, and `UnknownFalseLanes` be the `Worklist` of things to check? SjoerdMeijer: I appreciate this is bikeshedding, but here we go anyway: UnknownFalseLanes and KnownFalseZeros…
				samparkerAuthorUnsubmitted Done Reply Inline Actions I did have Predicated at some point, but I thought that was misleading because some instructions won't be using the VPR. I like KnownFalseZeros because it's explicit in what we expect and I'm much rather have code that doesn't require a big comment to make any sense of what is going on. It doesn't have to be that, but its name should convey what's important about the set. The naming of UnknownFalseLanes is definitely less important though. samparker: I did have Predicated at some point, but I thought that was misleading because some…
				for (auto &MI : *MBB) {
				if (isVectorPredicated(&MI)) {
				KnownFalseZeros.insert(&MI);
				continue;
				}

				// Only evaluate instructions which produce a single value.
				if (MI.getNumDefs() != 1 \|\| !MI.defs().begin()->isReg()) {
				SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Is there a test for this? SjoerdMeijer: Is there a test for this?
				samparkerAuthorUnsubmitted Done Reply Inline Actions The only instructions that I can think of would be indexed load/stores, but they'll be caught by isVectorPredicated. If/When we can enable reductions, we could have an instruction producing two scalar values and then this logic will need to be reorganised a bit. samparker: The only instructions that I can think of would be indexed load/stores, but they'll be caught…
				SjoerdMeijerUnsubmitted Not Done Reply Inline Actions okidoki, then probably just remove this, or replace it with an assert? SjoerdMeijer: okidoki, then probably just remove this, or replace it with an assert?
				UnknownFalseLanes.insert(&MI);
				continue;
				}

				Register DefReg = MI.defs().begin()->getReg();
				for (auto &MO : MI.operands()) {
				if (!isRegInClass(MO, QPRs) \|\| !MO.isUse() \|\| MO.getReg() != DefReg)
				SjoerdMeijerUnsubmitted Not Done Reply Inline Actions can you use one of your new RDA helper functions for this? SjoerdMeijer: can you use one of your new RDA helper functions for this?
				samparkerAuthorUnsubmitted Done Reply Inline Actions There were just static helpers and I don't think we won't really do enough of this logic to warrant more helpers. samparker: There were just static helpers and I don't think we won't really do enough of this logic to…
				continue;

				// If this instruction overwrites one of its operands, and that register
				// has known lanes, then this instruction also has known predicated false
				// lanes.
				if (auto *OpDef = RDA.getMIOperand(&MI, MO)) {
				if (KnownFalseZeros.count(OpDef)) {
				KnownFalseZeros.insert(&MI);
				break;
				}
				}
				}
				if (!KnownFalseZeros.count(&MI))
				UnknownFalseLanes.insert(&MI);
	}			}

				auto HasKnownUsers = [this](MachineInstr *MI, const MachineOperand &MO,
				SmallPtrSetImpl<MachineInstr *> &Knowns) {
				SmallPtrSet<MachineInstr *, 2> Uses;
				RDA.getGlobalUses(MI, MO.getReg(), Uses);
				for (auto *Use : Uses) {
				if (Use != MI && !Knowns.count(Use))
				return false;
				}
				return true;
				};

				// Now for all the unknown values, see if they're only consumed by known
				// instructions. Visit in reverse so that we can start at the values being
				// stored and then we can work towards the leaves, hopefully adding more
				// instructions to KnownFalseZeros.
				for (auto *MI : reverse(UnknownFalseLanes)) {
				for (auto &MO : MI->operands()) {
				if (!isRegInClass(MO, QPRs) \|\| !MO.isDef())
				continue;
				if (!HasKnownUsers(MI, MO, KnownFalseZeros)) {
				LLVM_DEBUG(dbgs() << "ARM Loops: Found an unknown def of : "
				<< TRI.getRegAsmName(MO.getReg()) << " at " << *MI);
				return false;
				}
				}
				// Any unknown false lanes have been masked away by the user(s).
				KnownFalseZeros.insert(MI);
				}
	return true;			return true;
	}			}

	void LowOverheadLoop::CheckLegality(ARMBasicBlockUtils *BBUtils) {			void LowOverheadLoop::CheckLegality(ARMBasicBlockUtils *BBUtils) {
	if (Revert)			if (Revert)
	return;			return;

	if (!End->getOperand(1).isMBB())			if (!End->getOperand(1).isMBB())
	▲ Show 20 Lines • Show All 492 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/LowOverheadLoops/invariant-qreg.mir

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				# RUN: llc -mtriple=thumbv8.1m.main -mattr=+mve -run-pass=arm-low-overhead-loops %s -o - \| FileCheck %s

				--- \|
				define dso_local <4 x i32> @invariant_use_store(i16* nocapture readonly %a, i32* %c, i32 %N, <4 x i32> %pass) {
				entry:
				%cmp9 = icmp eq i32 %N, 0
				%tmp = add i32 %N, 3
				%tmp1 = lshr i32 %tmp, 2
				%tmp2 = shl nuw i32 %tmp1, 2
				%tmp3 = add i32 %tmp2, -4
				%tmp4 = lshr i32 %tmp3, 2
				%tmp5 = add nuw nsw i32 %tmp4, 1
				br i1 %cmp9, label %exit, label %vector.ph

				vector.ph: ; preds = %entry
				call void @llvm.set.loop.iterations.i32(i32 %tmp5)
				br label %vector.body

				vector.body: ; preds = %vector.body, %vector.ph
				%lsr.iv1 = phi i32 [ %lsr.iv.next, %vector.body ], [ %tmp5, %vector.ph ]
				%lsr.iv20 = phi i32* [ %scevgep20, %vector.body ], [ %c, %vector.ph ]
				%lsr.iv = phi i16* [ %scevgep, %vector.body ], [ %a, %vector.ph ]
				%vec.phi = phi <4 x i32> [ zeroinitializer, %vector.ph ], [ %tmp13, %vector.body ]
				%tmp7 = phi i32 [ %N, %vector.ph ], [ %tmp9, %vector.body ]
				%lsr.iv17 = bitcast i16* %lsr.iv to <4 x i16>*
				%lsr.store = bitcast i32* %lsr.iv20 to <4 x i32>*
				%tmp8 = call <4 x i1> @llvm.arm.mve.vctp32(i32 %tmp7)
				%tmp9 = sub i32 %tmp7, 4
				%wide.masked.load = call <4 x i16> @llvm.masked.load.v4i16.p0v4i16(<4 x i16>* %lsr.iv17, i32 2, <4 x i1> %tmp8, <4 x i16> undef)
				%tmp10 = sext <4 x i16> %wide.masked.load to <4 x i32>
				%tmp12 = mul nsw <4 x i32> %pass, %tmp10
				%tmp13 = add <4 x i32> %tmp12, %vec.phi
				call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> %tmp13, <4 x i32>* %lsr.store, i32 4, <4 x i1> %tmp8)
				%scevgep = getelementptr i16, i16* %lsr.iv, i32 4
				%scevgep20 = getelementptr i32, i32* %lsr.iv20, i32 4
				%tmp14 = call i32 @llvm.loop.decrement.reg.i32.i32.i32(i32 %lsr.iv1, i32 1)
				%tmp15 = icmp ne i32 %tmp14, 0
				%lsr.iv.next = add nsw i32 %lsr.iv1, -1
				br i1 %tmp15, label %vector.body, label %exit

				exit: ; preds = %vector.body, %entry
				ret <4 x i32> %pass
				}

				define dso_local i32 @invariant_mul_use_reduce(i16* nocapture readonly %a, i32* %c, i32 %N, <4 x i32> %pass) {
				entry:
				%cmp9 = icmp eq i32 %N, 0
				%tmp = add i32 %N, 3
				%tmp1 = lshr i32 %tmp, 2
				%tmp2 = shl nuw i32 %tmp1, 2
				%tmp3 = add i32 %tmp2, -4
				%tmp4 = lshr i32 %tmp3, 2
				%tmp5 = add nuw nsw i32 %tmp4, 1
				br i1 %cmp9, label %exit, label %vector.ph

				vector.ph: ; preds = %entry
				call void @llvm.set.loop.iterations.i32(i32 %tmp5)
				br label %vector.body

				vector.body: ; preds = %vector.body, %vector.ph
				%lsr.iv1 = phi i32 [ %lsr.iv.next, %vector.body ], [ %tmp5, %vector.ph ]
				%lsr.iv = phi i16* [ %scevgep, %vector.body ], [ %a, %vector.ph ]
				%tmp7 = phi i32 [ %N, %vector.ph ], [ %tmp9, %vector.body ]
				%lsr.iv17 = bitcast i16* %lsr.iv to <4 x i16>*
				%tmp8 = call <4 x i1> @llvm.arm.mve.vctp32(i32 %tmp7)
				%tmp9 = sub i32 %tmp7, 4
				%wide.masked.load = call <4 x i16> @llvm.masked.load.v4i16.p0v4i16(<4 x i16>* %lsr.iv17, i32 2, <4 x i1> %tmp8, <4 x i16> undef)
				%tmp10 = sext <4 x i16> %wide.masked.load to <4 x i32>
				%tmp12 = mul nsw <4 x i32> %pass, %tmp10
				%tmp13 = call i32 @llvm.experimental.vector.reduce.add.v4i32(<4 x i32> %tmp12)
				%scevgep = getelementptr i16, i16* %lsr.iv, i32 4
				%tmp15 = call i32 @llvm.loop.decrement.reg.i32.i32.i32(i32 %lsr.iv1, i32 1)
				%tmp16 = icmp ne i32 %tmp15, 0
				%lsr.iv.next = add nsw i32 %lsr.iv1, -1
				br i1 %tmp16, label %vector.body, label %exit

				exit: ; preds = %vector.body, %entry
				%res = phi i32 [ 0, %entry ], [ %tmp13, %vector.body ]
				ret i32 %res
				}

				define dso_local i32 @invariant_add_use_reduce(i16* nocapture readonly %a, i32* %c, i32 %N, <4 x i32> %pass) {
				entry:
				%cmp9 = icmp eq i32 %N, 0
				%tmp = add i32 %N, 3
				%tmp1 = lshr i32 %tmp, 2
				%tmp2 = shl nuw i32 %tmp1, 2
				%tmp3 = add i32 %tmp2, -4
				%tmp4 = lshr i32 %tmp3, 2
				%tmp5 = add nuw nsw i32 %tmp4, 1
				br i1 %cmp9, label %exit, label %vector.ph

				vector.ph: ; preds = %entry
				call void @llvm.set.loop.iterations.i32(i32 %tmp5)
				br label %vector.body

				vector.body: ; preds = %vector.body, %vector.ph
				%lsr.iv1 = phi i32 [ %lsr.iv.next, %vector.body ], [ %tmp5, %vector.ph ]
				%lsr.iv = phi i16* [ %scevgep, %vector.body ], [ %a, %vector.ph ]
				%tmp7 = phi i32 [ %N, %vector.ph ], [ %tmp9, %vector.body ]
				%lsr.iv17 = bitcast i16* %lsr.iv to <4 x i16>*
				%tmp8 = call <4 x i1> @llvm.arm.mve.vctp32(i32 %tmp7)
				%tmp9 = sub i32 %tmp7, 4
				%wide.masked.load = call <4 x i16> @llvm.masked.load.v4i16.p0v4i16(<4 x i16>* %lsr.iv17, i32 2, <4 x i1> %tmp8, <4 x i16> undef)
				%tmp10 = sext <4 x i16> %wide.masked.load to <4 x i32>
				%tmp12 = add nsw <4 x i32> %pass, %tmp10
				%tmp13 = call i32 @llvm.experimental.vector.reduce.add.v4i32(<4 x i32> %tmp12)
				%scevgep = getelementptr i16, i16* %lsr.iv, i32 4
				%tmp15 = call i32 @llvm.loop.decrement.reg.i32.i32.i32(i32 %lsr.iv1, i32 1)
				%tmp16 = icmp ne i32 %tmp15, 0
				%lsr.iv.next = add nsw i32 %lsr.iv1, -1
				br i1 %tmp16, label %vector.body, label %exit

				exit: ; preds = %vector.body, %entry
				%res = phi i32 [ 0, %entry ], [ %tmp13, %vector.body ]
				ret i32 %res
				}

				declare i32 @llvm.experimental.vector.reduce.add.v4i32(<4 x i32>)
				declare <4 x i16> @llvm.masked.load.v4i16.p0v4i16(<4 x i16>*, i32 immarg, <4 x i1>, <4 x i16>)
				declare void @llvm.masked.store.v4i32.p0v4i32(<4 x i32>, <4 x i32>*, i32 immarg, <4 x i1>)
				declare void @llvm.set.loop.iterations.i32(i32)
				declare i32 @llvm.loop.decrement.reg.i32.i32.i32(i32, i32)
				declare <4 x i1> @llvm.arm.mve.vctp32(i32)

				...
				---
				name: invariant_use_store
				alignment: 2
				tracksRegLiveness: true
				registers: []
				liveins:
				- { reg: '$r0', virtual-reg: '' }
				- { reg: '$r1', virtual-reg: '' }
				- { reg: '$r2', virtual-reg: '' }
				frameInfo:
				stackSize: 8
				offsetAdjustment: 0
				maxAlignment: 8
				fixedStack:
				- { id: 0, type: default, offset: 0, size: 16, alignment: 8, stack-id: default,
				isImmutable: true, isAliased: false, callee-saved-register: '', callee-saved-restored: true,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				stack:
				- { id: 0, name: '', type: spill-slot, offset: -4, size: 4, alignment: 4,
				stack-id: default, callee-saved-register: '$lr', callee-saved-restored: false,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				- { id: 1, name: '', type: spill-slot, offset: -8, size: 4, alignment: 4,
				stack-id: default, callee-saved-register: '$r7', callee-saved-restored: true,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				callSites: []
				constants: []
				machineFunctionInfo: {}
				body: \|
				; CHECK-LABEL: name: invariant_use_store
				; CHECK: bb.0.entry:
				; CHECK: successors: %bb.3(0x30000000), %bb.1(0x50000000)
				; CHECK: liveins: $lr, $r0, $r1, $r2, $r7
				; CHECK: frame-setup tPUSH 14 /* CC::al */, $noreg, killed $r7, killed $lr, implicit-def $sp, implicit $sp
				; CHECK: frame-setup CFI_INSTRUCTION def_cfa_offset 8
				; CHECK: frame-setup CFI_INSTRUCTION offset $lr, -4
				; CHECK: frame-setup CFI_INSTRUCTION offset $r7, -8
				; CHECK: renamable $r3 = tADDrSPi $sp, 2, 14 /* CC::al */, $noreg
				; CHECK: renamable $q0 = MVE_VLDRWU32 killed renamable $r3, 0, 0, $noreg :: (load 16 from %fixed-stack.0, align 8)
				; CHECK: tCBZ $r2, %bb.3
				; CHECK: bb.1.vector.ph:
				; CHECK: successors: %bb.2(0x80000000)
				; CHECK: liveins: $q0, $r0, $r1, $r2
				; CHECK: renamable $q1 = MVE_VMOVimmi32 0, 0, $noreg, undef renamable $q1
				; CHECK: $lr = MVE_DLSTP_32 killed renamable $r2
				; CHECK: bb.2.vector.body:
				; CHECK: successors: %bb.2(0x7c000000), %bb.3(0x04000000)
				; CHECK: liveins: $lr, $q0, $q1, $r0, $r1
				; CHECK: renamable $r0, renamable $q2 = MVE_VLDRHS32_post killed renamable $r0, 8, 0, $noreg :: (load 8 from %ir.lsr.iv17, align 2)
				; CHECK: renamable $q2 = nsw MVE_VMULi32 renamable $q0, killed renamable $q2, 0, $noreg, undef renamable $q2
				; CHECK: renamable $q1 = MVE_VADDi32 killed renamable $q2, killed renamable $q1, 0, $noreg, undef renamable $q1
				; CHECK: renamable $r1 = MVE_VSTRWU32_post renamable $q1, killed renamable $r1, 16, 0, killed $noreg :: (store 16 into %ir.lsr.store, align 4)
				; CHECK: $lr = MVE_LETP killed renamable $lr, %bb.2
				; CHECK: bb.3.exit:
				; CHECK: liveins: $q0
				; CHECK: renamable $r0, renamable $r1 = VMOVRRD renamable $d0, 14 /* CC::al */, $noreg
				; CHECK: renamable $r2, renamable $r3 = VMOVRRD killed renamable $d1, 14 /* CC::al */, $noreg, implicit killed $q0
				; CHECK: tPOP_RET 14 /* CC::al */, $noreg, def $r7, def $pc, implicit killed $r0, implicit killed $r1, implicit killed $r2, implicit killed $r3
				bb.0.entry:
				successors: %bb.3(0x30000000), %bb.1(0x50000000)
				liveins: $r0, $r1, $r2, $r7, $lr

				frame-setup tPUSH 14 /* CC::al */, $noreg, killed $r7, killed $lr, implicit-def $sp, implicit $sp
				frame-setup CFI_INSTRUCTION def_cfa_offset 8
				frame-setup CFI_INSTRUCTION offset $lr, -4
				frame-setup CFI_INSTRUCTION offset $r7, -8
				renamable $r3 = tADDrSPi $sp, 2, 14 /* CC::al */, $noreg
				renamable $q0 = MVE_VLDRWU32 killed renamable $r3, 0, 0, $noreg :: (load 16 from %fixed-stack.0, align 8)
				tCBZ $r2, %bb.3

				bb.1.vector.ph:
				successors: %bb.2(0x80000000)
				liveins: $q0, $r0, $r1, $r2

				renamable $r3, dead $cpsr = tADDi3 renamable $r2, 3, 14 /* CC::al */, $noreg
				renamable $q1 = MVE_VMOVimmi32 0, 0, $noreg, undef renamable $q1
				renamable $r3 = t2BICri killed renamable $r3, 3, 14 /* CC::al */, $noreg, $noreg
				renamable $r12 = t2SUBri killed renamable $r3, 4, 14 /* CC::al */, $noreg, $noreg
				renamable $r3, dead $cpsr = tMOVi8 1, 14 /* CC::al */, $noreg
				renamable $r12 = nuw nsw t2ADDrs killed renamable $r3, killed renamable $r12, 19, 14 /* CC::al */, $noreg, $noreg
				t2DoLoopStart renamable $r12
				$r3 = tMOVr killed $r12, 14 /* CC::al */, $noreg

				bb.2.vector.body:
				successors: %bb.2(0x7c000000), %bb.3(0x04000000)
				liveins: $q0, $q1, $r0, $r1, $r2, $r3

				renamable $vpr = MVE_VCTP32 renamable $r2, 0, $noreg
				MVE_VPST 8, implicit $vpr
				renamable $r0, renamable $q2 = MVE_VLDRHS32_post killed renamable $r0, 8, 1, renamable $vpr :: (load 8 from %ir.lsr.iv17, align 2)
				$lr = tMOVr $r3, 14 /* CC::al */, $noreg
				renamable $q2 = nsw MVE_VMULi32 renamable $q0, killed renamable $q2, 0, $noreg, undef renamable $q2
				renamable $r3, dead $cpsr = nsw tSUBi8 killed $r3, 1, 14 /* CC::al */, $noreg
				renamable $r2, dead $cpsr = tSUBi8 killed renamable $r2, 4, 14 /* CC::al */, $noreg
				renamable $q1 = MVE_VADDi32 killed renamable $q2, killed renamable $q1, 0, $noreg, undef renamable $q1
				renamable $lr = t2LoopDec killed renamable $lr, 1
				MVE_VPST 8, implicit $vpr
				renamable $r1 = MVE_VSTRWU32_post renamable $q1, killed renamable $r1, 16, 1, killed renamable $vpr :: (store 16 into %ir.lsr.store, align 4)
				t2LoopEnd killed renamable $lr, %bb.2, implicit-def dead $cpsr
				tB %bb.3, 14 /* CC::al */, $noreg

				bb.3.exit:
				liveins: $q0

				renamable $r0, renamable $r1 = VMOVRRD renamable $d0, 14 /* CC::al */, $noreg
				renamable $r2, renamable $r3 = VMOVRRD killed renamable $d1, 14 /* CC::al */, $noreg, implicit $q0
				tPOP_RET 14 /* CC::al */, $noreg, def $r7, def $pc, implicit killed $r0, implicit killed $r1, implicit killed $r2, implicit killed $r3

				...
				---
				name: invariant_mul_use_reduce
				alignment: 2
				tracksRegLiveness: true
				registers: []
				liveins:
				- { reg: '$r0', virtual-reg: '' }
				- { reg: '$r2', virtual-reg: '' }
				frameInfo:
				stackSize: 8
				offsetAdjustment: 0
				maxAlignment: 8
				fixedStack:
				- { id: 0, type: default, offset: 0, size: 16, alignment: 8, stack-id: default,
				isImmutable: true, isAliased: false, callee-saved-register: '', callee-saved-restored: true,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				stack:
				- { id: 0, name: '', type: spill-slot, offset: -4, size: 4, alignment: 4,
				stack-id: default, callee-saved-register: '$lr', callee-saved-restored: false,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				- { id: 1, name: '', type: spill-slot, offset: -8, size: 4, alignment: 4,
				stack-id: default, callee-saved-register: '$r7', callee-saved-restored: true,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				callSites: []
				constants: []
				machineFunctionInfo: {}
				body: \|
				; CHECK-LABEL: name: invariant_mul_use_reduce
				; CHECK: bb.0.entry:
				; CHECK: successors: %bb.4(0x30000000), %bb.1(0x50000000)
				; CHECK: liveins: $lr, $r0, $r2, $r7
				; CHECK: frame-setup tPUSH 14 /* CC::al */, $noreg, killed $r7, killed $lr, implicit-def $sp, implicit $sp
				; CHECK: frame-setup CFI_INSTRUCTION def_cfa_offset 8
				; CHECK: frame-setup CFI_INSTRUCTION offset $lr, -4
				; CHECK: frame-setup CFI_INSTRUCTION offset $r7, -8
				; CHECK: tCBZ $r2, %bb.4
				; CHECK: bb.1.vector.ph:
				; CHECK: successors: %bb.2(0x80000000)
				; CHECK: liveins: $r0, $r2
				; CHECK: renamable $r1, dead $cpsr = tADDi3 renamable $r2, 3, 14 /* CC::al */, $noreg
				; CHECK: renamable $r3, dead $cpsr = tMOVi8 1, 14 /* CC::al */, $noreg
				; CHECK: renamable $r1 = t2BICri killed renamable $r1, 3, 14 /* CC::al */, $noreg, $noreg
				; CHECK: renamable $r1, dead $cpsr = tSUBi8 killed renamable $r1, 4, 14 /* CC::al */, $noreg
				; CHECK: renamable $r3 = nuw nsw t2ADDrs killed renamable $r3, killed renamable $r1, 19, 14 /* CC::al */, $noreg, $noreg
				; CHECK: renamable $r1 = tADDrSPi $sp, 2, 14 /* CC::al */, $noreg
				; CHECK: renamable $q0 = MVE_VLDRWU32 killed renamable $r1, 0, 0, $noreg :: (load 16 from %fixed-stack.0, align 8)
				; CHECK: dead $lr = t2DLS renamable $r3
				; CHECK: $r1 = tMOVr killed $r3, 14 /* CC::al */, $noreg
				; CHECK: bb.2.vector.body:
				; CHECK: successors: %bb.2(0x7c000000), %bb.3(0x04000000)
				; CHECK: liveins: $q0, $r0, $r1, $r2
				; CHECK: renamable $vpr = MVE_VCTP32 renamable $r2, 0, $noreg
				; CHECK: $lr = tMOVr $r1, 14 /* CC::al */, $noreg
				; CHECK: renamable $r1, dead $cpsr = nsw tSUBi8 killed $r1, 1, 14 /* CC::al */, $noreg
				; CHECK: renamable $r2, dead $cpsr = tSUBi8 killed renamable $r2, 4, 14 /* CC::al */, $noreg
				; CHECK: MVE_VPST 8, implicit $vpr
				; CHECK: renamable $r0, renamable $q1 = MVE_VLDRHS32_post killed renamable $r0, 8, 1, killed renamable $vpr :: (load 8 from %ir.lsr.iv17, align 2)
				; CHECK: renamable $r12 = MVE_VMLADAVu32 renamable $q0, killed renamable $q1, 0, $noreg
				; CHECK: dead $lr = t2LEUpdate killed renamable $lr, %bb.2
				; CHECK: bb.3.exit:
				; CHECK: liveins: $r12
				; CHECK: $r0 = tMOVr killed $r12, 14 /* CC::al */, $noreg
				; CHECK: tPOP_RET 14 /* CC::al */, $noreg, def $r7, def $pc, implicit killed $r0
				; CHECK: bb.4:
				; CHECK: renamable $r12 = t2MOVi 0, 14 /* CC::al */, $noreg, $noreg
				; CHECK: $r0 = tMOVr killed $r12, 14 /* CC::al */, $noreg
				; CHECK: tPOP_RET 14 /* CC::al */, $noreg, def $r7, def $pc, implicit killed $r0
				bb.0.entry:
				successors: %bb.4(0x30000000), %bb.1(0x50000000)
				liveins: $r0, $r2, $r7, $lr

				frame-setup tPUSH 14 /* CC::al */, $noreg, killed $r7, killed $lr, implicit-def $sp, implicit $sp
				frame-setup CFI_INSTRUCTION def_cfa_offset 8
				frame-setup CFI_INSTRUCTION offset $lr, -4
				frame-setup CFI_INSTRUCTION offset $r7, -8
				tCBZ $r2, %bb.4

				bb.1.vector.ph:
				successors: %bb.2(0x80000000)
				liveins: $r0, $r2

				renamable $r1, dead $cpsr = tADDi3 renamable $r2, 3, 14 /* CC::al */, $noreg
				renamable $r3, dead $cpsr = tMOVi8 1, 14 /* CC::al */, $noreg
				renamable $r1 = t2BICri killed renamable $r1, 3, 14 /* CC::al */, $noreg, $noreg
				renamable $r1, dead $cpsr = tSUBi8 killed renamable $r1, 4, 14 /* CC::al */, $noreg
				renamable $r3 = nuw nsw t2ADDrs killed renamable $r3, killed renamable $r1, 19, 14 /* CC::al */, $noreg, $noreg
				renamable $r1 = tADDrSPi $sp, 2, 14 /* CC::al */, $noreg
				renamable $q0 = MVE_VLDRWU32 killed renamable $r1, 0, 0, $noreg :: (load 16 from %fixed-stack.0, align 8)
				t2DoLoopStart renamable $r3
				$r1 = tMOVr killed $r3, 14 /* CC::al */, $noreg

				bb.2.vector.body:
				successors: %bb.2(0x7c000000), %bb.3(0x04000000)
				liveins: $q0, $r0, $r1, $r2

				renamable $vpr = MVE_VCTP32 renamable $r2, 0, $noreg
				$lr = tMOVr $r1, 14 /* CC::al */, $noreg
				renamable $r1, dead $cpsr = nsw tSUBi8 killed $r1, 1, 14 /* CC::al */, $noreg
				renamable $r2, dead $cpsr = tSUBi8 killed renamable $r2, 4, 14 /* CC::al */, $noreg
				MVE_VPST 8, implicit $vpr
				renamable $r0, renamable $q1 = MVE_VLDRHS32_post killed renamable $r0, 8, 1, killed renamable $vpr :: (load 8 from %ir.lsr.iv17, align 2)
				renamable $r12 = MVE_VMLADAVu32 renamable $q0, killed renamable $q1, 0, $noreg
				renamable $lr = t2LoopDec killed renamable $lr, 1
				t2LoopEnd killed renamable $lr, %bb.2, implicit-def dead $cpsr
				tB %bb.3, 14 /* CC::al */, $noreg

				bb.3.exit:
				liveins: $r12

				$r0 = tMOVr killed $r12, 14 /* CC::al */, $noreg
				tPOP_RET 14 /* CC::al */, $noreg, def $r7, def $pc, implicit killed $r0

				bb.4:
				renamable $r12 = t2MOVi 0, 14 /* CC::al */, $noreg, $noreg
				$r0 = tMOVr killed $r12, 14 /* CC::al */, $noreg
				tPOP_RET 14 /* CC::al */, $noreg, def $r7, def $pc, implicit killed $r0

				...
				---
				name: invariant_add_use_reduce
				alignment: 2
				tracksRegLiveness: true
				registers: []
				liveins:
				- { reg: '$r0', virtual-reg: '' }
				- { reg: '$r2', virtual-reg: '' }
				frameInfo:
				stackSize: 8
				offsetAdjustment: 0
				maxAlignment: 8
				fixedStack:
				- { id: 0, type: default, offset: 0, size: 16, alignment: 8, stack-id: default,
				isImmutable: true, isAliased: false, callee-saved-register: '', callee-saved-restored: true,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				stack:
				- { id: 0, name: '', type: spill-slot, offset: -4, size: 4, alignment: 4,
				stack-id: default, callee-saved-register: '$lr', callee-saved-restored: false,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				- { id: 1, name: '', type: spill-slot, offset: -8, size: 4, alignment: 4,
				stack-id: default, callee-saved-register: '$r7', callee-saved-restored: true,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				callSites: []
				constants: []
				machineFunctionInfo: {}
				body: \|
				; CHECK-LABEL: name: invariant_add_use_reduce
				; CHECK: bb.0.entry:
				; CHECK: successors: %bb.4(0x30000000), %bb.1(0x50000000)
				; CHECK: liveins: $lr, $r0, $r2, $r7
				; CHECK: frame-setup tPUSH 14 /* CC::al */, $noreg, killed $r7, killed $lr, implicit-def $sp, implicit $sp
				; CHECK: frame-setup CFI_INSTRUCTION def_cfa_offset 8
				; CHECK: frame-setup CFI_INSTRUCTION offset $lr, -4
				; CHECK: frame-setup CFI_INSTRUCTION offset $r7, -8
				; CHECK: tCBZ $r2, %bb.4
				; CHECK: bb.1.vector.ph:
				; CHECK: successors: %bb.2(0x80000000)
				; CHECK: liveins: $r0, $r2
				; CHECK: renamable $r1, dead $cpsr = tADDi3 renamable $r2, 3, 14 /* CC::al */, $noreg
				; CHECK: renamable $r3, dead $cpsr = tMOVi8 1, 14 /* CC::al */, $noreg
				; CHECK: renamable $r1 = t2BICri killed renamable $r1, 3, 14 /* CC::al */, $noreg, $noreg
				; CHECK: renamable $r1, dead $cpsr = tSUBi8 killed renamable $r1, 4, 14 /* CC::al */, $noreg
				; CHECK: renamable $r3 = nuw nsw t2ADDrs killed renamable $r3, killed renamable $r1, 19, 14 /* CC::al */, $noreg, $noreg
				; CHECK: renamable $r1 = tADDrSPi $sp, 2, 14 /* CC::al */, $noreg
				; CHECK: renamable $q0 = MVE_VLDRWU32 killed renamable $r1, 0, 0, $noreg :: (load 16 from %fixed-stack.0, align 8)
				; CHECK: dead $lr = t2DLS renamable $r3
				; CHECK: $r1 = tMOVr killed $r3, 14 /* CC::al */, $noreg
				; CHECK: bb.2.vector.body:
				; CHECK: successors: %bb.2(0x7c000000), %bb.3(0x04000000)
				; CHECK: liveins: $q0, $r0, $r1, $r2
				; CHECK: renamable $vpr = MVE_VCTP32 renamable $r2, 0, $noreg
				; CHECK: MVE_VPST 8, implicit $vpr
				; CHECK: renamable $r0, renamable $q1 = MVE_VLDRHS32_post killed renamable $r0, 8, 1, killed renamable $vpr :: (load 8 from %ir.lsr.iv17, align 2)
				; CHECK: $lr = tMOVr $r1, 14 /* CC::al */, $noreg
				; CHECK: renamable $q1 = nsw MVE_VADDi32 renamable $q0, killed renamable $q1, 0, $noreg, undef renamable $q1
				; CHECK: renamable $r1, dead $cpsr = nsw tSUBi8 killed $r1, 1, 14 /* CC::al */, $noreg
				; CHECK: renamable $r2, dead $cpsr = tSUBi8 killed renamable $r2, 4, 14 /* CC::al */, $noreg
				; CHECK: renamable $r12 = MVE_VADDVu32no_acc killed renamable $q1, 0, $noreg
				; CHECK: dead $lr = t2LEUpdate killed renamable $lr, %bb.2
				; CHECK: bb.3.exit:
				; CHECK: liveins: $r12
				; CHECK: $r0 = tMOVr killed $r12, 14 /* CC::al */, $noreg
				; CHECK: tPOP_RET 14 /* CC::al */, $noreg, def $r7, def $pc, implicit killed $r0
				; CHECK: bb.4:
				; CHECK: renamable $r12 = t2MOVi 0, 14 /* CC::al */, $noreg, $noreg
				; CHECK: $r0 = tMOVr killed $r12, 14 /* CC::al */, $noreg
				; CHECK: tPOP_RET 14 /* CC::al */, $noreg, def $r7, def $pc, implicit killed $r0
				bb.0.entry:
				successors: %bb.4(0x30000000), %bb.1(0x50000000)
				liveins: $r0, $r2, $r7, $lr

				frame-setup tPUSH 14 /* CC::al */, $noreg, killed $r7, killed $lr, implicit-def $sp, implicit $sp
				frame-setup CFI_INSTRUCTION def_cfa_offset 8
				frame-setup CFI_INSTRUCTION offset $lr, -4
				frame-setup CFI_INSTRUCTION offset $r7, -8
				tCBZ $r2, %bb.4

				bb.1.vector.ph:
				successors: %bb.2(0x80000000)
				liveins: $r0, $r2

				renamable $r1, dead $cpsr = tADDi3 renamable $r2, 3, 14 /* CC::al */, $noreg
				renamable $r3, dead $cpsr = tMOVi8 1, 14 /* CC::al */, $noreg
				renamable $r1 = t2BICri killed renamable $r1, 3, 14 /* CC::al */, $noreg, $noreg
				renamable $r1, dead $cpsr = tSUBi8 killed renamable $r1, 4, 14 /* CC::al */, $noreg
				renamable $r3 = nuw nsw t2ADDrs killed renamable $r3, killed renamable $r1, 19, 14 /* CC::al */, $noreg, $noreg
				renamable $r1 = tADDrSPi $sp, 2, 14 /* CC::al */, $noreg
				renamable $q0 = MVE_VLDRWU32 killed renamable $r1, 0, 0, $noreg :: (load 16 from %fixed-stack.0, align 8)
				t2DoLoopStart renamable $r3
				$r1 = tMOVr killed $r3, 14 /* CC::al */, $noreg

				bb.2.vector.body:
				successors: %bb.2(0x7c000000), %bb.3(0x04000000)
				liveins: $q0, $r0, $r1, $r2

				renamable $vpr = MVE_VCTP32 renamable $r2, 0, $noreg
				MVE_VPST 8, implicit $vpr
				renamable $r0, renamable $q1 = MVE_VLDRHS32_post killed renamable $r0, 8, 1, killed renamable $vpr :: (load 8 from %ir.lsr.iv17, align 2)
				$lr = tMOVr $r1, 14 /* CC::al */, $noreg
				renamable $q1 = nsw MVE_VADDi32 renamable $q0, killed renamable $q1, 0, $noreg, undef renamable $q1
				renamable $r1, dead $cpsr = nsw tSUBi8 killed $r1, 1, 14 /* CC::al */, $noreg
				renamable $r2, dead $cpsr = tSUBi8 killed renamable $r2, 4, 14 /* CC::al */, $noreg
				renamable $r12 = MVE_VADDVu32no_acc killed renamable $q1, 0, $noreg
				renamable $lr = t2LoopDec killed renamable $lr, 1
				t2LoopEnd killed renamable $lr, %bb.2, implicit-def dead $cpsr
				tB %bb.3, 14 /* CC::al */, $noreg

				bb.3.exit:
				liveins: $r12

				$r0 = tMOVr killed $r12, 14 /* CC::al */, $noreg
				tPOP_RET 14 /* CC::al */, $noreg, def $r7, def $pc, implicit killed $r0

				bb.4:
				renamable $r12 = t2MOVi 0, 14 /* CC::al */, $noreg, $noreg
				$r0 = tMOVr killed $r12, 14 /* CC::al */, $noreg
				tPOP_RET 14 /* CC::al */, $noreg, def $r7, def $pc, implicit killed $r0

				...

llvm/test/CodeGen/Thumb2/LowOverheadLoops/predicated-invariant.mir

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				# RUN: llc -mtriple=thumbv8.1m.main -mattr=+mve -run-pass=arm-low-overhead-loops %s -o - \| FileCheck %s

				--- \|
				define dso_local <4 x i32> @invariant_predicated_add_use(i16* nocapture readonly %a, i32* %c, i32 %N, <4 x i32> %pass) #0 {
				entry:
				%cmp9 = icmp eq i32 %N, 0
				%tmp = add i32 %N, 3
				%tmp1 = lshr i32 %tmp, 2
				%tmp2 = shl nuw i32 %tmp1, 2
				%tmp3 = add i32 %tmp2, -4
				%tmp4 = lshr i32 %tmp3, 2
				%tmp5 = add nuw nsw i32 %tmp4, 1
				br i1 %cmp9, label %exit, label %vector.ph

				vector.ph: ; preds = %entry
				call void @llvm.set.loop.iterations.i32(i32 %tmp5)
				br label %vector.body

				vector.body: ; preds = %vector.body, %vector.ph
				%lsr.iv1 = phi i32 [ %lsr.iv.next, %vector.body ], [ %tmp5, %vector.ph ]
				%lsr.iv = phi i16* [ %scevgep, %vector.body ], [ %a, %vector.ph ]
				%tmp7 = phi i32 [ %N, %vector.ph ], [ %tmp9, %vector.body ]
				%lsr.iv17 = bitcast i16* %lsr.iv to <4 x i16>*
				%tmp8 = call <4 x i1> @llvm.arm.mve.vctp32(i32 %tmp7)
				%tmp9 = sub i32 %tmp7, 4
				%wide.masked.load = call <4 x i16> @llvm.masked.load.v4i16.p0v4i16(<4 x i16>* %lsr.iv17, i32 2, <4 x i1> %tmp8, <4 x i16> undef)
				%tmp10 = sext <4 x i16> %wide.masked.load to <4 x i32>
				%acc.next = tail call <4 x i32> @llvm.arm.mve.add.predicated.v4i32.v4i1(<4 x i32> %pass, <4 x i32> %tmp10, <4 x i1> %tmp8, <4 x i32> undef)
				%scevgep = getelementptr i16, i16* %lsr.iv, i32 4
				%tmp11 = call i32 @llvm.loop.decrement.reg.i32.i32.i32(i32 %lsr.iv1, i32 1)
				%tmp12 = icmp ne i32 %tmp11, 0
				%lsr.iv.next = add nsw i32 %lsr.iv1, -1
				br i1 %tmp12, label %vector.body, label %exit

				exit: ; preds = %vector.body, %entry
				%res = phi <4 x i32> [ zeroinitializer, %entry ], [ %acc.next, %vector.body ]
				ret <4 x i32> %res
				}

				declare <4 x i16> @llvm.masked.load.v4i16.p0v4i16(<4 x i16>*, i32 immarg, <4 x i1>, <4 x i16>)
				declare void @llvm.set.loop.iterations.i32(i32)
				declare i32 @llvm.loop.decrement.reg.i32.i32.i32(i32, i32)
				declare <4 x i1> @llvm.arm.mve.vctp32(i32)
				declare <4 x i32> @llvm.arm.mve.add.predicated.v4i32.v4i1(<4 x i32>, <4 x i32>, <4 x i1>, <4 x i32>)

				...
				---
				name: invariant_predicated_add_use
				alignment: 2
				tracksRegLiveness: true
				registers: []
				liveins:
				- { reg: '$r0', virtual-reg: '' }
				- { reg: '$r2', virtual-reg: '' }
				frameInfo:
				stackSize: 8
				offsetAdjustment: 0
				maxAlignment: 8
				fixedStack:
				- { id: 0, type: default, offset: 0, size: 16, alignment: 8, stack-id: default,
				isImmutable: true, isAliased: false, callee-saved-register: '', callee-saved-restored: true,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				stack:
				- { id: 0, name: '', type: spill-slot, offset: -4, size: 4, alignment: 4,
				stack-id: default, callee-saved-register: '$lr', callee-saved-restored: false,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				- { id: 1, name: '', type: spill-slot, offset: -8, size: 4, alignment: 4,
				stack-id: default, callee-saved-register: '$r7', callee-saved-restored: true,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				callSites: []
				constants: []
				machineFunctionInfo: {}
				body: \|
				; CHECK-LABEL: name: invariant_predicated_add_use
				; CHECK: bb.0.entry:
				; CHECK: successors: %bb.3(0x30000000), %bb.1(0x50000000)
				; CHECK: liveins: $lr, $r0, $r2, $r7
				; CHECK: frame-setup tPUSH 14 /* CC::al */, $noreg, killed $r7, killed $lr, implicit-def $sp, implicit $sp
				; CHECK: frame-setup CFI_INSTRUCTION def_cfa_offset 8
				; CHECK: frame-setup CFI_INSTRUCTION offset $lr, -4
				; CHECK: frame-setup CFI_INSTRUCTION offset $r7, -8
				; CHECK: tCBZ $r2, %bb.3
				; CHECK: bb.1.vector.ph:
				; CHECK: successors: %bb.2(0x80000000)
				; CHECK: liveins: $r0, $r2
				; CHECK: renamable $r1 = tADDrSPi $sp, 2, 14 /* CC::al */, $noreg
				; CHECK: renamable $q0 = MVE_VLDRWU32 killed renamable $r1, 0, 0, $noreg :: (load 16 from %fixed-stack.0, align 8)
				; CHECK: $lr = MVE_DLSTP_32 killed renamable $r2
				; CHECK: bb.2.vector.body:
				; CHECK: successors: %bb.2(0x7c000000), %bb.4(0x04000000)
				; CHECK: liveins: $lr, $q0, $r0
				; CHECK: renamable $r0, renamable $q1 = MVE_VLDRHS32_post killed renamable $r0, 8, 0, $noreg :: (load 8 from %ir.lsr.iv17, align 2)
				; CHECK: renamable $q1 = MVE_VADDi32 renamable $q0, killed renamable $q1, 0, killed $noreg, undef renamable $q1
				; CHECK: $lr = MVE_LETP killed renamable $lr, %bb.2
				; CHECK: tB %bb.4, 14 /* CC::al */, $noreg
				; CHECK: bb.3:
				; CHECK: successors: %bb.4(0x80000000)
				; CHECK: renamable $q1 = MVE_VMOVimmi32 0, 0, $noreg, undef renamable $q1
				; CHECK: bb.4.exit:
				; CHECK: liveins: $q1
				; CHECK: renamable $r0, renamable $r1 = VMOVRRD renamable $d2, 14 /* CC::al */, $noreg
				; CHECK: renamable $r2, renamable $r3 = VMOVRRD killed renamable $d3, 14 /* CC::al */, $noreg, implicit killed $q1
				; CHECK: tPOP_RET 14 /* CC::al */, $noreg, def $r7, def $pc, implicit killed $r0, implicit killed $r1, implicit killed $r2, implicit killed $r3
				bb.0.entry:
				successors: %bb.3(0x30000000), %bb.1(0x50000000)
				liveins: $r0, $r2, $r7, $lr

				frame-setup tPUSH 14 /* CC::al */, $noreg, killed $r7, killed $lr, implicit-def $sp, implicit $sp
				frame-setup CFI_INSTRUCTION def_cfa_offset 8
				frame-setup CFI_INSTRUCTION offset $lr, -4
				frame-setup CFI_INSTRUCTION offset $r7, -8
				tCBZ $r2, %bb.3

				bb.1.vector.ph:
				successors: %bb.2(0x80000000)
				liveins: $r0, $r2

				renamable $r1, dead $cpsr = tADDi3 renamable $r2, 3, 14 /* CC::al */, $noreg
				renamable $r3, dead $cpsr = tMOVi8 1, 14 /* CC::al */, $noreg
				renamable $r1 = t2BICri killed renamable $r1, 3, 14 /* CC::al */, $noreg, $noreg
				renamable $r1, dead $cpsr = tSUBi8 killed renamable $r1, 4, 14 /* CC::al */, $noreg
				renamable $r3 = nuw nsw t2ADDrs killed renamable $r3, killed renamable $r1, 19, 14 /* CC::al */, $noreg, $noreg
				renamable $r1 = tADDrSPi $sp, 2, 14 /* CC::al */, $noreg
				renamable $q0 = MVE_VLDRWU32 killed renamable $r1, 0, 0, $noreg :: (load 16 from %fixed-stack.0, align 8)
				t2DoLoopStart renamable $r3
				$r1 = tMOVr killed $r3, 14 /* CC::al */, $noreg

				bb.2.vector.body:
				successors: %bb.2(0x7c000000), %bb.4(0x04000000)
				liveins: $q0, $r0, $r1, $r2

				renamable $vpr = MVE_VCTP32 renamable $r2, 0, $noreg
				$lr = tMOVr $r1, 14 /* CC::al */, $noreg
				renamable $r1, dead $cpsr = nsw tSUBi8 killed $r1, 1, 14 /* CC::al */, $noreg
				renamable $r2, dead $cpsr = tSUBi8 killed renamable $r2, 4, 14 /* CC::al */, $noreg
				renamable $lr = t2LoopDec killed renamable $lr, 1
				MVE_VPST 4, implicit $vpr
				renamable $r0, renamable $q1 = MVE_VLDRHS32_post killed renamable $r0, 8, 1, renamable $vpr :: (load 8 from %ir.lsr.iv17, align 2)
				renamable $q1 = MVE_VADDi32 renamable $q0, killed renamable $q1, 1, killed renamable $vpr, undef renamable $q1
				t2LoopEnd killed renamable $lr, %bb.2, implicit-def dead $cpsr
				tB %bb.4, 14 /* CC::al */, $noreg

				bb.3:
				successors: %bb.4(0x80000000)

				renamable $q1 = MVE_VMOVimmi32 0, 0, $noreg, undef renamable $q1

				bb.4.exit:
				liveins: $q1

				renamable $r0, renamable $r1 = VMOVRRD renamable $d2, 14 /* CC::al */, $noreg
				renamable $r2, renamable $r3 = VMOVRRD killed renamable $d3, 14 /* CC::al */, $noreg, implicit $q1
				tPOP_RET 14 /* CC::al */, $noreg, def $r7, def $pc, implicit killed $r0, implicit killed $r1, implicit killed $r2, implicit killed $r3

				...

llvm/test/CodeGen/Thumb2/LowOverheadLoops/unpredicated-min.mir

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				SjoerdMeijerUnsubmitted Not Done Reply Inline Actions nit: rename file unpredicated-min.mir -> unpredicated-max.mir? SjoerdMeijer: nit: rename file unpredicated-min.mir -> unpredicated-max.mir?
				# RUN: llc -mtriple=thumbv8.1m.main -mattr=+mve %s -run-pass=arm-low-overhead-loops -o - \| FileCheck %s

				--- \|
				define dso_local void @variant_max_use(i16* nocapture readonly %a, i16* %c, i32 %N) #0 {
				entry:
				%cmp9 = icmp eq i32 %N, 0
				%tmp = add i32 %N, 3
				%tmp1 = lshr i32 %tmp, 2
				%tmp2 = shl nuw i32 %tmp1, 2
				%tmp3 = add i32 %tmp2, -4
				%tmp4 = lshr i32 %tmp3, 2
				%tmp5 = add nuw nsw i32 %tmp4, 1
				br i1 %cmp9, label %exit, label %vector.ph

				vector.ph: ; preds = %entry
				call void @llvm.set.loop.iterations.i32(i32 %tmp5)
				br label %vector.body

				vector.body: ; preds = %vector.body, %vector.ph
				%lsr.iv1 = phi i32 [ %lsr.iv.next, %vector.body ], [ %tmp5, %vector.ph ]
				%lsr.iv = phi i16* [ %scevgep, %vector.body ], [ %a, %vector.ph ]
				%lsr.iv.2 = phi i16* [ %scevgep.2, %vector.body ], [ %c, %vector.ph ]
				%tmp7 = phi i32 [ %N, %vector.ph ], [ %tmp9, %vector.body ]
				%lsr.iv17 = bitcast i16* %lsr.iv to <8 x i16>*
				%tmp8 = call <8 x i1> @llvm.arm.mve.vctp16(i32 %tmp7)
				%tmp9 = sub i32 %tmp7, 8
				%wide.masked.load = call <8 x i16> @llvm.masked.load.v8i16.p0v8i16(<8 x i16>* %lsr.iv17, i32 2, <8 x i1> %tmp8, <8 x i16> undef)
				%min = tail call i16 @llvm.experimental.vector.reduce.smax.v8i16(<8 x i16> %wide.masked.load)
				store i16 %min, i16* %lsr.iv.2
				%scevgep = getelementptr i16, i16* %lsr.iv, i32 8
				%scevgep.2 = getelementptr i16, i16* %lsr.iv.2, i32 1
				%tmp10 = call i32 @llvm.loop.decrement.reg.i32.i32.i32(i32 %lsr.iv1, i32 1)
				%tmp11 = icmp ne i32 %tmp10, 0
				%lsr.iv.next = add nsw i32 %lsr.iv1, -1
				br i1 %tmp11, label %vector.body, label %exit

				exit: ; preds = %vector.body, %entry
				ret void
				}

				declare <8 x i16> @llvm.masked.load.v8i16.p0v8i16(<8 x i16>*, i32 immarg, <8 x i1>, <8 x i16>)
				declare void @llvm.set.loop.iterations.i32(i32)
				declare i32 @llvm.loop.decrement.reg.i32.i32.i32(i32, i32)
				declare <8 x i1> @llvm.arm.mve.vctp16(i32)
				declare i16 @llvm.experimental.vector.reduce.smax.v8i16(<8 x i16>)

				...
				---
				name: variant_max_use
				alignment: 2
				tracksRegLiveness: true
				registers: []
				liveins:
				- { reg: '$r0', virtual-reg: '' }
				- { reg: '$r1', virtual-reg: '' }
				- { reg: '$r2', virtual-reg: '' }
				frameInfo:
				stackSize: 8
				offsetAdjustment: 0
				maxAlignment: 4
				fixedStack: []
				stack:
				- { id: 0, name: '', type: spill-slot, offset: -4, size: 4, alignment: 4,
				stack-id: default, callee-saved-register: '$lr', callee-saved-restored: false,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				- { id: 1, name: '', type: spill-slot, offset: -8, size: 4, alignment: 4,
				stack-id: default, callee-saved-register: '$r5', callee-saved-restored: true,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				callSites: []
				constants: []
				machineFunctionInfo: {}
				body: \|
				; CHECK-LABEL: name: variant_max_use
				; CHECK: bb.0.entry:
				; CHECK: successors: %bb.1(0x80000000)
				; CHECK: liveins: $lr, $r0, $r1, $r2, $r5
				; CHECK: frame-setup tPUSH 14 /* CC::al */, $noreg, killed $r5, killed $lr, implicit-def $sp, implicit $sp
				; CHECK: frame-setup CFI_INSTRUCTION def_cfa_offset 8
				; CHECK: frame-setup CFI_INSTRUCTION offset $lr, -4
				; CHECK: frame-setup CFI_INSTRUCTION offset $r5, -8
				; CHECK: tCMPi8 renamable $r2, 0, 14 /* CC::al */, $noreg, implicit-def $cpsr
				; CHECK: t2IT 0, 8, implicit-def $itstate
				; CHECK: tPOP_RET 0 /* CC::eq */, killed $cpsr, def $r5, def $pc, implicit killed $itstate
				; CHECK: renamable $r3, dead $cpsr = tADDi3 renamable $r2, 3, 14 /* CC::al */, $noreg
				; CHECK: renamable $r3 = t2BICri killed renamable $r3, 3, 14 /* CC::al */, $noreg, $noreg
				; CHECK: renamable $r12 = t2SUBri killed renamable $r3, 4, 14 /* CC::al */, $noreg, $noreg
				; CHECK: renamable $r3, dead $cpsr = tMOVi8 1, 14 /* CC::al */, $noreg
				; CHECK: renamable $r3 = nuw nsw t2ADDrs killed renamable $r3, killed renamable $r12, 19, 14 /* CC::al */, $noreg, $noreg
				; CHECK: $r12 = t2MOVi16 32768, 14 /* CC::al */, $noreg
				; CHECK: $r12 = t2MOVTi16 killed $r12, 65535, 14 /* CC::al */, $noreg
				; CHECK: dead $lr = t2DLS renamable $r3
				; CHECK: $r5 = tMOVr killed $r3, 14 /* CC::al */, $noreg
				; CHECK: bb.1.vector.body:
				; CHECK: successors: %bb.1(0x7c000000), %bb.2(0x04000000)
				; CHECK: liveins: $r0, $r1, $r2, $r5, $r12
				; CHECK: $r3 = tMOVr $r12, 14 /* CC::al */, $noreg
				; CHECK: renamable $vpr = MVE_VCTP16 renamable $r2, 0, $noreg
				; CHECK: MVE_VPST 8, implicit $vpr
				; CHECK: renamable $r0, renamable $q0 = MVE_VLDRHU16_post killed renamable $r0, 16, 1, killed renamable $vpr :: (load 16 from %ir.lsr.iv17, align 2)
				; CHECK: renamable $r3 = MVE_VMAXVs16 killed renamable $r3, killed renamable $q0, 0, $noreg
				; CHECK: $lr = tMOVr $r5, 14 /* CC::al */, $noreg
				; CHECK: early-clobber renamable $r1 = t2STRH_POST killed renamable $r3, killed renamable $r1, 2, 14 /* CC::al */, $noreg :: (store 2 into %ir.lsr.iv.2)
				; CHECK: renamable $r5, dead $cpsr = nsw tSUBi8 killed $r5, 1, 14 /* CC::al */, $noreg
				; CHECK: renamable $r2, dead $cpsr = tSUBi8 killed renamable $r2, 8, 14 /* CC::al */, $noreg
				; CHECK: dead $lr = t2LEUpdate killed renamable $lr, %bb.1
				; CHECK: bb.2.exit:
				; CHECK: tPOP_RET 14 /* CC::al */, $noreg, def $r5, def $pc
				bb.0.entry:
				successors: %bb.1(0x80000000)
				liveins: $r0, $r1, $r2, $r5, $lr

				frame-setup tPUSH 14 /* CC::al */, $noreg, killed $r5, killed $lr, implicit-def $sp, implicit $sp
				frame-setup CFI_INSTRUCTION def_cfa_offset 8
				frame-setup CFI_INSTRUCTION offset $lr, -4
				frame-setup CFI_INSTRUCTION offset $r5, -8
				tCMPi8 renamable $r2, 0, 14 /* CC::al */, $noreg, implicit-def $cpsr
				t2IT 0, 8, implicit-def $itstate
				tPOP_RET 0 /* CC::eq */, killed $cpsr, def $r5, def $pc, implicit killed $itstate
				renamable $r3, dead $cpsr = tADDi3 renamable $r2, 3, 14 /* CC::al */, $noreg
				renamable $r3 = t2BICri killed renamable $r3, 3, 14 /* CC::al */, $noreg, $noreg
				renamable $r12 = t2SUBri killed renamable $r3, 4, 14 /* CC::al */, $noreg, $noreg
				renamable $r3, dead $cpsr = tMOVi8 1, 14 /* CC::al */, $noreg
				renamable $r3 = nuw nsw t2ADDrs killed renamable $r3, killed renamable $r12, 19, 14 /* CC::al */, $noreg, $noreg
				$r12 = t2MOVi16 32768, 14 /* CC::al */, $noreg
				$r12 = t2MOVTi16 killed $r12, 65535, 14 /* CC::al */, $noreg
				t2DoLoopStart renamable $r3
				$r5 = tMOVr killed $r3, 14 /* CC::al */, $noreg

				bb.1.vector.body:
				successors: %bb.1(0x7c000000), %bb.2(0x04000000)
				liveins: $r0, $r1, $r2, $r5, $r12

				$r3 = tMOVr $r12, 14 /* CC::al */, $noreg
				renamable $vpr = MVE_VCTP16 renamable $r2, 0, $noreg
				MVE_VPST 8, implicit $vpr
				renamable $r0, renamable $q0 = MVE_VLDRHU16_post killed renamable $r0, 16, 1, killed renamable $vpr :: (load 16 from %ir.lsr.iv17, align 2)
				renamable $r3 = MVE_VMAXVs16 killed renamable $r3, killed renamable $q0, 0, $noreg
				$lr = tMOVr $r5, 14 /* CC::al */, $noreg
				early-clobber renamable $r1 = t2STRH_POST killed renamable $r3, killed renamable $r1, 2, 14 /* CC::al */, $noreg :: (store 2 into %ir.lsr.iv.2)
				renamable $r5, dead $cpsr = nsw tSUBi8 killed $r5, 1, 14 /* CC::al */, $noreg
				renamable $r2, dead $cpsr = tSUBi8 killed renamable $r2, 8, 14 /* CC::al */, $noreg
				renamable $lr = t2LoopDec killed renamable $lr, 1
				t2LoopEnd killed renamable $lr, %bb.1, implicit-def dead $cpsr
				tB %bb.2, 14 /* CC::al */, $noreg

				bb.2.exit:
				tPOP_RET 14 /* CC::al */, $noreg, def $r5, def $pc

				...