This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Common inverse constant predicates to VPNOT
ClosedPublic

Authored by dmgreen on Dec 2 2020, 2:11 AM.

Download Raw Diff

Details

Reviewers

SjoerdMeijer
samtebbs
simon_tatham
efriedma
ostannard

Commits

rG384383e15c17: [ARM] Common inverse constant predicates to VPNOT

Summary

This scans through blocks looking for constants used as predicates in MVE instructions. When two constants are found which are the inverse of one another, the second can be replaced by a VPNOT of the first, potentially allowing that not to be folded away into an else predicate of a vpt block.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

dmgreen created this revision.Dec 2 2020, 2:11 AM

Herald added a project: Restricted Project. · View Herald TranscriptDec 2 2020, 2:11 AM

Herald added subscribers: danielkiss, hiraditya, kristof.beyls. · View Herald Transcript

dmgreen requested review of this revision.Dec 2 2020, 2:11 AM

Should there be a range limit somewhere in this logic, beyond just 'in the same basic block'? I worry slightly that if there's a very long basic block with two completely separate predicated sections, and they happen to reuse the same constant, then it might not be a win to do this change. Something along the lines of

%mask1 = [complicated constant]
%mask2 = [exactly the inverse constant]
[do a thing predicated on %mask1]
[do a thing predicated on %mask2]
[10 instructions of unrelated code not VPT-predicated at all]
%mask3 = [same constant as %mask2]
[do a thing predicated on %mask3]

In that situation, it's fine to turn the use of %mask2 into a VPNOT of %mask1, so that the first VPT block has a T and then an E instruction. But you'd probably prefer that %mask3 was rematerialized from scratch rather than (as I understand that this code would do) preserving %mask1 from ages ago in order to complement it again.

Our VPT block creation certainly isn't optimal in a lot of cases, that is very true. I think in general a VPNOT is going to be better or equal to a MOV;VMSR pair, so long ranges or intervening instructions are probably OK. If there is a different mask used between the two though, that case would be better to not try and use VPNOTs of previous values that just need to be spilled and reloaded.

I have made this only track a single predicate at a time, invalidating it as we scan through and reusing it or a VPNOT of it as we can.

My original motivating case for this isn't really a lot better - unfortunately scheduling comes along ruins everything. I may have to add something extra to sort that out too. The test cases here (along with the newly added ones) look OK though, baring one that has some strange register allocation.

That seems like a reasonable precaution to me, yes – with only one VPR, there's only one previous value that it might be of obviously low cost to VPNOT.

This revision is now accepted and ready to land.Dec 8 2020, 2:45 AM

dmgreen added a parent revision: D92369: [ARM] Improve handling of empty VPT blocks in tail predicated loops.Dec 8 2020, 7:26 AM

Closed by commit rG384383e15c17: [ARM] Common inverse constant predicates to VPNOT (authored by dmgreen). · Explain WhyDec 8 2020, 11:56 PM

This revision was automatically updated to reflect the committed changes.

dmgreen added a commit: rG384383e15c17: [ARM] Common inverse constant predicates to VPNOT.

Revision Contents

Path

Size

llvm/

lib/

Target/

ARM/

MVEVPTOptimisationsPass.cpp

86 lines

test/

CodeGen/

Thumb2/

mve-pred-constfold.ll

98 lines

mve-vpt-optimisations.mir

10 lines

Diff 310429

llvm/lib/Target/ARM/MVEVPTOptimisationsPass.cpp

Show First 20 Lines • Show All 61 Lines • ▼ Show 20 Lines	private:
bool RevertLoopWithCall(MachineLoop *ML);		bool RevertLoopWithCall(MachineLoop *ML);
bool ConvertTailPredLoop(MachineLoop ML, MachineDominatorTree DT);		bool ConvertTailPredLoop(MachineLoop ML, MachineDominatorTree DT);
MachineInstr &ReplaceRegisterUseWithVPNOT(MachineBasicBlock &MBB,		MachineInstr &ReplaceRegisterUseWithVPNOT(MachineBasicBlock &MBB,
MachineInstr &Instr,		MachineInstr &Instr,
MachineOperand &User,		MachineOperand &User,
Register Target);		Register Target);
bool ReduceOldVCCRValueUses(MachineBasicBlock &MBB);		bool ReduceOldVCCRValueUses(MachineBasicBlock &MBB);
bool ReplaceVCMPsByVPNOTs(MachineBasicBlock &MBB);		bool ReplaceVCMPsByVPNOTs(MachineBasicBlock &MBB);
		bool ReplaceConstByVPNOTs(MachineBasicBlock &MBB, MachineDominatorTree *DT);
bool ConvertVPSEL(MachineBasicBlock &MBB);		bool ConvertVPSEL(MachineBasicBlock &MBB);
};		};

char MVEVPTOptimisations::ID = 0;		char MVEVPTOptimisations::ID = 0;

} // end anonymous namespace		} // end anonymous namespace

INITIALIZE_PASS_BEGIN(MVEVPTOptimisations, DEBUG_TYPE,		INITIALIZE_PASS_BEGIN(MVEVPTOptimisations, DEBUG_TYPE,
▲ Show 20 Lines • Show All 563 Lines • ▼ Show 20 Lines	bool MVEVPTOptimisations::ReplaceVCMPsByVPNOTs(MachineBasicBlock &MBB) {
}		}

for (MachineInstr *DeadInstruction : DeadInstructions)		for (MachineInstr *DeadInstruction : DeadInstructions)
DeadInstruction->eraseFromParent();		DeadInstruction->eraseFromParent();

return !DeadInstructions.empty();		return !DeadInstructions.empty();
}		}

		bool MVEVPTOptimisations::ReplaceConstByVPNOTs(MachineBasicBlock &MBB,
		MachineDominatorTree *DT) {
		// Scan through the block, looking for instructions that use constants moves
		// into VPR that are the negative of one another. These are expected to be
		// COPY's to VCCRRegClass, from a t2MOVi or t2MOVi16. The last seen constant
		// mask is kept it or and VPNOT's of it are added or reused as we scan through
		// the function.
		unsigned LastVPTImm = 0;
		Register LastVPTReg = 0;
		SmallSet<MachineInstr *, 4> DeadInstructions;

		for (MachineInstr &Instr : MBB.instrs()) {
		// Look for predicated MVE instructions.
		int PIdx = llvm::findFirstVPTPredOperandIdx(Instr);
		if (PIdx == -1)
		continue;
		Register VPR = Instr.getOperand(PIdx + 1).getReg();
		if (!VPR.isVirtual())
		continue;

		// From that we are looking for an instruction like %11:vccr = COPY %9:rgpr.
		MachineInstr *Copy = MRI->getVRegDef(VPR);
		if (!Copy \|\| Copy->getOpcode() != TargetOpcode::COPY \|\|
		!Copy->getOperand(1).getReg().isVirtual() \|\|
		MRI->getRegClass(Copy->getOperand(1).getReg()) == &ARM::VCCRRegClass) {
		LastVPTReg = 0;
		continue;
		}
		Register GPR = Copy->getOperand(1).getReg();

		// Find the Immediate used by the copy.
		auto getImm = [&](Register GPR) -> unsigned {
		MachineInstr *Def = MRI->getVRegDef(GPR);
		if (Def && (Def->getOpcode() == ARM::t2MOVi \|\|
		Def->getOpcode() == ARM::t2MOVi16))
		return Def->getOperand(1).getImm();
		return -1U;
		};
		unsigned Imm = getImm(GPR);
		if (Imm == -1U) {
		LastVPTReg = 0;
		continue;
		}

		unsigned NotImm = ~Imm & 0xffff;
		if (LastVPTReg != 0 && LastVPTReg != VPR && LastVPTImm == Imm) {
		Instr.getOperand(PIdx + 1).setReg(LastVPTReg);
		if (MRI->use_empty(VPR)) {
		DeadInstructions.insert(Copy);
		if (MRI->hasOneUse(GPR))
		DeadInstructions.insert(MRI->getVRegDef(GPR));
		}
		LLVM_DEBUG(dbgs() << "Reusing predicate: in " << Instr);
		} else if (LastVPTReg != 0 && LastVPTImm == NotImm) {
		// We have found the not of a previous constant. Create a VPNot of the
		// earlier predicate reg and use it instead of the copy.
		Register NewVPR = MRI->createVirtualRegister(&ARM::VCCRRegClass);
		auto VPNot = BuildMI(MBB, &Instr, Instr.getDebugLoc(),
		TII->get(ARM::MVE_VPNOT), NewVPR)
		.addReg(LastVPTReg);
		addUnpredicatedMveVpredNOp(VPNot);

		// Use the new register and check if the def is now dead.
		Instr.getOperand(PIdx + 1).setReg(NewVPR);
		if (MRI->use_empty(VPR)) {
		DeadInstructions.insert(Copy);
		if (MRI->hasOneUse(GPR))
		DeadInstructions.insert(MRI->getVRegDef(GPR));
		}
		LLVM_DEBUG(dbgs() << "Adding VPNot: " << *VPNot << " to replace use at "
		<< Instr);
		VPR = NewVPR;
		}

		LastVPTImm = Imm;
		LastVPTReg = VPR;
		}

		for (MachineInstr *DI : DeadInstructions)
		DI->eraseFromParent();

		return !DeadInstructions.empty();
		}

// Replace VPSEL with a predicated VMOV in blocks with a VCTP. This is a		// Replace VPSEL with a predicated VMOV in blocks with a VCTP. This is a
// somewhat blunt approximation to allow tail predicated with vpsel		// somewhat blunt approximation to allow tail predicated with vpsel
// instructions. We turn a vselect into a VPSEL in ISEL, but they have slightly		// instructions. We turn a vselect into a VPSEL in ISEL, but they have slightly
// different semantics under tail predication. Until that is modelled we just		// different semantics under tail predication. Until that is modelled we just
// convert to a VMOVT (via a predicated VORR) instead.		// convert to a VMOVT (via a predicated VORR) instead.
bool MVEVPTOptimisations::ConvertVPSEL(MachineBasicBlock &MBB) {		bool MVEVPTOptimisations::ConvertVPSEL(MachineBasicBlock &MBB) {
bool HasVCTP = false;		bool HasVCTP = false;
SmallVector<MachineInstr *, 4> DeadInstructions;		SmallVector<MachineInstr *, 4> DeadInstructions;
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	bool MVEVPTOptimisations::runOnMachineFunction(MachineFunction &Fn) {

bool Modified = false;		bool Modified = false;
for (MachineLoop *ML : MLI->getBase().getLoopsInPreorder()) {		for (MachineLoop *ML : MLI->getBase().getLoopsInPreorder()) {
Modified \|= RevertLoopWithCall(ML);		Modified \|= RevertLoopWithCall(ML);
Modified \|= ConvertTailPredLoop(ML, DT);		Modified \|= ConvertTailPredLoop(ML, DT);
}		}

for (MachineBasicBlock &MBB : Fn) {		for (MachineBasicBlock &MBB : Fn) {
		Modified \|= ReplaceConstByVPNOTs(MBB, DT);
Modified \|= ReplaceVCMPsByVPNOTs(MBB);		Modified \|= ReplaceVCMPsByVPNOTs(MBB);
Modified \|= ReduceOldVCCRValueUses(MBB);		Modified \|= ReduceOldVCCRValueUses(MBB);
Modified \|= ConvertVPSEL(MBB);		Modified \|= ConvertVPSEL(MBB);
}		}

LLVM_DEBUG(dbgs() << "**************************************\n");		LLVM_DEBUG(dbgs() << "**************************************\n");
return Modified;		return Modified;
}		}

/// createMVEVPTOptimisationsPass		/// createMVEVPTOptimisationsPass
FunctionPass *llvm::createMVEVPTOptimisationsPass() {		FunctionPass *llvm::createMVEVPTOptimisationsPass() {
return new MVEVPTOptimisations();		return new MVEVPTOptimisations();
}		}

llvm/test/CodeGen/Thumb2/mve-pred-constfold.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -mtriple=thumbv8.1m.main-none-none-eabi -mattr=+mve -verify-machineinstrs %s -o - \| FileCheck %s		; RUN: llc -mtriple=thumbv8.1m.main-none-none-eabi -mattr=+mve -verify-machineinstrs %s -o - \| FileCheck %s

define arm_aapcs_vfpcc void @reg(<8 x i16> %acc0, <8 x i16> %acc1, i32* nocapture %px, i16 signext %p0) {		define arm_aapcs_vfpcc void @reg(<8 x i16> %acc0, <8 x i16> %acc1, i32* nocapture %px, i16 signext %p0) {
; CHECK-LABEL: reg:		; CHECK-LABEL: reg:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: .save {r4, r6, r7, lr}		; CHECK-NEXT: .save {r4, r6, r7, lr}
; CHECK-NEXT: push {r4, r6, r7, lr}		; CHECK-NEXT: push {r4, r6, r7, lr}
; CHECK-NEXT: .pad #8
; CHECK-NEXT: sub sp, #8
; CHECK-NEXT: movw r1, #52428		; CHECK-NEXT: movw r1, #52428
; CHECK-NEXT: vmsr p0, r1		; CHECK-NEXT: vmsr p0, r1
; CHECK-NEXT: movw r1, #13107		; CHECK-NEXT: vpstete
; CHECK-NEXT: vstr p0, [sp, #4] @ 4-byte Spill
; CHECK-NEXT: vpst
; CHECK-NEXT: vaddvt.s16 r12, q1		; CHECK-NEXT: vaddvt.s16 r12, q1
; CHECK-NEXT: vmsr p0, r1		; CHECK-NEXT: vaddve.s16 r2, q1
; CHECK-NEXT: vstr p0, [sp] @ 4-byte Spill
; CHECK-NEXT: vpst
; CHECK-NEXT: vaddvt.s16 r2, q1
; CHECK-NEXT: vldr p0, [sp, #4] @ 4-byte Reload
; CHECK-NEXT: vpst
; CHECK-NEXT: vaddvt.s16 r4, q0		; CHECK-NEXT: vaddvt.s16 r4, q0
; CHECK-NEXT: vldr p0, [sp] @ 4-byte Reload		; CHECK-NEXT: vaddve.s16 r6, q0
; CHECK-NEXT: vpst
; CHECK-NEXT: vaddvt.s16 r6, q0
; CHECK-NEXT: strd r6, r4, [r0]		; CHECK-NEXT: strd r6, r4, [r0]
; CHECK-NEXT: strd r2, r12, [r0, #8]		; CHECK-NEXT: strd r2, r12, [r0, #8]
; CHECK-NEXT: add sp, #8
; CHECK-NEXT: pop {r4, r6, r7, pc}		; CHECK-NEXT: pop {r4, r6, r7, pc}
entry:		entry:
%0 = tail call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 13107)		%0 = tail call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 13107)
%1 = tail call i32 @llvm.arm.mve.addv.predicated.v8i16.v8i1(<8 x i16> %acc0, i32 0, <8 x i1> %0)		%1 = tail call i32 @llvm.arm.mve.addv.predicated.v8i16.v8i1(<8 x i16> %acc0, i32 0, <8 x i1> %0)
%2 = tail call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 52428)		%2 = tail call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 52428)
%3 = tail call i32 @llvm.arm.mve.addv.predicated.v8i16.v8i1(<8 x i16> %acc0, i32 0, <8 x i1> %2)		%3 = tail call i32 @llvm.arm.mve.addv.predicated.v8i16.v8i1(<8 x i16> %acc0, i32 0, <8 x i1> %2)
%4 = tail call i32 @llvm.arm.mve.addv.predicated.v8i16.v8i1(<8 x i16> %acc1, i32 0, <8 x i1> %0)		%4 = tail call i32 @llvm.arm.mve.addv.predicated.v8i16.v8i1(<8 x i16> %acc1, i32 0, <8 x i1> %0)
%5 = tail call i32 @llvm.arm.mve.addv.predicated.v8i16.v8i1(<8 x i16> %acc1, i32 0, <8 x i1> %2)		%5 = tail call i32 @llvm.arm.mve.addv.predicated.v8i16.v8i1(<8 x i16> %acc1, i32 0, <8 x i1> %2)
▲ Show 20 Lines • Show All 109 Lines • ▼ Show 20 Lines



define arm_aapcs_vfpcc i32 @const_mask_1(<4 x i32> %0, <4 x i32> %1, i32 %2) {		define arm_aapcs_vfpcc i32 @const_mask_1(<4 x i32> %0, <4 x i32> %1, i32 %2) {
; CHECK-LABEL: const_mask_1:		; CHECK-LABEL: const_mask_1:
; CHECK: @ %bb.0:		; CHECK: @ %bb.0:
; CHECK-NEXT: movs r1, #1		; CHECK-NEXT: movs r1, #1
; CHECK-NEXT: vmsr p0, r1		; CHECK-NEXT: vmsr p0, r1
; CHECK-NEXT: vpstt		; CHECK-NEXT: vpsttee
; CHECK-NEXT: vaddvat.s32 r0, q0
; CHECK-NEXT: vaddvat.s32 r0, q1
; CHECK-NEXT: movw r1, #65534
; CHECK-NEXT: vmsr p0, r1
; CHECK-NEXT: vpstt
; CHECK-NEXT: vaddvat.s32 r0, q0		; CHECK-NEXT: vaddvat.s32 r0, q0
; CHECK-NEXT: vaddvat.s32 r0, q1		; CHECK-NEXT: vaddvat.s32 r0, q1
		; CHECK-NEXT: vaddvae.s32 r0, q0
		; CHECK-NEXT: vaddvae.s32 r0, q1
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
%4 = tail call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 1)		%4 = tail call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 1)
%5 = tail call i32 @llvm.arm.mve.addv.predicated.v4i32.v4i1(<4 x i32> %0, i32 0, <4 x i1> %4)		%5 = tail call i32 @llvm.arm.mve.addv.predicated.v4i32.v4i1(<4 x i32> %0, i32 0, <4 x i1> %4)
%6 = add i32 %5, %2		%6 = add i32 %5, %2
%7 = tail call i32 @llvm.arm.mve.addv.predicated.v4i32.v4i1(<4 x i32> %1, i32 0, <4 x i1> %4)		%7 = tail call i32 @llvm.arm.mve.addv.predicated.v4i32.v4i1(<4 x i32> %1, i32 0, <4 x i1> %4)
%8 = add i32 %6, %7		%8 = add i32 %6, %7
%9 = tail call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 65534)		%9 = tail call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 65534)
%10 = tail call i32 @llvm.arm.mve.addv.predicated.v4i32.v4i1(<4 x i32> %0, i32 0, <4 x i1> %9)		%10 = tail call i32 @llvm.arm.mve.addv.predicated.v4i32.v4i1(<4 x i32> %0, i32 0, <4 x i1> %9)
Show All 30 Lines	; CHECK-NEXT: bx lr
ret i32 %13		ret i32 %13
}		}

define arm_aapcs_vfpcc i32 @const_mask_1234(<4 x i32> %0, <4 x i32> %1, i32 %2) {		define arm_aapcs_vfpcc i32 @const_mask_1234(<4 x i32> %0, <4 x i32> %1, i32 %2) {
; CHECK-LABEL: const_mask_1234:		; CHECK-LABEL: const_mask_1234:
; CHECK: @ %bb.0:		; CHECK: @ %bb.0:
; CHECK-NEXT: movw r1, #1234		; CHECK-NEXT: movw r1, #1234
; CHECK-NEXT: vmsr p0, r1		; CHECK-NEXT: vmsr p0, r1
; CHECK-NEXT: vpstt		; CHECK-NEXT: vpsttee
; CHECK-NEXT: vaddvat.s32 r0, q0
; CHECK-NEXT: vaddvat.s32 r0, q1
; CHECK-NEXT: movw r1, #64301
; CHECK-NEXT: vmsr p0, r1
; CHECK-NEXT: vpstt
; CHECK-NEXT: vaddvat.s32 r0, q0		; CHECK-NEXT: vaddvat.s32 r0, q0
; CHECK-NEXT: vaddvat.s32 r0, q1		; CHECK-NEXT: vaddvat.s32 r0, q1
		; CHECK-NEXT: vaddvae.s32 r0, q0
		; CHECK-NEXT: vaddvae.s32 r0, q1
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
%4 = tail call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 1234)		%4 = tail call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 1234)
%5 = tail call i32 @llvm.arm.mve.addv.predicated.v4i32.v4i1(<4 x i32> %0, i32 0, <4 x i1> %4)		%5 = tail call i32 @llvm.arm.mve.addv.predicated.v4i32.v4i1(<4 x i32> %0, i32 0, <4 x i1> %4)
%6 = add i32 %5, %2		%6 = add i32 %5, %2
%7 = tail call i32 @llvm.arm.mve.addv.predicated.v4i32.v4i1(<4 x i32> %1, i32 0, <4 x i1> %4)		%7 = tail call i32 @llvm.arm.mve.addv.predicated.v4i32.v4i1(<4 x i32> %1, i32 0, <4 x i1> %4)
%8 = add i32 %6, %7		%8 = add i32 %6, %7
%9 = tail call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 64301)		%9 = tail call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 64301)
%10 = tail call i32 @llvm.arm.mve.addv.predicated.v4i32.v4i1(<4 x i32> %0, i32 0, <4 x i1> %9)		%10 = tail call i32 @llvm.arm.mve.addv.predicated.v4i32.v4i1(<4 x i32> %0, i32 0, <4 x i1> %9)
%11 = add i32 %8, %10		%11 = add i32 %8, %10
%12 = tail call i32 @llvm.arm.mve.addv.predicated.v4i32.v4i1(<4 x i32> %1, i32 0, <4 x i1> %9)		%12 = tail call i32 @llvm.arm.mve.addv.predicated.v4i32.v4i1(<4 x i32> %1, i32 0, <4 x i1> %9)
%13 = add i32 %11, %12		%13 = add i32 %11, %12
ret i32 %13		ret i32 %13
}		}

define arm_aapcs_vfpcc i32 @const_mask_abab(<4 x i32> %0, <4 x i32> %1, i32 %2) {		define arm_aapcs_vfpcc i32 @const_mask_abab(<4 x i32> %0, <4 x i32> %1, i32 %2) {
; CHECK-LABEL: const_mask_abab:		; CHECK-LABEL: const_mask_abab:
; CHECK: @ %bb.0:		; CHECK: @ %bb.0:
; CHECK-NEXT: .pad #8
; CHECK-NEXT: sub sp, #8
; CHECK-NEXT: movw r1, #1234		; CHECK-NEXT: movw r1, #1234
; CHECK-NEXT: vmsr p0, r1		; CHECK-NEXT: vmsr p0, r1
; CHECK-NEXT: movw r1, #64301		; CHECK-NEXT: vpstete
; CHECK-NEXT: vstr p0, [sp, #4] @ 4-byte Spill
; CHECK-NEXT: vpst
; CHECK-NEXT: vaddvat.s32 r0, q0		; CHECK-NEXT: vaddvat.s32 r0, q0
; CHECK-NEXT: vmsr p0, r1		; CHECK-NEXT: vaddvae.s32 r0, q1
; CHECK-NEXT: vstr p0, [sp] @ 4-byte Spill
; CHECK-NEXT: vpst
; CHECK-NEXT: vaddvat.s32 r0, q1
; CHECK-NEXT: vldr p0, [sp, #4] @ 4-byte Reload
; CHECK-NEXT: vpst
; CHECK-NEXT: vaddvat.s32 r0, q1		; CHECK-NEXT: vaddvat.s32 r0, q1
; CHECK-NEXT: vldr p0, [sp] @ 4-byte Reload		; CHECK-NEXT: vaddvae.s32 r0, q0
; CHECK-NEXT: vpst
; CHECK-NEXT: vaddvat.s32 r0, q0
; CHECK-NEXT: add sp, #8
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
%4 = tail call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 1234)		%4 = tail call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 1234)
%5 = tail call i32 @llvm.arm.mve.addv.predicated.v4i32.v4i1(<4 x i32> %0, i32 0, <4 x i1> %4)		%5 = tail call i32 @llvm.arm.mve.addv.predicated.v4i32.v4i1(<4 x i32> %0, i32 0, <4 x i1> %4)
%6 = add i32 %5, %2		%6 = add i32 %5, %2
%7 = tail call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 64301)		%7 = tail call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 64301)
%8 = tail call i32 @llvm.arm.mve.addv.predicated.v4i32.v4i1(<4 x i32> %1, i32 0, <4 x i1> %7)		%8 = tail call i32 @llvm.arm.mve.addv.predicated.v4i32.v4i1(<4 x i32> %1, i32 0, <4 x i1> %7)
%9 = add i32 %6, %8		%9 = add i32 %6, %8
%10 = tail call i32 @llvm.arm.mve.addv.predicated.v4i32.v4i1(<4 x i32> %1, i32 0, <4 x i1> %4)		%10 = tail call i32 @llvm.arm.mve.addv.predicated.v4i32.v4i1(<4 x i32> %1, i32 0, <4 x i1> %4)
%11 = add i32 %9, %10		%11 = add i32 %9, %10
%12 = tail call i32 @llvm.arm.mve.addv.predicated.v4i32.v4i1(<4 x i32> %0, i32 0, <4 x i1> %7)		%12 = tail call i32 @llvm.arm.mve.addv.predicated.v4i32.v4i1(<4 x i32> %0, i32 0, <4 x i1> %7)
%13 = add i32 %11, %12		%13 = add i32 %11, %12
ret i32 %13		ret i32 %13
}		}

define arm_aapcs_vfpcc i32 @const_mask_abbreakab(<4 x i32> %0, <4 x i32> %1, i32 %2) {		define arm_aapcs_vfpcc i32 @const_mask_abbreakab(<4 x i32> %0, <4 x i32> %1, i32 %2) {
; CHECK-LABEL: const_mask_abbreakab:		; CHECK-LABEL: const_mask_abbreakab:
; CHECK: @ %bb.0:		; CHECK: @ %bb.0:
; CHECK-NEXT: .pad #8
; CHECK-NEXT: sub sp, #8
; CHECK-NEXT: movw r1, #1234		; CHECK-NEXT: movw r1, #1234
; CHECK-NEXT: vmsr p0, r1		; CHECK-NEXT: vmsr p0, r1
; CHECK-NEXT: movw r1, #64301		; CHECK-NEXT: vpste
; CHECK-NEXT: vstr p0, [sp, #4] @ 4-byte Spill
; CHECK-NEXT: vpst
; CHECK-NEXT: vaddvat.s32 r0, q0		; CHECK-NEXT: vaddvat.s32 r0, q0
; CHECK-NEXT: vmsr p0, r1		; CHECK-NEXT: vaddvae.s32 r0, q1
; CHECK-NEXT: vstr p0, [sp] @ 4-byte Spill
; CHECK-NEXT: vpst
; CHECK-NEXT: vaddvat.s32 r0, q1
; CHECK-NEXT: vadd.i32 q1, q0, r0		; CHECK-NEXT: vadd.i32 q1, q0, r0
; CHECK-NEXT: vldr p0, [sp, #4] @ 4-byte Reload		; CHECK-NEXT: vpnot
; CHECK-NEXT: vpst		; CHECK-NEXT: vpste
; CHECK-NEXT: vaddvat.s32 r0, q1		; CHECK-NEXT: vaddvat.s32 r0, q1
; CHECK-NEXT: vldr p0, [sp] @ 4-byte Reload		; CHECK-NEXT: vaddvae.s32 r0, q0
; CHECK-NEXT: vpst
; CHECK-NEXT: vaddvat.s32 r0, q0
; CHECK-NEXT: add sp, #8
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
%4 = tail call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 1234)		%4 = tail call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 1234)
%5 = tail call i32 @llvm.arm.mve.addv.predicated.v4i32.v4i1(<4 x i32> %0, i32 0, <4 x i1> %4)		%5 = tail call i32 @llvm.arm.mve.addv.predicated.v4i32.v4i1(<4 x i32> %0, i32 0, <4 x i1> %4)
%6 = add i32 %5, %2		%6 = add i32 %5, %2
%7 = tail call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 64301)		%7 = tail call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 64301)
%8 = tail call i32 @llvm.arm.mve.addv.predicated.v4i32.v4i1(<4 x i32> %1, i32 0, <4 x i1> %7)		%8 = tail call i32 @llvm.arm.mve.addv.predicated.v4i32.v4i1(<4 x i32> %1, i32 0, <4 x i1> %7)
%9 = add i32 %6, %8		%9 = add i32 %6, %8
%si = insertelement <4 x i32> undef, i32 %9, i32 0		%si = insertelement <4 x i32> undef, i32 %9, i32 0
Show All 9 Lines
define arm_aapcs_vfpcc i32 @const_mask_break(<4 x i32> %0, <4 x i32> %1, i32 %2) {		define arm_aapcs_vfpcc i32 @const_mask_break(<4 x i32> %0, <4 x i32> %1, i32 %2) {
; CHECK-LABEL: const_mask_break:		; CHECK-LABEL: const_mask_break:
; CHECK: @ %bb.0:		; CHECK: @ %bb.0:
; CHECK-NEXT: movw r1, #1234		; CHECK-NEXT: movw r1, #1234
; CHECK-NEXT: vmsr p0, r1		; CHECK-NEXT: vmsr p0, r1
; CHECK-NEXT: vpstt		; CHECK-NEXT: vpstt
; CHECK-NEXT: vaddvat.s32 r0, q0		; CHECK-NEXT: vaddvat.s32 r0, q0
; CHECK-NEXT: vaddvat.s32 r0, q1		; CHECK-NEXT: vaddvat.s32 r0, q1
; CHECK-NEXT: movw r1, #64301
; CHECK-NEXT: vadd.i32 q1, q0, r0		; CHECK-NEXT: vadd.i32 q1, q0, r0
; CHECK-NEXT: vmsr p0, r1		; CHECK-NEXT: vpnot
; CHECK-NEXT: vpstt		; CHECK-NEXT: vpstt
; CHECK-NEXT: vaddvat.s32 r0, q1		; CHECK-NEXT: vaddvat.s32 r0, q1
; CHECK-NEXT: vaddvat.s32 r0, q0		; CHECK-NEXT: vaddvat.s32 r0, q0
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
%4 = tail call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 1234)		%4 = tail call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 1234)
%5 = tail call i32 @llvm.arm.mve.addv.predicated.v4i32.v4i1(<4 x i32> %0, i32 0, <4 x i1> %4)		%5 = tail call i32 @llvm.arm.mve.addv.predicated.v4i32.v4i1(<4 x i32> %0, i32 0, <4 x i1> %4)
%6 = add i32 %5, %2		%6 = add i32 %5, %2
%7 = tail call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 64301)		%7 = tail call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 64301)
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	; CHECK-NEXT: bx lr
%12 = tail call i32 @llvm.arm.mve.addv.predicated.v4i32.v4i1(<4 x i32> %0, i32 0, <4 x i1> %7)		%12 = tail call i32 @llvm.arm.mve.addv.predicated.v4i32.v4i1(<4 x i32> %0, i32 0, <4 x i1> %7)
%13 = add i32 %11, %12		%13 = add i32 %11, %12
ret i32 %13		ret i32 %13
}		}

define arm_aapcs_vfpcc i32 @const_mask_threepredabab(<4 x i32> %0, <4 x i32> %1, i32 %2) {		define arm_aapcs_vfpcc i32 @const_mask_threepredabab(<4 x i32> %0, <4 x i32> %1, i32 %2) {
; CHECK-LABEL: const_mask_threepredabab:		; CHECK-LABEL: const_mask_threepredabab:
; CHECK: @ %bb.0:		; CHECK: @ %bb.0:
; CHECK-NEXT: .pad #8		; CHECK-NEXT: .pad #4
; CHECK-NEXT: sub sp, #8		; CHECK-NEXT: sub sp, #4
; CHECK-NEXT: movw r1, #1234		; CHECK-NEXT: movw r1, #1234
; CHECK-NEXT: vmsr p0, r1		; CHECK-NEXT: vmsr p0, r1
; CHECK-NEXT: movw r1, #64301		; CHECK-NEXT: vstr p0, [sp] @ 4-byte Spill
; CHECK-NEXT: vstr p0, [sp, #4] @ 4-byte Spill
; CHECK-NEXT: vpst		; CHECK-NEXT: vpst
; CHECK-NEXT: vaddvat.s32 r0, q0		; CHECK-NEXT: vaddvat.s32 r0, q0
; CHECK-NEXT: vmsr p0, r1		; CHECK-NEXT: vldr p0, [sp] @ 4-byte Reload
; CHECK-NEXT: vstr p0, [sp] @ 4-byte Spill		; CHECK-NEXT: vpnot
; CHECK-NEXT: vpst		; CHECK-NEXT: vpst
; CHECK-NEXT: vaddvat.s32 r0, q1		; CHECK-NEXT: vaddvat.s32 r0, q1
; CHECK-NEXT: vpt.s32 gt, q1, q0		; CHECK-NEXT: vpt.s32 gt, q1, q0
; CHECK-NEXT: vaddvat.s32 r0, q1		; CHECK-NEXT: vaddvat.s32 r0, q1
; CHECK-NEXT: vldr p0, [sp, #4] @ 4-byte Reload
; CHECK-NEXT: vpst
; CHECK-NEXT: vaddvat.s32 r0, q1
; CHECK-NEXT: vldr p0, [sp] @ 4-byte Reload		; CHECK-NEXT: vldr p0, [sp] @ 4-byte Reload
; CHECK-NEXT: vpst		; CHECK-NEXT: vpste
; CHECK-NEXT: vaddvat.s32 r0, q0		; CHECK-NEXT: vaddvat.s32 r0, q1
; CHECK-NEXT: add sp, #8		; CHECK-NEXT: vaddvae.s32 r0, q0
		; CHECK-NEXT: add sp, #4
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
%4 = tail call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 1234)		%4 = tail call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 1234)
%5 = tail call i32 @llvm.arm.mve.addv.predicated.v4i32.v4i1(<4 x i32> %0, i32 0, <4 x i1> %4)		%5 = tail call i32 @llvm.arm.mve.addv.predicated.v4i32.v4i1(<4 x i32> %0, i32 0, <4 x i1> %4)
%6 = add i32 %5, %2		%6 = add i32 %5, %2
%7 = tail call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 64301)		%7 = tail call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 64301)
%8 = tail call i32 @llvm.arm.mve.addv.predicated.v4i32.v4i1(<4 x i32> %1, i32 0, <4 x i1> %7)		%8 = tail call i32 @llvm.arm.mve.addv.predicated.v4i32.v4i1(<4 x i32> %1, i32 0, <4 x i1> %7)
%9 = add i32 %6, %8		%9 = add i32 %6, %8
%n7 = icmp slt <4 x i32> %0, %1		%n7 = icmp slt <4 x i32> %0, %1
Show All 22 Lines

llvm/test/CodeGen/Thumb2/mve-vpt-optimisations.mir

Show First 20 Lines • Show All 830 Lines • ▼ Show 20 Lines	body: \|
; (This means that these examples should not be optimized.)		; (This means that these examples should not be optimized.)
;		;
bb.0:		bb.0:
%2:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 10, 0, $noreg		%2:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 10, 0, $noreg
%3:vccr = MVE_VPNOT %2:vccr, 1, %2:vccr		%3:vccr = MVE_VPNOT %2:vccr, 1, %2:vccr
%4:mqpr = MVE_VORR %0:mqpr, %0:mqpr, 1, %2:vccr, undef %4:mqpr		%4:mqpr = MVE_VORR %0:mqpr, %0:mqpr, 1, %2:vccr, undef %4:mqpr
%5:mqpr = MVE_VORR %4:mqpr, %4:mqpr, 1, %3:vccr, undef %5:mqpr		%5:mqpr = MVE_VORR %4:mqpr, %4:mqpr, 1, %3:vccr, undef %5:mqpr
bb.1:		bb.1:
%2:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 10, 0, $noreg		%12:vccr = MVE_VCMPs32 %0:mqpr, %1:mqpr, 10, 0, $noreg
%3:vccr = MVE_VPNOT %2:vccr, 1, %2:vccr		%13:vccr = MVE_VPNOT %12:vccr, 1, %12:vccr
%4:mqpr = MVE_VORR %0:mqpr, %1:mqpr, 1, %3:vccr, undef %4:mqpr		%14:mqpr = MVE_VORR %0:mqpr, %1:mqpr, 1, %13:vccr, undef %14:mqpr
%5:mqpr = MVE_VORR %0:mqpr, %0:mqpr, 1, %2:vccr, undef %5:mqpr		%15:mqpr = MVE_VORR %0:mqpr, %0:mqpr, 1, %12:vccr, undef %15:mqpr
%6:mqpr = MVE_VORR %1:mqpr, %0:mqpr, 1, %3:vccr, undef %6:mqpr		%16:mqpr = MVE_VORR %1:mqpr, %0:mqpr, 1, %13:vccr, undef %16:mqpr
tBX_RET 14, $noreg, implicit %0:mqpr		tBX_RET 14, $noreg, implicit %0:mqpr
...		...
---		---
name: spill_prevention_copies		name: spill_prevention_copies
alignment: 4		alignment: 4
body: \|		body: \|
;		;
; Tests that VPNOTs are replaced by a COPY instead of inserting a VPNOT		; Tests that VPNOTs are replaced by a COPY instead of inserting a VPNOT
▲ Show 20 Lines • Show All 104 Lines • Show Last 20 Lines