This is an archive of the discontinued LLVM Phabricator instance.

[ARM][LowOverheadLoops] Insert loop start at end of block in more cases
AbandonedPublic

Authored by samtebbs on Oct 8 2020, 7:53 AM.

Download Raw Diff

Details

Reviewers

SjoerdMeijer
dmgreen
simon_tatham
olista01
t.p.northover
samparker

Summary

This patch inserts the loop start instruction at the end of the block as long as the count register would have the same value and the insertion point kills it. This in turn allows tail-predication of more loops.

Diff Detail

Event Timeline

samtebbs created this revision.Oct 8 2020, 7:53 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 8 2020, 7:53 AM

Herald added subscribers: llvm-commits, danielkiss, hiraditya, kristof.beyls. · View Herald Transcript

samtebbs requested review of this revision.Oct 8 2020, 7:53 AM

Harbormaster completed remote builds in B74449: Diff 296974.Oct 8 2020, 8:29 AM

Fix test

Insert loop start at end of block in more cases

Hmm. Just a quick check - do we want that? I can see it improves some tail predication cases, that's good. But do we want that in general? The DLS instructions have a latency like any other, and earlier is better from that perspective. Or are we assuming that that latency will never matter into the LE instruction?

llvm/test/CodeGen/Thumb2/LowOverheadLoops/it-block-chain-store.mir
146–147 ↗	(On Diff #296983)	These tests are hard to be sure, but it looks like the old version was doing something with lr but that might now be a different value?

In D89048#2321000, @dmgreen wrote:

Insert loop start at end of block in more cases

Hmm. Just a quick check - do we want that? I can see it improves some tail predication cases, that's good. But do we want that in general? The DLS instructions have a latency like any other, and earlier is better from that perspective. Or are we assuming that that latency will never matter into the LE instruction?

I wasn't so much concerned about latencies, but I was guessing that by moving the instruction, the goal is to get some more tail-predication "for free". That is suggesting to me that it is a work-around for the analysis of tail-predication not recognising that this is safe?

I wasn't so much concerned about latencies, but I was guessing that by moving the instruction, the goal is to get some more tail-predication "for free".

Indeed, this greatly simplifies the effort required to ensure that we can generate the LSTP version. And I can't see the latency of DLS having any real affect on the performance of a loop, especially not compared to all the other things that can go bad!

samparker added inline comments.Oct 9 2020, 1:59 AM

llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
1088	Can this logic be simplified? I don't think we should need to be concerned with LR, or at least the top-level conditional block should be guard with: if (FirstNonTerminator == MBB->end() && RDA.isReachingDefLiveOut(Start, ARM::LR) Then it looks like what we just need to check whether InsertPt kills CountReg?

samparker added inline comments.Oct 9 2020, 2:01 AM

llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
1088	... along with your existing check for RDA.hasSameReachingDef(LastMI, &*InsertPt, CountReg).

Simplify logic and add check for LR being live-out.

samtebbs marked 3 inline comments as done.Oct 9 2020, 8:02 AM

samtebbs added inline comments.

llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
1088	Agreed that it can be simplified. Thanks for pointing that out!
llvm/test/CodeGen/Thumb2/LowOverheadLoops/it-block-chain-store.mir
146–147 ↗	(On Diff #296983)	Indeed, I missing an important check. Thanks.

Sorry, I didn't realise this had been updated.

llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
1084	I seem to remember being concerned that though we've already chosen an InsertPt, we then compare reaching defs against Start instead... In the case where InsertPt != Start, it means there's a mov lr, after Start and so RDA.isReachingDefLiveOut(Start, ARM::LR) shouldn't be true and RDA.hasSameReachingDef(Start, &*FirstNonTerminator, ARM::LR) may also not be true. Would you mind making another patch to compare against InsertPt and then rebase this on top of that?
1091	nit: your indentations are too wide.

Rebase on top of https://reviews.llvm.org/D89549 and format code.

llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
1084	Good idea. done.
1091	Cheers

And I can't see the latency of DLS having any real affect on the performance of a loop, ..

Unfortunately I'm not sure that is always true exactly, but I'm not against the general idea of moving the loop start closer loop, especially for DLSTP. If it simplifies tail predication and makes it more reliable, then that's certainly a good thing.

I have reverted the patch that this depends on (38f625d0d1360b0) because it's had issues for a while now and we could do with it being correct so that we have a better place work work from. That way we can re-do this whilst having a higher confidence that we are not introducing bugs.

The code that this change depends on has been reverted so I will close this and re-visit it once those changes have been re-worked.

Revision Contents

Path

Size

llvm/

lib/

Target/

ARM/

ARMLowOverheadLoops.cpp

26 lines

test/

CodeGen/

Thumb2/

LowOverheadLoops/

226 lines

29 lines

28 lines

22 lines

Diff 298617

llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp

Show First 20 Lines • Show All 1,075 Lines • ▼ Show 20 Lines	auto TryAdjustInsertionPoint = [](MachineBasicBlock::iterator &InsertPt,

MachineBasicBlock *MBB = InsertPt->getParent();		MachineBasicBlock *MBB = InsertPt->getParent();
MachineBasicBlock::iterator FirstNonTerminator =		MachineBasicBlock::iterator FirstNonTerminator =
MBB->getFirstTerminator();		MBB->getFirstTerminator();
unsigned CountReg = Start->getOperand(0).getReg();		unsigned CountReg = Start->getOperand(0).getReg();

// Get the latest possible insertion point and check whether the semantics		// Get the latest possible insertion point and check whether the semantics
// will be maintained if Start was inserted there.		// will be maintained if Start was inserted there.
if (FirstNonTerminator == MBB->end()) {		if (FirstNonTerminator != MBB->end()) {
		samparkerUnsubmitted Done Reply Inline Actions I seem to remember being concerned that though we've already chosen an InsertPt, we then compare reaching defs against Start instead... In the case where InsertPt != Start, it means there's a mov lr, after Start and so RDA.isReachingDefLiveOut(Start, ARM::LR) shouldn't be true and RDA.hasSameReachingDef(Start, &FirstNonTerminator, ARM::LR) may also not be true. Would you mind making another patch to compare against InsertPt and then rebase this on top of that? samparker:* I seem to remember being concerned that though we've already chosen an InsertPt, we then…
		samtebbsAuthorUnsubmitted Done Reply Inline Actions Good idea. done. samtebbs: Good idea. done.
if (RDA.isReachingDefLiveOut(&*InsertPt, CountReg) &&		if (RDA.hasSameReachingDef(Start, &*FirstNonTerminator, CountReg) &&
RDA.isReachingDefLiveOut(&*InsertPt, ARM::LR))
InsertPt = FirstNonTerminator;
} else if (RDA.hasSameReachingDef(Start, &*FirstNonTerminator, CountReg) &&
RDA.hasSameReachingDef(Start, &*FirstNonTerminator, ARM::LR))		RDA.hasSameReachingDef(Start, &*FirstNonTerminator, ARM::LR))
InsertPt = FirstNonTerminator;		InsertPt = FirstNonTerminator;
		} else if (RDA.isReachingDefLiveOut(&*InsertPt, ARM::LR)) {
		samparkerUnsubmitted Done Reply Inline Actions Can this logic be simplified? I don't think we should need to be concerned with LR, or at least the top-level conditional block should be guard with: if (FirstNonTerminator == MBB->end() && RDA.isReachingDefLiveOut(Start, ARM::LR) Then it looks like what we just need to check whether InsertPt kills CountReg? samparker: Can this logic be simplified? I don't think we should need to be concerned with LR, or at least…
		samparkerUnsubmitted Done Reply Inline Actions ... along with your existing check for RDA.hasSameReachingDef(LastMI, &InsertPt, CountReg). samparker:* ... along with your existing check for RDA.hasSameReachingDef(LastMI, &*InsertPt, CountReg).
		samtebbsAuthorUnsubmitted Done Reply Inline Actions Agreed that it can be simplified. Thanks for pointing that out! samtebbs: Agreed that it can be simplified. Thanks for pointing that out!
		if (RDA.isReachingDefLiveOut(&*InsertPt, CountReg))
		InsertPt = FirstNonTerminator;
		else {
		samparkerUnsubmitted Done Reply Inline Actions nit: your indentations are too wide. samparker: nit: your indentations are too wide.
		samtebbsAuthorUnsubmitted Done Reply Inline Actions Cheers samtebbs: Cheers
		// Allow inserting at the end of the block if it kills the count
		// register and it would have the same value
		auto LastMI = &*MBB->getLastNonDebugInstr();
		if (RDA.hasSameReachingDef(LastMI, &*InsertPt, CountReg)) {
		for (auto &Op : InsertPt->operands()) {
		if (Op.isReg() && Op.getReg() == CountReg && Op.isKill()) {
		InsertPt = FirstNonTerminator;
		break;
		}
		}
		}
		}
		}
};		};

if (!FindStartInsertionPoint(Start, Dec, StartInsertPt, StartInsertBB, RDA,		if (!FindStartInsertionPoint(Start, Dec, StartInsertPt, StartInsertBB, RDA,
ToRemove)) {		ToRemove)) {
LLVM_DEBUG(dbgs() << "ARM Loops: Unable to find safe insertion point.\n");		LLVM_DEBUG(dbgs() << "ARM Loops: Unable to find safe insertion point.\n");
Revert = true;		Revert = true;
return;		return;
}		}
▲ Show 20 Lines • Show All 581 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/LowOverheadLoops/dls-kills-reg.mir

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				# RUN: llc -mtriple=thumbv8.1m.main-none-none-eabi -mattr=+mve -run-pass=arm-low-overhead-loops %s -o - \| FileCheck %s

				--- \|
				define arm_aapcs_vfpcc void @do_loop_start_kills_reg(float* %pSrc, i32 %blockSize, float* nocapture %pResult) {
				entry:
				%0 = add i32 %blockSize, 3
				%1 = icmp slt i32 %blockSize, 4
				%smin = select i1 %1, i32 %blockSize, i32 4
				%2 = sub i32 %0, %smin
				%3 = lshr i32 %2, 2
				%4 = add nuw nsw i32 %3, 1
				%5 = icmp slt i32 %blockSize, 4
				%smin3 = select i1 %5, i32 %blockSize, i32 4
				%6 = sub i32 %0, %smin3
				%7 = lshr i32 %6, 2
				%8 = add nuw nsw i32 %7, 1
				call void @llvm.set.loop.iterations.i32(i32 %8)
				br label %do.body.i

				do.body.i: ; preds = %do.body.i, %entry
				%blkCnt.0.i = phi i32 [ %13, %do.body.i ], [ %blockSize, %entry ]
				%sumVec.0.i = phi <4 x float> [ %12, %do.body.i ], [ zeroinitializer, %entry ]
				%pSrc.addr.0.i = phi float* [ %add.ptr.i, %do.body.i ], [ %pSrc, %entry ]
				%9 = phi i32 [ %8, %entry ], [ %14, %do.body.i ]
				%pSrc.addr.0.i2 = bitcast float* %pSrc.addr.0.i to <4 x float>*
				%10 = tail call <4 x i1> @llvm.arm.mve.vctp32(i32 %blkCnt.0.i)
				%11 = tail call fast <4 x float> @llvm.masked.load.v4f32.p0v4f32(<4 x float>* %pSrc.addr.0.i2, i32 4, <4 x i1> %10, <4 x float> zeroinitializer)
				%12 = tail call fast <4 x float> @llvm.arm.mve.add.predicated.v4f32.v4i1(<4 x float> %sumVec.0.i, <4 x float> %11, <4 x i1> %10, <4 x float> %sumVec.0.i)
				%add.ptr.i = getelementptr inbounds float, float* %pSrc.addr.0.i, i32 4
				%13 = add i32 %blkCnt.0.i, -4
				%14 = call i32 @llvm.loop.decrement.reg.i32(i32 %9, i32 1)
				%15 = icmp ne i32 %14, 0
				br i1 %15, label %do.body.i, label %arm_mean_f32_mve.exit

				arm_mean_f32_mve.exit: ; preds = %do.body.i
				%16 = extractelement <4 x float> %12, i32 3
				%add2.i.i = fadd fast float %16, %16
				%conv.i = uitofp i32 %blockSize to float
				%div.i = fdiv fast float %add2.i.i, %conv.i
				%17 = bitcast float %div.i to i32
				call void @llvm.set.loop.iterations.i32(i32 %4)
				br label %do.body

				do.body: ; preds = %do.body, %arm_mean_f32_mve.exit
				%blkCnt.0 = phi i32 [ %blockSize, %arm_mean_f32_mve.exit ], [ %26, %do.body ]
				%sumVec.0 = phi <4 x float> [ zeroinitializer, %arm_mean_f32_mve.exit ], [ %25, %do.body ]
				%pSrc.addr.0 = phi float* [ %pSrc, %arm_mean_f32_mve.exit ], [ %add.ptr, %do.body ]
				%18 = phi i32 [ %4, %arm_mean_f32_mve.exit ], [ %27, %do.body ]
				%pSrc.addr.01 = bitcast float* %pSrc.addr.0 to <4 x float>*
				%19 = tail call <4 x i1> @llvm.arm.mve.vctp32(i32 %blkCnt.0)
				%20 = tail call fast <4 x float> @llvm.masked.load.v4f32.p0v4f32(<4 x float>* %pSrc.addr.01, i32 4, <4 x i1> %19, <4 x float> zeroinitializer)
				%21 = insertelement <4 x i32> undef, i32 %17, i64 0
				%22 = shufflevector <4 x i32> %21, <4 x i32> undef, <4 x i32> zeroinitializer
				%23 = bitcast <4 x i32> %22 to <4 x float>
				%24 = tail call fast <4 x float> @llvm.arm.mve.sub.predicated.v4f32.v4i1(<4 x float> %20, <4 x float> %23, <4 x i1> %19, <4 x float> undef)
				%25 = tail call fast <4 x float> @llvm.arm.mve.fma.predicated.v4f32.v4i1(<4 x float> %24, <4 x float> %24, <4 x float> %sumVec.0, <4 x i1> %19)
				%add.ptr = getelementptr inbounds float, float* %pSrc.addr.0, i32 4
				%26 = add i32 %blkCnt.0, -4
				%27 = call i32 @llvm.loop.decrement.reg.i32(i32 %18, i32 1)
				%28 = icmp ne i32 %27, 0
				br i1 %28, label %do.body, label %do.end

				do.end: ; preds = %do.body
				%29 = extractelement <4 x float> %25, i32 3
				%add2.i = fadd fast float %29, %29
				%sub2 = add i32 %blockSize, -1
				%conv = uitofp i32 %sub2 to float
				%div = fdiv fast float %add2.i, %conv
				store float %div, float* %pResult, align 4
				ret void
				}

				declare <4 x float> @llvm.arm.mve.sub.predicated.v4f32.v4i1(<4 x float>, <4 x float>, <4 x i1>, <4 x float>)
				declare <4 x float> @llvm.arm.mve.fma.predicated.v4f32.v4i1(<4 x float>, <4 x float>, <4 x float>, <4 x i1>)
				declare <4 x i1> @llvm.arm.mve.vctp32(i32)
				declare <4 x float> @llvm.masked.load.v4f32.p0v4f32(<4 x float>*, i32 immarg, <4 x i1>, <4 x float>)
				declare <4 x float> @llvm.arm.mve.add.predicated.v4f32.v4i1(<4 x float>, <4 x float>, <4 x i1>, <4 x float>)
				declare void @llvm.set.loop.iterations.i32(i32)
				declare i32 @llvm.loop.decrement.reg.i32(i32, i32)
				...
				---
				name: do_loop_start_kills_reg
				alignment: 2
				tracksRegLiveness: true
				registers: []
				liveins:
				- { reg: '$r0', virtual-reg: '' }
				- { reg: '$r1', virtual-reg: '' }
				- { reg: '$r2', virtual-reg: '' }
				frameInfo:
				stackSize: 8
				offsetAdjustment: 0
				maxAlignment: 4
				fixedStack: []
				stack:
				- { id: 0, name: '', type: spill-slot, offset: -4, size: 4, alignment: 4,
				stack-id: default, callee-saved-register: '$lr', callee-saved-restored: false,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				- { id: 1, name: '', type: spill-slot, offset: -8, size: 4, alignment: 4,
				stack-id: default, callee-saved-register: '$r4', callee-saved-restored: true,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				callSites: []
				constants: []
				machineFunctionInfo: {}
				body: \|
				; CHECK-LABEL: name: do_loop_start_kills_reg
				; CHECK: bb.0.entry:
				; CHECK: successors: %bb.1(0x80000000)
				; CHECK: liveins: $lr, $r0, $r1, $r2, $r4
				; CHECK: frame-setup tPUSH 14 /* CC::al */, $noreg, killed $r4, killed $lr, implicit-def $sp, implicit $sp
				; CHECK: frame-setup CFI_INSTRUCTION def_cfa_offset 8
				; CHECK: frame-setup CFI_INSTRUCTION offset $lr, -4
				; CHECK: frame-setup CFI_INSTRUCTION offset $r4, -8
				; CHECK: renamable $q0 = MVE_VMOVimmi32 0, 0, $noreg, undef renamable $q0
				; CHECK: $r3 = tMOVr $r1, 14 /* CC::al */, $noreg
				; CHECK: $r12 = tMOVr $r0, 14 /* CC::al */, $noreg
				; CHECK: $lr = MVE_DLSTP_32 killed renamable $r3
				; CHECK: bb.1.do.body.i:
				; CHECK: successors: %bb.1(0x7c000000), %bb.2(0x04000000)
				; CHECK: liveins: $lr, $q0, $r0, $r1, $r2, $r12
				; CHECK: renamable $r12, renamable $q1 = MVE_VLDRWU32_post killed renamable $r12, 16, 0, $noreg :: (load 16 from %ir.pSrc.addr.0.i2, align 4)
				; CHECK: renamable $q0 = MVE_VADDf32 killed renamable $q0, killed renamable $q1, 0, killed $noreg, killed renamable $q0
				; CHECK: $lr = MVE_LETP killed renamable $lr, %bb.1
				; CHECK: bb.2.arm_mean_f32_mve.exit:
				; CHECK: successors: %bb.3(0x80000000)
				; CHECK: liveins: $q0, $r0, $r1, $r2
				; CHECK: $s4 = VMOVSR $r1, 14 /* CC::al */, $noreg
				; CHECK: renamable $s0 = nnan ninf nsz arcp contract afn reassoc VADDS killed renamable $s3, killed renamable $s3, 14 /* CC::al */, $noreg, implicit killed $q0
				; CHECK: $r3 = tMOVr $r1, 14 /* CC::al */, $noreg
				; CHECK: renamable $s4 = VUITOS killed renamable $s4, 14 /* CC::al */, $noreg
				; CHECK: renamable $s0 = nnan ninf nsz arcp contract afn reassoc VDIVS killed renamable $s0, killed renamable $s4, 14 /* CC::al */, $noreg
				; CHECK: renamable $r12 = VMOVRS killed renamable $s0, 14 /* CC::al */, $noreg
				; CHECK: renamable $q0 = MVE_VMOVimmi32 0, 0, $noreg, undef renamable $q0
				; CHECK: $lr = MVE_DLSTP_32 killed renamable $r3
				; CHECK: bb.3.do.body:
				; CHECK: successors: %bb.3(0x7c000000), %bb.4(0x04000000)
				; CHECK: liveins: $lr, $q0, $r0, $r1, $r2, $r12
				; CHECK: renamable $r0, renamable $q1 = MVE_VLDRWU32_post killed renamable $r0, 16, 0, $noreg :: (load 16 from %ir.pSrc.addr.01, align 4)
				; CHECK: renamable $q1 = MVE_VSUB_qr_f32 killed renamable $q1, renamable $r12, 0, $noreg, undef renamable $q1
				; CHECK: renamable $q0 = MVE_VFMAf32 killed renamable $q0, killed renamable $q1, killed renamable $q1, 0, killed $noreg
				; CHECK: $lr = MVE_LETP killed renamable $lr, %bb.3
				; CHECK: bb.4.do.end:
				; CHECK: liveins: $q0, $r1, $r2
				; CHECK: renamable $r0, dead $cpsr = tSUBi3 killed renamable $r1, 1, 14 /* CC::al */, $noreg
				; CHECK: renamable $s0 = nnan ninf nsz arcp contract afn reassoc VADDS killed renamable $s3, killed renamable $s3, 14 /* CC::al */, $noreg, implicit killed $q0
				; CHECK: $s2 = VMOVSR killed $r0, 14 /* CC::al */, $noreg
				; CHECK: renamable $s2 = VUITOS killed renamable $s2, 14 /* CC::al */, $noreg
				; CHECK: renamable $s0 = nnan ninf nsz arcp contract afn reassoc VDIVS killed renamable $s0, killed renamable $s2, 14 /* CC::al */, $noreg
				; CHECK: VSTRS killed renamable $s0, killed renamable $r2, 0, 14 /* CC::al */, $noreg :: (store 4 into %ir.pResult)
				; CHECK: frame-destroy tPOP_RET 14 /* CC::al */, $noreg, def $r4, def $pc
				bb.0.entry:
				successors: %bb.1(0x80000000)
				liveins: $r0, $r1, $r2, $r4, $lr

				frame-setup tPUSH 14 /* CC::al */, $noreg, killed $r4, killed $lr, implicit-def $sp, implicit $sp
				frame-setup CFI_INSTRUCTION def_cfa_offset 8
				frame-setup CFI_INSTRUCTION offset $lr, -4
				frame-setup CFI_INSTRUCTION offset $r4, -8
				$r3 = tMOVr $r1, 14 /* CC::al */, $noreg
				tCMPi8 renamable $r1, 4, 14 /* CC::al */, $noreg, implicit-def $cpsr
				t2IT 10, 8, implicit-def $itstate
				renamable $r3 = tMOVi8 $noreg, 4, 10 /* CC::ge */, killed $cpsr, implicit killed renamable $r3, implicit killed $itstate
				renamable $r12 = t2MOVi 1, 14 /* CC::al */, $noreg, $noreg
				renamable $r3, dead $cpsr = tSUBrr renamable $r1, killed renamable $r3, 14 /* CC::al */, $noreg
				renamable $q0 = MVE_VMOVimmi32 0, 0, $noreg, undef renamable $q0
				renamable $r3, dead $cpsr = tADDi8 killed renamable $r3, 3, 14 /* CC::al */, $noreg
				renamable $lr = nuw nsw t2ADDrs killed renamable $r12, killed renamable $r3, 19, 14 /* CC::al */, $noreg, $noreg
				$r3 = tMOVr $r1, 14 /* CC::al */, $noreg
				$r12 = tMOVr $r0, 14 /* CC::al */, $noreg
				t2DoLoopStart renamable $lr
				$r4 = tMOVr $lr, 14 /* CC::al */, $noreg

				bb.1.do.body.i:
				successors: %bb.1(0x7c000000), %bb.2(0x04000000)
				liveins: $lr, $q0, $r0, $r1, $r2, $r3, $r4, $r12

				renamable $vpr = MVE_VCTP32 renamable $r3, 0, $noreg
				renamable $r3, dead $cpsr = tSUBi8 killed renamable $r3, 4, 14 /* CC::al */, $noreg
				renamable $lr = t2LoopDec killed renamable $lr, 1
				MVE_VPST 4, implicit $vpr
				renamable $r12, renamable $q1 = MVE_VLDRWU32_post killed renamable $r12, 16, 1, renamable $vpr :: (load 16 from %ir.pSrc.addr.0.i2, align 4)
				renamable $q0 = MVE_VADDf32 killed renamable $q0, killed renamable $q1, 1, killed renamable $vpr, renamable $q0
				t2LoopEnd renamable $lr, %bb.1, implicit-def dead $cpsr
				tB %bb.2, 14 /* CC::al */, $noreg

				bb.2.arm_mean_f32_mve.exit:
				successors: %bb.3(0x80000000)
				liveins: $q0, $r0, $r1, $r2, $r4

				$s4 = VMOVSR $r1, 14 /* CC::al */, $noreg
				$lr = tMOVr $r4, 14 /* CC::al */, $noreg
				renamable $s0 = nnan ninf nsz arcp contract afn reassoc VADDS killed renamable $s3, renamable $s3, 14 /* CC::al */, $noreg, implicit $q0
				$r3 = tMOVr $r1, 14 /* CC::al */, $noreg
				renamable $s4 = VUITOS killed renamable $s4, 14 /* CC::al */, $noreg
				t2DoLoopStart killed $r4
				renamable $s0 = nnan ninf nsz arcp contract afn reassoc VDIVS killed renamable $s0, killed renamable $s4, 14 /* CC::al */, $noreg
				renamable $r12 = VMOVRS killed renamable $s0, 14 /* CC::al */, $noreg
				renamable $q0 = MVE_VMOVimmi32 0, 0, $noreg, undef renamable $q0

				bb.3.do.body:
				successors: %bb.3(0x7c000000), %bb.4(0x04000000)
				liveins: $lr, $q0, $r0, $r1, $r2, $r3, $r12

				renamable $vpr = MVE_VCTP32 renamable $r3, 0, $noreg
				renamable $r3, dead $cpsr = tSUBi8 killed renamable $r3, 4, 14 /* CC::al */, $noreg
				renamable $lr = t2LoopDec killed renamable $lr, 1
				MVE_VPST 2, implicit $vpr
				renamable $r0, renamable $q1 = MVE_VLDRWU32_post killed renamable $r0, 16, 1, renamable $vpr :: (load 16 from %ir.pSrc.addr.01, align 4)
				renamable $q1 = MVE_VSUB_qr_f32 killed renamable $q1, renamable $r12, 1, renamable $vpr, undef renamable $q1
				renamable $q0 = MVE_VFMAf32 killed renamable $q0, killed renamable $q1, renamable $q1, 1, killed renamable $vpr
				t2LoopEnd renamable $lr, %bb.3, implicit-def dead $cpsr
				tB %bb.4, 14 /* CC::al */, $noreg

				bb.4.do.end:
				liveins: $q0, $r1, $r2

				renamable $r0, dead $cpsr = tSUBi3 killed renamable $r1, 1, 14 /* CC::al */, $noreg
				renamable $s0 = nnan ninf nsz arcp contract afn reassoc VADDS killed renamable $s3, renamable $s3, 14 /* CC::al */, $noreg, implicit $q0
				$s2 = VMOVSR killed $r0, 14 /* CC::al */, $noreg
				renamable $s2 = VUITOS killed renamable $s2, 14 /* CC::al */, $noreg
				renamable $s0 = nnan ninf nsz arcp contract afn reassoc VDIVS killed renamable $s0, killed renamable $s2, 14 /* CC::al */, $noreg
				VSTRS killed renamable $s0, killed renamable $r2, 0, 14 /* CC::al */, $noreg :: (store 4 into %ir.pResult)
				frame-destroy tPOP_RET 14 /* CC::al */, $noreg, def $r4, def $pc

				...

llvm/test/CodeGen/Thumb2/LowOverheadLoops/it-block-mov.mir

Show All 40 Lines	body: \|
; CHECK: bb.0:		; CHECK: bb.0:
; CHECK: successors: %bb.1(0x40000000), %bb.2(0x40000000)		; CHECK: successors: %bb.1(0x40000000), %bb.2(0x40000000)
; CHECK: liveins: $lr, $r0, $r1, $r2, $r4		; CHECK: liveins: $lr, $r0, $r1, $r2, $r4
; CHECK: frame-setup tPUSH 14 /* CC::al */, $noreg, killed $r4, killed $lr, implicit-def $sp, implicit $sp		; CHECK: frame-setup tPUSH 14 /* CC::al */, $noreg, killed $r4, killed $lr, implicit-def $sp, implicit $sp
; CHECK: frame-setup CFI_INSTRUCTION def_cfa_offset 8		; CHECK: frame-setup CFI_INSTRUCTION def_cfa_offset 8
; CHECK: frame-setup CFI_INSTRUCTION offset $lr, -4		; CHECK: frame-setup CFI_INSTRUCTION offset $lr, -4
; CHECK: frame-setup CFI_INSTRUCTION offset $r4, -8		; CHECK: frame-setup CFI_INSTRUCTION offset $r4, -8
; CHECK: tCMPi8 renamable $r1, 2, 14 /* CC::al */, $noreg, implicit-def $cpsr		; CHECK: tCMPi8 renamable $r1, 2, 14 /* CC::al */, $noreg, implicit-def $cpsr
; CHECK: renamable $r12 = t2MOVi 4, 14 /* CC::al */, $noreg, $noreg
; CHECK: tBcc %bb.2, 2 /* CC::hs */, killed $cpsr		; CHECK: tBcc %bb.2, 2 /* CC::hs */, killed $cpsr
; CHECK: bb.1:		; CHECK: bb.1:
; CHECK: liveins: $r2		; CHECK: liveins: $r2
; CHECK: renamable $s0 = VLDRS %const.0, 0, 14 /* CC::al */, $noreg		; CHECK: renamable $s0 = VLDRS %const.0, 0, 14 /* CC::al */, $noreg
; CHECK: VSTRS killed renamable $s0, killed renamable $r2, 0, 14 /* CC::al */, $noreg		; CHECK: VSTRS killed renamable $s0, killed renamable $r2, 0, 14 /* CC::al */, $noreg
; CHECK: tPOP_RET 14 /* CC::al */, $noreg, def $r4, def $pc		; CHECK: tPOP_RET 14 /* CC::al */, $noreg, def $r4, def $pc
; CHECK: bb.2:		; CHECK: bb.2:
; CHECK: successors: %bb.3(0x80000000)		; CHECK: successors: %bb.3(0x80000000)
; CHECK: liveins: $r0, $r1, $r2, $r12		; CHECK: liveins: $r0, $r1, $r2
; CHECK: renamable $r4, dead $cpsr = tMOVi8 1, 14 /* CC::al */, $noreg
; CHECK: tCMPi8 renamable $r1, 4, 14 /* CC::al */, $noreg, implicit-def $cpsr
; CHECK: t2IT 11, 8, implicit-def $itstate
; CHECK: $r12 = tMOVr renamable $r1, 11 /* CC::lt */, killed $cpsr, implicit killed renamable $r12, implicit killed $itstate
; CHECK: renamable $r3 = t2SUBrr renamable $r1, killed renamable $r12, 14 /* CC::al */, $noreg, $noreg
; CHECK: renamable $r3, dead $cpsr = tADDi8 killed renamable $r3, 3, 14 /* CC::al */, $noreg
; CHECK: $r12 = tMOVr $r1, 14 /* CC::al */, $noreg		; CHECK: $r12 = tMOVr $r1, 14 /* CC::al */, $noreg
; CHECK: renamable $r4 = nuw nsw t2ADDrs killed renamable $r4, killed renamable $r3, 19, 14 /* CC::al */, $noreg, $noreg
; CHECK: renamable $q0 = MVE_VMOVimmi32 0, 0, $noreg, undef renamable $q0		; CHECK: renamable $q0 = MVE_VMOVimmi32 0, 0, $noreg, undef renamable $q0
; CHECK: $r3 = tMOVr $r0, 14 /* CC::al */, $noreg		; CHECK: $r3 = tMOVr $r0, 14 /* CC::al */, $noreg
; CHECK: $lr = MVE_DLSTP_32 killed renamable $r12		; CHECK: $lr = MVE_DLSTP_32 killed renamable $r12
; CHECK: bb.3:		; CHECK: bb.3:
; CHECK: successors: %bb.3(0x7c000000), %bb.4(0x04000000)		; CHECK: successors: %bb.3(0x7c000000), %bb.4(0x04000000)
; CHECK: liveins: $lr, $q0, $r0, $r1, $r2, $r3, $r4		; CHECK: liveins: $lr, $q0, $r0, $r1, $r2, $r3
; CHECK: renamable $q1 = nnan ninf nsz MVE_VLDRWU32 renamable $r3, 0, 0, $noreg		; CHECK: renamable $q1 = nnan ninf nsz MVE_VLDRWU32 renamable $r3, 0, 0, $noreg
; CHECK: renamable $q0 = nnan ninf nsz MVE_VADDf32 killed renamable $q0, killed renamable $q1, 0, killed $noreg, killed renamable $q0		; CHECK: renamable $q0 = nnan ninf nsz MVE_VADDf32 killed renamable $q0, killed renamable $q1, 0, killed $noreg, killed renamable $q0
; CHECK: renamable $r3, dead $cpsr = nuw tADDi8 killed renamable $r3, 16, 14 /* CC::al */, $noreg		; CHECK: renamable $r3, dead $cpsr = nuw tADDi8 killed renamable $r3, 16, 14 /* CC::al */, $noreg
; CHECK: $lr = MVE_LETP killed renamable $lr, %bb.3		; CHECK: $lr = MVE_LETP killed renamable $lr, %bb.3
; CHECK: bb.4:		; CHECK: bb.4:
; CHECK: successors: %bb.5(0x80000000)		; CHECK: successors: %bb.5(0x80000000)
; CHECK: liveins: $q0, $r0, $r1, $r2, $r4		; CHECK: liveins: $q0, $r0, $r1, $r2
; CHECK: renamable $s4 = nnan ninf nsz VADDS renamable $s0, renamable $s1, 14 /* CC::al */, $noreg		; CHECK: renamable $s4 = nnan ninf nsz VADDS renamable $s0, renamable $s1, 14 /* CC::al */, $noreg
; CHECK: $r3 = tMOVr $r1, 14 /* CC::al */, $noreg		; CHECK: $r3 = tMOVr $r1, 14 /* CC::al */, $noreg
; CHECK: renamable $s4 = nnan ninf nsz VADDS renamable $s2, killed renamable $s4, 14 /* CC::al */, $noreg		; CHECK: renamable $s4 = nnan ninf nsz VADDS renamable $s2, killed renamable $s4, 14 /* CC::al */, $noreg
; CHECK: renamable $s0 = nnan ninf nsz VADDS killed renamable $s3, killed renamable $s4, 14 /* CC::al */, $noreg, implicit killed $q0		; CHECK: renamable $s0 = nnan ninf nsz VADDS killed renamable $s3, killed renamable $s4, 14 /* CC::al */, $noreg, implicit killed $q0
; CHECK: $s2 = VMOVSR $r1, 14 /* CC::al */, $noreg		; CHECK: $s2 = VMOVSR $r1, 14 /* CC::al */, $noreg
; CHECK: renamable $s2 = VUITOS killed renamable $s2, 14 /* CC::al */, $noreg		; CHECK: renamable $s2 = VUITOS killed renamable $s2, 14 /* CC::al */, $noreg
; CHECK: $lr = t2DLS killed $r4
; CHECK: renamable $s4 = nnan ninf nsz VDIVS killed renamable $s0, killed renamable $s2, 14 /* CC::al */, $noreg		; CHECK: renamable $s4 = nnan ninf nsz VDIVS killed renamable $s0, killed renamable $s2, 14 /* CC::al */, $noreg
; CHECK: renamable $q0 = MVE_VMOVimmi32 0, 0, $noreg, undef renamable $q0		; CHECK: renamable $q0 = MVE_VMOVimmi32 0, 0, $noreg, undef renamable $q0
		; CHECK: $lr = MVE_DLSTP_32 killed renamable $r3
; CHECK: bb.5:		; CHECK: bb.5:
; CHECK: successors: %bb.5(0x7c000000), %bb.6(0x04000000)		; CHECK: successors: %bb.5(0x7c000000), %bb.6(0x04000000)
; CHECK: liveins: $lr, $q0, $r0, $r1, $r2, $r3, $s4		; CHECK: liveins: $lr, $q0, $r0, $r1, $r2, $s4
; CHECK: renamable $vpr = MVE_VCTP32 renamable $r3, 0, $noreg
; CHECK: $r4 = VMOVRS $s4, 14 /* CC::al */, $noreg		; CHECK: $r4 = VMOVRS $s4, 14 /* CC::al */, $noreg
; CHECK: MVE_VPST 2, implicit $vpr		; CHECK: renamable $q2 = nnan ninf nsz MVE_VLDRWU32 renamable $r0, 0, 0, $noreg
; CHECK: renamable $q2 = nnan ninf nsz MVE_VLDRWU32 renamable $r0, 0, 1, renamable $vpr		; CHECK: renamable $q2 = nnan ninf nsz MVE_VSUB_qr_f32 killed renamable $q2, killed renamable $r4, 0, $noreg, undef renamable $q2
; CHECK: renamable $q2 = nnan ninf nsz MVE_VSUB_qr_f32 killed renamable $q2, killed renamable $r4, 1, renamable $vpr, undef renamable $q2		; CHECK: renamable $q0 = nnan ninf nsz MVE_VFMAf32 killed renamable $q0, killed renamable $q2, killed renamable $q2, 0, killed $noreg
; CHECK: renamable $q0 = nnan ninf nsz MVE_VFMAf32 killed renamable $q0, killed renamable $q2, killed renamable $q2, 1, killed renamable $vpr
; CHECK: renamable $r3, dead $cpsr = nsw tSUBi8 killed renamable $r3, 4, 14 /* CC::al */, $noreg
; CHECK: renamable $r0, dead $cpsr = nuw tADDi8 killed renamable $r0, 16, 14 /* CC::al */, $noreg		; CHECK: renamable $r0, dead $cpsr = nuw tADDi8 killed renamable $r0, 16, 14 /* CC::al */, $noreg
; CHECK: $lr = t2LEUpdate killed renamable $lr, %bb.5		; CHECK: $lr = MVE_LETP killed renamable $lr, %bb.5
; CHECK: bb.6:		; CHECK: bb.6:
; CHECK: liveins: $q0, $r1, $r2		; CHECK: liveins: $q0, $r1, $r2
; CHECK: renamable $s4 = nnan ninf nsz VADDS renamable $s0, renamable $s1, 14 /* CC::al */, $noreg		; CHECK: renamable $s4 = nnan ninf nsz VADDS renamable $s0, renamable $s1, 14 /* CC::al */, $noreg
; CHECK: renamable $r0, dead $cpsr = tSUBi3 killed renamable $r1, 1, 14 /* CC::al */, $noreg		; CHECK: renamable $r0, dead $cpsr = tSUBi3 killed renamable $r1, 1, 14 /* CC::al */, $noreg
; CHECK: renamable $s4 = nnan ninf nsz VADDS renamable $s2, killed renamable $s4, 14 /* CC::al */, $noreg		; CHECK: renamable $s4 = nnan ninf nsz VADDS renamable $s2, killed renamable $s4, 14 /* CC::al */, $noreg
; CHECK: renamable $s0 = nnan ninf nsz VADDS killed renamable $s3, killed renamable $s4, 14 /* CC::al */, $noreg, implicit killed $q0		; CHECK: renamable $s0 = nnan ninf nsz VADDS killed renamable $s3, killed renamable $s4, 14 /* CC::al */, $noreg, implicit killed $q0
; CHECK: $s2 = VMOVSR killed $r0, 14 /* CC::al */, $noreg		; CHECK: $s2 = VMOVSR killed $r0, 14 /* CC::al */, $noreg
; CHECK: renamable $s2 = VUITOS killed renamable $s2, 14 /* CC::al */, $noreg		; CHECK: renamable $s2 = VUITOS killed renamable $s2, 14 /* CC::al */, $noreg
▲ Show 20 Lines • Show All 109 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/LowOverheadLoops/mov-after-dlstp.mir

Show First 20 Lines • Show All 146 Lines • ▼ Show 20 Lines	body: \|
; CHECK-LABEL: name: arm_var_f32_mve		; CHECK-LABEL: name: arm_var_f32_mve
; CHECK: bb.0.entry:		; CHECK: bb.0.entry:
; CHECK: successors: %bb.1(0x80000000)		; CHECK: successors: %bb.1(0x80000000)
; CHECK: liveins: $lr, $r0, $r1, $r2, $r4		; CHECK: liveins: $lr, $r0, $r1, $r2, $r4
; CHECK: frame-setup tPUSH 14 /* CC::al */, $noreg, killed $r4, killed $lr, implicit-def $sp, implicit $sp		; CHECK: frame-setup tPUSH 14 /* CC::al */, $noreg, killed $r4, killed $lr, implicit-def $sp, implicit $sp
; CHECK: frame-setup CFI_INSTRUCTION def_cfa_offset 8		; CHECK: frame-setup CFI_INSTRUCTION def_cfa_offset 8
; CHECK: frame-setup CFI_INSTRUCTION offset $lr, -4		; CHECK: frame-setup CFI_INSTRUCTION offset $lr, -4
; CHECK: frame-setup CFI_INSTRUCTION offset $r4, -8		; CHECK: frame-setup CFI_INSTRUCTION offset $r4, -8
; CHECK: $r3 = tMOVr $r1, 14 /* CC::al */, $noreg
; CHECK: tCMPi8 renamable $r1, 4, 14 /* CC::al */, $noreg, implicit-def $cpsr
; CHECK: t2IT 10, 8, implicit-def $itstate
; CHECK: renamable $r3 = tMOVi8 $noreg, 4, 10 /* CC::ge */, killed $cpsr, implicit killed renamable $r3, implicit killed $itstate
; CHECK: renamable $r12 = t2MOVi 1, 14 /* CC::al */, $noreg, $noreg
; CHECK: renamable $r3, dead $cpsr = tSUBrr renamable $r1, killed renamable $r3, 14 /* CC::al */, $noreg
; CHECK: renamable $q0 = MVE_VMOVimmi32 0, 0, $noreg, undef renamable $q0		; CHECK: renamable $q0 = MVE_VMOVimmi32 0, 0, $noreg, undef renamable $q0
; CHECK: renamable $r3, dead $cpsr = tADDi8 killed renamable $r3, 3, 14 /* CC::al */, $noreg
; CHECK: renamable $lr = nuw nsw t2ADDrs killed renamable $r12, killed renamable $r3, 19, 14 /* CC::al */, $noreg, $noreg
; CHECK: $r3 = tMOVr $r1, 14 /* CC::al */, $noreg		; CHECK: $r3 = tMOVr $r1, 14 /* CC::al */, $noreg
; CHECK: $r12 = tMOVr $r0, 14 /* CC::al */, $noreg		; CHECK: $r12 = tMOVr $r0, 14 /* CC::al */, $noreg
; CHECK: $r4 = tMOVr killed $lr, 14 /* CC::al */, $noreg
; CHECK: $lr = MVE_DLSTP_32 killed renamable $r3		; CHECK: $lr = MVE_DLSTP_32 killed renamable $r3
; CHECK: bb.1.do.body.i:		; CHECK: bb.1.do.body.i:
; CHECK: successors: %bb.1(0x7c000000), %bb.2(0x04000000)		; CHECK: successors: %bb.1(0x7c000000), %bb.2(0x04000000)
; CHECK: liveins: $lr, $q0, $r0, $r1, $r2, $r4, $r12		; CHECK: liveins: $lr, $q0, $r0, $r1, $r2, $r12
; CHECK: renamable $r12, renamable $q1 = MVE_VLDRWU32_post killed renamable $r12, 16, 0, $noreg :: (load 16 from %ir.pSrc.addr.0.i2, align 4)		; CHECK: renamable $r12, renamable $q1 = MVE_VLDRWU32_post killed renamable $r12, 16, 0, $noreg :: (load 16 from %ir.pSrc.addr.0.i2, align 4)
; CHECK: renamable $q0 = nnan ninf nsz arcp contract afn reassoc MVE_VADDf32 killed renamable $q0, killed renamable $q1, 0, killed $noreg, killed renamable $q0		; CHECK: renamable $q0 = nnan ninf nsz arcp contract afn reassoc MVE_VADDf32 killed renamable $q0, killed renamable $q1, 0, killed $noreg, killed renamable $q0
; CHECK: $lr = MVE_LETP killed renamable $lr, %bb.1		; CHECK: $lr = MVE_LETP killed renamable $lr, %bb.1
; CHECK: bb.2.arm_mean_f32_mve.exit:		; CHECK: bb.2.arm_mean_f32_mve.exit:
; CHECK: successors: %bb.3(0x80000000)		; CHECK: successors: %bb.3(0x80000000)
; CHECK: liveins: $q0, $r0, $r1, $r2, $r4		; CHECK: liveins: $q0, $r0, $r1, $r2
; CHECK: $s4 = VMOVSR $r1, 14 /* CC::al */, $noreg		; CHECK: $s4 = VMOVSR $r1, 14 /* CC::al */, $noreg
; CHECK: renamable $s0 = nnan ninf nsz arcp contract afn reassoc VADDS killed renamable $s3, killed renamable $s3, 14 /* CC::al */, $noreg, implicit killed $q0		; CHECK: renamable $s0 = nnan ninf nsz arcp contract afn reassoc VADDS killed renamable $s3, killed renamable $s3, 14 /* CC::al */, $noreg, implicit killed $q0
; CHECK: $lr = t2DLS killed $r4
; CHECK: renamable $s4 = VUITOS killed renamable $s4, 14 /* CC::al */, $noreg		; CHECK: renamable $s4 = VUITOS killed renamable $s4, 14 /* CC::al */, $noreg
; CHECK: renamable $s0 = nnan ninf nsz arcp contract afn reassoc VDIVS killed renamable $s0, killed renamable $s4, 14 /* CC::al */, $noreg		; CHECK: renamable $s0 = nnan ninf nsz arcp contract afn reassoc VDIVS killed renamable $s0, killed renamable $s4, 14 /* CC::al */, $noreg
; CHECK: renamable $r3 = VMOVRS killed renamable $s0, 14 /* CC::al */, $noreg		; CHECK: renamable $r3 = VMOVRS killed renamable $s0, 14 /* CC::al */, $noreg
; CHECK: renamable $q0 = MVE_VMOVimmi32 0, 0, $noreg, undef renamable $q0		; CHECK: renamable $q0 = MVE_VMOVimmi32 0, 0, $noreg, undef renamable $q0
; CHECK: renamable $q1 = MVE_VDUP32 killed renamable $r3, 0, $noreg, undef renamable $q1		; CHECK: renamable $q1 = MVE_VDUP32 killed renamable $r3, 0, $noreg, undef renamable $q1
; CHECK: $r3 = tMOVr $r1, 14 /* CC::al */, $noreg		; CHECK: $r3 = tMOVr $r1, 14 /* CC::al */, $noreg
		; CHECK: $lr = MVE_DLSTP_32 killed renamable $r3
; CHECK: bb.3.do.body:		; CHECK: bb.3.do.body:
; CHECK: successors: %bb.3(0x7c000000), %bb.4(0x04000000)		; CHECK: successors: %bb.3(0x7c000000), %bb.4(0x04000000)
; CHECK: liveins: $lr, $q0, $q1, $r0, $r1, $r2, $r3		; CHECK: liveins: $lr, $q0, $q1, $r0, $r1, $r2
; CHECK: renamable $vpr = MVE_VCTP32 renamable $r3, 0, $noreg		; CHECK: renamable $r0, renamable $q2 = MVE_VLDRWU32_post killed renamable $r0, 16, 0, $noreg :: (load 16 from %ir.pSrc.addr.01, align 4)
; CHECK: renamable $r3, dead $cpsr = tSUBi8 killed renamable $r3, 4, 14 /* CC::al */, $noreg		; CHECK: renamable $q2 = nnan ninf nsz arcp contract afn reassoc MVE_VSUBf32 killed renamable $q2, renamable $q1, 0, $noreg, undef renamable $q2
; CHECK: MVE_VPST 2, implicit $vpr		; CHECK: renamable $q0 = nnan ninf nsz arcp contract afn reassoc MVE_VFMAf32 killed renamable $q0, killed renamable $q2, killed renamable $q2, 0, killed $noreg
; CHECK: renamable $r0, renamable $q2 = MVE_VLDRWU32_post killed renamable $r0, 16, 1, renamable $vpr :: (load 16 from %ir.pSrc.addr.01, align 4)		; CHECK: $lr = MVE_LETP killed renamable $lr, %bb.3
; CHECK: renamable $q2 = nnan ninf nsz arcp contract afn reassoc MVE_VSUBf32 killed renamable $q2, renamable $q1, 1, renamable $vpr, undef renamable $q2
; CHECK: renamable $q0 = nnan ninf nsz arcp contract afn reassoc MVE_VFMAf32 killed renamable $q0, killed renamable $q2, killed renamable $q2, 1, killed renamable $vpr
; CHECK: $lr = t2LEUpdate killed renamable $lr, %bb.3
; CHECK: bb.4.do.end:		; CHECK: bb.4.do.end:
; CHECK: liveins: $q0, $r1, $r2		; CHECK: liveins: $q0, $r1, $r2
; CHECK: renamable $r0, dead $cpsr = tSUBi3 killed renamable $r1, 1, 14 /* CC::al */, $noreg		; CHECK: renamable $r0, dead $cpsr = tSUBi3 killed renamable $r1, 1, 14 /* CC::al */, $noreg
; CHECK: renamable $s0 = nnan ninf nsz arcp contract afn reassoc VADDS killed renamable $s3, killed renamable $s3, 14 /* CC::al */, $noreg, implicit killed $q0		; CHECK: renamable $s0 = nnan ninf nsz arcp contract afn reassoc VADDS killed renamable $s3, killed renamable $s3, 14 /* CC::al */, $noreg, implicit killed $q0
; CHECK: $s2 = VMOVSR killed $r0, 14 /* CC::al */, $noreg		; CHECK: $s2 = VMOVSR killed $r0, 14 /* CC::al */, $noreg
; CHECK: renamable $s2 = VUITOS killed renamable $s2, 14 /* CC::al */, $noreg		; CHECK: renamable $s2 = VUITOS killed renamable $s2, 14 /* CC::al */, $noreg
; CHECK: renamable $s0 = nnan ninf nsz arcp contract afn reassoc VDIVS killed renamable $s0, killed renamable $s2, 14 /* CC::al */, $noreg		; CHECK: renamable $s0 = nnan ninf nsz arcp contract afn reassoc VDIVS killed renamable $s0, killed renamable $s2, 14 /* CC::al */, $noreg
; CHECK: VSTRS killed renamable $s0, killed renamable $r2, 0, 14 /* CC::al */, $noreg :: (store 4 into %ir.pResult)		; CHECK: VSTRS killed renamable $s0, killed renamable $r2, 0, 14 /* CC::al */, $noreg :: (store 4 into %ir.pResult)
▲ Show 20 Lines • Show All 77 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/LowOverheadLoops/mov-operand.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=thumbv8.1m.main-none-none-eabi -mattr=+mve.fp -verify-machineinstrs -tail-predication=enabled -o - %s \| FileCheck %s			; RUN: llc -mtriple=thumbv8.1m.main-none-none-eabi -mattr=+mve.fp -verify-machineinstrs -tail-predication=enabled -o - %s \| FileCheck %s

	define arm_aapcs_vfpcc void @arm_var_f32_mve(float* %pSrc, i32 %blockSize, float* nocapture %pResult) {			define arm_aapcs_vfpcc void @arm_var_f32_mve(float* %pSrc, i32 %blockSize, float* nocapture %pResult) {
	; CHECK-LABEL: arm_var_f32_mve:			; CHECK-LABEL: arm_var_f32_mve:
	; CHECK: @ %bb.0: @ %entry			; CHECK: @ %bb.0: @ %entry
	; CHECK-NEXT: .save {r4, lr}			; CHECK-NEXT: .save {r4, lr}
	; CHECK-NEXT: push {r4, lr}			; CHECK-NEXT: push {r4, lr}
	; CHECK-NEXT: mov r3, r1
	; CHECK-NEXT: cmp r1, #4
	; CHECK-NEXT: it ge
	; CHECK-NEXT: movge r3, #4
	; CHECK-NEXT: mov.w r12, #1
	; CHECK-NEXT: subs r3, r1, r3
	; CHECK-NEXT: vmov.i32 q0, #0x0			; CHECK-NEXT: vmov.i32 q0, #0x0
	; CHECK-NEXT: adds r3, #3
	; CHECK-NEXT: add.w lr, r12, r3, lsr #2
	; CHECK-NEXT: mov r3, r1			; CHECK-NEXT: mov r3, r1
	; CHECK-NEXT: mov r12, r0			; CHECK-NEXT: mov r12, r0
	; CHECK-NEXT: mov r4, lr
	; CHECK-NEXT: dlstp.32 lr, r3			; CHECK-NEXT: dlstp.32 lr, r3
	; CHECK-NEXT: .LBB0_1: @ %do.body.i			; CHECK-NEXT: .LBB0_1: @ %do.body.i
	; CHECK-NEXT: @ =>This Inner Loop Header: Depth=1			; CHECK-NEXT: @ =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: vldrw.u32 q1, [r12], #16			; CHECK-NEXT: vldrw.u32 q1, [r12], #16
	; CHECK-NEXT: vadd.f32 q0, q0, q1			; CHECK-NEXT: vadd.f32 q0, q0, q1
	; CHECK-NEXT: letp lr, .LBB0_1			; CHECK-NEXT: letp lr, .LBB0_1
	; CHECK-NEXT: @ %bb.2: @ %arm_mean_f32_mve.exit			; CHECK-NEXT: @ %bb.2: @ %arm_mean_f32_mve.exit
	; CHECK-NEXT: vmov s4, r1			; CHECK-NEXT: vmov s4, r1
	; CHECK-NEXT: vadd.f32 s0, s3, s3			; CHECK-NEXT: vadd.f32 s0, s3, s3
	; CHECK-NEXT: mov r3, r1			; CHECK-NEXT: mov r3, r1
	; CHECK-NEXT: vcvt.f32.u32 s4, s4			; CHECK-NEXT: vcvt.f32.u32 s4, s4
	; CHECK-NEXT: dls lr, r4
	; CHECK-NEXT: vdiv.f32 s0, s0, s4			; CHECK-NEXT: vdiv.f32 s0, s0, s4
	; CHECK-NEXT: vmov r12, s0			; CHECK-NEXT: vmov r12, s0
	; CHECK-NEXT: vmov.i32 q0, #0x0			; CHECK-NEXT: vmov.i32 q0, #0x0
				; CHECK-NEXT: dlstp.32 lr, r3
	; CHECK-NEXT: .LBB0_3: @ %do.body			; CHECK-NEXT: .LBB0_3: @ %do.body
	; CHECK-NEXT: @ =>This Inner Loop Header: Depth=1			; CHECK-NEXT: @ =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: vctp.32 r3			; CHECK-NEXT: vldrw.u32 q1, [r0], #16
	; CHECK-NEXT: subs r3, #4			; CHECK-NEXT: vsub.f32 q1, q1, r12
	; CHECK-NEXT: vpsttt			; CHECK-NEXT: vfma.f32 q0, q1, q1
	; CHECK-NEXT: vldrwt.u32 q1, [r0], #16			; CHECK-NEXT: letp lr, .LBB0_3
	; CHECK-NEXT: vsubt.f32 q1, q1, r12
	; CHECK-NEXT: vfmat.f32 q0, q1, q1
	; CHECK-NEXT: le lr, .LBB0_3
	; CHECK-NEXT: @ %bb.4: @ %do.end			; CHECK-NEXT: @ %bb.4: @ %do.end
	; CHECK-NEXT: subs r0, r1, #1			; CHECK-NEXT: subs r0, r1, #1
	; CHECK-NEXT: vadd.f32 s0, s3, s3			; CHECK-NEXT: vadd.f32 s0, s3, s3
	; CHECK-NEXT: vmov s2, r0			; CHECK-NEXT: vmov s2, r0
	; CHECK-NEXT: vcvt.f32.u32 s2, s2			; CHECK-NEXT: vcvt.f32.u32 s2, s2
	; CHECK-NEXT: vdiv.f32 s0, s0, s2			; CHECK-NEXT: vdiv.f32 s0, s0, s2
	; CHECK-NEXT: vstr s0, [r2]			; CHECK-NEXT: vstr s0, [r2]
	; CHECK-NEXT: pop {r4, pc}			; CHECK-NEXT: pop {r4, pc}
	▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines