This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/
-
CodeGen/
-
InlineSpiller.cpp
-
test/CodeGen/
-
CodeGen/
-
AMDGPU/
-
av_spill_cross_bb_usage.mir
-
spill-scavenge-offset.ll
-
swdev380865.ll
-
Hexagon/
-
regalloc-bad-undef.mir
-
Thumb2/
-
mve-postinc-dct.ll
-
mve-vst3.ll

Differential D147079

InlineSpiller: Consider if all subranges are the same when avoiding redundant spills
AcceptedPublic

Authored by arsenm on Mar 28 2023, 1:46 PM.

Download Raw Diff

Details

Reviewers

MatzeB
qcolombet
kparzysz
foad

Summary

This avoids some redundant spills of subranges, and avoids a compile failure.
This greatly reduces the numbers of spills in a loop.

The main range is not informative when multiple instructions are needed to fully define
a register. A common scenario is a lowered reg_sequence where every subregister
is sequentially defined, but each def changes the main range's value number. If
we look at specific lanes at the use index, we can see the value is actually the
same.

In this testcase, there are a large number of materialized 64-bit constant defs
which are hoisted outside of the loop by MachineLICM. These are feeding REG_SEQUENCES,
which is not considered rematerializable inside the loop. After coalescing, the split
constant defs produce main ranges with an apparent phi def. There's no phi def if you look
at each individual subrange, and only half of the register is really redefined to a constant.

Fixes: SWDEV-380865

Diff Detail

Unit TestsFailed

	Time	Test
	30 ms	Linux x64 > LLVM.Transforms/SampleProfile::pseudo-probe-peep.ll Script: -- : 'RUN: at line 2'; /var/lib/buildkite-agent/builds/linux-56-7f758798dd-v9p4k-1/llvm-project/phabricator-build/build/bin/llc -mtriple=x86_64-- -stop-after=peephole-opt -o - /var/lib/buildkite-agent/builds/linux-56-7f758798dd-v9p4k-1/llvm-project/phabricator-build/llvm/test/Transforms/SampleProfile/pseudo-probe-peep.ll \| /var/lib/buildkite-agent/builds/linux-56-7f758798dd-v9p4k-1/llvm-project/phabricator-build/build/bin/FileCheck /var/lib/buildkite-agent/builds/linux-56-7f758798dd-v9p4k-1/llvm-project/phabricator-build/llvm/test/Transforms/SampleProfile/pseudo-probe-peep.ll

Event Timeline

arsenm created this revision.Mar 28 2023, 1:46 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 28 2023, 1:46 PM

Herald added subscribers: kosarev, StephenFan, kerbowa and 3 others. · View Herald Transcript

arsenm requested review of this revision.Mar 28 2023, 1:46 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 28 2023, 1:46 PM

Herald added a subscriber: wdng. · View Herald Transcript

arsenm added a parent revision: D147072: InlineSpiller: Consider copy bundles when looking for snippet copies.Mar 28 2023, 1:46 PM

Harbormaster completed remote builds in B222334: Diff 509120.Mar 29 2023, 1:41 AM

qcolombet accepted this revision.May 5 2023, 7:10 AM

This revision is now accepted and ready to land.May 5 2023, 7:10 AM

Rebase

Harbormaster completed remote builds in B257128: Diff 556647.Sep 13 2023, 3:36 AM

arsenm mentioned this in rGd8127b2ba8a8: InlineSpiller: Consider if all subranges are the same when avoiding redundant….Oct 1 2023, 1:38 AM

d8127b2ba8a87a610851b9a462f2fc2526c36e37

This is causing infinite compile times on a bunch of graphics shaders. Can you please fix or revert?

Test case with llc -march=amdgcn -mcpu=gfx900:

define amdgpu_cs void @main(i32 %i, float %i72) {
bb:
  %i12 = call i64 @llvm.amdgcn.s.getpc()
  %i2 = lshr i64 %i12, 32
  %i3 = trunc i64 %i2 to i32
  %i4 = insertelement <2 x i32> zeroinitializer, i32 %i3, i64 1
  %i5 = bitcast <2 x i32> %i4 to i64
  %i6 = inttoptr i64 %i5 to ptr addrspace(4)
  %i7 = getelementptr i8, ptr addrspace(4) %i6, i64 48
  %i8 = load <4 x i32>, ptr addrspace(4) %i7, align 16
  %i9 = getelementptr i8, ptr addrspace(4) %i6, i64 64
  %i10 = load <4 x i32>, ptr addrspace(4) %i9, align 16
  %i11 = getelementptr i8, ptr addrspace(4) %i6, i64 240
  %i123 = load <8 x i32>, ptr addrspace(4) %i11, align 32
  %i13 = getelementptr i8, ptr addrspace(4) %i6, i64 272
  %i14 = load <8 x i32>, ptr addrspace(4) %i13, align 32
  %i15 = getelementptr i8, ptr addrspace(4) %i6, i64 304
  %i16 = load <8 x i32>, ptr addrspace(4) %i15, align 32
  %i17 = getelementptr i8, ptr addrspace(4) %i6, i64 336
  %i18 = load <8 x i32>, ptr addrspace(4) %i17, align 32
  %i19 = call <4 x i32> @llvm.amdgcn.s.buffer.load.v4i32(<4 x i32> zeroinitializer, i32 0, i32 0)
  %i20 = bitcast <4 x i32> %i19 to <4 x float>
  %i21 = extractelement <4 x float> %i20, i64 0
  %i27 = call <4 x i32> @llvm.amdgcn.s.buffer.load.v4i32(<4 x i32> zeroinitializer, i32 128, i32 0)
  %i28 = bitcast <4 x i32> %i27 to <4 x float>
  %i29 = extractelement <4 x float> %i28, i64 0
  %i30 = call <4 x i32> @llvm.amdgcn.s.buffer.load.v4i32(<4 x i32> zeroinitializer, i32 144, i32 0)
  %i31 = bitcast <4 x i32> %i30 to <4 x float>
  %i32 = extractelement <4 x float> %i31, i64 0
  %i39 = call <4 x i32> @llvm.amdgcn.s.buffer.load.v4i32(<4 x i32> zeroinitializer, i32 1, i32 0)
  %i40 = bitcast <4 x i32> %i39 to <4 x float>
  %i41 = extractelement <4 x float> %i40, i64 0
  %i42 = call <4 x i32> @llvm.amdgcn.s.buffer.load.v4i32(<4 x i32> zeroinitializer, i32 208, i32 0)
  %i43 = bitcast <4 x i32> %i42 to <4 x float>
  %i45 = extractelement <4 x float> %i43, i64 0
  %i52 = getelementptr i8, ptr addrspace(4) %i6, i64 496
  %i53 = load <8 x i32>, ptr addrspace(4) %i52, align 32
  %i54 = getelementptr i8, ptr addrspace(4) %i6, i64 528
  %i55 = load <8 x i32>, ptr addrspace(4) %i54, align 32
  %i56 = getelementptr i8, ptr addrspace(4) %i6, i64 752
  %i57 = load <8 x i32>, ptr addrspace(4) %i56, align 32
  %i58 = getelementptr i8, ptr addrspace(4) %i6, i64 784
  %i59 = load <8 x i32>, ptr addrspace(4) %i58, align 32
  %i60 = getelementptr i8, ptr addrspace(4) %i6, i64 944
  %i61 = load <4 x i32>, ptr addrspace(4) %i60, align 16
  %i67 = bitcast <4 x i32> %i61 to <4 x float>
  %i68 = extractelement <4 x float> %i67, i64 0
  %i69 = fmul float %i68, %i32
  %i726 = call float @llvm.amdgcn.image.sample.lz.2d.f32.f32(i32 1, float %i69, float 0.000000e+00, <8 x i32> %i14, <4 x i32> <i32 1, i32 1, i32 1, i32 1>, i1 false, i32 0, i32 0)
  %i76 = call <4 x float> @llvm.amdgcn.image.sample.lz.2d.v4f32.f32(i32 1, float 0.000000e+00, float 0.000000e+00, <8 x i32> %i16, <4 x i32> zeroinitializer, i1 false, i32 0, i32 0)
  %i77 = extractelement <4 x float> %i76, i64 0
  %i78 = call float @llvm.amdgcn.fmed3.f32(float %i77, float %i726, float 0.000000e+00)
  %i79 = call <4 x float> @llvm.amdgcn.image.sample.lz.2d.v4f32.f32(i32 1, float 0.000000e+00, float 0.000000e+00, <8 x i32> %i18, <4 x i32> %i8, i1 false, i32 0, i32 0)
  %i80 = extractelement <4 x float> %i79, i64 0
  %i81 = fcmp one float %i72, 0.000000e+00
  %i83 = icmp ne i32 %i, 0
  br i1 %i83, label %bb84, label %bb111

bb84:                                             ; preds = %bb
  br i1 %i81, label %bb85, label %bb102

bb85:                                             ; preds = %bb85, %bb84
  %i86 = phi float [ %i101, %bb85 ], [ 0.000000e+00, %bb84 ]
  %i87 = call <2 x float> @llvm.amdgcn.image.sample.lz.2d.v2f32.f32(i32 1, float 0.000000e+00, float 0.000000e+00, <8 x i32> %i53, <4 x i32> zeroinitializer, i1 false, i32 0, i32 0)
  %i88 = extractelement <2 x float> %i87, i64 0
  %i89 = call <2 x float> @llvm.amdgcn.image.sample.lz.2d.v2f32.f32(i32 1, float 0.000000e+00, float 0.000000e+00, <8 x i32> %i57, <4 x i32> zeroinitializer, i1 false, i32 0, i32 0)
  %i90 = extractelement <2 x float> %i89, i64 0
  %i91 = fsub float %i90, %i88
  %i95 = fmul float %i91, %i78
  %i99 = fadd float %i86, %i95
  %i101 = fsub float %i86, %i99
  br label %bb85

bb102:                                            ; preds = %bb84
  %i103 = call float @llvm.amdgcn.image.sample.lz.2d.f32.f32(i32 1, float 0.000000e+00, float 0.000000e+00, <8 x i32> %i123, <4 x i32> zeroinitializer, i1 false, i32 0, i32 0)
  %i105 = bitcast float %i103 to i32
  %i106 = insertelement <3 x i32> zeroinitializer, i32 %i105, i64 0
  call void @llvm.amdgcn.raw.buffer.store.v3i32(<3 x i32> %i106, <4 x i32> zeroinitializer, i32 0, i32 0, i32 0)
  %i109 = bitcast float %i80 to i32
  %i110 = insertelement <4 x i32> zeroinitializer, i32 %i109, i64 0
  call void @llvm.amdgcn.raw.buffer.store.v4i32(<4 x i32> %i110, <4 x i32> zeroinitializer, i32 0, i32 0, i32 0)
  ret void

bb111:                                            ; preds = %bb
  %i112 = call float @llvm.amdgcn.image.sample.lz.2d.f32.f32(i32 1, float 0.000000e+00, float 0.000000e+00, <8 x i32> %i14, <4 x i32> zeroinitializer, i1 false, i32 0, i32 0)
  br label %bb122

bb122:                                            ; preds = %bb122, %bb111
  %i1237 = phi float [ 0.000000e+00, %bb111 ], [ %i162, %bb122 ]
  %i125 = fmul float %i1237, %i21
  %i133 = fmul float %i45, %i125
  %i135 = fmul float %i133, %i29
  %i136 = fadd float %i135, %i41
  %i137 = call <2 x float> @llvm.amdgcn.image.sample.lz.2d.v2f32.f32(i32 1, float %i136, float 0.000000e+00, <8 x i32> %i55, <4 x i32> %i10, i1 false, i32 0, i32 0)
  %i138 = extractelement <2 x float> %i137, i64 0
  %i140 = call <2 x float> @llvm.amdgcn.image.sample.lz.2d.v2f32.f32(i32 1, float %i1237, float 0.000000e+00, <8 x i32> %i59, <4 x i32> zeroinitializer, i1 false, i32 0, i32 0)
  %i141 = extractelement <2 x float> %i140, i64 0
  %i142 = fsub float %i141, %i138
  %i145 = fmul float %i142, %i32
  %i147 = call <2 x float> @llvm.amdgcn.image.sample.lz.2d.v2f32.f32(i32 1, float 0.000000e+00, float 0.000000e+00, <8 x i32> zeroinitializer, <4 x i32> <i32 1, i32 1, i32 1, i32 1>, i1 false, i32 0, i32 0)
  %i148 = extractelement <2 x float> %i147, i64 0
  %i157 = fadd float %i148, %i145
  %i158 = fmul float %i157, %i78
  %i162 = fmul float %i158, %i112
  br label %bb122
}

; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare float @llvm.amdgcn.fmed3.f32(float, float, float)

; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare i64 @llvm.amdgcn.s.getpc()

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(read)
declare <4 x float> @llvm.amdgcn.image.sample.lz.2d.v4f32.f32(i32 immarg, float, float, <8 x i32>, <4 x i32>, i1 immarg, i32 immarg, i32 immarg)

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(read)
declare float @llvm.amdgcn.image.sample.lz.2d.f32.f32(i32 immarg, float, float, <8 x i32>, <4 x i32>, i1 immarg, i32 immarg, i32 immarg)

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(read)
declare <2 x float> @llvm.amdgcn.image.sample.lz.2d.v2f32.f32(i32 immarg, float, float, <8 x i32>, <4 x i32>, i1 immarg, i32 immarg, i32 immarg)

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
declare <4 x i32> @llvm.amdgcn.s.buffer.load.v4i32(<4 x i32>, i32, i32 immarg)

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(write)
declare void @llvm.amdgcn.raw.buffer.store.v3i32(<3 x i32>, <4 x i32>, i32, i32, i32 immarg)

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(write)
declare void @llvm.amdgcn.raw.buffer.store.v4i32(<4 x i32>, <4 x i32>, i32, i32, i32 immarg)

This patch causes out of memory errors in the openmp tests. Reverting the patch fixes these errors.

https://lab.llvm.org/buildbot/#/builders/193/builds/39450

I added reverting commit https://github.com/llvm/llvm-project/commit/e816c89c8406670e516c1b20af586358748bf77e

arsenm reopened this revision.Oct 5 2023, 9:56 AM

This revision is now accepted and ready to land.Oct 5 2023, 9:56 AM

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

InlineSpiller.cpp

33 lines

test/

CodeGen/

AMDGPU/

av_spill_cross_bb_usage.mir

16 lines

spill-scavenge-offset.ll

343 lines

swdev380865.ll

99 lines

Hexagon/

regalloc-bad-undef.mir

4 lines

Thumb2/

mve-postinc-dct.ll

42 lines

mve-vst3.ll

57 lines

Diff 556647

llvm/lib/CodeGen/InlineSpiller.cpp

Show First 20 Lines • Show All 489 Lines • ▼ Show 20 Lines	#endif
// to mergeable list. In X86 AMX, 2 intructions are required to store.		// to mergeable list. In X86 AMX, 2 intructions are required to store.
// We disable the merge for this case.		// We disable the merge for this case.
if (MIS.begin() == MII)		if (MIS.begin() == MII)
HSpiller.addToMergeableSpills(*MII, StackSlot, Original);		HSpiller.addToMergeableSpills(*MII, StackSlot, Original);
++NumSpills;		++NumSpills;
return true;		return true;
}		}

		/// Check if all subranges in \p LI and \p SLI have the same value number at \p
		/// Idx.
		static bool allSubRangeValNoSame(const LiveInterval &LI,
		const LiveInterval &SLI,
		const MachineInstr &MI,
		const MachineRegisterInfo &MRI,
		const TargetRegisterInfo &TRI, SlotIndex Idx) {
		for (auto &SR : SLI.subranges()) {
		VNInfo *SubVNI = SR.getVNInfoAt(Idx);

		for (auto &SubLI : LI.subranges()) {
		if (SubLI.LaneMask == SR.LaneMask) {
		if (SubVNI != SubLI.getVNInfoAt(Idx))
		return false;
		} else if ((SubLI.LaneMask & SR.LaneMask).any()) {
		// TODO: Check non-exact, overlapping subranges if they share the same
		// def instruction
		return false;
		}
		}
		}

		return true;
		}

/// eliminateRedundantSpills - SLI:VNI is known to be on the stack. Remove any		/// eliminateRedundantSpills - SLI:VNI is known to be on the stack. Remove any
/// redundant spills of this value in SLI.reg and sibling copies.		/// redundant spills of this value in SLI.reg and sibling copies.
void InlineSpiller::eliminateRedundantSpills(LiveInterval &SLI, VNInfo *VNI) {		void InlineSpiller::eliminateRedundantSpills(LiveInterval &SLI, VNInfo *VNI) {
assert(VNI && "Missing value");		assert(VNI && "Missing value");
SmallVector<std::pair<LiveInterval, VNInfo>, 8> WorkList;		SmallVector<std::pair<LiveInterval, VNInfo>, 8> WorkList;
WorkList.push_back(std::make_pair(&SLI, VNI));		WorkList.push_back(std::make_pair(&SLI, VNI));
assert(StackInt && "No stack slot assigned yet.");		assert(StackInt && "No stack slot assigned yet.");

Show All 13 Lines	do {
LLVM_DEBUG(dbgs() << "Merged to stack int: " << *StackInt << '\n');		LLVM_DEBUG(dbgs() << "Merged to stack int: " << *StackInt << '\n');

// Find all spills and copies of VNI.		// Find all spills and copies of VNI.
for (MachineInstr &MI :		for (MachineInstr &MI :
llvm::make_early_inc_range(MRI.use_nodbg_bundles(Reg))) {		llvm::make_early_inc_range(MRI.use_nodbg_bundles(Reg))) {
if (!MI.mayStore() && !TII.isCopyInstr(MI))		if (!MI.mayStore() && !TII.isCopyInstr(MI))
continue;		continue;
SlotIndex Idx = LIS.getInstructionIndex(MI);		SlotIndex Idx = LIS.getInstructionIndex(MI);
if (LI->getVNInfoAt(Idx) != VNI)
		// The main range value numbers will differ if multiple instructions are
		// used to define its various subregisters. Check the subregister value
		// numbers as a fallback.
		if (LI->getVNInfoAt(Idx) != VNI &&
		(!SLI.hasSubRanges() \|\|
		!allSubRangeValNoSame(*LI, SLI, MI, MRI, TRI, Idx)))
continue;		continue;

// Follow sibling copies down the dominator tree.		// Follow sibling copies down the dominator tree.
if (Register DstReg = isCopyOfBundle(MI, Reg, TII)) {		if (Register DstReg = isCopyOfBundle(MI, Reg, TII)) {
if (isSibling(DstReg)) {		if (isSibling(DstReg)) {
LiveInterval &DstLI = LIS.getInterval(DstReg);		LiveInterval &DstLI = LIS.getInterval(DstReg);
VNInfo *DstVNI = DstLI.getVNInfoAt(Idx.getRegSlot());		VNInfo *DstVNI = DstLI.getVNInfoAt(Idx.getRegSlot());
assert(DstVNI && "Missing defined value");		assert(DstVNI && "Missing defined value");
▲ Show 20 Lines • Show All 1,158 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/av_spill_cross_bb_usage.mir

Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	body: \|
; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr61, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec :: (store (s32) into %stack.15, addrspace 5)		; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr61, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec :: (store (s32) into %stack.15, addrspace 5)
; GCN-NEXT: renamable $vgpr44 = COPY $vgpr13, implicit $exec		; GCN-NEXT: renamable $vgpr44 = COPY $vgpr13, implicit $exec
; GCN-NEXT: renamable $vgpr43 = COPY $vgpr12, implicit $exec		; GCN-NEXT: renamable $vgpr43 = COPY $vgpr12, implicit $exec
; GCN-NEXT: S_CBRANCH_SCC1 %bb.2, implicit undef $scc		; GCN-NEXT: S_CBRANCH_SCC1 %bb.2, implicit undef $scc
; GCN-NEXT: S_BRANCH %bb.1		; GCN-NEXT: S_BRANCH %bb.1
; GCN-NEXT: {{ $}}		; GCN-NEXT: {{ $}}
; GCN-NEXT: bb.1:		; GCN-NEXT: bb.1:
; GCN-NEXT: successors: %bb.2(0x80000000)		; GCN-NEXT: successors: %bb.2(0x80000000)
; GCN-NEXT: liveins: $exec:0x000000000000000F, $sgpr30, $sgpr31, $vgpr0:0x0000000000000003, $vgpr1:0x0000000000000003, $vgpr2:0x0000000000000003, $vgpr3:0x0000000000000003, $vgpr4:0x0000000000000003, $vgpr5:0x0000000000000003, $vgpr6:0x0000000000000003, $vgpr7:0x0000000000000003, $vgpr8:0x0000000000000003, $vgpr9:0x0000000000000003, $vgpr40, $sgpr30_sgpr31, $vgpr10_vgpr11:0x000000000000000F, $vgpr14_vgpr15:0x000000000000000F, $vgpr41_vgpr42:0x000000000000000F, $vgpr43_vgpr44:0x000000000000000F, $vgpr45_vgpr46:0x000000000000000F, $vgpr56_vgpr57:0x000000000000000F, $vgpr58_vgpr59:0x000000000000000F, $vgpr60_vgpr61:0x000000000000000F		; GCN-NEXT: liveins: $exec:0x000000000000000F, $sgpr30, $sgpr31, $vgpr0:0x0000000000000003, $vgpr1:0x0000000000000003, $vgpr2:0x0000000000000003, $vgpr3:0x0000000000000003, $vgpr4:0x0000000000000003, $vgpr5:0x0000000000000003, $vgpr6:0x0000000000000003, $vgpr7:0x0000000000000003, $vgpr8:0x0000000000000003, $vgpr9:0x0000000000000003, $vgpr40, $sgpr30_sgpr31, $vgpr10_vgpr11:0x000000000000000F, $vgpr41_vgpr42:0x000000000000000F, $vgpr43_vgpr44:0x000000000000000F, $vgpr45_vgpr46:0x000000000000000F, $vgpr56_vgpr57:0x000000000000000F, $vgpr58_vgpr59:0x000000000000000F, $vgpr60_vgpr61:0x000000000000000F
; GCN-NEXT: {{ $}}		; GCN-NEXT: {{ $}}
; GCN-NEXT: renamable $vgpr57 = COPY $vgpr9, implicit $exec		; GCN-NEXT: renamable $vgpr57 = COPY $vgpr9, implicit $exec
; GCN-NEXT: renamable $vgpr56 = COPY $vgpr8, implicit $exec		; GCN-NEXT: renamable $vgpr56 = COPY $vgpr8, implicit $exec
; GCN-NEXT: renamable $vgpr59 = COPY $vgpr7, implicit $exec		; GCN-NEXT: renamable $vgpr59 = COPY $vgpr7, implicit $exec
; GCN-NEXT: renamable $vgpr58 = COPY $vgpr6, implicit $exec		; GCN-NEXT: renamable $vgpr58 = COPY $vgpr6, implicit $exec
; GCN-NEXT: renamable $vgpr61 = COPY $vgpr5, implicit $exec		; GCN-NEXT: renamable $vgpr61 = COPY $vgpr5, implicit $exec
; GCN-NEXT: renamable $vgpr60 = COPY $vgpr4, implicit $exec		; GCN-NEXT: renamable $vgpr60 = COPY $vgpr4, implicit $exec
; GCN-NEXT: renamable $vgpr42 = COPY $vgpr3, implicit $exec		; GCN-NEXT: renamable $vgpr42 = COPY $vgpr3, implicit $exec
; GCN-NEXT: renamable $vgpr41 = COPY $vgpr2, implicit $exec		; GCN-NEXT: renamable $vgpr41 = COPY $vgpr2, implicit $exec
; GCN-NEXT: renamable $vgpr46 = COPY $vgpr1, implicit $exec		; GCN-NEXT: renamable $vgpr46 = COPY $vgpr1, implicit $exec
; GCN-NEXT: renamable $vgpr45 = COPY $vgpr0, implicit $exec		; GCN-NEXT: renamable $vgpr45 = COPY $vgpr0, implicit $exec
; GCN-NEXT: renamable $sgpr16_sgpr17 = IMPLICIT_DEF		; GCN-NEXT: renamable $sgpr16_sgpr17 = IMPLICIT_DEF
; GCN-NEXT: $vgpr40 = V_WRITELANE_B32 $sgpr30, 0, $vgpr40, implicit-def $sgpr30_sgpr31, implicit $sgpr30_sgpr31		; GCN-NEXT: $vgpr40 = V_WRITELANE_B32 $sgpr30, 0, $vgpr40, implicit-def $sgpr30_sgpr31, implicit $sgpr30_sgpr31
; GCN-NEXT: $vgpr40 = V_WRITELANE_B32 $sgpr31, 1, $vgpr40, implicit $sgpr30_sgpr31		; GCN-NEXT: $vgpr40 = V_WRITELANE_B32 $sgpr31, 1, $vgpr40, implicit $sgpr30_sgpr31
; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr14, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 52, 0, 0, implicit $exec, implicit-def $vgpr14_vgpr15, implicit $vgpr14_vgpr15 :: (store (s32) into %stack.1, addrspace 5)		; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr10, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 52, 0, 0, implicit $exec, implicit-def $vgpr10_vgpr11, implicit $vgpr10_vgpr11 :: (store (s32) into %stack.1, addrspace 5)
; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr15, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 56, 0, 0, implicit $exec, implicit killed $vgpr14_vgpr15 :: (store (s32) into %stack.1 + 4, addrspace 5)		; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr11, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 56, 0, 0, implicit $exec, implicit killed $vgpr10_vgpr11 :: (store (s32) into %stack.1 + 4, addrspace 5)
; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr10, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 60, 0, 0, implicit $exec, implicit-def $vgpr10_vgpr11, implicit $vgpr10_vgpr11 :: (store (s32) into %stack.2, addrspace 5)
; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr11, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 64, 0, 0, implicit $exec, implicit killed $vgpr10_vgpr11 :: (store (s32) into %stack.2 + 4, addrspace 5)
; GCN-NEXT: dead $sgpr30_sgpr31 = SI_CALL killed renamable $sgpr16_sgpr17, 0, csr_amdgpu, implicit-def dead $vgpr0		; GCN-NEXT: dead $sgpr30_sgpr31 = SI_CALL killed renamable $sgpr16_sgpr17, 0, csr_amdgpu, implicit-def dead $vgpr0
; GCN-NEXT: $vgpr14 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 52, 0, 0, implicit $exec, implicit-def $vgpr14_vgpr15 :: (load (s32) from %stack.1, addrspace 5)		; GCN-NEXT: $vgpr14 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 60, 0, 0, implicit $exec, implicit-def $vgpr14_vgpr15 :: (load (s32) from %stack.2, addrspace 5)
; GCN-NEXT: $vgpr15 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 56, 0, 0, implicit $exec, implicit-def $vgpr14_vgpr15 :: (load (s32) from %stack.1 + 4, addrspace 5)		; GCN-NEXT: $vgpr15 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 64, 0, 0, implicit $exec, implicit-def $vgpr14_vgpr15 :: (load (s32) from %stack.2 + 4, addrspace 5)
; GCN-NEXT: renamable $vgpr0_vgpr1 = nofpexcept V_FMA_F64_e64 0, killed $vgpr45_vgpr46, 0, killed $vgpr41_vgpr42, 0, killed $vgpr60_vgpr61, 0, 0, implicit $mode, implicit $exec		; GCN-NEXT: renamable $vgpr0_vgpr1 = nofpexcept V_FMA_F64_e64 0, killed $vgpr45_vgpr46, 0, killed $vgpr41_vgpr42, 0, killed $vgpr60_vgpr61, 0, 0, implicit $mode, implicit $exec
; GCN-NEXT: FLAT_STORE_DWORDX2 killed renamable $vgpr58_vgpr59, killed renamable $vgpr0_vgpr1, 0, 0, implicit $exec, implicit $flat_scr :: (store (s64))		; GCN-NEXT: FLAT_STORE_DWORDX2 killed renamable $vgpr58_vgpr59, killed renamable $vgpr0_vgpr1, 0, 0, implicit $exec, implicit $flat_scr :: (store (s64))
; GCN-NEXT: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 60, 0, 0, implicit $exec, implicit-def $vgpr0_vgpr1 :: (load (s32) from %stack.2, addrspace 5)		; GCN-NEXT: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 52, 0, 0, implicit $exec, implicit-def $vgpr0_vgpr1 :: (load (s32) from %stack.1, addrspace 5)
; GCN-NEXT: $vgpr1 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 64, 0, 0, implicit $exec, implicit-def $vgpr0_vgpr1 :: (load (s32) from %stack.2 + 4, addrspace 5)		; GCN-NEXT: $vgpr1 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 56, 0, 0, implicit $exec, implicit-def $vgpr0_vgpr1 :: (load (s32) from %stack.1 + 4, addrspace 5)
; GCN-NEXT: FLAT_STORE_DWORDX2 killed renamable $vgpr0_vgpr1, killed renamable $vgpr56_vgpr57, 0, 0, implicit $exec, implicit $flat_scr :: (store (s64))		; GCN-NEXT: FLAT_STORE_DWORDX2 killed renamable $vgpr0_vgpr1, killed renamable $vgpr56_vgpr57, 0, 0, implicit $exec, implicit $flat_scr :: (store (s64))
; GCN-NEXT: {{ $}}		; GCN-NEXT: {{ $}}
; GCN-NEXT: bb.2:		; GCN-NEXT: bb.2:
; GCN-NEXT: liveins: $vgpr40, $vgpr14_vgpr15:0x000000000000000F, $vgpr43_vgpr44:0x000000000000000F		; GCN-NEXT: liveins: $vgpr40, $vgpr14_vgpr15:0x000000000000000F, $vgpr43_vgpr44:0x000000000000000F
; GCN-NEXT: {{ $}}		; GCN-NEXT: {{ $}}
; GCN-NEXT: renamable $vgpr0_vgpr1 = V_MOV_B64_PSEUDO 0, implicit $exec		; GCN-NEXT: renamable $vgpr0_vgpr1 = V_MOV_B64_PSEUDO 0, implicit $exec
; GCN-NEXT: FLAT_STORE_DWORDX2 undef renamable $vgpr0_vgpr1, killed renamable $vgpr43_vgpr44, 0, 0, implicit $exec, implicit $flat_scr :: (store (s64))		; GCN-NEXT: FLAT_STORE_DWORDX2 undef renamable $vgpr0_vgpr1, killed renamable $vgpr43_vgpr44, 0, 0, implicit $exec, implicit $flat_scr :: (store (s64))
; GCN-NEXT: FLAT_STORE_DWORDX2 killed renamable $vgpr0_vgpr1, killed renamable $vgpr14_vgpr15, 0, 0, implicit $exec, implicit $flat_scr :: (store (s64))		; GCN-NEXT: FLAT_STORE_DWORDX2 killed renamable $vgpr0_vgpr1, killed renamable $vgpr14_vgpr15, 0, 0, implicit $exec, implicit $flat_scr :: (store (s64))
▲ Show 20 Lines • Show All 59 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/spill-scavenge-offset.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 10,267 Lines • ▼ Show 20 Lines
	; GFX6-NEXT: ; def s[2:3]			; GFX6-NEXT: ; def s[2:3]
	; GFX6-NEXT: ;;#ASMEND			; GFX6-NEXT: ;;#ASMEND
	; GFX6-NEXT: ;;#ASMSTART			; GFX6-NEXT: ;;#ASMSTART
	; GFX6-NEXT: ; def s33			; GFX6-NEXT: ; def s33
	; GFX6-NEXT: ;;#ASMEND			; GFX6-NEXT: ;;#ASMEND
	; GFX6-NEXT: s_and_saveexec_b64 s[34:35], vcc			; GFX6-NEXT: s_and_saveexec_b64 s[34:35], vcc
	; GFX6-NEXT: s_cbranch_execz .LBB1_2			; GFX6-NEXT: s_cbranch_execz .LBB1_2
	; GFX6-NEXT: ; %bb.1: ; %bb0			; GFX6-NEXT: ; %bb.1: ; %bb0
	; GFX6-NEXT: s_mov_b64 s[36:37], exec			; GFX6-NEXT: s_mov_b64 s[0:1], exec
	; GFX6-NEXT: s_mov_b64 exec, 0xff			; GFX6-NEXT: s_mov_b64 exec, 0xff
	; GFX6-NEXT: buffer_store_dword v4, off, s[40:43], 0			; GFX6-NEXT: buffer_store_dword v0, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: v_writelane_b32 v4, s8, 0			; GFX6-NEXT: v_writelane_b32 v0, s8, 0
	; GFX6-NEXT: v_writelane_b32 v4, s9, 1			; GFX6-NEXT: v_writelane_b32 v0, s9, 1
	; GFX6-NEXT: v_writelane_b32 v4, s10, 2			; GFX6-NEXT: v_writelane_b32 v0, s10, 2
	; GFX6-NEXT: v_writelane_b32 v4, s11, 3			; GFX6-NEXT: v_writelane_b32 v0, s11, 3
	; GFX6-NEXT: v_writelane_b32 v4, s12, 4			; GFX6-NEXT: v_writelane_b32 v0, s12, 4
	; GFX6-NEXT: v_writelane_b32 v4, s13, 5			; GFX6-NEXT: v_writelane_b32 v0, s13, 5
	; GFX6-NEXT: v_writelane_b32 v4, s14, 6			; GFX6-NEXT: v_writelane_b32 v0, s14, 6
	; GFX6-NEXT: v_writelane_b32 v4, s15, 7			; GFX6-NEXT: v_writelane_b32 v0, s15, 7
	; GFX6-NEXT: s_mov_b32 s38, 0x84400			; GFX6-NEXT: s_mov_b32 s36, 0x84400
	; GFX6-NEXT: buffer_store_dword v4, off, s[40:43], s38 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v0, off, s[40:43], s36 ; 4-byte Folded Spill
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dword v4, off, s[40:43], 0			; GFX6-NEXT: buffer_load_dword v0, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: s_mov_b64 exec, s[36:37]			; GFX6-NEXT: s_mov_b64 exec, s[0:1]
	; GFX6-NEXT: s_mov_b64 s[36:37], exec			; GFX6-NEXT: s_mov_b64 s[0:1], exec
	; GFX6-NEXT: s_mov_b64 exec, 0xff			; GFX6-NEXT: s_mov_b64 exec, 0xff
	; GFX6-NEXT: s_mov_b32 s38, 0x83c00			; GFX6-NEXT: s_mov_b32 s36, 0x83c00
	; GFX6-NEXT: buffer_store_dword v4, off, s[40:43], 0			; GFX6-NEXT: buffer_store_dword v0, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dword v4, off, s[40:43], s38 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v0, off, s[40:43], s36 ; 4-byte Folded Reload
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: v_readlane_b32 s8, v4, 0			; GFX6-NEXT: v_readlane_b32 s8, v0, 0
	; GFX6-NEXT: v_readlane_b32 s9, v4, 1			; GFX6-NEXT: v_readlane_b32 s9, v0, 1
	; GFX6-NEXT: v_readlane_b32 s10, v4, 2			; GFX6-NEXT: v_readlane_b32 s10, v0, 2
	; GFX6-NEXT: v_readlane_b32 s11, v4, 3			; GFX6-NEXT: v_readlane_b32 s11, v0, 3
	; GFX6-NEXT: v_readlane_b32 s12, v4, 4			; GFX6-NEXT: v_readlane_b32 s12, v0, 4
	; GFX6-NEXT: v_readlane_b32 s13, v4, 5			; GFX6-NEXT: v_readlane_b32 s13, v0, 5
	; GFX6-NEXT: v_readlane_b32 s14, v4, 6			; GFX6-NEXT: v_readlane_b32 s14, v0, 6
	; GFX6-NEXT: v_readlane_b32 s15, v4, 7			; GFX6-NEXT: v_readlane_b32 s15, v0, 7
	; GFX6-NEXT: buffer_load_dword v4, off, s[40:43], 0			; GFX6-NEXT: buffer_load_dword v0, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: s_mov_b64 exec, s[36:37]			; GFX6-NEXT: s_mov_b64 exec, s[0:1]
	; GFX6-NEXT: s_mov_b64 s[36:37], exec			; GFX6-NEXT: s_mov_b64 s[0:1], exec
	; GFX6-NEXT: s_mov_b64 exec, 0xff			; GFX6-NEXT: s_mov_b64 exec, 0xff
	; GFX6-NEXT: buffer_store_dword v4, off, s[40:43], 0			; GFX6-NEXT: buffer_store_dword v0, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: v_writelane_b32 v4, s16, 0			; GFX6-NEXT: v_writelane_b32 v0, s16, 0
	; GFX6-NEXT: v_writelane_b32 v4, s17, 1			; GFX6-NEXT: v_writelane_b32 v0, s17, 1
	; GFX6-NEXT: v_writelane_b32 v4, s18, 2			; GFX6-NEXT: v_writelane_b32 v0, s18, 2
	; GFX6-NEXT: v_writelane_b32 v4, s19, 3			; GFX6-NEXT: v_writelane_b32 v0, s19, 3
	; GFX6-NEXT: v_writelane_b32 v4, s20, 4			; GFX6-NEXT: v_writelane_b32 v0, s20, 4
	; GFX6-NEXT: v_writelane_b32 v4, s21, 5			; GFX6-NEXT: v_writelane_b32 v0, s21, 5
	; GFX6-NEXT: v_writelane_b32 v4, s22, 6			; GFX6-NEXT: v_writelane_b32 v0, s22, 6
	; GFX6-NEXT: v_writelane_b32 v4, s23, 7			; GFX6-NEXT: v_writelane_b32 v0, s23, 7
	; GFX6-NEXT: s_mov_b32 s38, 0x84c00			; GFX6-NEXT: s_mov_b32 s36, 0x84c00
	; GFX6-NEXT: buffer_store_dword v4, off, s[40:43], s38 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v0, off, s[40:43], s36 ; 4-byte Folded Spill
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dword v4, off, s[40:43], 0			; GFX6-NEXT: buffer_load_dword v0, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: s_mov_b64 exec, s[36:37]			; GFX6-NEXT: s_mov_b64 exec, s[0:1]
	; GFX6-NEXT: s_mov_b64 s[36:37], exec			; GFX6-NEXT: s_mov_b64 s[0:1], exec
	; GFX6-NEXT: s_mov_b64 exec, 0xff			; GFX6-NEXT: s_mov_b64 exec, 0xff
	; GFX6-NEXT: s_mov_b32 s38, 0x84400			; GFX6-NEXT: s_mov_b32 s36, 0x84400
	; GFX6-NEXT: buffer_store_dword v4, off, s[40:43], 0			; GFX6-NEXT: buffer_store_dword v0, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dword v4, off, s[40:43], s38 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v0, off, s[40:43], s36 ; 4-byte Folded Reload
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: v_readlane_b32 s16, v4, 0			; GFX6-NEXT: v_readlane_b32 s16, v0, 0
	; GFX6-NEXT: v_readlane_b32 s17, v4, 1			; GFX6-NEXT: v_readlane_b32 s17, v0, 1
	; GFX6-NEXT: v_readlane_b32 s18, v4, 2			; GFX6-NEXT: v_readlane_b32 s18, v0, 2
	; GFX6-NEXT: v_readlane_b32 s19, v4, 3			; GFX6-NEXT: v_readlane_b32 s19, v0, 3
	; GFX6-NEXT: v_readlane_b32 s20, v4, 4			; GFX6-NEXT: v_readlane_b32 s20, v0, 4
	; GFX6-NEXT: v_readlane_b32 s21, v4, 5			; GFX6-NEXT: v_readlane_b32 s21, v0, 5
	; GFX6-NEXT: v_readlane_b32 s22, v4, 6			; GFX6-NEXT: v_readlane_b32 s22, v0, 6
	; GFX6-NEXT: v_readlane_b32 s23, v4, 7			; GFX6-NEXT: v_readlane_b32 s23, v0, 7
	; GFX6-NEXT: buffer_load_dword v4, off, s[40:43], 0			; GFX6-NEXT: buffer_load_dword v0, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: s_mov_b64 exec, s[36:37]			; GFX6-NEXT: s_mov_b64 exec, s[0:1]
	; GFX6-NEXT: s_mov_b64 s[36:37], exec			; GFX6-NEXT: s_mov_b64 s[0:1], exec
	; GFX6-NEXT: s_mov_b64 exec, 0xff			; GFX6-NEXT: s_mov_b64 exec, 0xff
	; GFX6-NEXT: buffer_store_dword v4, off, s[40:43], 0			; GFX6-NEXT: buffer_store_dword v0, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: v_writelane_b32 v4, s24, 0			; GFX6-NEXT: v_writelane_b32 v0, s24, 0
	; GFX6-NEXT: v_writelane_b32 v4, s25, 1			; GFX6-NEXT: v_writelane_b32 v0, s25, 1
	; GFX6-NEXT: v_writelane_b32 v4, s26, 2			; GFX6-NEXT: v_writelane_b32 v0, s26, 2
	; GFX6-NEXT: v_writelane_b32 v4, s27, 3			; GFX6-NEXT: v_writelane_b32 v0, s27, 3
	; GFX6-NEXT: v_writelane_b32 v4, s28, 4			; GFX6-NEXT: v_writelane_b32 v0, s28, 4
	; GFX6-NEXT: v_writelane_b32 v4, s29, 5			; GFX6-NEXT: v_writelane_b32 v0, s29, 5
	; GFX6-NEXT: v_writelane_b32 v4, s30, 6			; GFX6-NEXT: v_writelane_b32 v0, s30, 6
	; GFX6-NEXT: v_writelane_b32 v4, s31, 7			; GFX6-NEXT: v_writelane_b32 v0, s31, 7
	; GFX6-NEXT: s_mov_b32 s38, 0x85400			; GFX6-NEXT: s_mov_b32 s36, 0x85400
	; GFX6-NEXT: buffer_store_dword v4, off, s[40:43], s38 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v0, off, s[40:43], s36 ; 4-byte Folded Spill
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dword v4, off, s[40:43], 0			; GFX6-NEXT: buffer_load_dword v0, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: s_mov_b64 exec, s[36:37]			; GFX6-NEXT: s_mov_b64 exec, s[0:1]
	; GFX6-NEXT: s_mov_b64 s[36:37], exec			; GFX6-NEXT: s_mov_b64 s[0:1], exec
	; GFX6-NEXT: s_mov_b64 exec, 0xff			; GFX6-NEXT: s_mov_b64 exec, 0xff
	; GFX6-NEXT: s_mov_b32 s38, 0x84c00			; GFX6-NEXT: s_mov_b32 s36, 0x84c00
	; GFX6-NEXT: buffer_store_dword v4, off, s[40:43], 0			; GFX6-NEXT: buffer_store_dword v0, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dword v4, off, s[40:43], s38 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v0, off, s[40:43], s36 ; 4-byte Folded Reload
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: v_readlane_b32 s24, v4, 0			; GFX6-NEXT: v_readlane_b32 s24, v0, 0
	; GFX6-NEXT: v_readlane_b32 s25, v4, 1			; GFX6-NEXT: v_readlane_b32 s25, v0, 1
	; GFX6-NEXT: v_readlane_b32 s26, v4, 2			; GFX6-NEXT: v_readlane_b32 s26, v0, 2
	; GFX6-NEXT: v_readlane_b32 s27, v4, 3			; GFX6-NEXT: v_readlane_b32 s27, v0, 3
	; GFX6-NEXT: v_readlane_b32 s28, v4, 4			; GFX6-NEXT: v_readlane_b32 s28, v0, 4
	; GFX6-NEXT: v_readlane_b32 s29, v4, 5			; GFX6-NEXT: v_readlane_b32 s29, v0, 5
	; GFX6-NEXT: v_readlane_b32 s30, v4, 6			; GFX6-NEXT: v_readlane_b32 s30, v0, 6
	; GFX6-NEXT: v_readlane_b32 s31, v4, 7			; GFX6-NEXT: v_readlane_b32 s31, v0, 7
	; GFX6-NEXT: buffer_load_dword v4, off, s[40:43], 0			; GFX6-NEXT: buffer_load_dword v0, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: s_mov_b64 exec, s[36:37]			; GFX6-NEXT: s_mov_b64 exec, s[0:1]
	; GFX6-NEXT: s_mov_b64 s[36:37], exec
	; GFX6-NEXT: s_mov_b64 exec, 15
	; GFX6-NEXT: buffer_store_dword v4, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: v_writelane_b32 v4, s0, 0
	; GFX6-NEXT: v_writelane_b32 v4, s1, 1
	; GFX6-NEXT: v_writelane_b32 v4, s2, 2
	; GFX6-NEXT: v_writelane_b32 v4, s3, 3
	; GFX6-NEXT: s_mov_b32 s38, 0x85c00
	; GFX6-NEXT: buffer_store_dword v4, off, s[40:43], s38 ; 4-byte Folded Spill
	; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dword v4, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: s_mov_b64 exec, s[36:37]
	; GFX6-NEXT: s_mov_b64 s[0:1], exec			; GFX6-NEXT: s_mov_b64 s[0:1], exec
	; GFX6-NEXT: s_mov_b64 exec, 15			; GFX6-NEXT: s_mov_b64 exec, 15
	; GFX6-NEXT: buffer_store_dword v4, off, s[40:43], 0			; GFX6-NEXT: buffer_store_dword v0, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: v_writelane_b32 v4, s4, 0			; GFX6-NEXT: v_writelane_b32 v0, s4, 0
	; GFX6-NEXT: v_writelane_b32 v4, s5, 1			; GFX6-NEXT: v_writelane_b32 v0, s5, 1
	; GFX6-NEXT: v_writelane_b32 v4, s6, 2			; GFX6-NEXT: v_writelane_b32 v0, s6, 2
	; GFX6-NEXT: v_writelane_b32 v4, s7, 3			; GFX6-NEXT: v_writelane_b32 v0, s7, 3
	; GFX6-NEXT: s_mov_b32 s36, 0x86000			; GFX6-NEXT: s_mov_b32 s36, 0x85c00
	; GFX6-NEXT: buffer_store_dword v4, off, s[40:43], s36 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v0, off, s[40:43], s36 ; 4-byte Folded Spill
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dword v4, off, s[40:43], 0			; GFX6-NEXT: buffer_load_dword v0, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: s_mov_b64 exec, s[0:1]			; GFX6-NEXT: s_mov_b64 exec, s[0:1]
	; GFX6-NEXT: s_mov_b64 s[0:1], exec			; GFX6-NEXT: s_mov_b64 s[0:1], exec
	; GFX6-NEXT: s_mov_b64 exec, 3			; GFX6-NEXT: s_mov_b64 exec, 3
	; GFX6-NEXT: buffer_store_dword v4, off, s[40:43], 0			; GFX6-NEXT: buffer_store_dword v0, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: v_writelane_b32 v4, s2, 0			; GFX6-NEXT: v_writelane_b32 v0, s2, 0
	; GFX6-NEXT: v_writelane_b32 v4, s3, 1			; GFX6-NEXT: v_writelane_b32 v0, s3, 1
	; GFX6-NEXT: s_mov_b32 s4, 0x86400			; GFX6-NEXT: s_mov_b32 s4, 0x86000
	; GFX6-NEXT: buffer_store_dword v4, off, s[40:43], s4 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v0, off, s[40:43], s4 ; 4-byte Folded Spill
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dword v4, off, s[40:43], 0			; GFX6-NEXT: buffer_load_dword v0, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: s_mov_b64 exec, s[0:1]			; GFX6-NEXT: s_mov_b64 exec, s[0:1]
	; GFX6-NEXT: s_mov_b64 s[36:37], exec			; GFX6-NEXT: s_mov_b64 s[36:37], exec
	; GFX6-NEXT: s_mov_b64 exec, 0xff			; GFX6-NEXT: s_mov_b64 exec, 0xff
	; GFX6-NEXT: s_mov_b32 s38, 0x85400			; GFX6-NEXT: s_mov_b32 s38, 0x85400
	; GFX6-NEXT: buffer_store_dword v4, off, s[40:43], 0			; GFX6-NEXT: buffer_store_dword v0, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dword v4, off, s[40:43], s38 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v0, off, s[40:43], s38 ; 4-byte Folded Reload
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: v_readlane_b32 s0, v4, 0			; GFX6-NEXT: v_readlane_b32 s0, v0, 0
	; GFX6-NEXT: v_readlane_b32 s1, v4, 1			; GFX6-NEXT: v_readlane_b32 s1, v0, 1
	; GFX6-NEXT: v_readlane_b32 s2, v4, 2			; GFX6-NEXT: v_readlane_b32 s2, v0, 2
	; GFX6-NEXT: v_readlane_b32 s3, v4, 3			; GFX6-NEXT: v_readlane_b32 s3, v0, 3
	; GFX6-NEXT: v_readlane_b32 s4, v4, 4			; GFX6-NEXT: v_readlane_b32 s4, v0, 4
	; GFX6-NEXT: v_readlane_b32 s5, v4, 5			; GFX6-NEXT: v_readlane_b32 s5, v0, 5
	; GFX6-NEXT: v_readlane_b32 s6, v4, 6			; GFX6-NEXT: v_readlane_b32 s6, v0, 6
	; GFX6-NEXT: v_readlane_b32 s7, v4, 7			; GFX6-NEXT: v_readlane_b32 s7, v0, 7
	; GFX6-NEXT: buffer_load_dword v4, off, s[40:43], 0			; GFX6-NEXT: buffer_load_dword v0, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: s_mov_b64 exec, s[36:37]			; GFX6-NEXT: s_mov_b64 exec, s[36:37]
	; GFX6-NEXT: s_mov_b64 s[44:45], exec			; GFX6-NEXT: s_mov_b64 s[44:45], exec
	; GFX6-NEXT: s_mov_b64 exec, 15			; GFX6-NEXT: s_mov_b64 exec, 15
	; GFX6-NEXT: v_mov_b32_e32 v7, 0x2180			; GFX6-NEXT: v_mov_b32_e32 v1, 0x2170
	; GFX6-NEXT: buffer_store_dword v4, off, s[40:43], 0			; GFX6-NEXT: buffer_store_dword v0, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dword v4, v7, s[40:43], 0 offen ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v0, v1, s[40:43], 0 offen ; 4-byte Folded Reload
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: v_readlane_b32 s36, v4, 0			; GFX6-NEXT: v_readlane_b32 s36, v0, 0
	; GFX6-NEXT: v_readlane_b32 s37, v4, 1			; GFX6-NEXT: v_readlane_b32 s37, v0, 1
	; GFX6-NEXT: v_readlane_b32 s38, v4, 2			; GFX6-NEXT: v_readlane_b32 s38, v0, 2
	; GFX6-NEXT: v_readlane_b32 s39, v4, 3			; GFX6-NEXT: v_readlane_b32 s39, v0, 3
	; GFX6-NEXT: buffer_load_dword v4, off, s[40:43], 0			; GFX6-NEXT: buffer_load_dword v0, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: s_mov_b64 exec, s[44:45]			; GFX6-NEXT: s_mov_b64 exec, s[44:45]
	; GFX6-NEXT: s_mov_b64 vcc, s[34:35]			; GFX6-NEXT: s_mov_b64 vcc, s[34:35]
	; GFX6-NEXT: s_mov_b64 s[44:45], exec			; GFX6-NEXT: s_mov_b64 s[44:45], exec
	; GFX6-NEXT: s_mov_b64 exec, 3			; GFX6-NEXT: s_mov_b64 exec, 3
	; GFX6-NEXT: v_mov_b32_e32 v7, 0x2190			; GFX6-NEXT: v_mov_b32_e32 v1, 0x2180
	; GFX6-NEXT: buffer_store_dword v4, off, s[40:43], 0			; GFX6-NEXT: buffer_store_dword v0, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dword v4, v7, s[40:43], 0 offen ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v0, v1, s[40:43], 0 offen ; 4-byte Folded Reload
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: v_readlane_b32 s34, v4, 0			; GFX6-NEXT: v_readlane_b32 s34, v0, 0
	; GFX6-NEXT: v_readlane_b32 s35, v4, 1			; GFX6-NEXT: v_readlane_b32 s35, v0, 1
	; GFX6-NEXT: buffer_load_dword v4, off, s[40:43], 0			; GFX6-NEXT: buffer_load_dword v0, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: s_mov_b64 exec, s[44:45]			; GFX6-NEXT: s_mov_b64 exec, s[44:45]
	; GFX6-NEXT: ;;#ASMSTART			; GFX6-NEXT: ;;#ASMSTART
	; GFX6-NEXT: ; use s[8:15],s[16:23],s[24:31],s[0:7],s[36:39],s[34:35]			; GFX6-NEXT: ; use s[8:15],s[16:23],s[24:31],s[0:7],s[36:39],s[34:35]
	; GFX6-NEXT: ;;#ASMEND			; GFX6-NEXT: ;;#ASMEND
	; GFX6-NEXT: s_mov_b64 s[34:35], vcc			; GFX6-NEXT: s_mov_b64 s[34:35], vcc
	; GFX6-NEXT: s_mov_b64 s[4:5], exec			; GFX6-NEXT: s_mov_b64 s[4:5], exec
	; GFX6-NEXT: s_mov_b64 exec, 15			; GFX6-NEXT: s_mov_b64 exec, 15
	; GFX6-NEXT: s_mov_b32 s6, 0x85c00			; GFX6-NEXT: s_mov_b32 s6, 0x86200
	; GFX6-NEXT: buffer_store_dword v4, off, s[40:43], 0			; GFX6-NEXT: buffer_store_dword v0, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dword v4, off, s[40:43], s6 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v0, off, s[40:43], s6 ; 4-byte Folded Reload
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: v_readlane_b32 s0, v4, 0			; GFX6-NEXT: v_readlane_b32 s0, v0, 0
	; GFX6-NEXT: v_readlane_b32 s1, v4, 1			; GFX6-NEXT: v_readlane_b32 s1, v0, 1
	; GFX6-NEXT: v_readlane_b32 s2, v4, 2			; GFX6-NEXT: v_readlane_b32 s2, v0, 2
	; GFX6-NEXT: v_readlane_b32 s3, v4, 3			; GFX6-NEXT: v_readlane_b32 s3, v0, 3
	; GFX6-NEXT: buffer_load_dword v4, off, s[40:43], 0			; GFX6-NEXT: buffer_load_dword v0, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: s_mov_b64 exec, s[4:5]			; GFX6-NEXT: s_mov_b64 exec, s[4:5]
	; GFX6-NEXT: s_mov_b32 s2, 0x83c00
	; GFX6-NEXT: buffer_store_dword v0, off, s[40:43], s2 ; 4-byte Folded Spill
	; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: buffer_store_dword v1, off, s[40:43], s2 offset:4 ; 4-byte Folded Spill
	; GFX6-NEXT: buffer_store_dword v2, off, s[40:43], s2 offset:8 ; 4-byte Folded Spill
	; GFX6-NEXT: buffer_store_dword v3, off, s[40:43], s2 offset:12 ; 4-byte Folded Spill
	; GFX6-NEXT: s_mov_b32 s2, 0x84400
	; GFX6-NEXT: buffer_store_dword v13, off, s[40:43], s2 ; 4-byte Folded Spill
	; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: buffer_store_dword v14, off, s[40:43], s2 offset:4 ; 4-byte Folded Spill
	; GFX6-NEXT: buffer_store_dword v15, off, s[40:43], s2 offset:8 ; 4-byte Folded Spill
	; GFX6-NEXT: buffer_store_dword v16, off, s[40:43], s2 offset:12 ; 4-byte Folded Spill
	; GFX6-NEXT: s_mov_b32 s2, 0x84c00
	; GFX6-NEXT: buffer_store_dword v17, off, s[40:43], s2 ; 4-byte Folded Spill
	; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: buffer_store_dword v18, off, s[40:43], s2 offset:4 ; 4-byte Folded Spill
	; GFX6-NEXT: buffer_store_dword v19, off, s[40:43], s2 offset:8 ; 4-byte Folded Spill
	; GFX6-NEXT: buffer_store_dword v20, off, s[40:43], s2 offset:12 ; 4-byte Folded Spill
	; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: ;;#ASMSTART			; GFX6-NEXT: ;;#ASMSTART
	; GFX6-NEXT: ;;#ASMEND			; GFX6-NEXT: ;;#ASMEND
				; GFX6-NEXT: s_mov_b32 s2, 0x84c00
	; GFX6-NEXT: buffer_load_dword v17, off, s[40:43], s2 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v17, off, s[40:43], s2 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v18, off, s[40:43], s2 offset:4 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v18, off, s[40:43], s2 offset:4 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v19, off, s[40:43], s2 offset:8 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v19, off, s[40:43], s2 offset:8 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v20, off, s[40:43], s2 offset:12 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v20, off, s[40:43], s2 offset:12 ; 4-byte Folded Reload
	; GFX6-NEXT: s_mov_b32 s2, 0x84400			; GFX6-NEXT: s_mov_b32 s2, 0x84400
	; GFX6-NEXT: buffer_load_dword v13, off, s[40:43], s2 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v13, off, s[40:43], s2 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v14, off, s[40:43], s2 offset:4 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v14, off, s[40:43], s2 offset:4 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v15, off, s[40:43], s2 offset:8 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v15, off, s[40:43], s2 offset:8 ; 4-byte Folded Reload
	▲ Show 20 Lines • Show All 146 Lines • ▼ Show 20 Lines
	; GFX9-FLATSCR-NEXT: s_load_dwordx4 s[36:39], s[0:1], 0x24			; GFX9-FLATSCR-NEXT: s_load_dwordx4 s[36:39], s[0:1], 0x24
	; GFX9-FLATSCR-NEXT: v_mbcnt_lo_u32_b32 v0, -1, 0			; GFX9-FLATSCR-NEXT: v_mbcnt_lo_u32_b32 v0, -1, 0
	; GFX9-FLATSCR-NEXT: v_mbcnt_hi_u32_b32 v5, -1, v0			; GFX9-FLATSCR-NEXT: v_mbcnt_hi_u32_b32 v5, -1, v0
	; GFX9-FLATSCR-NEXT: v_lshlrev_b32_e32 v0, 8, v5			; GFX9-FLATSCR-NEXT: v_lshlrev_b32_e32 v0, 8, v5
	; GFX9-FLATSCR-NEXT: s_add_u32 flat_scratch_lo, s2, s5			; GFX9-FLATSCR-NEXT: s_add_u32 flat_scratch_lo, s2, s5
	; GFX9-FLATSCR-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-FLATSCR-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-FLATSCR-NEXT: global_load_dwordx4 v[1:4], v0, s[38:39] offset:240			; GFX9-FLATSCR-NEXT: global_load_dwordx4 v[1:4], v0, s[38:39] offset:240
	; GFX9-FLATSCR-NEXT: s_addc_u32 flat_scratch_hi, s3, 0			; GFX9-FLATSCR-NEXT: s_addc_u32 flat_scratch_hi, s3, 0
	; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x20b0			; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x20c0
	; GFX9-FLATSCR-NEXT: v_mov_b32_e32 v6, 0			; GFX9-FLATSCR-NEXT: v_mov_b32_e32 v6, 0
	; GFX9-FLATSCR-NEXT: v_mov_b32_e32 v7, 1			; GFX9-FLATSCR-NEXT: v_mov_b32_e32 v7, 1
	; GFX9-FLATSCR-NEXT: s_waitcnt vmcnt(0)			; GFX9-FLATSCR-NEXT: s_waitcnt vmcnt(0)
	; GFX9-FLATSCR-NEXT: scratch_store_dwordx4 off, v[1:4], s0 ; 16-byte Folded Spill			; GFX9-FLATSCR-NEXT: scratch_store_dwordx4 off, v[1:4], s0 ; 16-byte Folded Spill
	; GFX9-FLATSCR-NEXT: global_load_dwordx4 v[8:11], v0, s[38:39] offset:224			; GFX9-FLATSCR-NEXT: global_load_dwordx4 v[8:11], v0, s[38:39] offset:224
	; GFX9-FLATSCR-NEXT: global_load_dwordx4 v[1:4], v0, s[38:39] offset:208			; GFX9-FLATSCR-NEXT: global_load_dwordx4 v[1:4], v0, s[38:39] offset:208
	; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x20a0			; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x20b0
	; GFX9-FLATSCR-NEXT: s_waitcnt vmcnt(0)			; GFX9-FLATSCR-NEXT: s_waitcnt vmcnt(0)
	; GFX9-FLATSCR-NEXT: scratch_store_dwordx4 off, v[1:4], s0 ; 16-byte Folded Spill			; GFX9-FLATSCR-NEXT: scratch_store_dwordx4 off, v[1:4], s0 ; 16-byte Folded Spill
	; GFX9-FLATSCR-NEXT: global_load_dwordx4 v[20:23], v0, s[38:39] offset:192			; GFX9-FLATSCR-NEXT: global_load_dwordx4 v[20:23], v0, s[38:39] offset:192
	; GFX9-FLATSCR-NEXT: global_load_dwordx4 v[1:4], v0, s[38:39] offset:176			; GFX9-FLATSCR-NEXT: global_load_dwordx4 v[1:4], v0, s[38:39] offset:176
	; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x2090			; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x20a0
	; GFX9-FLATSCR-NEXT: s_waitcnt vmcnt(0)			; GFX9-FLATSCR-NEXT: s_waitcnt vmcnt(0)
	; GFX9-FLATSCR-NEXT: scratch_store_dwordx4 off, v[1:4], s0 ; 16-byte Folded Spill			; GFX9-FLATSCR-NEXT: scratch_store_dwordx4 off, v[1:4], s0 ; 16-byte Folded Spill
	; GFX9-FLATSCR-NEXT: global_load_dwordx4 v[16:19], v0, s[38:39] offset:160			; GFX9-FLATSCR-NEXT: global_load_dwordx4 v[16:19], v0, s[38:39] offset:160
	; GFX9-FLATSCR-NEXT: global_load_dwordx4 v[1:4], v0, s[38:39] offset:144			; GFX9-FLATSCR-NEXT: global_load_dwordx4 v[1:4], v0, s[38:39] offset:144
	; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x2080			; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x2080
	; GFX9-FLATSCR-NEXT: s_waitcnt vmcnt(0)			; GFX9-FLATSCR-NEXT: s_waitcnt vmcnt(0)
	; GFX9-FLATSCR-NEXT: scratch_store_dwordx4 off, v[1:4], s0 ; 16-byte Folded Spill			; GFX9-FLATSCR-NEXT: scratch_store_dwordx4 off, v[1:4], s0 ; 16-byte Folded Spill
	; GFX9-FLATSCR-NEXT: global_load_dwordx4 v[1:4], v0, s[38:39] offset:128			; GFX9-FLATSCR-NEXT: global_load_dwordx4 v[1:4], v0, s[38:39] offset:128
	; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x20c0			; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x2090
	; GFX9-FLATSCR-NEXT: s_waitcnt vmcnt(0)			; GFX9-FLATSCR-NEXT: s_waitcnt vmcnt(0)
	; GFX9-FLATSCR-NEXT: scratch_store_dwordx4 off, v[1:4], s0 ; 16-byte Folded Spill			; GFX9-FLATSCR-NEXT: scratch_store_dwordx4 off, v[1:4], s0 ; 16-byte Folded Spill
	; GFX9-FLATSCR-NEXT: global_load_dwordx4 v[1:4], v0, s[38:39] offset:112			; GFX9-FLATSCR-NEXT: global_load_dwordx4 v[1:4], v0, s[38:39] offset:112
	; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x2060			; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x2060
	; GFX9-FLATSCR-NEXT: s_waitcnt vmcnt(0)			; GFX9-FLATSCR-NEXT: s_waitcnt vmcnt(0)
	; GFX9-FLATSCR-NEXT: scratch_store_dwordx4 off, v[1:4], s0 ; 16-byte Folded Spill			; GFX9-FLATSCR-NEXT: scratch_store_dwordx4 off, v[1:4], s0 ; 16-byte Folded Spill
	; GFX9-FLATSCR-NEXT: global_load_dwordx4 v[1:4], v0, s[38:39] offset:96			; GFX9-FLATSCR-NEXT: global_load_dwordx4 v[1:4], v0, s[38:39] offset:96
	; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x2050			; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x2050
	▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
	; GFX9-FLATSCR-NEXT: ; def s[38:39]			; GFX9-FLATSCR-NEXT: ; def s[38:39]
	; GFX9-FLATSCR-NEXT: ;;#ASMEND			; GFX9-FLATSCR-NEXT: ;;#ASMEND
	; GFX9-FLATSCR-NEXT: ;;#ASMSTART			; GFX9-FLATSCR-NEXT: ;;#ASMSTART
	; GFX9-FLATSCR-NEXT: ; def s33			; GFX9-FLATSCR-NEXT: ; def s33
	; GFX9-FLATSCR-NEXT: ;;#ASMEND			; GFX9-FLATSCR-NEXT: ;;#ASMEND
	; GFX9-FLATSCR-NEXT: s_and_saveexec_b64 s[34:35], vcc			; GFX9-FLATSCR-NEXT: s_and_saveexec_b64 s[34:35], vcc
	; GFX9-FLATSCR-NEXT: s_cbranch_execz .LBB1_2			; GFX9-FLATSCR-NEXT: s_cbranch_execz .LBB1_2
	; GFX9-FLATSCR-NEXT: ; %bb.1: ; %bb0			; GFX9-FLATSCR-NEXT: ; %bb.1: ; %bb0
				; GFX9-FLATSCR-NEXT: v_mov_b32_e32 v0, v16
	; GFX9-FLATSCR-NEXT: ;;#ASMSTART			; GFX9-FLATSCR-NEXT: ;;#ASMSTART
	; GFX9-FLATSCR-NEXT: ; use s[0:7],s[8:15],s[16:23],s[24:31],s[40:43],s[38:39]			; GFX9-FLATSCR-NEXT: ; use s[0:7],s[8:15],s[16:23],s[24:31],s[40:43],s[38:39]
	; GFX9-FLATSCR-NEXT: ;;#ASMEND			; GFX9-FLATSCR-NEXT: ;;#ASMEND
	; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x20d0			; GFX9-FLATSCR-NEXT: v_mov_b32_e32 v1, v17
	; GFX9-FLATSCR-NEXT: scratch_store_dwordx4 off, v[0:3], s0 ; 16-byte Folded Spill			; GFX9-FLATSCR-NEXT: v_mov_b32_e32 v2, v18
	; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x20e0			; GFX9-FLATSCR-NEXT: v_mov_b32_e32 v3, v19
	; GFX9-FLATSCR-NEXT: scratch_store_dwordx4 off, v[16:19], s0 ; 16-byte Folded Spill
	; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x20f0
	; GFX9-FLATSCR-NEXT: scratch_store_dwordx4 off, v[20:23], s0 ; 16-byte Folded Spill
	; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x2100
	; GFX9-FLATSCR-NEXT: scratch_store_dwordx4 off, v[8:11], s0 ; 16-byte Folded Spill
	; GFX9-FLATSCR-NEXT: s_nop 0
	; GFX9-FLATSCR-NEXT: ;;#ASMSTART			; GFX9-FLATSCR-NEXT: ;;#ASMSTART
	; GFX9-FLATSCR-NEXT: ;;#ASMEND			; GFX9-FLATSCR-NEXT: ;;#ASMEND
	; GFX9-FLATSCR-NEXT: scratch_load_dwordx4 v[8:11], off, s0 ; 16-byte Folded Reload
	; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x20f0			; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x20f0
	; GFX9-FLATSCR-NEXT: scratch_load_dwordx4 v[20:23], off, s0 ; 16-byte Folded Reload			; GFX9-FLATSCR-NEXT: scratch_load_dwordx4 v[8:11], off, s0 ; 16-byte Folded Reload
	; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x20e0			; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x20e0
	; GFX9-FLATSCR-NEXT: scratch_load_dwordx4 v[16:19], off, s0 ; 16-byte Folded Reload			; GFX9-FLATSCR-NEXT: scratch_load_dwordx4 v[20:23], off, s0 ; 16-byte Folded Reload
				; GFX9-FLATSCR-NEXT: v_mov_b32_e32 v19, v3
	; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x20d0			; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x20d0
				; GFX9-FLATSCR-NEXT: v_mov_b32_e32 v18, v2
				; GFX9-FLATSCR-NEXT: v_mov_b32_e32 v17, v1
				; GFX9-FLATSCR-NEXT: v_mov_b32_e32 v16, v0
	; GFX9-FLATSCR-NEXT: scratch_load_dwordx4 v[0:3], off, s0 ; 16-byte Folded Reload			; GFX9-FLATSCR-NEXT: scratch_load_dwordx4 v[0:3], off, s0 ; 16-byte Folded Reload
	; GFX9-FLATSCR-NEXT: ;;#ASMSTART			; GFX9-FLATSCR-NEXT: ;;#ASMSTART
	; GFX9-FLATSCR-NEXT: ;;#ASMEND			; GFX9-FLATSCR-NEXT: ;;#ASMEND
	; GFX9-FLATSCR-NEXT: ;;#ASMSTART			; GFX9-FLATSCR-NEXT: ;;#ASMSTART
	; GFX9-FLATSCR-NEXT: ;;#ASMEND			; GFX9-FLATSCR-NEXT: ;;#ASMEND
	; GFX9-FLATSCR-NEXT: ;;#ASMSTART			; GFX9-FLATSCR-NEXT: ;;#ASMSTART
	; GFX9-FLATSCR-NEXT: ;;#ASMEND			; GFX9-FLATSCR-NEXT: ;;#ASMEND
	; GFX9-FLATSCR-NEXT: ;;#ASMSTART			; GFX9-FLATSCR-NEXT: ;;#ASMSTART
	; GFX9-FLATSCR-NEXT: ;;#ASMEND			; GFX9-FLATSCR-NEXT: ;;#ASMEND
	; GFX9-FLATSCR-NEXT: ;;#ASMSTART			; GFX9-FLATSCR-NEXT: ;;#ASMSTART
	; GFX9-FLATSCR-NEXT: ;;#ASMEND			; GFX9-FLATSCR-NEXT: ;;#ASMEND
	; GFX9-FLATSCR-NEXT: ;;#ASMSTART			; GFX9-FLATSCR-NEXT: ;;#ASMSTART
	; GFX9-FLATSCR-NEXT: ;;#ASMEND			; GFX9-FLATSCR-NEXT: ;;#ASMEND
	; GFX9-FLATSCR-NEXT: .LBB1_2: ; %ret			; GFX9-FLATSCR-NEXT: .LBB1_2: ; %ret
	; GFX9-FLATSCR-NEXT: s_or_b64 exec, exec, s[34:35]			; GFX9-FLATSCR-NEXT: s_or_b64 exec, exec, s[34:35]
	; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x20b0			; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x20c0
	; GFX9-FLATSCR-NEXT: scratch_load_dwordx4 v[12:15], off, s0 ; 16-byte Folded Reload			; GFX9-FLATSCR-NEXT: scratch_load_dwordx4 v[12:15], off, s0 ; 16-byte Folded Reload
	; GFX9-FLATSCR-NEXT: v_lshlrev_b64 v[4:5], 8, v[5:6]			; GFX9-FLATSCR-NEXT: v_lshlrev_b64 v[4:5], 8, v[5:6]
	; GFX9-FLATSCR-NEXT: v_mov_b32_e32 v6, s37			; GFX9-FLATSCR-NEXT: v_mov_b32_e32 v6, s37
	; GFX9-FLATSCR-NEXT: v_add_co_u32_e32 v4, vcc, s36, v4			; GFX9-FLATSCR-NEXT: v_add_co_u32_e32 v4, vcc, s36, v4
	; GFX9-FLATSCR-NEXT: v_addc_co_u32_e32 v5, vcc, v6, v5, vcc			; GFX9-FLATSCR-NEXT: v_addc_co_u32_e32 v5, vcc, v6, v5, vcc
	; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x20a0			; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x20b0
	; GFX9-FLATSCR-NEXT: s_waitcnt vmcnt(0)			; GFX9-FLATSCR-NEXT: s_waitcnt vmcnt(0)
	; GFX9-FLATSCR-NEXT: global_store_dwordx4 v[4:5], v[12:15], off offset:240			; GFX9-FLATSCR-NEXT: global_store_dwordx4 v[4:5], v[12:15], off offset:240
	; GFX9-FLATSCR-NEXT: global_store_dwordx4 v[4:5], v[8:11], off offset:224			; GFX9-FLATSCR-NEXT: global_store_dwordx4 v[4:5], v[8:11], off offset:224
	; GFX9-FLATSCR-NEXT: scratch_load_dwordx4 v[6:9], off, s0 ; 16-byte Folded Reload			; GFX9-FLATSCR-NEXT: scratch_load_dwordx4 v[6:9], off, s0 ; 16-byte Folded Reload
	; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x2090			; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x20a0
	; GFX9-FLATSCR-NEXT: s_waitcnt vmcnt(0)			; GFX9-FLATSCR-NEXT: s_waitcnt vmcnt(0)
	; GFX9-FLATSCR-NEXT: global_store_dwordx4 v[4:5], v[6:9], off offset:208			; GFX9-FLATSCR-NEXT: global_store_dwordx4 v[4:5], v[6:9], off offset:208
	; GFX9-FLATSCR-NEXT: global_store_dwordx4 v[4:5], v[20:23], off offset:192			; GFX9-FLATSCR-NEXT: global_store_dwordx4 v[4:5], v[20:23], off offset:192
	; GFX9-FLATSCR-NEXT: scratch_load_dwordx4 v[20:23], off, s0 ; 16-byte Folded Reload			; GFX9-FLATSCR-NEXT: scratch_load_dwordx4 v[20:23], off, s0 ; 16-byte Folded Reload
	; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x2080			; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x2080
	; GFX9-FLATSCR-NEXT: s_waitcnt vmcnt(0)			; GFX9-FLATSCR-NEXT: s_waitcnt vmcnt(0)
	; GFX9-FLATSCR-NEXT: global_store_dwordx4 v[4:5], v[20:23], off offset:176			; GFX9-FLATSCR-NEXT: global_store_dwordx4 v[4:5], v[20:23], off offset:176
	; GFX9-FLATSCR-NEXT: global_store_dwordx4 v[4:5], v[16:19], off offset:160			; GFX9-FLATSCR-NEXT: global_store_dwordx4 v[4:5], v[16:19], off offset:160
	; GFX9-FLATSCR-NEXT: scratch_load_dwordx4 v[16:19], off, s0 ; 16-byte Folded Reload			; GFX9-FLATSCR-NEXT: scratch_load_dwordx4 v[16:19], off, s0 ; 16-byte Folded Reload
	; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x20c0			; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x2090
	; GFX9-FLATSCR-NEXT: scratch_load_dwordx4 v[6:9], off, s0 ; 16-byte Folded Reload			; GFX9-FLATSCR-NEXT: scratch_load_dwordx4 v[6:9], off, s0 ; 16-byte Folded Reload
	; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x2060			; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x2060
	; GFX9-FLATSCR-NEXT: scratch_load_dwordx4 v[12:15], off, s0 ; 16-byte Folded Reload			; GFX9-FLATSCR-NEXT: scratch_load_dwordx4 v[12:15], off, s0 ; 16-byte Folded Reload
	; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x2050			; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x2050
	; GFX9-FLATSCR-NEXT: s_waitcnt vmcnt(2)			; GFX9-FLATSCR-NEXT: s_waitcnt vmcnt(2)
	; GFX9-FLATSCR-NEXT: global_store_dwordx4 v[4:5], v[16:19], off offset:144			; GFX9-FLATSCR-NEXT: global_store_dwordx4 v[4:5], v[16:19], off offset:144
	; GFX9-FLATSCR-NEXT: s_waitcnt vmcnt(2)			; GFX9-FLATSCR-NEXT: s_waitcnt vmcnt(2)
	; GFX9-FLATSCR-NEXT: global_store_dwordx4 v[4:5], v[6:9], off offset:128			; GFX9-FLATSCR-NEXT: global_store_dwordx4 v[4:5], v[6:9], off offset:128
	▲ Show 20 Lines • Show All 78 Lines • ▼ Show 20 Lines
	; GFX10-FLATSCR-NEXT: ; def s[34:35]			; GFX10-FLATSCR-NEXT: ; def s[34:35]
	; GFX10-FLATSCR-NEXT: ;;#ASMEND			; GFX10-FLATSCR-NEXT: ;;#ASMEND
	; GFX10-FLATSCR-NEXT: ;;#ASMSTART			; GFX10-FLATSCR-NEXT: ;;#ASMSTART
	; GFX10-FLATSCR-NEXT: ; def s38			; GFX10-FLATSCR-NEXT: ; def s38
	; GFX10-FLATSCR-NEXT: ;;#ASMEND			; GFX10-FLATSCR-NEXT: ;;#ASMEND
	; GFX10-FLATSCR-NEXT: v_cmpx_eq_u32_e32 0, v0			; GFX10-FLATSCR-NEXT: v_cmpx_eq_u32_e32 0, v0
	; GFX10-FLATSCR-NEXT: s_cbranch_execz .LBB1_2			; GFX10-FLATSCR-NEXT: s_cbranch_execz .LBB1_2
	; GFX10-FLATSCR-NEXT: ; %bb.1: ; %bb0			; GFX10-FLATSCR-NEXT: ; %bb.1: ; %bb0
	; GFX10-FLATSCR-NEXT: ;;#ASMSTART
	; GFX10-FLATSCR-NEXT: ; use s[0:7],s[8:15],s[16:23],s[24:31],s[40:43],s[34:35]
	; GFX10-FLATSCR-NEXT: ;;#ASMEND
	; GFX10-FLATSCR-NEXT: s_movk_i32 s0, 0x2010
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v88, v59			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v88, v59
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v92, v63			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v92, v63
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v87, v58			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v87, v58
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v86, v57			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v86, v57
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v85, v56			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v85, v56
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v91, v62			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v91, v62
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v90, v61			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v90, v61
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v89, v60			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v89, v60
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v60, v35			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v60, v35
	; GFX10-FLATSCR-NEXT: scratch_store_dwordx4 off, v[64:67], s0 ; 16-byte Folded Spill
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v68, v39			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v68, v39
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v59, v34			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v59, v34
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v58, v33			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v58, v33
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v57, v32			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v57, v32
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v67, v38			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v67, v38
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v66, v37			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v66, v37
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v65, v36			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v65, v36
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v36, v11			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v36, v11
	Show All 32 Lines
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v42, v17			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v42, v17
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v47, v22			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v47, v22
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v46, v21			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v46, v21
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v51, v26			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v51, v26
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v50, v25			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v50, v25
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v55, v30			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v55, v30
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v54, v29			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v54, v29
	; GFX10-FLATSCR-NEXT: ;;#ASMSTART			; GFX10-FLATSCR-NEXT: ;;#ASMSTART
				; GFX10-FLATSCR-NEXT: ; use s[0:7],s[8:15],s[16:23],s[24:31],s[40:43],s[34:35]
				; GFX10-FLATSCR-NEXT: ;;#ASMEND
				; GFX10-FLATSCR-NEXT: ;;#ASMSTART
	; GFX10-FLATSCR-NEXT: ;;#ASMEND			; GFX10-FLATSCR-NEXT: ;;#ASMEND
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v8, v33			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v8, v33
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v28, v53			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v28, v53
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v24, v49			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v24, v49
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v20, v45			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v20, v45
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v16, v41			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v16, v41
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v12, v37			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v12, v37
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v9, v34			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v9, v34
	Show All 15 Lines
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v13, v38			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v13, v38
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v14, v39			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v14, v39
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v15, v40			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v15, v40
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v33, v58			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v33, v58
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v34, v59			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v34, v59
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v35, v60			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v35, v60
	; GFX10-FLATSCR-NEXT: ;;#ASMSTART			; GFX10-FLATSCR-NEXT: ;;#ASMSTART
	; GFX10-FLATSCR-NEXT: ;;#ASMEND			; GFX10-FLATSCR-NEXT: ;;#ASMEND
				; GFX10-FLATSCR-NEXT: s_movk_i32 s0, 0x2010
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v36, v65			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v36, v65
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v37, v66			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v37, v66
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v38, v67			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v38, v67
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v39, v68			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v39, v68
	; GFX10-FLATSCR-NEXT: scratch_load_dwordx4 v[64:67], off, s0 ; 16-byte Folded Reload			; GFX10-FLATSCR-NEXT: scratch_load_dwordx4 v[64:67], off, s0 ; 16-byte Folded Reload
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v60, v89			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v60, v89
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v56, v85			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v56, v85
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v52, v81			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v52, v81
	▲ Show 20 Lines • Show All 111 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/swdev380865.ll

	Show All 10 Lines
	; spills inside the loop, so we would repeatedly reload the same			; spills inside the loop, so we would repeatedly reload the same
	; values.			; values.

	define amdgpu_kernel void @_Z6kernelILi4000ELi1EEvPd(ptr addrspace(1) %x.coerce) {			define amdgpu_kernel void @_Z6kernelILi4000ELi1EEvPd(ptr addrspace(1) %x.coerce) {
	; CHECK-LABEL: _Z6kernelILi4000ELi1EEvPd:			; CHECK-LABEL: _Z6kernelILi4000ELi1EEvPd:
	; CHECK: ; %bb.0: ; %entry			; CHECK: ; %bb.0: ; %entry
	; CHECK-NEXT: s_mov_b64 s[0:1], 0			; CHECK-NEXT: s_mov_b64 s[0:1], 0
	; CHECK-NEXT: s_load_dword s2, s[0:1], 0x0			; CHECK-NEXT: s_load_dword s2, s[0:1], 0x0
	; CHECK-NEXT: ; implicit-def: $vgpr2			; CHECK-NEXT: s_load_dwordx2 s[6:7], s[0:1], 0x0
	; CHECK-NEXT: ; kill: killed $sgpr0_sgpr1			; CHECK-NEXT: ; kill: killed $sgpr0_sgpr1
	; CHECK-NEXT: s_mov_b32 s7, 0x401c0000
	; CHECK-NEXT: s_mov_b32 s5, 0x40280000
	; CHECK-NEXT: s_waitcnt lgkmcnt(0)
	; CHECK-NEXT: v_writelane_b32 v2, s2, 0
	; CHECK-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
	; CHECK-NEXT: s_mov_b32 s0, 0			; CHECK-NEXT: s_mov_b32 s0, 0
				; CHECK-NEXT: ; implicit-def: $vgpr2
	; CHECK-NEXT: s_mov_b32 s1, 0x40140000			; CHECK-NEXT: s_mov_b32 s1, 0x40140000
	; CHECK-NEXT: s_mov_b32 s1, 0x40180000			; CHECK-NEXT: s_mov_b32 s1, 0x40180000
				; CHECK-NEXT: s_waitcnt lgkmcnt(0)
				; CHECK-NEXT: v_writelane_b32 v2, s2, 0
	; CHECK-NEXT: v_writelane_b32 v2, s0, 1			; CHECK-NEXT: v_writelane_b32 v2, s0, 1
	; CHECK-NEXT: v_writelane_b32 v2, s1, 2			; CHECK-NEXT: v_writelane_b32 v2, s1, 2
	; CHECK-NEXT: s_mov_b32 s1, 0x40220000			; CHECK-NEXT: s_mov_b32 s1, 0x40240000
	; CHECK-NEXT: v_writelane_b32 v2, s0, 3			; CHECK-NEXT: v_writelane_b32 v2, s0, 3
				; CHECK-NEXT: v_mov_b32_e32 v0, s6
	; CHECK-NEXT: v_writelane_b32 v2, s1, 4			; CHECK-NEXT: v_writelane_b32 v2, s1, 4
	; CHECK-NEXT: s_mov_b32 s1, 0x40240000			; CHECK-NEXT: s_mov_b32 s3, 0x40260000
	; CHECK-NEXT: v_writelane_b32 v2, s0, 5			; CHECK-NEXT: s_mov_b32 s5, 0x40280000
	; CHECK-NEXT: v_writelane_b32 v2, s1, 6			; CHECK-NEXT: v_mov_b32_e32 v1, s7
	; CHECK-NEXT: s_mov_b32 s1, 0x40260000
	; CHECK-NEXT: v_writelane_b32 v2, s0, 7
	; CHECK-NEXT: s_waitcnt lgkmcnt(0)
	; CHECK-NEXT: v_mov_b32_e32 v0, s2
	; CHECK-NEXT: v_writelane_b32 v2, s1, 8
	; CHECK-NEXT: v_mov_b32_e32 v1, s3
	; CHECK-NEXT: .LBB0_1: ; %for.cond4.preheader			; CHECK-NEXT: .LBB0_1: ; %for.cond4.preheader
	; CHECK-NEXT: ; =>This Inner Loop Header: Depth=1			; CHECK-NEXT: ; =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: v_add_f64 v[0:1], v[0:1], 0			; CHECK-NEXT: v_add_f64 v[0:1], v[0:1], 0
	; CHECK-NEXT: s_mov_b32 s2, 0			; CHECK-NEXT: s_mov_b32 s6, 0
	; CHECK-NEXT: s_mov_b32 s3, 0x40140000			; CHECK-NEXT: s_mov_b32 s7, 0x40140000
	; CHECK-NEXT: v_writelane_b32 v2, s6, 9			; CHECK-NEXT: v_writelane_b32 v2, s0, 5
	; CHECK-NEXT: v_writelane_b32 v2, s7, 10			; CHECK-NEXT: v_add_f64 v[0:1], v[0:1], s[6:7]
	; CHECK-NEXT: v_writelane_b32 v2, s0, 11
	; CHECK-NEXT: v_readlane_b32 s6, v2, 1			; CHECK-NEXT: v_readlane_b32 s6, v2, 1
	; CHECK-NEXT: v_readlane_b32 s7, v2, 2			; CHECK-NEXT: v_readlane_b32 s7, v2, 2
	; CHECK-NEXT: v_add_f64 v[0:1], v[0:1], s[2:3]
	; CHECK-NEXT: s_mov_b32 s1, s7			; CHECK-NEXT: s_mov_b32 s1, s7
	; CHECK-NEXT: s_mov_b32 s0, s2			; CHECK-NEXT: s_mov_b32 s6, 0
	; CHECK-NEXT: v_writelane_b32 v2, s6, 1			; CHECK-NEXT: s_mov_b32 s7, 0x40140000
	; CHECK-NEXT: v_writelane_b32 v2, s7, 2			; CHECK-NEXT: s_mov_b32 s0, s6
	; CHECK-NEXT: v_readlane_b32 s6, v2, 9			; CHECK-NEXT: v_readlane_b32 s6, v2, 6
	; CHECK-NEXT: v_readlane_b32 s7, v2, 10
	; CHECK-NEXT: s_mov_b32 s6, s2
	; CHECK-NEXT: v_add_f64 v[0:1], v[0:1], s[0:1]			; CHECK-NEXT: v_add_f64 v[0:1], v[0:1], s[0:1]
	; CHECK-NEXT: v_readlane_b32 s0, v2, 3
	; CHECK-NEXT: v_readlane_b32 s1, v2, 4
	; CHECK-NEXT: s_mov_b32 s3, s1
	; CHECK-NEXT: s_mov_b32 s0, 0			; CHECK-NEXT: s_mov_b32 s0, 0
				; CHECK-NEXT: v_readlane_b32 s7, v2, 7
	; CHECK-NEXT: s_mov_b32 s1, 0x40140000			; CHECK-NEXT: s_mov_b32 s1, 0x40140000
	; CHECK-NEXT: s_mov_b32 s2, s0			; CHECK-NEXT: s_mov_b32 s6, s0
	; CHECK-NEXT: s_mov_b32 s1, s3
	; CHECK-NEXT: v_add_f64 v[0:1], v[0:1], s[6:7]			; CHECK-NEXT: v_add_f64 v[0:1], v[0:1], s[6:7]
	; CHECK-NEXT: v_writelane_b32 v2, s0, 3			; CHECK-NEXT: v_readlane_b32 s6, v2, 8
	; CHECK-NEXT: v_writelane_b32 v2, s1, 4			; CHECK-NEXT: v_readlane_b32 s7, v2, 9
	; CHECK-NEXT: v_readlane_b32 s0, v2, 5			; CHECK-NEXT: s_mov_b32 s6, s0
	; CHECK-NEXT: v_readlane_b32 s1, v2, 6			; CHECK-NEXT: v_readlane_b32 s0, v2, 3
	; CHECK-NEXT: v_add_f64 v[0:1], v[0:1], s[2:3]			; CHECK-NEXT: v_readlane_b32 s1, v2, 4
	; CHECK-NEXT: s_mov_b32 s3, s1			; CHECK-NEXT: v_add_f64 v[0:1], v[0:1], s[6:7]
	; CHECK-NEXT: s_mov_b32 s0, 0			; CHECK-NEXT: s_mov_b32 s6, 0
	; CHECK-NEXT: s_mov_b32 s1, 0x40140000			; CHECK-NEXT: s_mov_b32 s7, 0x40140000
	; CHECK-NEXT: s_mov_b32 s2, s0			; CHECK-NEXT: s_mov_b32 s0, s6
	; CHECK-NEXT: s_mov_b32 s1, s3			; CHECK-NEXT: s_mov_b32 s2, s6
	; CHECK-NEXT: v_writelane_b32 v2, s0, 5			; CHECK-NEXT: s_mov_b32 s4, s6
	; CHECK-NEXT: v_writelane_b32 v2, s1, 6			; CHECK-NEXT: v_add_f64 v[0:1], v[0:1], s[0:1]
	; CHECK-NEXT: v_add_f64 v[0:1], v[0:1], s[2:3]
	; CHECK-NEXT: v_readlane_b32 s0, v2, 7
	; CHECK-NEXT: v_readlane_b32 s1, v2, 8
	; CHECK-NEXT: s_mov_b32 s3, s1
	; CHECK-NEXT: s_mov_b32 s0, 0
	; CHECK-NEXT: s_mov_b32 s1, 0x40140000
	; CHECK-NEXT: s_mov_b32 s2, s0
	; CHECK-NEXT: s_mov_b32 s1, s3
	; CHECK-NEXT: v_add_f64 v[0:1], v[0:1], s[2:3]
	; CHECK-NEXT: v_writelane_b32 v2, s0, 7
	; CHECK-NEXT: v_writelane_b32 v2, s1, 8
	; CHECK-NEXT: s_mov_b32 s0, 0
	; CHECK-NEXT: s_mov_b32 s1, 0x40140000
	; CHECK-NEXT: s_mov_b32 s4, s0
	; CHECK-NEXT: v_readlane_b32 s0, v2, 0			; CHECK-NEXT: v_readlane_b32 s0, v2, 0
	; CHECK-NEXT: v_readlane_b32 s2, v2, 11			; CHECK-NEXT: v_add_f64 v[0:1], v[0:1], s[2:3]
	; CHECK-NEXT: v_add_f64 v[0:1], v[0:1], s[4:5]			; CHECK-NEXT: v_readlane_b32 s2, v2, 5
	; CHECK-NEXT: s_add_i32 s2, s2, s0			; CHECK-NEXT: s_add_i32 s2, s2, s0
	; CHECK-NEXT: v_writelane_b32 v2, s2, 11			; CHECK-NEXT: v_writelane_b32 v2, s2, 5
	; CHECK-NEXT: v_readlane_b32 s0, v2, 11			; CHECK-NEXT: v_readlane_b32 s0, v2, 5
	; CHECK-NEXT: s_cmpk_lt_i32 s0, 0xa00			; CHECK-NEXT: s_cmpk_lt_i32 s0, 0xa00
				; CHECK-NEXT: v_add_f64 v[0:1], v[0:1], s[4:5]
	; CHECK-NEXT: s_cbranch_scc1 .LBB0_1			; CHECK-NEXT: s_cbranch_scc1 .LBB0_1
	; CHECK-NEXT: ; %bb.2: ; %for.cond.cleanup.loopexit			; CHECK-NEXT: ; %bb.2: ; %for.cond.cleanup.loopexit
	; CHECK-NEXT: v_mov_b32_e32 v3, 0			; CHECK-NEXT: v_mov_b32_e32 v3, 0
	; CHECK-NEXT: v_mov_b32_e32 v4, 0			; CHECK-NEXT: v_mov_b32_e32 v4, 0
	; CHECK-NEXT: global_store_dwordx2 v[3:4], v[0:1], off			; CHECK-NEXT: global_store_dwordx2 v[3:4], v[0:1], off
	; CHECK-NEXT: ; kill: killed $vgpr2			; CHECK-NEXT: ; kill: killed $vgpr2
	; CHECK-NEXT: s_endpgm			; CHECK-NEXT: s_endpgm
	entry:			entry:
	Show All 29 Lines

llvm/test/CodeGen/Hexagon/regalloc-bad-undef.mir

Show First 20 Lines • Show All 147 Lines • ▼ Show 20 Lines	bb.0.entry:
%59 = A2_tfrsi 524288		%59 = A2_tfrsi 524288
undef %32.isub_hi = A2_tfrsi 0		undef %32.isub_hi = A2_tfrsi 0
%8 = S2_extractup undef %9, 6, 25		%8 = S2_extractup undef %9, 6, 25
%47 = A2_tfrpi 2		%47 = A2_tfrpi 2
%13 = A2_tfrpi -1		%13 = A2_tfrpi -1
%13 = S2_asl_r_p_acc %13, %47, %8.isub_lo		%13 = S2_asl_r_p_acc %13, %47, %8.isub_lo
%51 = A2_tfrpi 0		%51 = A2_tfrpi 0

; CHECK: $d2 = S2_extractup undef renamable $d0, 6, 25		; CHECK: $d0 = S2_extractup undef renamable $d0, 6, 25
; CHECK: $d0 = A2_tfrpi 2		; CHECK: $d1 = A2_tfrpi 2
; CHECK: $d13 = A2_tfrpi -1		; CHECK: $d13 = A2_tfrpi -1
; CHECK-NOT: undef $r4		; CHECK-NOT: undef $r4

bb.1.for.body:		bb.1.for.body:
successors: %bb.3.for.end, %bb.2.if.end82		successors: %bb.3.for.end, %bb.2.if.end82

ADJCALLSTACKDOWN 0, 0, implicit-def dead $r29, implicit-def dead $r30, implicit $r31, implicit $r30, implicit $r29		ADJCALLSTACKDOWN 0, 0, implicit-def dead $r29, implicit-def dead $r30, implicit $r31, implicit $r30, implicit $r29
J2_call @lrand48, implicit-def dead $d0, implicit-def dead $d1, implicit-def dead $d2, implicit-def dead $d3, implicit-def dead $d4, implicit-def dead $d5, implicit-def dead $d6, implicit-def dead $d7, implicit-def dead $r28, implicit-def dead $r31, implicit-def dead $p0, implicit-def dead $p1, implicit-def dead $p2, implicit-def dead $p3, implicit-def dead $m0, implicit-def dead $m1, implicit-def dead $lc0, implicit-def dead $lc1, implicit-def dead $sa0, implicit-def dead $sa1, implicit-def dead $usr, implicit-def $usr_ovf, implicit-def dead $cs0, implicit-def dead $cs1, implicit-def dead $w0, implicit-def dead $w1, implicit-def dead $w2, implicit-def dead $w3, implicit-def dead $w4, implicit-def dead $w5, implicit-def dead $w6, implicit-def dead $w7, implicit-def dead $w8, implicit-def dead $w9, implicit-def dead $w10, implicit-def dead $w11, implicit-def dead $w12, implicit-def dead $w13, implicit-def dead $w14, implicit-def dead $w15, implicit-def dead $q0, implicit-def dead $q1, implicit-def dead $q2, implicit-def dead $q3, implicit-def $r0		J2_call @lrand48, implicit-def dead $d0, implicit-def dead $d1, implicit-def dead $d2, implicit-def dead $d3, implicit-def dead $d4, implicit-def dead $d5, implicit-def dead $d6, implicit-def dead $d7, implicit-def dead $r28, implicit-def dead $r31, implicit-def dead $p0, implicit-def dead $p1, implicit-def dead $p2, implicit-def dead $p3, implicit-def dead $m0, implicit-def dead $m1, implicit-def dead $lc0, implicit-def dead $lc1, implicit-def dead $sa0, implicit-def dead $sa1, implicit-def dead $usr, implicit-def $usr_ovf, implicit-def dead $cs0, implicit-def dead $cs1, implicit-def dead $w0, implicit-def dead $w1, implicit-def dead $w2, implicit-def dead $w3, implicit-def dead $w4, implicit-def dead $w5, implicit-def dead $w6, implicit-def dead $w7, implicit-def dead $w8, implicit-def dead $w9, implicit-def dead $w10, implicit-def dead $w11, implicit-def dead $w12, implicit-def dead $w13, implicit-def dead $w14, implicit-def dead $w15, implicit-def dead $q0, implicit-def dead $q1, implicit-def dead $q2, implicit-def dead $q3, implicit-def $r0
Show All 39 Lines

llvm/test/CodeGen/Thumb2/mve-postinc-dct.ll

Show First 20 Lines • Show All 1,018 Lines • ▼ Show 20 Lines	middle.block: ; preds = %vector.body
%add61 = add i32 %k2.0156, 6		%add61 = add i32 %k2.0156, 6
%cmp3 = icmp ult i32 %add61, %sub		%cmp3 = icmp ult i32 %add61, %sub
br i1 %cmp3, label %for.body, label %for.cond.cleanup		br i1 %cmp3, label %for.body, label %for.cond.cleanup
}		}

define void @DCT_mve7(ptr nocapture readonly %S, ptr nocapture readonly %pIn, ptr nocapture %pOut) {		define void @DCT_mve7(ptr nocapture readonly %S, ptr nocapture readonly %pIn, ptr nocapture %pOut) {
; CHECK-LABEL: DCT_mve7:		; CHECK-LABEL: DCT_mve7:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: .save {r4, r5, r6, r7, r8, r9, r10, r11, lr}		; CHECK-NEXT: .save {r4, r5, r6, r7, r8, r9, r11, lr}
; CHECK-NEXT: push.w {r4, r5, r6, r7, r8, r9, r10, r11, lr}		; CHECK-NEXT: push.w {r4, r5, r6, r7, r8, r9, r11, lr}
; CHECK-NEXT: .pad #4
; CHECK-NEXT: sub sp, #4
; CHECK-NEXT: .vsave {d8, d9, d10, d11, d12, d13, d14, d15}		; CHECK-NEXT: .vsave {d8, d9, d10, d11, d12, d13, d14, d15}
; CHECK-NEXT: vpush {d8, d9, d10, d11, d12, d13, d14, d15}		; CHECK-NEXT: vpush {d8, d9, d10, d11, d12, d13, d14, d15}
; CHECK-NEXT: .pad #72		; CHECK-NEXT: .pad #72
; CHECK-NEXT: sub sp, #72		; CHECK-NEXT: sub sp, #72
; CHECK-NEXT: str r1, [sp, #20] @ 4-byte Spill		; CHECK-NEXT: str r1, [sp, #20] @ 4-byte Spill
; CHECK-NEXT: ldr r1, [r0, #4]		; CHECK-NEXT: ldr r1, [r0, #4]
; CHECK-NEXT: subs r1, #7		; CHECK-NEXT: subs r1, #7
; CHECK-NEXT: str r1, [sp, #16] @ 4-byte Spill		; CHECK-NEXT: str r1, [sp, #16] @ 4-byte Spill
Show All 30 Lines
; CHECK-NEXT: ldr r1, [sp, #20] @ 4-byte Reload		; CHECK-NEXT: ldr r1, [sp, #20] @ 4-byte Reload
; CHECK-NEXT: adds r4, r0, #2		; CHECK-NEXT: adds r4, r0, #2
; CHECK-NEXT: ldr r6, [sp, #8] @ 4-byte Reload		; CHECK-NEXT: ldr r6, [sp, #8] @ 4-byte Reload
; CHECK-NEXT: add.w r8, r0, #1		; CHECK-NEXT: add.w r8, r0, #1
; CHECK-NEXT: mov r3, r9		; CHECK-NEXT: mov r3, r9
; CHECK-NEXT: vmov q4, q2		; CHECK-NEXT: vmov q4, q2
; CHECK-NEXT: vmov q5, q2		; CHECK-NEXT: vmov q5, q2
; CHECK-NEXT: vmov q3, q2		; CHECK-NEXT: vmov q3, q2
; CHECK-NEXT: vmov q6, q2
; CHECK-NEXT: vmov q1, q2		; CHECK-NEXT: vmov q1, q2
; CHECK-NEXT: mov r12, r7		; CHECK-NEXT: mov r12, r7
; CHECK-NEXT: vstrw.32 q2, [sp, #56] @ 16-byte Spill		; CHECK-NEXT: vstrw.32 q2, [sp, #56] @ 16-byte Spill
; CHECK-NEXT: dls lr, r6		; CHECK-NEXT: dls lr, r6
; CHECK-NEXT: .LBB6_3: @ %vector.body		; CHECK-NEXT: .LBB6_3: @ %vector.body
; CHECK-NEXT: @ Parent Loop BB6_2 Depth=1		; CHECK-NEXT: @ Parent Loop BB6_2 Depth=1
; CHECK-NEXT: @ => This Inner Loop Header: Depth=2		; CHECK-NEXT: @ => This Inner Loop Header: Depth=2
; CHECK-NEXT: vctp.32 r12		; CHECK-NEXT: vctp.32 r12
; CHECK-NEXT: add.w r10, r3, r5		; CHECK-NEXT: adds r6, r3, r5
; CHECK-NEXT: vpstt		; CHECK-NEXT: vpstt
; CHECK-NEXT: vldrwt.u32 q7, [r1], #16		; CHECK-NEXT: vldrwt.u32 q7, [r1], #16
; CHECK-NEXT: vldrwt.u32 q0, [r3], #16		; CHECK-NEXT: vldrwt.u32 q0, [r3], #16
; CHECK-NEXT: add.w r11, r10, r5		; CHECK-NEXT: add.w r11, r6, r5
; CHECK-NEXT: sub.w r12, r12, #4		; CHECK-NEXT: sub.w r12, r12, #4
; CHECK-NEXT: vpstt		; CHECK-NEXT: vpstt
; CHECK-NEXT: vfmat.f32 q5, q0, q7		; CHECK-NEXT: vfmat.f32 q5, q0, q7
; CHECK-NEXT: vldrwt.u32 q0, [r10]
; CHECK-NEXT: add.w r6, r11, r5
; CHECK-NEXT: vpstt
; CHECK-NEXT: vfmat.f32 q6, q0, q7
; CHECK-NEXT: vldrwt.u32 q0, [r11]		; CHECK-NEXT: vldrwt.u32 q0, [r11]
; CHECK-NEXT: vstrw.32 q6, [sp, #40] @ 16-byte Spill		; CHECK-NEXT: add.w r6, r11, r5
; CHECK-NEXT: vmov q6, q5		; CHECK-NEXT: vmov q6, q5
; CHECK-NEXT: vpst		; CHECK-NEXT: vpst
; CHECK-NEXT: vfmat.f32 q1, q0, q7		; CHECK-NEXT: vfmat.f32 q1, q0, q7
; CHECK-NEXT: vmov q5, q4		; CHECK-NEXT: vmov q5, q4
; CHECK-NEXT: vmov q4, q3		; CHECK-NEXT: vmov q4, q3
; CHECK-NEXT: vmov q3, q1		; CHECK-NEXT: vmov q3, q1
; CHECK-NEXT: vpst		; CHECK-NEXT: vpst
; CHECK-NEXT: vldrwt.u32 q0, [r6]		; CHECK-NEXT: vldrwt.u32 q0, [r6]
▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines
; CHECK-NEXT: ldr r1, [sp, #4] @ 4-byte Reload		; CHECK-NEXT: ldr r1, [sp, #4] @ 4-byte Reload
; CHECK-NEXT: add r9, r1		; CHECK-NEXT: add r9, r1
; CHECK-NEXT: ldr r1, [sp, #16] @ 4-byte Reload		; CHECK-NEXT: ldr r1, [sp, #16] @ 4-byte Reload
; CHECK-NEXT: cmp r0, r1		; CHECK-NEXT: cmp r0, r1
; CHECK-NEXT: blo.w .LBB6_2		; CHECK-NEXT: blo.w .LBB6_2
; CHECK-NEXT: .LBB6_5: @ %for.cond.cleanup		; CHECK-NEXT: .LBB6_5: @ %for.cond.cleanup
; CHECK-NEXT: add sp, #72		; CHECK-NEXT: add sp, #72
; CHECK-NEXT: vpop {d8, d9, d10, d11, d12, d13, d14, d15}		; CHECK-NEXT: vpop {d8, d9, d10, d11, d12, d13, d14, d15}
; CHECK-NEXT: add sp, #4		; CHECK-NEXT: pop.w {r4, r5, r6, r7, r8, r9, r11, pc}
; CHECK-NEXT: pop.w {r4, r5, r6, r7, r8, r9, r10, r11, pc}
entry:		entry:
%NumInputs = getelementptr inbounds %struct.DCT_InstanceTypeDef, ptr %S, i32 0, i32 2		%NumInputs = getelementptr inbounds %struct.DCT_InstanceTypeDef, ptr %S, i32 0, i32 2
%i = load i32, ptr %NumInputs, align 4		%i = load i32, ptr %NumInputs, align 4
%NumFilters = getelementptr inbounds %struct.DCT_InstanceTypeDef, ptr %S, i32 0, i32 1		%NumFilters = getelementptr inbounds %struct.DCT_InstanceTypeDef, ptr %S, i32 0, i32 1
%i1 = load i32, ptr %NumFilters, align 4		%i1 = load i32, ptr %NumFilters, align 4
%pDCTCoefs = getelementptr inbounds %struct.DCT_InstanceTypeDef, ptr %S, i32 0, i32 0		%pDCTCoefs = getelementptr inbounds %struct.DCT_InstanceTypeDef, ptr %S, i32 0, i32 0
%i2 = load ptr, ptr %pDCTCoefs, align 4		%i2 = load ptr, ptr %pDCTCoefs, align 4
%cmp = icmp ugt i32 %i, 1		%cmp = icmp ugt i32 %i, 1
▲ Show 20 Lines • Show All 158 Lines • ▼ Show 20 Lines
; CHECK-NEXT: vmov.i32 q3, #0x0		; CHECK-NEXT: vmov.i32 q3, #0x0
; CHECK-NEXT: ldr r5, [sp, #8] @ 4-byte Reload		; CHECK-NEXT: ldr r5, [sp, #8] @ 4-byte Reload
; CHECK-NEXT: adds r4, r0, #3		; CHECK-NEXT: adds r4, r0, #3
; CHECK-NEXT: str r1, [sp, #24] @ 4-byte Spill		; CHECK-NEXT: str r1, [sp, #24] @ 4-byte Spill
; CHECK-NEXT: add.w r8, r0, #2		; CHECK-NEXT: add.w r8, r0, #2
; CHECK-NEXT: adds r1, r0, #1		; CHECK-NEXT: adds r1, r0, #1
; CHECK-NEXT: mov r3, r12		; CHECK-NEXT: mov r3, r12
; CHECK-NEXT: vmov q5, q3		; CHECK-NEXT: vmov q5, q3
; CHECK-NEXT: vmov q6, q3
; CHECK-NEXT: vmov q4, q3		; CHECK-NEXT: vmov q4, q3
; CHECK-NEXT: vmov q7, q3		; CHECK-NEXT: vmov q7, q3
; CHECK-NEXT: vmov q2, q3		; CHECK-NEXT: vmov q2, q3
; CHECK-NEXT: mov r10, r7		; CHECK-NEXT: mov r10, r7
; CHECK-NEXT: vstrw.32 q3, [sp, #56] @ 16-byte Spill		; CHECK-NEXT: vstrw.32 q3, [sp, #56] @ 16-byte Spill
; CHECK-NEXT: vstrw.32 q3, [sp, #72] @ 16-byte Spill		; CHECK-NEXT: vstrw.32 q3, [sp, #72] @ 16-byte Spill
; CHECK-NEXT: dls lr, r5		; CHECK-NEXT: dls lr, r5
; CHECK-NEXT: .LBB7_3: @ %vector.body		; CHECK-NEXT: .LBB7_3: @ %vector.body
; CHECK-NEXT: @ Parent Loop BB7_2 Depth=1		; CHECK-NEXT: @ Parent Loop BB7_2 Depth=1
; CHECK-NEXT: @ => This Inner Loop Header: Depth=2		; CHECK-NEXT: @ => This Inner Loop Header: Depth=2
; CHECK-NEXT: vctp.32 r10		; CHECK-NEXT: vctp.32 r10
; CHECK-NEXT: add.w r11, r3, r6		; CHECK-NEXT: add.w r11, r3, r6
; CHECK-NEXT: vpstt		; CHECK-NEXT: vpsttt
; CHECK-NEXT: vldrwt.u32 q0, [r9], #16
; CHECK-NEXT: vldrwt.u32 q1, [r3], #16		; CHECK-NEXT: vldrwt.u32 q1, [r3], #16
; CHECK-NEXT: add.w r5, r11, r6		; CHECK-NEXT: vldrwt.u32 q0, [r9], #16
; CHECK-NEXT: sub.w r10, r10, #4
; CHECK-NEXT: vpstt
; CHECK-NEXT: vfmat.f32 q6, q1, q0
; CHECK-NEXT: vldrwt.u32 q1, [r11]		; CHECK-NEXT: vldrwt.u32 q1, [r11]
; CHECK-NEXT: vstrw.32 q6, [sp, #40] @ 16-byte Spill		; CHECK-NEXT: add.w r5, r11, r6
; CHECK-NEXT: vmov q6, q5		; CHECK-NEXT: vmov q6, q5
		; CHECK-NEXT: vmov q5, q3
; CHECK-NEXT: vpst		; CHECK-NEXT: vpst
; CHECK-NEXT: vfmat.f32 q7, q1, q0		; CHECK-NEXT: vfmat.f32 q7, q1, q0
; CHECK-NEXT: vmov q5, q3
; CHECK-NEXT: vmov q3, q4		; CHECK-NEXT: vmov q3, q4
; CHECK-NEXT: vmov q4, q2		; CHECK-NEXT: vmov q4, q2
; CHECK-NEXT: vpst		; CHECK-NEXT: vpst
; CHECK-NEXT: vldrwt.u32 q1, [r5]		; CHECK-NEXT: vldrwt.u32 q1, [r5]
; CHECK-NEXT: vldrw.u32 q2, [sp, #56] @ 16-byte Reload		; CHECK-NEXT: vldrw.u32 q2, [sp, #56] @ 16-byte Reload
; CHECK-NEXT: adds r7, r5, r6		; CHECK-NEXT: adds r7, r5, r6
		; CHECK-NEXT: adds r5, r7, r6
		; CHECK-NEXT: sub.w r10, r10, #4
; CHECK-NEXT: vpstt		; CHECK-NEXT: vpstt
; CHECK-NEXT: vfmat.f32 q2, q1, q0		; CHECK-NEXT: vfmat.f32 q2, q1, q0
; CHECK-NEXT: vldrwt.u32 q1, [r7]		; CHECK-NEXT: vldrwt.u32 q1, [r7]
; CHECK-NEXT: vstrw.32 q2, [sp, #56] @ 16-byte Spill		; CHECK-NEXT: vstrw.32 q2, [sp, #56] @ 16-byte Spill
; CHECK-NEXT: vldrw.u32 q2, [sp, #72] @ 16-byte Reload		; CHECK-NEXT: vldrw.u32 q2, [sp, #72] @ 16-byte Reload
; CHECK-NEXT: adds r5, r7, r6		; CHECK-NEXT: adds r7, r5, r6
; CHECK-NEXT: vpstt		; CHECK-NEXT: vpstt
; CHECK-NEXT: vfmat.f32 q2, q1, q0		; CHECK-NEXT: vfmat.f32 q2, q1, q0
; CHECK-NEXT: vldrwt.u32 q1, [r5]		; CHECK-NEXT: vldrwt.u32 q1, [r5]
; CHECK-NEXT: adds r7, r5, r6
; CHECK-NEXT: vstrw.32 q2, [sp, #72] @ 16-byte Spill		; CHECK-NEXT: vstrw.32 q2, [sp, #72] @ 16-byte Spill
; CHECK-NEXT: vmov q2, q4		; CHECK-NEXT: vmov q2, q4
; CHECK-NEXT: vmov q4, q3
; CHECK-NEXT: vpstt		; CHECK-NEXT: vpstt
; CHECK-NEXT: vfmat.f32 q2, q1, q0		; CHECK-NEXT: vfmat.f32 q2, q1, q0
; CHECK-NEXT: vldrwt.u32 q1, [r7]		; CHECK-NEXT: vldrwt.u32 q1, [r7]
		; CHECK-NEXT: vmov q4, q3
; CHECK-NEXT: adds r5, r7, r6		; CHECK-NEXT: adds r5, r7, r6
; CHECK-NEXT: vmov q3, q5
; CHECK-NEXT: vpstt		; CHECK-NEXT: vpstt
; CHECK-NEXT: vfmat.f32 q4, q1, q0		; CHECK-NEXT: vfmat.f32 q4, q1, q0
; CHECK-NEXT: vldrwt.u32 q1, [r5]		; CHECK-NEXT: vldrwt.u32 q1, [r5]
		; CHECK-NEXT: vmov q3, q5
; CHECK-NEXT: vmov q5, q6		; CHECK-NEXT: vmov q5, q6
; CHECK-NEXT: add r5, r6		; CHECK-NEXT: add r5, r6
; CHECK-NEXT: vpstt		; CHECK-NEXT: vpstt
; CHECK-NEXT: vfmat.f32 q5, q1, q0		; CHECK-NEXT: vfmat.f32 q5, q1, q0
; CHECK-NEXT: vldrwt.u32 q1, [r5]		; CHECK-NEXT: vldrwt.u32 q1, [r5]
; CHECK-NEXT: vldrw.u32 q6, [sp, #40] @ 16-byte Reload		; CHECK-NEXT: vldrw.u32 q6, [sp, #40] @ 16-byte Reload
; CHECK-NEXT: vpst		; CHECK-NEXT: vpst
; CHECK-NEXT: vfmat.f32 q3, q1, q0		; CHECK-NEXT: vfmat.f32 q3, q1, q0
▲ Show 20 Lines • Show All 199 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/mve-vst3.ll

	Show First 20 Lines • Show All 1,071 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: .vsave {d8, d9, d10, d11, d12, d13, d14, d15}			; CHECK-NEXT: .vsave {d8, d9, d10, d11, d12, d13, d14, d15}
	; CHECK-NEXT: vpush {d8, d9, d10, d11, d12, d13, d14, d15}			; CHECK-NEXT: vpush {d8, d9, d10, d11, d12, d13, d14, d15}
	; CHECK-NEXT: .pad #128			; CHECK-NEXT: .pad #128
	; CHECK-NEXT: sub sp, #128			; CHECK-NEXT: sub sp, #128
	; CHECK-NEXT: vldrw.u32 q3, [r0, #176]			; CHECK-NEXT: vldrw.u32 q3, [r0, #176]
	; CHECK-NEXT: vldrw.u32 q2, [r0, #64]			; CHECK-NEXT: vldrw.u32 q2, [r0, #64]
	; CHECK-NEXT: vldrw.u32 q1, [r0]			; CHECK-NEXT: vldrw.u32 q1, [r0]
	; CHECK-NEXT: vldrw.u32 q0, [r0, #128]			; CHECK-NEXT: vldrw.u32 q0, [r0, #128]
	; CHECK-NEXT: vstrw.32 q3, [sp, #112] @ 16-byte Spill			; CHECK-NEXT: vstrw.32 q3, [sp, #80] @ 16-byte Spill
	; CHECK-NEXT: vldrw.u32 q3, [r0, #160]			; CHECK-NEXT: vldrw.u32 q3, [r0, #160]
	; CHECK-NEXT: vmov.f32 s24, s9
	; CHECK-NEXT: vldrw.u32 q5, [r0, #144]
	; CHECK-NEXT: vstrw.32 q3, [sp, #96] @ 16-byte Spill
	; CHECK-NEXT: vldrw.u32 q3, [r0, #96]
	; CHECK-NEXT: vmov.f32 s26, s6
	; CHECK-NEXT: vldrw.u32 q7, [r0, #112]			; CHECK-NEXT: vldrw.u32 q7, [r0, #112]
				; CHECK-NEXT: vldrw.u32 q4, [r0, #48]
				; CHECK-NEXT: vstrw.32 q3, [sp, #112] @ 16-byte Spill
				; CHECK-NEXT: vldrw.u32 q3, [r0, #96]
				; CHECK-NEXT: vmov.f32 s25, s1
				; CHECK-NEXT: vldrw.u32 q5, [r0, #144]
	; CHECK-NEXT: vstrw.32 q3, [sp, #32] @ 16-byte Spill			; CHECK-NEXT: vstrw.32 q3, [sp, #32] @ 16-byte Spill
	; CHECK-NEXT: vldrw.u32 q3, [r0, #80]			; CHECK-NEXT: vldrw.u32 q3, [r0, #80]
	; CHECK-NEXT: vmov.f32 s27, s10			; CHECK-NEXT: vmov.f32 s24, s9
	; CHECK-NEXT: vldrw.u32 q4, [r0, #48]
	; CHECK-NEXT: vstrw.32 q3, [sp, #48] @ 16-byte Spill			; CHECK-NEXT: vstrw.32 q3, [sp, #48] @ 16-byte Spill
	; CHECK-NEXT: vldrw.u32 q3, [r0, #32]			; CHECK-NEXT: vldrw.u32 q3, [r0, #32]
	; CHECK-NEXT: vmov.f32 s25, s1			; CHECK-NEXT: vmov.f32 s26, s6
	; CHECK-NEXT: vstrw.32 q3, [sp, #16] @ 16-byte Spill			; CHECK-NEXT: vstrw.32 q3, [sp, #16] @ 16-byte Spill
	; CHECK-NEXT: vldrw.u32 q3, [r0, #16]			; CHECK-NEXT: vldrw.u32 q3, [r0, #16]
	; CHECK-NEXT: vstrw.32 q6, [r1, #16]			; CHECK-NEXT: vmov.f32 s27, s10
	; CHECK-NEXT: vmov.f32 s24, s2			; CHECK-NEXT: vstrw.32 q3, [sp, #96] @ 16-byte Spill
	; CHECK-NEXT: vstrw.32 q3, [sp, #80] @ 16-byte Spill
	; CHECK-NEXT: vmov.f32 s27, s3
	; CHECK-NEXT: vmov.f32 s14, s0			; CHECK-NEXT: vmov.f32 s14, s0
	; CHECK-NEXT: vldrw.u32 q0, [sp, #112] @ 16-byte Reload
	; CHECK-NEXT: vmov.f32 s12, s4			; CHECK-NEXT: vmov.f32 s12, s4
	; CHECK-NEXT: vmov.f32 s15, s5			; CHECK-NEXT: vstrw.32 q6, [r1, #16]
	; CHECK-NEXT: vmov.f32 s13, s8			; CHECK-NEXT: vmov.f32 s13, s8
				; CHECK-NEXT: vmov.f32 s15, s5
	; CHECK-NEXT: vstrw.32 q3, [sp, #64] @ 16-byte Spill			; CHECK-NEXT: vstrw.32 q3, [sp, #64] @ 16-byte Spill
				; CHECK-NEXT: vldrw.u32 q3, [sp, #80] @ 16-byte Reload
				; CHECK-NEXT: vmov.f32 s24, s2
				; CHECK-NEXT: vmov.f32 s27, s3
				; CHECK-NEXT: vmov.f32 s2, s12
				; CHECK-NEXT: vmov.f32 s0, s16
				; CHECK-NEXT: vmov.f32 s1, s28
				; CHECK-NEXT: vmov.f32 s3, s17
	; CHECK-NEXT: vmov.f32 s25, s7			; CHECK-NEXT: vmov.f32 s25, s7
	; CHECK-NEXT: vmov.f32 s6, s0
	; CHECK-NEXT: vmov.f32 s13, s1
	; CHECK-NEXT: vmov.f32 s0, s2
	; CHECK-NEXT: vmov.f32 s4, s16
	; CHECK-NEXT: vmov.f32 s5, s28
	; CHECK-NEXT: vmov.f32 s7, s17
	; CHECK-NEXT: vmov.f32 s1, s19
	; CHECK-NEXT: vstrw.32 q1, [sp] @ 16-byte Spill
	; CHECK-NEXT: vmov.f32 s2, s31
	; CHECK-NEXT: vldrw.u32 q1, [sp, #32] @ 16-byte Reload			; CHECK-NEXT: vldrw.u32 q1, [sp, #32] @ 16-byte Reload
	; CHECK-NEXT: vmov.f32 s26, s11			; CHECK-NEXT: vmov.f32 s26, s11
	; CHECK-NEXT: vldrw.u32 q2, [sp, #16] @ 16-byte Reload			; CHECK-NEXT: vldrw.u32 q2, [sp, #16] @ 16-byte Reload
	; CHECK-NEXT: vstrw.32 q0, [sp, #112] @ 16-byte Spill			; CHECK-NEXT: vstrw.32 q0, [sp] @ 16-byte Spill
	; CHECK-NEXT: vldrw.u32 q0, [sp, #96] @ 16-byte Reload			; CHECK-NEXT: vldrw.u32 q0, [sp, #112] @ 16-byte Reload
	; CHECK-NEXT: vmov.f32 s15, s30			; CHECK-NEXT: vmov.f32 s15, s30
	; CHECK-NEXT: vstrw.32 q6, [r1, #32]			; CHECK-NEXT: vstrw.32 q6, [r1, #32]
	; CHECK-NEXT: vmov.f32 s17, s1			; CHECK-NEXT: vmov.f32 s17, s1
	; CHECK-NEXT: vldrw.u32 q6, [sp, #80] @ 16-byte Reload			; CHECK-NEXT: vldrw.u32 q6, [sp, #96] @ 16-byte Reload
	; CHECK-NEXT: vmov.f32 s30, s0			; CHECK-NEXT: vmov.f32 s30, s0
	; CHECK-NEXT: vmov.f32 s0, s2			; CHECK-NEXT: vmov.f32 s0, s2
	; CHECK-NEXT: vmov.f32 s1, s11			; CHECK-NEXT: vmov.f32 s1, s11
	; CHECK-NEXT: vmov.f32 s2, s7			; CHECK-NEXT: vmov.f32 s2, s7
	; CHECK-NEXT: vmov.f32 s14, s18			; CHECK-NEXT: vmov.f32 s14, s18
	; CHECK-NEXT: vstrw.32 q0, [sp, #96] @ 16-byte Spill			; CHECK-NEXT: vstrw.32 q0, [sp, #112] @ 16-byte Spill
	; CHECK-NEXT: vmov.f32 s18, s10			; CHECK-NEXT: vmov.f32 s18, s10
	; CHECK-NEXT: vldrw.u32 q0, [sp, #48] @ 16-byte Reload			; CHECK-NEXT: vldrw.u32 q0, [sp, #48] @ 16-byte Reload
	; CHECK-NEXT: vmov.f32 s28, s8			; CHECK-NEXT: vmov.f32 s28, s8
	; CHECK-NEXT: vmov.f32 s31, s9			; CHECK-NEXT: vmov.f32 s31, s9
	; CHECK-NEXT: vldrw.u32 q2, [sp, #80] @ 16-byte Reload			; CHECK-NEXT: vldrw.u32 q2, [sp, #96] @ 16-byte Reload
	; CHECK-NEXT: vmov.f32 s12, s29			; CHECK-NEXT: vmov.f32 s12, s29
	; CHECK-NEXT: vmov.f32 s29, s4			; CHECK-NEXT: vmov.f32 s29, s4
	; CHECK-NEXT: vstrw.32 q3, [r1, #160]			; CHECK-NEXT: vstrw.32 q3, [r1, #160]
	; CHECK-NEXT: vmov.f32 s16, s5			; CHECK-NEXT: vmov.f32 s16, s5
	; CHECK-NEXT: vstrw.32 q7, [r1, #96]			; CHECK-NEXT: vstrw.32 q7, [r1, #96]
	; CHECK-NEXT: vmov.f32 s19, s6			; CHECK-NEXT: vmov.f32 s19, s6
	; CHECK-NEXT: vmov.f32 s4, s8			; CHECK-NEXT: vmov.f32 s4, s8
	; CHECK-NEXT: vstrw.32 q4, [r1, #112]			; CHECK-NEXT: vstrw.32 q4, [r1, #112]
	; CHECK-NEXT: vmov.f32 s6, s20			; CHECK-NEXT: vmov.f32 s6, s20
	; CHECK-NEXT: vmov.f32 s20, s22			; CHECK-NEXT: vmov.f32 s20, s22
	; CHECK-NEXT: vmov.f32 s5, s0			; CHECK-NEXT: vmov.f32 s5, s0
	; CHECK-NEXT: vmov.f32 s8, s1			; CHECK-NEXT: vmov.f32 s8, s1
	; CHECK-NEXT: vmov.f32 s11, s2			; CHECK-NEXT: vmov.f32 s11, s2
	; CHECK-NEXT: vmov.f32 s22, s3			; CHECK-NEXT: vmov.f32 s22, s3
	; CHECK-NEXT: vldrw.u32 q0, [sp, #96] @ 16-byte Reload			; CHECK-NEXT: vldrw.u32 q0, [sp, #112] @ 16-byte Reload
	; CHECK-NEXT: vmov.f32 s7, s9			; CHECK-NEXT: vmov.f32 s7, s9
	; CHECK-NEXT: vstrw.32 q0, [r1, #128]			; CHECK-NEXT: vstrw.32 q0, [r1, #128]
	; CHECK-NEXT: vldrw.u32 q0, [sp] @ 16-byte Reload			; CHECK-NEXT: vldrw.u32 q0, [sp] @ 16-byte Reload
	; CHECK-NEXT: vmov.f32 s9, s21			; CHECK-NEXT: vmov.f32 s9, s21
	; CHECK-NEXT: vstrw.32 q1, [r1, #48]			; CHECK-NEXT: vstrw.32 q1, [r1, #48]
	; CHECK-NEXT: vstrw.32 q0, [r1, #144]			; CHECK-NEXT: vstrw.32 q0, [r1, #144]
	; CHECK-NEXT: vldrw.u32 q0, [sp, #112] @ 16-byte Reload			; CHECK-NEXT: vldrw.u32 q0, [sp, #80] @ 16-byte Reload
	; CHECK-NEXT: vmov.f32 s21, s27			; CHECK-NEXT: vmov.f32 s21, s27
	; CHECK-NEXT: vstrw.32 q2, [r1, #64]			; CHECK-NEXT: vstrw.32 q2, [r1, #64]
	; CHECK-NEXT: vstrw.32 q0, [r1, #176]			; CHECK-NEXT: vstrw.32 q0, [r1, #176]
	; CHECK-NEXT: vldrw.u32 q0, [sp, #64] @ 16-byte Reload			; CHECK-NEXT: vldrw.u32 q0, [sp, #64] @ 16-byte Reload
	; CHECK-NEXT: vstrw.32 q5, [r1, #80]			; CHECK-NEXT: vstrw.32 q5, [r1, #80]
	; CHECK-NEXT: vstrw.32 q0, [r1]			; CHECK-NEXT: vstrw.32 q0, [r1]
	; CHECK-NEXT: add sp, #128			; CHECK-NEXT: add sp, #128
	; CHECK-NEXT: vpop {d8, d9, d10, d11, d12, d13, d14, d15}			; CHECK-NEXT: vpop {d8, d9, d10, d11, d12, d13, d14, d15}
	▲ Show 20 Lines • Show All 375 Lines • Show Last 20 Lines