This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
-
AMDGPUTargetMachine.cpp
-
SIDefines.h
-
SIFrameLowering.h
2/5
SIFrameLowering.cpp
-
SIISelLowering.cpp
1/2
SIInstrInfo.h
-
SIInstrInfo.cpp
-
SIInstructions.td
5/23
SILowerSGPRSpills.cpp
1/4
SIMachineFunctionInfo.h
1/2
SIMachineFunctionInfo.cpp
4/7
SIRegisterInfo.cpp
-
test/CodeGen/
-
CodeGen/
-
AMDGPU/
-
GlobalISel/
-
assert-align.ll
-
call-outgoing-stack-args.ll
-
image-waterfall-loop-O0.ll
-
localizer.ll
-
abi-attribute-hints-undefined-behavior.ll
-
branch-relax-spill.ll
-
call-alias-register-usage-agpr.ll
-
call-alias-register-usage1.ll
-
callee-frame-setup.ll
-
cf-loop-on-constant.ll
-
collapse-endcf.ll
-
control-flow-fastregalloc.ll
-
cross-block-use-is-not-abi-copy.ll
1/2
csr-sgpr-spill-live-ins.mir
-
dwarf-multi-register-use-crash.ll
-
fix-frame-reg-in-custom-csr-spills.ll
-
flat-scratch-init.ll
-
fold-reload-into-exec.mir
-
fold-reload-into-m0.mir
-
frame-setup-without-sgpr-to-vgpr-spills.ll
-
gfx-call-non-gfx-func.ll
-
gfx-callable-argument-types.ll
-
gfx-callable-preserved-registers.ll
-
gfx-callable-return-types.ll
-
indirect-call.ll
-
kernel-vgpr-spill-mubuf-with-voffset.ll
-
load-constant-i16.ll
-
mubuf-legalize-operands.ll
-
mul24-pass-ordering.ll
1
need-fp-from-vgpr-spills.ll
-
no-source-locations-in-prologue.ll
-
partial-sgpr-to-vgpr-spills.ll
-
scc-clobbered-sgpr-to-vmem-spill.ll
-
sgpr-spill-dead-frame-in-dbg-value.mir
-
sgpr-spill-fi-skip-processing-stack-arg-dbg-value.mir
-
sgpr-spill-no-vgprs.ll
-
sgpr-spill-partially-undef.mir
-
sgpr-spill-update-only-slot-indexes.ll
-
sgpr-spill-vmem-large-frame.mir
-
sgpr-spills-split-regalloc.ll
-
si-spill-sgpr-stack.ll
-
sibling-call.ll
-
spill-csr-frame-ptr-reg-copy.ll
-
spill-offset-calculation.ll
-
spill-reg-tuple-super-reg-use.mir
-
spill-scavenge-offset.ll
-
spill-sgpr-csr-live-ins.mir
-
spill-sgpr-stack-no-sgpr.ll
4/6
spill-sgpr-to-virtual-vgpr.mir
-
spill-vgpr-to-agpr-update-regscavenger.ll
-
spill-writelane-vgprs.ll
-
spill192.mir
-
spill224.mir
-
tail-call-amdgpu-gfx.ll
-
tuple-allocation-failure.ll
-
unstructured-cfg-def-use-issue.ll
-
vgpr-tuple-allocation.ll
-
wwm-register-spill-during-regalloc.ll
-
wwm-reserved-spill.ll
-
MIR/AMDGPU/
-
AMDGPU/
-
machine-function-info-after-pei.ll
-
machine-function-info-no-ir.mir
-
machine-function-info.ll
-
sgpr-for-exec-copy-invalid-reg.mir
-
stack-id-assert.mir

Differential D124196

[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs
AcceptedPublic

Authored by cdevadas on Apr 21 2022, 12:18 PM.

Download Raw Diff

Details

Reviewers

arsenm
rampitec
sebastian-ne

Commits

rG7a98f084c4d1: [AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs
rG40ba0942e2ab: [AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs

Summary

Currently, the custom SGPR spill lowering pass spills
SGPRs into physical VGPR lanes and the remaining VGPRs
are used by regalloc for vector regclass allocation.
This imposes many restrictions that we ended up with
unsuccessful SGPR spilling when there won't be enough
VGPRs and we are forced to spill the leftover into
memory during PEI. The custom spill handling during PEI
has many edge cases and often breaks the compiler time
to time.

This patch implements spilling SGPRs into virtual VGPR
lanes. Since we now split the register allocation for
SGPRs and VGPRs, the virtual registers introduced for
the spill lanes would get allocated automatically in
the subsequent regalloc invocation for VGPRs.

Spill to virtual registers will always be successful,
even in the high-pressure situations, and hence it avoids
most of the edge cases during PEI. We are now left with
only the custom SGPR spills during PEI for special registers
like the frame pointer which is an unproblematic case.

By spilling CSRs into virtual VGPR lanes, we might end up
with broken CFIs that can potentially corrupt the frame
unwinding in the debugger causing either a crash or a
terrible debugging experience. This occurs when regalloc
tries to spill or split the liverange of these virtual VGPRs.
The CFIs should also be inserted at these intermediate
points to correctly propagate the CFI entries. It is not
currently implemented in the compiler. As a short-term fix,
we continue to spill CSR SGPRs into physical VGPR lanes for
the debugger to correctly compute the unwind information.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

cdevadas added inline comments.Apr 26 2022, 9:22 AM

llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
380	I don't think we can do any special handling for them at assembly emission. IMPLICIT_DEF is handled/printed by the generic part of AsmPrinter and it won't reach the target-specific emitInstruction at all.
410–411	Will do.
llvm/test/CodeGen/AMDGPU/csr-sgpr-spill-live-ins.mir
17–19	Will do.
llvm/test/CodeGen/AMDGPU/spill-sgpr-to-virtual-vgpr.mir
27	The auto-generator didn't generate the tied-operands correctly. I can hand-modify this test to show the tied operand. It's the simplest case.
59	This test is already hand-modified to check the tied operands.
195	I couldn't write one successfully. Will try some unstructured flow to force one.

cdevadas added inline comments.Apr 27 2022, 3:53 AM

llvm/test/CodeGen/AMDGPU/spill-sgpr-to-virtual-vgpr.mir
195	I don't think such a case exists. A fall-through block will have only one successor and that becomes the nearest dominator for its children. It would be true even for any unstructured flow.

Fixed the review comments.
Moved UpdateLaneVGPRDomInstr lambda into a separate function.
Implemented getClearedProperties to clear certain MF properties.
Tes pre-commit + rebase.
Fixed the tied operand cases in certain tests.

Harbormaster completed remote builds in B161576: Diff 425478.Apr 27 2022, 4:33 AM

As a follow up I think we need to address the loss of being able to share VGPR lanes for unrelated spills

llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
268–269	Typo "the the". It's also not necessarily unstructured
297	IsDominatesChecked is confusingly named and expresses the code not the intent. SeenSpillInBlock?
llvm/test/CodeGen/AMDGPU/need-fp-from-vgpr-spills.ll
207–209	This is an unfortunate regression but what I expected

arsenm added inline comments.Apr 27 2022, 2:07 PM

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
318	As part of the follow up to allow spill slot sharing, I think we can move all of this allocation stuff out of SIMachineFunctionInfo and into SILowerSGPRSpills

cdevadas added inline comments.Apr 27 2022, 8:24 PM

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
318	Ya, will try to move it entirely out of SIMachineFunctionInfo.

Addressed the review comments.

Harbormaster completed remote builds in B161733: Diff 425689.Apr 27 2022, 8:31 PM

cdevadas mentioned this in D124192: [AMDGPU] Callee must always spill writelane VGPRs.Jun 21 2022, 8:20 AM

Code rebase.

Herald added subscribers: kosarev, jsilvanus. · View Herald TranscriptJun 27 2022, 10:13 AM

Harbormaster completed remote builds in B172249: Diff 440294.Jun 27 2022, 10:14 AM

LGTM. Might want to introduce an asm printer flag on the implicit_def to mark it's for SGPR spills in the comment

llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
271	Remove "Is there a better way to handle it?"
311	Extra ()s

This revision is now accepted and ready to land.Jun 27 2022, 5:29 PM

Should also remove the SpillSGPRToVGPR option and handling

In D124196#3616260, @arsenm wrote:

Should also remove the SpillSGPRToVGPR option and handling

I don't want to do it separately. A follow-up patch?

In D124196#3616270, @cdevadas wrote:

In D124196#3616260, @arsenm wrote:

Should also remove the SpillSGPRToVGPR option and handling

I don't want to do it separately. A follow-up patch?

Typo in my earlier comment. I want to do that as a separate patch.
I've identified a few more clean up that can be done while removing SpillSGPRToVGPR option.

In D124196#3616270, @cdevadas wrote:

In D124196#3616260, @arsenm wrote:

Should also remove the SpillSGPRToVGPR option and handling

I don't want to do it separately. A follow-up patch?

Yes, that's fine

Code rebase.

Harbormaster completed remote builds in B172564: Diff 440735.Jun 28 2022, 12:59 PM

arsenm accepted this revision.Jun 28 2022, 3:39 PM

What happens when the register allocator decides to split a live range of virtual registers here, i.e. if it introduces a COPY?

cdevadas removed a parent revision: D124195: [AMDGPU] Separate out SGPR spills to VGPR lanes during PEI.Jun 29 2022, 9:17 AM

In D124196#3618878, @nhaehnle wrote:

What happens when the register allocator decides to split a live range of virtual registers here, i.e. if it introduces a COPY?

This is totally broken as soon as any of these spill. We need WWM spills if they do. We should boost their priority and they need a guaranteed register to save and restore exec. I’m not sure the best way to go about this

This revision now requires changes to proceed.Jun 29 2022, 1:28 PM

Implemented WWM register spill. Reserved SGPR(s) needed for saving EXEC while manipulating the WWM spills. Included the reserved SGPRs serialization.
I couldn't reproduce the WWM COPY situation yet even after running the internal PSDB tests and hoping this patch is good to go.
Working on a follow-up patch to implement WWM Copy.

Harbormaster completed remote builds in B189651: Diff 464220.Sep 30 2022, 5:16 AM

cdevadas added a parent revision: D134951: [CodeGen][RegAllocFast] Add MRI delegate callback to notify VReg spill.Sep 30 2022, 5:16 AM

AFAIK, the WWM register has some unmodeled liveness behavior, which makes it impossible to allocate wwm register together with normal vector register in one pass now.
For example(a typical if-then):

bb0:
  %0 = ...
  s_cbranch_execz %bb2

bb1:
  %1 = wwm_operation
  ... = %1
  %0 = ...

bb2:
  ... = %0

VGPR %0 was dead in bb1 and WWM-VGPR %1 was defined and used in bb1. As there is no live-range conflict between them, they have a chance to get assigned the same physical register. If this happens, certain lane of %0 might be overwritten when writing to %1. I am not sure if moving the SIPreAllocateWWMRegs between the sgpr allocation and the vgpr allocation might help your case? The key point is to request the SIPreAllocateWWMRegs allocate the wwm register usage introduced in SILowerSGPRSpills.

In D124196#3829110, @ruiling wrote:
AFAIK, the WWM register has some unmodeled liveness behavior, which makes it impossible to allocate wwm register together with normal vector register in one pass now.
For example(a typical if-then):
bb0:
  %0 = ...
  s_cbranch_execz %bb2

bb1:
  %1 = wwm_operation
  ... = %1
  %0 = ...

bb2:
  ... = %0
VGPR %0 was dead in bb1 and WWM-VGPR %1 was defined and used in bb1. As there is no live-range conflict between them, they have a chance to get assigned the same physical register. If this happens, certain lane of %0 might be overwritten when writing to %1. I am not sure if moving the SIPreAllocateWWMRegs between the sgpr allocation and the vgpr allocation might help your case? The key point is to request the SIPreAllocateWWMRegs allocate the wwm register usage introduced in SILowerSGPRSpills.

Agree, allowing wwm-register allocation together with the regular vector registers would be error prone with miscomputed liveness data.
But I guess, they are edge cases . Unlike the other WWM operations, the writelane/readlane for SGPR spill stores/restores, in most cases, would span across different blocks and such a liveness miscomputation would be a rare combination.

IIRC, SIPreAllocateWWMRegs can help allocate only when we have enough free VGPRs. There is no live-range spill/split incorporated in this custom pass. It won’t help in the case of large functions with more SGPR spills.
The best approach would be to introduce another regalloc pipeline between the existing SGPR and VGPR allocations. The new pipeline should allocate only the WWM-registers.
It would, however, increase the compile time complexity further. But I’m not sure we have a better choice.

Agree, allowing wwm-register allocation together with the regular vector registers would be error prone with miscomputed liveness data.
But I guess, they are edge cases . Unlike the other WWM operations, the writelane/readlane for SGPR spill stores/restores, in most cases, would span across different blocks and such a liveness miscomputation would be a rare combination.

I think we need to make sure the idea is correct in all possible cases we can think of. The writelane/readlane shares the same behavior with WWM operation regarding to the issue here. That is: they may write to a VGPR lane that the corresponding thread is inactive. "spanning across different blocks" won't help on the problem. Even the writelane/readlane operations span across more than one thousand blocks, it can still be nested in an outer if-then structure.

cdevadas removed a parent revision: D134951: [CodeGen][RegAllocFast] Add MRI delegate callback to notify VReg spill.Oct 3 2022, 9:37 PM

In D124196#3829974, @ruiling wrote:

Agree, allowing wwm-register allocation together with the regular vector registers would be error prone with miscomputed liveness data.
But I guess, they are edge cases . Unlike the other WWM operations, the writelane/readlane for SGPR spill stores/restores, in most cases, would span across different blocks and such a liveness miscomputation would be a rare combination.

I think we need to make sure the idea is correct in all possible cases we can think of. The writelane/readlane shares the same behavior with WWM operation regarding to the issue here. That is: they may write to a VGPR lane that the corresponding thread is inactive. "spanning across different blocks" won't help on the problem. Even the writelane/readlane operations span across more than one thousand blocks, it can still be nested in an outer if-then structure.

Yes, we should fix this case. And we don't see a better way other than introducing a new regalloc pipeline for wwm registers alone. The effort for that is yet to be accounted and planning a follow-up patch to split the vgpr allocation.

Moved VRegFlags into AMDGPU files. Introduced the MRI delegate callbacks and used the delegate method to propagate the virtual register flags.

Herald added a subscriber: arphaman. · View Herald TranscriptOct 25 2022, 7:29 AM

Harbormaster completed remote builds in B194175: Diff 470483.Oct 25 2022, 7:30 AM

Simplified addDelegate function to reflect the recent changes made in D134950.

Harbormaster completed remote builds in B194341: Diff 470714.Oct 25 2022, 10:52 PM

Pierre-vh added a subscriber: Pierre-vh.Oct 26 2022, 1:16 AM

Pierre-vh added inline comments.

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
1988	Why does SCC need to be dead? What happens if another instruction right after uses it?

cdevadas added inline comments.Oct 26 2022, 1:43 AM

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
1988	The code here is only to manipulate exec mask and no other instruction depends on the SCC that it produces, and we should mark it dead to avoid unwanted side effects. We don't have an alternate instruction that doesn't clobber SCC.

Pierre-vh added inline comments.Oct 26 2022, 1:50 AM

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp

1988

Ah that makes sense, but shouldn't this check that it's not inserting in a place where SCC is alive?
I was trying out this patch and I have a case where it's causing issues:

S_CMP_EQ_U32 killed renamable $sgpr6, killed renamable $sgpr7, implicit-def $scc
renamable $vgpr0 = V_WRITELANE_B32 killed $sgpr4, 4, $vgpr0(tied-def 0), implicit-def $sgpr4_sgpr5, implicit $sgpr4_sgpr5
renamable $vgpr0 = V_WRITELANE_B32 killed $sgpr5, 5, $vgpr0(tied-def 0), implicit killed $sgpr4_sgpr5
$sgpr10_sgpr11 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec
$agpr4 = V_ACCVGPR_WRITE_B32_e64 killed $vgpr0, implicit $exec
$exec = S_MOV_B64 killed $sgpr10_sgpr11
S_CBRANCH_SCC1 %bb.5, implicit killed $scc

Insertion is between the S_CMP and the S_CBRANCH.

cdevadas added inline comments.Oct 26 2022, 1:59 AM

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
1988	Yes, the check is already in place. See the code above, the if condition, that inserts two separate move instructions when SCC is live and the else part uses SCC when it is free. Not sure why RegScavenger returned false. It should have returned SCC as clobbered.

cdevadas added inline comments.Oct 26 2022, 2:10 AM

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
1988	See test llvm/test/CodeGen/AMDGPU/cf-loop-on-constant.ll A similar situation is handled. RS returned the correct liveness info for SCC.

Rebase after recent changes in D134950.

Harbormaster completed remote builds in B194472: Diff 470903.Oct 26 2022, 12:43 PM

cdevadas added a parent revision: D134951: [CodeGen][RegAllocFast] Add MRI delegate callback to notify VReg spill.Oct 27 2022, 11:39 PM

cdevadas mentioned this in D124195: [AMDGPU] Separate out SGPR spills to VGPR lanes during PEI.Oct 28 2022, 11:34 AM

Code rebase.

Harbormaster completed remote builds in B195465: Diff 472291.Nov 1 2022, 7:10 AM

Rebase

Harbormaster completed remote builds in B195608: Diff 472479.Nov 1 2022, 7:00 PM

Pierre-vh added inline comments.Nov 2 2022, 3:37 AM

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1357	I think this is missing and it's what's causing verification errors with "Using an undefined physical register" that I was talking about. The current code just tells the scavenger to enter that block but it doesn't update it to the right instruction, so eliminateFrameIndex is working with information from the start of the BB, not from the MI it's dealing with

cdevadas added inline comments.Nov 2 2022, 4:05 AM

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1357	An entirely different problem and needs to be implemented separately. The code that handles the register liveness update is implemented in `PEI::replaceFrameIndices` and it tracks the loops and invokes RS->forward() appropriately to update the liveness info. I guess we should bring this code into VGPR to AGPR spill path.

Included the patch provided by @Pierre-vh to correctly update the register liveness in the RegisterScavenger during VGPR -> AGPR spilling.
This patch avoids a crash that occurred when enabled SGPR spill to virtual VGPR lanes.

diff --git a/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp b/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp

a/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp

+++ b/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
@@ -1511,52 +1511,52 @@ void SIFrameLowering::processFunctionBeforeFrameFinalized(

                     && EnableSpillVGPRToAGPR;
                     
if (FuncInfo->allocateVGPRSpillToAGPR(MF, FI,
                                      TRI->isAGPR(MRI, VReg))) {

// FIXME: change to enterBasicBlockEnd()
RS->enterBasicBlock(MBB);

+ RS->enterBasicBlockEnd(MBB);
+ RS->backward(MI);

TRI->eliminateFrameIndex(MI, 0, FIOp, RS);
SpillFIs.set(FI);
continue;

Included the new test llvm/test/CodeGen/AMDGPU/spill-vgpr-to-agpr-update-regscavenger.ll.

Harbormaster completed remote builds in B195744: Diff 472666.Nov 2 2022, 10:19 AM

Ping

arsenm added inline comments.Nov 14 2022, 1:32 PM

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1355–1357	D137574 is in flight to invert the direction, should we land that first / separately?
llvm/lib/Target/AMDGPU/SIInstrInfo.h
628	static?
llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
63	Is this introducing a new computation in the pass pipeline (I assume not since I don't see a pass pipeline test update)
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
669	Reg.isVirtual()
675	Reg.isVirtual()

arsenm added inline comments.Nov 14 2022, 1:32 PM

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
487–492	I don't like having state here for a single operation that's happening in one pass and isn't valid for multiple uses. I don't really understand how this is being set and passed around
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
651	Isn't this always required?

cdevadas added inline comments.Nov 15 2022, 10:30 AM

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1355–1357	Alex's patch has landed. But this code is still needed to update the liveness for each instruction as eliminateFrameIndex is called here.
llvm/lib/Target/AMDGPU/SIInstrInfo.h
628	Will change.
llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
63	It isn't.
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
487–492	CurrentVRegSpilled is needed to track the virtual register (Liverange) for which the physical register was assigned. And it is needed only for fast regalloc . We need this mapping to correctly track the WWM spills as RegAllocFast spills/restore the physical registers directly as there is no VRM. This will be appropriately set with the delegate MRI_NoteVirtualRegisterSpill which is inserted in the RegAllocFast spill/reload functions. SIMachineFunctionInfo is where the delegates are currently handled and I don't have a better place to move it.
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
651	No. They are reserved only if RA inserts any whole wave spill.

Rebase + Suggestions incorporated.

Harbormaster completed remote builds in B197797: Diff 475518.Nov 15 2022, 10:45 AM

Ping

cdevadas removed a parent revision: D134951: [CodeGen][RegAllocFast] Add MRI delegate callback to notify VReg spill.Nov 22 2022, 12:13 PM

Rebase + Incorporated changes after D138515 to move the handling of physReg to current VirtReg mapping entirely into the generic design.

Harbormaster completed remote builds in B199233: Diff 477532.Nov 23 2022, 9:26 AM

cdevadas added a parent revision: D138517: [CodeGen] Use cloneVirtualRegister in LiveIntervals and LiveRangeEdit.Nov 23 2022, 9:26 AM

cdevadas mentioned this in D138515: [CodeGen][RegAllocFast] Map PhysReg to its current VirtReg.Nov 23 2022, 9:50 AM

Implemented the WWM spill during RegAllocFast using the additional argument to the spiller interface introduced with patch D138656.

Harbormaster completed remote builds in B199400: Diff 477752.Nov 24 2022, 5:03 AM

cdevadas mentioned this in D138656: [CodeGen] Additional Register argument to storeRegToStackSlot/loadRegFromStackSlot.Nov 24 2022, 5:50 AM

cdevadas removed a parent revision: D138517: [CodeGen] Use cloneVirtualRegister in LiveIntervals and LiveRangeEdit.Nov 24 2022, 5:53 AM

cdevadas added a parent revision: D138656: [CodeGen] Additional Register argument to storeRegToStackSlot/loadRegFromStackSlot.

rebase

Harbormaster completed remote builds in B203386: Diff 483233.Dec 15 2022, 10:38 AM

arsenm accepted this revision.Dec 15 2022, 10:45 AM

This revision is now accepted and ready to land.Dec 15 2022, 10:45 AM

This revision was landed with ongoing or failed builds.Dec 16 2022, 10:27 PM

Closed by commit rG40ba0942e2ab: [AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs (authored by cdevadas). · Explain Why

This revision was automatically updated to reflect the committed changes.

cdevadas added a commit: rG40ba0942e2ab: [AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs.

This patch causes OpenMC offloaded via OpenMP on AMDGPUs to crash at runtime. It looks like some corruption in the memory address.
You can find build instructions here: https://github.com/jtramm/openmc_offloading_builder

The commit before this one works fine though, assuming you cherry picked https://reviews.llvm.org/rGee1d000d43321590771a2f047c8c55d07d09ad28 first as it landed after.
I assume other codes will be impacted too.

@jtramm @ronlieb @jhuber6 FYI

This revision is now accepted and ready to land.Dec 19 2022, 11:14 PM

In D124196#4007017, @jdoerfert wrote:

This patch causes OpenMC offloaded via OpenMP on AMDGPUs to crash at runtime. It looks like some corruption in the memory address.
You can find build instructions here: https://github.com/jtramm/openmc_offloading_builder

The commit before this one works fine though, assuming you cherry picked https://reviews.llvm.org/rGee1d000d43321590771a2f047c8c55d07d09ad28 first as it landed after.
I assume other codes will be impacted too.

@jtramm @ronlieb @jhuber6 FYI

Thanks. Going to take a look.

cdevadas added a reverting change: rGa3028239a751: Revert "[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs".Dec 21 2022, 2:50 AM

Rebased after whole-wave copy implementation.

cdevadas added a parent revision: D143762: [AMDGPU] Enable whole wave register copy.Feb 10 2023, 10:00 AM

cdevadas mentioned this in D143754: [MachineInstr] Introduce generic predicated copy opcode.Feb 10 2023, 10:05 AM

cdevadas removed a parent revision: D143762: [AMDGPU] Enable whole wave register copy.May 8 2023, 4:36 AM

Rebased
Incorporated the downstream code

Harbormaster completed remote builds in B232828: Diff 523333.May 18 2023, 4:09 AM

yassingh added a parent revision: D143762: [AMDGPU] Enable whole wave register copy.May 18 2023, 4:12 AM

cdevadas edited the summary of this revision. (Show Details)May 18 2023, 5:06 AM

rebase

Harbormaster completed remote builds in B236912: Diff 528813.Jun 6 2023, 5:44 AM

rebase

Harbormaster completed remote builds in B239970: Diff 532865.Jun 20 2023, 4:23 AM

arsenm added inline comments.Jun 21 2023, 5:27 PM

llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
70	It shouldn't have been SSA to begin with ad this doesn't de-SSA
71	Add a comment explaining the new vregs?
380	You don't need to specially handle the instruction, see AsmPrinterFlags

Just a few more nits

This revision now requires changes to proceed.Jun 22 2023, 10:55 AM

yassingh added inline comments.Jun 26 2023, 4:53 AM

llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
70	Removing this line causes machine-verifier to crash in few tests. Any hints @cdevadas ?
380	Tried adding a new flag here D153754

Review comments

Harbormaster completed remote builds in B241203: Diff 534590.Jun 26 2023, 8:59 AM

yassingh added inline comments.Jun 26 2023, 9:07 AM

llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
70	Removing this line works fine when running the whole pipeline as the compiler knows the code here is not in SSA form. However, when SILowerSGPRSpills and related passes are run in isolation the verifier assumes the code to be in SSA form(possibly a bug there, also we are introducing virtual vgprs maybe that's the reason). I can leave the line as it is or is there some way to update the test files to let the compiler know the input isn't SSA? I tried "isSSA: false", didn't work.

cdevadas added inline comments.Jun 26 2023, 9:41 AM

llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
70	Seems reasonable to retain this line for now. The compiler might not be able to decide that this pass is run post phi-elimination and assume SSA form by default. There must be a serialized option to control it for MIR tests.

yassingh added inline comments.Jun 26 2023, 9:23 PM

llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
70	Yeah, MIRParser::isSSA recomputes the SSA information and sets it to true, also it doesn't expose a way to override it.

Rebase over ancestor patch changes.

Harbormaster completed remote builds in B241700: Diff 535257.Jun 28 2023, 12:09 AM

arsenm accepted this revision.Jun 28 2023, 9:25 AM

arsenm added inline comments.

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1355–1356	This is a pre-existing issue that should be fixed, but we should not be scanning the entire block from the end on every spill. The block iteration should be reversed and we should lazily call enterBasicBlockEnd on the first seen spill
llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
70	this is kind of a mir parser bug

This revision is now accepted and ready to land.Jun 28 2023, 9:25 AM

cdevadas mentioned this in D143762: [AMDGPU] Enable whole wave register copy.Jul 4 2023, 6:57 AM

fix comment

Harbormaster completed remote builds in B243664: Diff 537989.Jul 6 2023, 11:53 PM

Rebase before merge

This revision was landed with ongoing or failed builds.Jul 7 2023, 10:46 AM

Closed by commit rG7a98f084c4d1: [AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs (authored by cdevadas, committed by yassingh). · Explain Why

This revision was automatically updated to reflect the committed changes.

yassingh added a commit: rG7a98f084c4d1: [AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs.

Harbormaster completed remote builds in B243813: Diff 538200.Jul 7 2023, 12:52 PM

cdevadas mentioned this in D150388: [CodeGen]Allow targets to use target specific COPY instructions for live range splitting.Jul 16 2023, 12:01 PM

Still breaks OpenMC... https://github.com/llvm/llvm-project/issues/63983

This revision is now accepted and ready to land.Jul 20 2023, 9:29 AM

vitalybuka added a reverting change: D156381: Revert "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting".Jul 26 2023, 4:00 PM

vitalybuka added a reverting change: rGa496c8be6e63: Revert "[CodeGen]Allow targets to use target specific COPY instructions for….Jul 26 2023, 10:13 PM

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPUTargetMachine.cpp

3 lines

7 lines

4 lines

45 lines

8 lines

5 lines

60 lines

2 lines

SILowerSGPRSpills.cpp

108 lines

SIMachineFunctionInfo.h

49 lines

SIMachineFunctionInfo.cpp

32 lines

SIRegisterInfo.cpp

64 lines

test/

CodeGen/

AMDGPU/

GlobalISel/

assert-align.ll

1 line

call-outgoing-stack-args.ll

8 lines

image-waterfall-loop-O0.ll

209 lines

localizer.ll

3 lines

abi-attribute-hints-undefined-behavior.ll

1 line

branch-relax-spill.ll

8 lines

call-alias-register-usage-agpr.ll

6 lines

call-alias-register-usage1.ll

2 lines

callee-frame-setup.ll

31 lines

cf-loop-on-constant.ll

91 lines

collapse-endcf.ll

108 lines

control-flow-fastregalloc.ll

11 lines

cross-block-use-is-not-abi-copy.ll

4 lines

csr-sgpr-spill-live-ins.mir

9 lines

dwarf-multi-register-use-crash.ll

77 lines

fix-frame-reg-in-custom-csr-spills.ll

1 line

flat-scratch-init.ll

80 lines

fold-reload-into-exec.mir

58 lines

fold-reload-into-m0.mir

16 lines

frame-setup-without-sgpr-to-vgpr-spills.ll

1 line

gfx-call-non-gfx-func.ll

14 lines

gfx-callable-argument-types.ll

1180 lines

gfx-callable-preserved-registers.ll

286 lines

gfx-callable-return-types.ll

41 lines

indirect-call.ll

324 lines

kernel-vgpr-spill-mubuf-with-voffset.ll

28 lines

load-constant-i16.ll

296 lines

mubuf-legalize-operands.ll

29 lines

mul24-pass-ordering.ll

47 lines

need-fp-from-vgpr-spills.ll

68 lines

no-source-locations-in-prologue.ll

38 lines

partial-sgpr-to-vgpr-spills.ll

1053 lines

scc-clobbered-sgpr-to-vmem-spill.ll

387 lines

sgpr-spill-dead-frame-in-dbg-value.mir

26 lines

sgpr-spill-fi-skip-processing-stack-arg-dbg-value.mir

4 lines

sgpr-spill-no-vgprs.ll

290 lines

sgpr-spill-partially-undef.mir

14 lines

sgpr-spill-update-only-slot-indexes.ll

9 lines

sgpr-spill-vmem-large-frame.mir

4 lines

sgpr-spills-split-regalloc.ll

177 lines

si-spill-sgpr-stack.ll

10 lines

sibling-call.ll

8 lines

spill-csr-frame-ptr-reg-copy.ll

18 lines

spill-offset-calculation.ll

20 lines

spill-reg-tuple-super-reg-use.mir

36 lines

spill-scavenge-offset.ll

480 lines

spill-sgpr-csr-live-ins.mir

5 lines

spill-sgpr-stack-no-sgpr.ll

45 lines

spill-sgpr-to-virtual-vgpr.mir

320 lines

spill-vgpr-to-agpr-update-regscavenger.ll

85 lines

spill-writelane-vgprs.ll

9 lines

spill192.mir

29 lines

spill224.mir

33 lines

tail-call-amdgpu-gfx.ll

3 lines

tuple-allocation-failure.ll

224 lines

unstructured-cfg-def-use-issue.ll

85 lines

vgpr-tuple-allocation.ll

246 lines

wwm-register-spill-during-regalloc.ll

166 lines

wwm-reserved-spill.ll

216 lines

MIR/

AMDGPU/

machine-function-info-after-pei.ll

1 line

machine-function-info-no-ir.mir

29 lines

machine-function-info.ll

4 lines

sgpr-for-exec-copy-invalid-reg.mir

12 lines

stack-id-assert.mir

2 lines

Diff 477532

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

Show First 20 Lines • Show All 1,514 Lines • ▼ Show 20 Lines	bool GCNTargetMachine::parseMachineFunctionInfo(
auto parseOptionalRegister = [&](const yaml::StringValue &RegName,		auto parseOptionalRegister = [&](const yaml::StringValue &RegName,
Register &RegVal) {		Register &RegVal) {
return !RegName.Value.empty() && parseRegister(RegName, RegVal);		return !RegName.Value.empty() && parseRegister(RegName, RegVal);
};		};

if (parseOptionalRegister(YamlMFI.VGPRForAGPRCopy, MFI->VGPRForAGPRCopy))		if (parseOptionalRegister(YamlMFI.VGPRForAGPRCopy, MFI->VGPRForAGPRCopy))
return true;		return true;

		if (parseOptionalRegister(YamlMFI.SGPRForEXECCopy, MFI->SGPRForEXECCopy))
		return true;

auto diagnoseRegisterClass = [&](const yaml::StringValue &RegName) {		auto diagnoseRegisterClass = [&](const yaml::StringValue &RegName) {
// Create a diagnostic for a the register string literal.		// Create a diagnostic for a the register string literal.
const MemoryBuffer &Buffer =		const MemoryBuffer &Buffer =
*PFS.SM->getMemoryBuffer(PFS.SM->getMainFileID());		*PFS.SM->getMemoryBuffer(PFS.SM->getMainFileID());
Error = SMDiagnostic(*PFS.SM, SMLoc(), Buffer.getBufferIdentifier(), 1,		Error = SMDiagnostic(*PFS.SM, SMLoc(), Buffer.getBufferIdentifier(), 1,
RegName.Value.size(), SourceMgr::DK_Error,		RegName.Value.size(), SourceMgr::DK_Error,
"incorrect register class for field", RegName.Value,		"incorrect register class for field", RegName.Value,
None, None);		None, None);
▲ Show 20 Lines • Show All 125 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIDefines.h

Show First 20 Lines • Show All 899 Lines • ▼ Show 20 Lines	enum Offset_COV5 : unsigned {
MULTIGRID_SYNC_ARG_OFFSET = 88,		MULTIGRID_SYNC_ARG_OFFSET = 88,
HEAP_PTR_OFFSET = 96,		HEAP_PTR_OFFSET = 96,
PRIVATE_BASE_OFFSET = 192,		PRIVATE_BASE_OFFSET = 192,
SHARED_BASE_OFFSET = 196,		SHARED_BASE_OFFSET = 196,
QUEUE_PTR_OFFSET = 200,		QUEUE_PTR_OFFSET = 200,
};		};

} // namespace ImplicitArg		} // namespace ImplicitArg

		namespace VirtRegFlag {
		// Virtual Register Flags.
		enum Register_Flag : uint8_t { WWM_REG = 0 };

		} // namespace VirtRegFlag

} // namespace AMDGPU		} // namespace AMDGPU

#define R_00B028_SPI_SHADER_PGM_RSRC1_PS 0x00B028		#define R_00B028_SPI_SHADER_PGM_RSRC1_PS 0x00B028
#define S_00B028_VGPRS(x) (((x) & 0x3F) << 0)		#define S_00B028_VGPRS(x) (((x) & 0x3F) << 0)
#define S_00B028_SGPRS(x) (((x) & 0x0F) << 6)		#define S_00B028_SGPRS(x) (((x) & 0x0F) << 6)
#define S_00B028_MEM_ORDERED(x) (((x) & 0x1) << 25)		#define S_00B028_MEM_ORDERED(x) (((x) & 0x1) << 25)
#define G_00B028_MEM_ORDERED(x) (((x) >> 25) & 0x1)		#define G_00B028_MEM_ORDERED(x) (((x) >> 25) & 0x1)
#define C_00B028_MEM_ORDERED 0xFDFFFFFF		#define C_00B028_MEM_ORDERED 0xFDFFFFFF
▲ Show 20 Lines • Show All 147 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIFrameLowering.h

Show All 28 Lines	void emitEpilogue(MachineFunction &MF,
MachineBasicBlock &MBB) const override;		MachineBasicBlock &MBB) const override;
StackOffset getFrameIndexReference(const MachineFunction &MF, int FI,		StackOffset getFrameIndexReference(const MachineFunction &MF, int FI,
Register &FrameReg) const override;		Register &FrameReg) const override;

void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs,		void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs,
RegScavenger *RS = nullptr) const override;		RegScavenger *RS = nullptr) const override;
void determineCalleeSavesSGPR(MachineFunction &MF, BitVector &SavedRegs,		void determineCalleeSavesSGPR(MachineFunction &MF, BitVector &SavedRegs,
RegScavenger *RS = nullptr) const;		RegScavenger *RS = nullptr) const;
void determinePrologEpilogSGPRSaves(MachineFunction &MF,		void determinePrologEpilogSGPRSaves(MachineFunction &MF, BitVector &SavedRegs,
BitVector &SavedRegs) const;		bool NeedExecCopyReservedReg) const;
void emitCSRSpillStores(MachineFunction &MF, MachineBasicBlock &MBB,		void emitCSRSpillStores(MachineFunction &MF, MachineBasicBlock &MBB,
MachineBasicBlock::iterator MBBI, DebugLoc &DL,		MachineBasicBlock::iterator MBBI, DebugLoc &DL,
LivePhysRegs &LiveRegs, Register FrameReg,		LivePhysRegs &LiveRegs, Register FrameReg,
Register FramePtrRegScratchCopy) const;		Register FramePtrRegScratchCopy) const;
void emitCSRSpillRestores(MachineFunction &MF, MachineBasicBlock &MBB,		void emitCSRSpillRestores(MachineFunction &MF, MachineBasicBlock &MBB,
MachineBasicBlock::iterator MBBI, DebugLoc &DL,		MachineBasicBlock::iterator MBBI, DebugLoc &DL,
LivePhysRegs &LiveRegs, Register FrameReg,		LivePhysRegs &LiveRegs, Register FrameReg,
Register FramePtrRegScratchCopy) const;		Register FramePtrRegScratchCopy) const;
▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp

Show First 20 Lines • Show All 60 Lines • ▼ Show 20 Lines if (LiveRegs.available(MRI, Reg))

return Reg; return Reg;

} }

return MCRegister(); return MCRegister();

} }

static void getVGPRSpillLaneOrTempRegister( static void getVGPRSpillLaneOrTempRegister(

MachineFunction &MF, LivePhysRegs &LiveRegs, Register SGPR, MachineFunction &MF, LivePhysRegs &LiveRegs, Register SGPR,

const TargetRegisterClass &RC = AMDGPU::SReg_32_XM0_XEXECRegClass) { const TargetRegisterClass &RC = AMDGPU::SReg_32_XM0_XEXECRegClass,

bool IncludeScratchCopy = true) {

SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>(); SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();

MachineFrameInfo &FrameInfo = MF.getFrameInfo(); MachineFrameInfo &FrameInfo = MF.getFrameInfo();

const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>(); const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();

const SIRegisterInfo *TRI = ST.getRegisterInfo(); const SIRegisterInfo *TRI = ST.getRegisterInfo();

unsigned Size = TRI->getSpillSize(RC); unsigned Size = TRI->getSpillSize(RC);

Align Alignment = TRI->getSpillAlign(RC); Align Alignment = TRI->getSpillAlign(RC);

// We need to save and restore the given SGPR. // We need to save and restore the given SGPR.

// 1: Try to save the given register into an unused scratch SGPR. The LiveRegs // 1: Try to save the given register into an unused scratch SGPR. The LiveRegs

// should have all the callee saved registers marked as used. // should have all the callee saved registers marked as used. For certain

if (IncludeScratchCopy)

ScratchSGPR = findUnusedRegister(MF.getRegInfo(), LiveRegs, RC);

if (!ScratchSGPR) { if (!ScratchSGPR) {

int FI = FrameInfo.CreateStackObject(Size, Alignment, true, nullptr, int FI = FrameInfo.CreateStackObject(Size, Alignment, true, nullptr,

TargetStackID::SGPRSpill); TargetStackID::SGPRSpill);

if (TRI->spillSGPRToVGPR() && if (TRI->spillSGPRToVGPR() &&

MFI->allocateSGPRSpillToVGPRLane(MF, FI, /* IsPrologEpilog */ true)) { MFI->allocateSGPRSpillToVGPRLane(MF, FI, /* IsPrologEpilog */ true)) {

// 2: There's no free lane to spill, and no free register to save the // 2: There's no free lane to spill, and no free register to save the

▲ Show 20 Lines • Show All 1,252 Lines • ▼ Show 20 Lines for (MachineBasicBlock &MBB : MF) {

// finalization. // finalization.

unsigned FIOp = AMDGPU::getNamedOperandIdx(MI.getOpcode(), unsigned FIOp = AMDGPU::getNamedOperandIdx(MI.getOpcode(),

AMDGPU::OpName::vaddr); AMDGPU::OpName::vaddr);

int FI = MI.getOperand(FIOp).getIndex(); int FI = MI.getOperand(FIOp).getIndex();

TII->getNamedOperand(MI, AMDGPU::OpName::vdata)->getReg(); TII->getNamedOperand(MI, AMDGPU::OpName::vdata)->getReg();

if (FuncInfo->allocateVGPRSpillToAGPR(MF, FI, if (FuncInfo->allocateVGPRSpillToAGPR(MF, FI,

TRI->isAGPR(MRI, VReg))) { TRI->isAGPR(MRI, VReg))) {

// FIXME: change to enterBasicBlockEnd() RS->enterBasicBlockEnd(MBB);

RS->enterBasicBlock(MBB); RS->backward(MI);

arsenmUnsubmitted

Not Done

This is a pre-existing issue that should be fixed, but we should not be scanning the entire block from the end on every spill. The block iteration should be reversed and we should lazily call enterBasicBlockEnd on the first seen spill

arsenm: This is a pre-existing issue that should be fixed, but we should not be scanning the entire…

TRI->eliminateFrameIndex(MI, 0, FIOp, RS); TRI->eliminateFrameIndex(MI, 0, FIOp, RS);

Pierre-vhUnsubmitted

Not Done

RS->enterBasicBlock(MBB);

+ RS->forward(MI);

TRI->eliminateFrameIndex(MI, 0, FIOp, RS);

SpillFIs.set(FI);

I think this is missing and it's what's causing verification errors with "Using an undefined physical register" that I was talking about.
The current code just tells the scavenger to enter that block but it doesn't update it to the right instruction, so eliminateFrameIndex is working with information from the start of the BB, not from the MI it's dealing with

Pierre-vh: I think this is missing and it's what's causing verification errors with "Using an undefined…

cdevadasAuthorUnsubmitted

Done

An entirely different problem and needs to be implemented separately. The code that handles the register liveness update is implemented in PEI::replaceFrameIndices and it tracks the loops and invokes RS->forward() appropriately to update the liveness info. I guess we should bring this code into VGPR to AGPR spill path.

cdevadas: An entirely different problem and needs to be implemented separately. The code that handles the…

arsenmUnsubmitted

Not Done

D137574 is in flight to invert the direction, should we land that first / separately?

arsenm: D137574 is in flight to invert the direction, should we land that first / separately?

cdevadasAuthorUnsubmitted

Done

Alex's patch has landed. But this code is still needed to update the liveness for each instruction as eliminateFrameIndex is called here.

cdevadas: Alex's patch has landed. But this code is still needed to update the liveness for each…

SpillFIs.set(FI); SpillFIs.set(FI);

continue; continue;

} }

} else if (TII->isStoreToStackSlot(MI, FrameIndex) || } else if (TII->isStoreToStackSlot(MI, FrameIndex) ||

TII->isLoadFromStackSlot(MI, FrameIndex)) TII->isLoadFromStackSlot(MI, FrameIndex))

if (!MFI.isFixedObjectIndex(FrameIndex)) if (!MFI.isFixedObjectIndex(FrameIndex))

NonVGPRSpillFIs.set(FrameIndex); NonVGPRSpillFIs.set(FrameIndex);

} }

▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines if (UnusedLowVGPR && (TRI->getHWRegIndex(UnusedLowVGPR) <

MRI.freezeReservedRegs(MF); MRI.freezeReservedRegs(MF);

} }

// The special SGPR spills like the one needed for FP, BP or any reserved // The special SGPR spills like the one needed for FP, BP or any reserved

// registers delayed until frame lowering. // registers delayed until frame lowering.

void SIFrameLowering::determinePrologEpilogSGPRSaves( void SIFrameLowering::determinePrologEpilogSGPRSaves(

MachineFunction &MF, BitVector &SavedVGPRs) const { MachineFunction &MF, BitVector &SavedVGPRs,

bool NeedExecCopyReservedReg) const {

MachineFrameInfo &FrameInfo = MF.getFrameInfo(); MachineFrameInfo &FrameInfo = MF.getFrameInfo();

MachineRegisterInfo &MRI = MF.getRegInfo();

SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>(); SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();

const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>(); const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();

const SIRegisterInfo *TRI = ST.getRegisterInfo(); const SIRegisterInfo *TRI = ST.getRegisterInfo();

LivePhysRegs LiveRegs; LivePhysRegs LiveRegs;

LiveRegs.init(*TRI); LiveRegs.init(*TRI);

// Initially mark callee saved registers as used so we will not choose them // Initially mark callee saved registers as used so we will not choose them

// while looking for scratch SGPRs. // while looking for scratch SGPRs.

const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs(); const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();

for (unsigned I = 0; CSRegs[I]; ++I) for (unsigned I = 0; CSRegs[I]; ++I)

LiveRegs.addReg(CSRegs[I]); LiveRegs.addReg(CSRegs[I]);

if (NeedExecCopyReservedReg) {

assert(ReservedReg && "Should have reserved an SGPR for EXEC copy.");

const TargetRegisterClass &RC = ST.isWave32()

? AMDGPU::SReg_32_XM0_XEXECRegClass

: AMDGPU::SGPR_64RegClass;

if (UnusedScratchReg) {

// If found any unused scratch SGPR, reserve the register itself for Exec

// copy and there is no need for any spills in that case.

MFI->setSGPRForEXECCopy(UnusedScratchReg);

LiveRegs.addReg(UnusedScratchReg);

} else {

// Needs spill.

assert(!MFI->hasPrologEpilogSGPRSpillEntry(ReservedReg) &&

"Re-reserving spill slot for EXEC copy register");

getVGPRSpillLaneOrTempRegister(MF, LiveRegs, ReservedReg, RC,

/* IncludeScratchCopy */ false);

}

// hasFP only knows about stack objects that already exist. We're now // hasFP only knows about stack objects that already exist. We're now

// determining the stack slots that will be created, so we have to predict // determining the stack slots that will be created, so we have to predict

// them. Stack objects force FP usage with calls. // them. Stack objects force FP usage with calls.

// //

// Note a new VGPR CSR may be introduced if one is used for the spill, but we // Note a new VGPR CSR may be introduced if one is used for the spill, but we

// don't want to report it here. // don't want to report it here.

// //

// FIXME: Is this really hasReservedCallFrame? // FIXME: Is this really hasReservedCallFrame?

Show All 22 Lines void SIFrameLowering::determineCalleeSaves(MachineFunction &MF,

RegScavenger *RS) const { RegScavenger *RS) const {

TargetFrameLowering::determineCalleeSaves(MF, SavedVGPRs, RS); TargetFrameLowering::determineCalleeSaves(MF, SavedVGPRs, RS);

SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>(); SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();

if (MFI->isEntryFunction()) if (MFI->isEntryFunction())

return; return;

const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>(); const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();

const SIRegisterInfo *TRI = ST.getRegisterInfo(); const SIRegisterInfo *TRI = ST.getRegisterInfo();

const SIInstrInfo *TII = ST.getInstrInfo();

bool NeedExecCopyReservedReg = false;

for (MachineBasicBlock &MBB : MF) { for (MachineBasicBlock &MBB : MF) {

for (MachineInstr &MI : MBB) { for (MachineInstr &MI : MBB) {

// WRITELANE instructions used for SGPR spills can overwrite the inactive // WRITELANE instructions used for SGPR spills can overwrite the inactive

// lanes of VGPRs and callee must spill and restore them even if they are // lanes of VGPRs and callee must spill and restore them even if they are

// marked Caller-saved. // marked Caller-saved.

// TODO: Handle this elsewhere at an early point. Walking through all MBBs // TODO: Handle this elsewhere at an early point. Walking through all MBBs

// here would be a bad heuristic. A better way should be by calling // here would be a bad heuristic. A better way should be by calling

// allocateWWMSpill during the regalloc pipeline whenever a physical // allocateWWMSpill during the regalloc pipeline whenever a physical

// register is allocated for the intended virtual registers. That will // register is allocated for the intended virtual registers. That will

// also help excluding the general use of WRITELANE/READLANE intrinsics // also help excluding the general use of WRITELANE/READLANE intrinsics

// that won't really need any such special handling. // that won't really need any such special handling.

if (MI.getOpcode() == AMDGPU::V_WRITELANE_B32) if (MI.getOpcode() == AMDGPU::V_WRITELANE_B32)

MFI->allocateWWMSpill(MF, MI.getOperand(0).getReg()); MFI->allocateWWMSpill(MF, MI.getOperand(0).getReg());

else if (MI.getOpcode() == AMDGPU::V_READLANE_B32) else if (MI.getOpcode() == AMDGPU::V_READLANE_B32)

MFI->allocateWWMSpill(MF, MI.getOperand(1).getReg()); MFI->allocateWWMSpill(MF, MI.getOperand(1).getReg());

else if (TII->isWWMRegSpillOpcode(MI.getOpcode()))

NeedExecCopyReservedReg = true;

} }

// Ignore the SGPRs the default implementation found. // Ignore the SGPRs the default implementation found.

SavedVGPRs.clearBitsNotInMask(TRI->getAllVectorRegMask()); SavedVGPRs.clearBitsNotInMask(TRI->getAllVectorRegMask());

// Do not save AGPRs prior to GFX90A because there was no easy way to do so. // Do not save AGPRs prior to GFX90A because there was no easy way to do so.

// In gfx908 there was do AGPR loads and stores and thus spilling also // In gfx908 there was do AGPR loads and stores and thus spilling also

// require a temporary VGPR. // require a temporary VGPR.

if (!ST.hasGFX90AInsts()) if (!ST.hasGFX90AInsts())

SavedVGPRs.clearBitsInMask(TRI->getAllAGPRRegMask()); SavedVGPRs.clearBitsInMask(TRI->getAllAGPRRegMask());

determinePrologEpilogSGPRSaves(MF, SavedVGPRs); determinePrologEpilogSGPRSaves(MF, SavedVGPRs, NeedExecCopyReservedReg);

// The Whole-Wave VGPRs need to be specially inserted in the prolog, so don't // The Whole-Wave VGPRs need to be specially inserted in the prolog, so don't

// allow the default insertion to handle them. // allow the default insertion to handle them.

for (auto &Reg : MFI->getWWMSpills()) for (auto &Reg : MFI->getWWMSpills())

SavedVGPRs.reset(Reg.first); SavedVGPRs.reset(Reg.first);

// Mark all lane VGPRs as BB LiveIns. // Mark all lane VGPRs as BB LiveIns.

for (MachineBasicBlock &MBB : MF) { for (MachineBasicBlock &MBB : MF) {

▲ Show 20 Lines • Show All 213 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 12,504 Lines • ▼ Show 20 Lines	for (unsigned I = 0, E = MRI.getNumVirtRegs(); I != E; ++I) {
if (!RC)		if (!RC)
continue;		continue;
int NewClassID = getAlignedAGPRClassID(RC->getID());		int NewClassID = getAlignedAGPRClassID(RC->getID());
if (NewClassID != -1)		if (NewClassID != -1)
MRI.setRegClass(Reg, TRI->getRegClass(NewClassID));		MRI.setRegClass(Reg, TRI->getRegClass(NewClassID));
}		}
}		}

		// Reserve the SGPR(s) to save/restore EXEC for WWM spill/copy handling.
		unsigned MaxNumSGPRs = ST.getMaxNumSGPRs(MF);
		Register SReg =
		ST.isWave32()
		? AMDGPU::SGPR_32RegClass.getRegister(MaxNumSGPRs - 1)
		: AMDGPU::SGPR_64RegClass.getRegister((MaxNumSGPRs / 2) - 1);
		Info->setSGPRForEXECCopy(SReg);

TargetLoweringBase::finalizeLowering(MF);		TargetLoweringBase::finalizeLowering(MF);
}		}

void SITargetLowering::computeKnownBitsForFrameIndex(		void SITargetLowering::computeKnownBitsForFrameIndex(
const int FI, KnownBits &Known, const MachineFunction &MF) const {		const int FI, KnownBits &Known, const MachineFunction &MF) const {
TargetLowering::computeKnownBitsForFrameIndex(FI, Known, MF);		TargetLowering::computeKnownBitsForFrameIndex(FI, Known, MF);

// Set the high bits to zero based on the maximum allowed scratch size per		// Set the high bits to zero based on the maximum allowed scratch size per
▲ Show 20 Lines • Show All 702 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIInstrInfo.h

Show First 20 Lines • Show All 619 Lines • ▼ Show 20 Lines	public:
static bool isSGPRSpill(const MachineInstr &MI) {		static bool isSGPRSpill(const MachineInstr &MI) {
return MI.getDesc().TSFlags & SIInstrFlags::SGPRSpill;		return MI.getDesc().TSFlags & SIInstrFlags::SGPRSpill;
}		}

bool isSGPRSpill(uint16_t Opcode) const {		bool isSGPRSpill(uint16_t Opcode) const {
return get(Opcode).TSFlags & SIInstrFlags::SGPRSpill;		return get(Opcode).TSFlags & SIInstrFlags::SGPRSpill;
}		}

		static bool isWWMRegSpillOpcode(uint16_t Opcode) {
		arsenmUnsubmitted Not Done Reply Inline Actions static? arsenm: static?
		cdevadasAuthorUnsubmitted Done Reply Inline Actions Will change. cdevadas: Will change.
		return Opcode == AMDGPU::SI_SPILL_WWM_V32_SAVE \|\|
		Opcode == AMDGPU::SI_SPILL_WWM_V32_RESTORE;
		}

static bool isDPP(const MachineInstr &MI) {		static bool isDPP(const MachineInstr &MI) {
return MI.getDesc().TSFlags & SIInstrFlags::DPP;		return MI.getDesc().TSFlags & SIInstrFlags::DPP;
}		}

bool isDPP(uint16_t Opcode) const {		bool isDPP(uint16_t Opcode) const {
return get(Opcode).TSFlags & SIInstrFlags::DPP;		return get(Opcode).TSFlags & SIInstrFlags::DPP;
}		}

▲ Show 20 Lines • Show All 696 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,505 Lines • ▼ Show 20 Lines	case 64:
return AMDGPU::SI_SPILL_AV512_SAVE;		return AMDGPU::SI_SPILL_AV512_SAVE;
case 128:		case 128:
return AMDGPU::SI_SPILL_AV1024_SAVE;		return AMDGPU::SI_SPILL_AV1024_SAVE;
default:		default:
llvm_unreachable("unknown register size");		llvm_unreachable("unknown register size");
}		}
}		}

		static unsigned getWWMRegSpillSaveOpcode(unsigned Size) {
		// Currently, there is only 32-bit WWM register spills needed.
		if (Size != 4)
		llvm_unreachable("unknown wwm register spill size");

		return AMDGPU::SI_SPILL_WWM_V32_SAVE;
		}

		static unsigned getVectorRegSpillSaveOpcode(Register Reg,
		const TargetRegisterClass *RC,
		unsigned Size,
		const SIRegisterInfo &TRI,
		const SIMachineFunctionInfo &MFI,
		const MachineRegisterInfo &MRI) {
		// Choose the right opcode if spilling a WWM register.
		if (MFI.checkFlag(MRI, Reg, AMDGPU::VirtRegFlag::WWM_REG))
		return getWWMRegSpillSaveOpcode(Size);

		return TRI.isVectorSuperClass(RC) ? getAVSpillSaveOpcode(Size)
		: TRI.isAGPRClass(RC) ? getAGPRSpillSaveOpcode(Size)
		: getVGPRSpillSaveOpcode(Size);
		}

void SIInstrInfo::storeRegToStackSlot(MachineBasicBlock &MBB,		void SIInstrInfo::storeRegToStackSlot(MachineBasicBlock &MBB,
MachineBasicBlock::iterator MI,		MachineBasicBlock::iterator MI,
Register SrcReg, bool isKill,		Register SrcReg, bool isKill,
int FrameIndex,		int FrameIndex,
const TargetRegisterClass *RC,		const TargetRegisterClass *RC,
const TargetRegisterInfo *TRI) const {		const TargetRegisterInfo *TRI) const {
MachineFunction *MF = MBB.getParent();		MachineFunction *MF = MBB.getParent();
SIMachineFunctionInfo *MFI = MF->getInfo<SIMachineFunctionInfo>();		SIMachineFunctionInfo *MFI = MF->getInfo<SIMachineFunctionInfo>();
Show All 30 Lines	BuildMI(MBB, MI, DL, OpDesc)
.addMemOperand(MMO)		.addMemOperand(MMO)
.addReg(MFI->getStackPtrOffsetReg(), RegState::Implicit);		.addReg(MFI->getStackPtrOffsetReg(), RegState::Implicit);

if (RI.spillSGPRToVGPR())		if (RI.spillSGPRToVGPR())
FrameInfo.setStackID(FrameIndex, TargetStackID::SGPRSpill);		FrameInfo.setStackID(FrameIndex, TargetStackID::SGPRSpill);
return;		return;
}		}

unsigned Opcode = RI.isVectorSuperClass(RC) ? getAVSpillSaveOpcode(SpillSize)		unsigned Opcode =
: RI.isAGPRClass(RC) ? getAGPRSpillSaveOpcode(SpillSize)		getVectorRegSpillSaveOpcode(SrcReg, RC, SpillSize, RI, *MFI, MRI);
: getVGPRSpillSaveOpcode(SpillSize);
MFI->setHasSpilledVGPRs();		MFI->setHasSpilledVGPRs();

BuildMI(MBB, MI, DL, get(Opcode))		BuildMI(MBB, MI, DL, get(Opcode))
.addReg(SrcReg, getKillRegState(isKill)) // data		.addReg(SrcReg, getKillRegState(isKill)) // data
.addFrameIndex(FrameIndex) // addr		.addFrameIndex(FrameIndex) // addr
.addReg(MFI->getStackPtrOffsetReg()) // scratch_offset		.addReg(MFI->getStackPtrOffsetReg()) // scratch_offset
.addImm(0) // offset		.addImm(0) // offset
.addMemOperand(MMO);		.addMemOperand(MMO);
▲ Show 20 Lines • Show All 102 Lines • ▼ Show 20 Lines	case 64:
return AMDGPU::SI_SPILL_AV512_RESTORE;		return AMDGPU::SI_SPILL_AV512_RESTORE;
case 128:		case 128:
return AMDGPU::SI_SPILL_AV1024_RESTORE;		return AMDGPU::SI_SPILL_AV1024_RESTORE;
default:		default:
llvm_unreachable("unknown register size");		llvm_unreachable("unknown register size");
}		}
}		}

		static unsigned getWWMRegSpillRestoreOpcode(unsigned Size) {
		// Currently, there is only 32-bit WWM register spills needed.
		if (Size != 4)
		llvm_unreachable("unknown wwm register spill size");

		return AMDGPU::SI_SPILL_WWM_V32_RESTORE;
		}

		static unsigned getVectorRegSpillRestoreOpcode(Register Reg,
		const TargetRegisterClass *RC,
		unsigned Size,
		const SIRegisterInfo &TRI,
		const SIMachineFunctionInfo &MFI,
		const MachineRegisterInfo &MRI) {
		// Choose the right opcode if restoring a WWM register.
		if (MFI.checkFlag(MRI, Reg, AMDGPU::VirtRegFlag::WWM_REG))
		return getWWMRegSpillRestoreOpcode(Size);

		return TRI.isVectorSuperClass(RC) ? getAVSpillRestoreOpcode(Size)
		: TRI.isAGPRClass(RC) ? getAGPRSpillRestoreOpcode(Size)
		: getVGPRSpillRestoreOpcode(Size);
		}

void SIInstrInfo::loadRegFromStackSlot(MachineBasicBlock &MBB,		void SIInstrInfo::loadRegFromStackSlot(MachineBasicBlock &MBB,
MachineBasicBlock::iterator MI,		MachineBasicBlock::iterator MI,
Register DestReg, int FrameIndex,		Register DestReg, int FrameIndex,
const TargetRegisterClass *RC,		const TargetRegisterClass *RC,
const TargetRegisterInfo *TRI) const {		const TargetRegisterInfo *TRI) const {
MachineFunction *MF = MBB.getParent();		MachineFunction *MF = MBB.getParent();
SIMachineFunctionInfo *MFI = MF->getInfo<SIMachineFunctionInfo>();		SIMachineFunctionInfo *MFI = MF->getInfo<SIMachineFunctionInfo>();
MachineFrameInfo &FrameInfo = MF->getFrameInfo();		MachineFrameInfo &FrameInfo = MF->getFrameInfo();
		MachineRegisterInfo &MRI = MF->getRegInfo();
const DebugLoc &DL = MBB.findDebugLoc(MI);		const DebugLoc &DL = MBB.findDebugLoc(MI);
unsigned SpillSize = TRI->getSpillSize(*RC);		unsigned SpillSize = TRI->getSpillSize(*RC);

MachinePointerInfo PtrInfo		MachinePointerInfo PtrInfo
= MachinePointerInfo::getFixedStack(*MF, FrameIndex);		= MachinePointerInfo::getFixedStack(*MF, FrameIndex);

MachineMemOperand *MMO = MF->getMachineMemOperand(		MachineMemOperand *MMO = MF->getMachineMemOperand(
PtrInfo, MachineMemOperand::MOLoad, FrameInfo.getObjectSize(FrameIndex),		PtrInfo, MachineMemOperand::MOLoad, FrameInfo.getObjectSize(FrameIndex),
FrameInfo.getObjectAlign(FrameIndex));		FrameInfo.getObjectAlign(FrameIndex));

if (RI.isSGPRClass(RC)) {		if (RI.isSGPRClass(RC)) {
MFI->setHasSpilledSGPRs();		MFI->setHasSpilledSGPRs();
assert(DestReg != AMDGPU::M0 && "m0 should not be reloaded into");		assert(DestReg != AMDGPU::M0 && "m0 should not be reloaded into");
assert(DestReg != AMDGPU::EXEC_LO && DestReg != AMDGPU::EXEC_HI &&		assert(DestReg != AMDGPU::EXEC_LO && DestReg != AMDGPU::EXEC_HI &&
DestReg != AMDGPU::EXEC && "exec should not be spilled");		DestReg != AMDGPU::EXEC && "exec should not be spilled");

// FIXME: Maybe this should not include a memoperand because it will be		// FIXME: Maybe this should not include a memoperand because it will be
// lowered to non-memory instructions.		// lowered to non-memory instructions.
const MCInstrDesc &OpDesc = get(getSGPRSpillRestoreOpcode(SpillSize));		const MCInstrDesc &OpDesc = get(getSGPRSpillRestoreOpcode(SpillSize));
if (DestReg.isVirtual() && SpillSize == 4) {		if (DestReg.isVirtual() && SpillSize == 4) {
MachineRegisterInfo &MRI = MF->getRegInfo();
MRI.constrainRegClass(DestReg, &AMDGPU::SReg_32_XM0_XEXECRegClass);		MRI.constrainRegClass(DestReg, &AMDGPU::SReg_32_XM0_XEXECRegClass);
}		}

if (RI.spillSGPRToVGPR())		if (RI.spillSGPRToVGPR())
FrameInfo.setStackID(FrameIndex, TargetStackID::SGPRSpill);		FrameInfo.setStackID(FrameIndex, TargetStackID::SGPRSpill);
BuildMI(MBB, MI, DL, OpDesc, DestReg)		BuildMI(MBB, MI, DL, OpDesc, DestReg)
.addFrameIndex(FrameIndex) // addr		.addFrameIndex(FrameIndex) // addr
.addMemOperand(MMO)		.addMemOperand(MMO)
.addReg(MFI->getStackPtrOffsetReg(), RegState::Implicit);		.addReg(MFI->getStackPtrOffsetReg(), RegState::Implicit);

return;		return;
}		}

unsigned Opcode = RI.isVectorSuperClass(RC)		unsigned Opcode =
? getAVSpillRestoreOpcode(SpillSize)		getVectorRegSpillRestoreOpcode(DestReg, RC, SpillSize, RI, *MFI, MRI);
: RI.isAGPRClass(RC) ? getAGPRSpillRestoreOpcode(SpillSize)
: getVGPRSpillRestoreOpcode(SpillSize);
BuildMI(MBB, MI, DL, get(Opcode), DestReg)		BuildMI(MBB, MI, DL, get(Opcode), DestReg)
.addFrameIndex(FrameIndex) // vaddr		.addFrameIndex(FrameIndex) // vaddr
.addReg(MFI->getStackPtrOffsetReg()) // scratch_offset		.addReg(MFI->getStackPtrOffsetReg()) // scratch_offset
.addImm(0) // offset		.addImm(0) // offset
.addMemOperand(MMO);		.addMemOperand(MMO);
}		}

void SIInstrInfo::insertNoop(MachineBasicBlock &MBB,		void SIInstrInfo::insertNoop(MachineBasicBlock &MBB,
▲ Show 20 Lines • Show All 6,781 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIInstructions.td

	Show First 20 Lines • Show All 847 Lines • ▼ Show 20 Lines
	defm SI_SPILL_AV128 : SI_SPILL_VGPR <AV_128, 1>;			defm SI_SPILL_AV128 : SI_SPILL_VGPR <AV_128, 1>;
	defm SI_SPILL_AV160 : SI_SPILL_VGPR <AV_160, 1>;			defm SI_SPILL_AV160 : SI_SPILL_VGPR <AV_160, 1>;
	defm SI_SPILL_AV192 : SI_SPILL_VGPR <AV_192, 1>;			defm SI_SPILL_AV192 : SI_SPILL_VGPR <AV_192, 1>;
	defm SI_SPILL_AV224 : SI_SPILL_VGPR <AV_224, 1>;			defm SI_SPILL_AV224 : SI_SPILL_VGPR <AV_224, 1>;
	defm SI_SPILL_AV256 : SI_SPILL_VGPR <AV_256, 1>;			defm SI_SPILL_AV256 : SI_SPILL_VGPR <AV_256, 1>;
	defm SI_SPILL_AV512 : SI_SPILL_VGPR <AV_512, 1>;			defm SI_SPILL_AV512 : SI_SPILL_VGPR <AV_512, 1>;
	defm SI_SPILL_AV1024 : SI_SPILL_VGPR <AV_1024, 1>;			defm SI_SPILL_AV1024 : SI_SPILL_VGPR <AV_1024, 1>;

				defm SI_SPILL_WWM_V32 : SI_SPILL_VGPR <VGPR_32>;

	def SI_PC_ADD_REL_OFFSET : SPseudoInstSI <			def SI_PC_ADD_REL_OFFSET : SPseudoInstSI <
	(outs SReg_64:$dst),			(outs SReg_64:$dst),
	(ins si_ga:$ptr_lo, si_ga:$ptr_hi),			(ins si_ga:$ptr_lo, si_ga:$ptr_hi),
	[(set SReg_64:$dst,			[(set SReg_64:$dst,
	(i64 (SIpc_add_rel_offset tglobaladdr:$ptr_lo, tglobaladdr:$ptr_hi)))]> {			(i64 (SIpc_add_rel_offset tglobaladdr:$ptr_lo, tglobaladdr:$ptr_hi)))]> {
	let Defs = [SCC];			let Defs = [SCC];
	}			}

	▲ Show 20 Lines • Show All 2,623 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp

Show All 14 Lines
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "AMDGPU.h"		#include "AMDGPU.h"
#include "GCNSubtarget.h"		#include "GCNSubtarget.h"
#include "MCTargetDesc/AMDGPUMCTargetDesc.h"		#include "MCTargetDesc/AMDGPUMCTargetDesc.h"
#include "SIMachineFunctionInfo.h"		#include "SIMachineFunctionInfo.h"
#include "llvm/CodeGen/LiveIntervals.h"		#include "llvm/CodeGen/LiveIntervals.h"
		#include "llvm/CodeGen/MachineDominators.h"
#include "llvm/CodeGen/MachineFrameInfo.h"		#include "llvm/CodeGen/MachineFrameInfo.h"
#include "llvm/CodeGen/RegisterScavenging.h"		#include "llvm/CodeGen/RegisterScavenging.h"
#include "llvm/InitializePasses.h"		#include "llvm/InitializePasses.h"

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "si-lower-sgpr-spills"		#define DEBUG_TYPE "si-lower-sgpr-spills"

using MBBVector = SmallVector<MachineBasicBlock *, 4>;		using MBBVector = SmallVector<MachineBasicBlock *, 4>;

namespace {		namespace {

class SILowerSGPRSpills : public MachineFunctionPass {		class SILowerSGPRSpills : public MachineFunctionPass {
private:		private:
const SIRegisterInfo *TRI = nullptr;		const SIRegisterInfo *TRI = nullptr;
const SIInstrInfo *TII = nullptr;		const SIInstrInfo *TII = nullptr;
LiveIntervals *LIS = nullptr;		LiveIntervals *LIS = nullptr;
SlotIndexes *Indexes = nullptr;		SlotIndexes *Indexes = nullptr;
		MachineDominatorTree *MDT = nullptr;

// Save and Restore blocks of the current function. Typically there is a		// Save and Restore blocks of the current function. Typically there is a
// single save block, unless Windows EH funclets are involved.		// single save block, unless Windows EH funclets are involved.
MBBVector SaveBlocks;		MBBVector SaveBlocks;
MBBVector RestoreBlocks;		MBBVector RestoreBlocks;

public:		public:
static char ID;		static char ID;

SILowerSGPRSpills() : MachineFunctionPass(ID) {}		SILowerSGPRSpills() : MachineFunctionPass(ID) {}

void calculateSaveRestoreBlocks(MachineFunction &MF);		void calculateSaveRestoreBlocks(MachineFunction &MF);
bool spillCalleeSavedRegs(MachineFunction &MF);		bool spillCalleeSavedRegs(MachineFunction &MF);
		void updateLaneVGPRDomInstr(
		int FI, MachineBasicBlock *MBB, MachineBasicBlock::iterator InsertPt,
		DenseMap<Register, MachineBasicBlock::iterator> &LaneVGPRDomInstr);

bool runOnMachineFunction(MachineFunction &MF) override;		bool runOnMachineFunction(MachineFunction &MF) override;

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
		AU.addRequired<MachineDominatorTree>();
		arsenmUnsubmitted Not Done Reply Inline Actions Is this introducing a new computation in the pass pipeline (I assume not since I don't see a pass pipeline test update) arsenm: Is this introducing a new computation in the pass pipeline (I assume not since I don't see a…
		cdevadasAuthorUnsubmitted Done Reply Inline Actions It isn't. cdevadas: It isn't.
AU.setPreservesAll();		AU.setPreservesAll();
MachineFunctionPass::getAnalysisUsage(AU);		MachineFunctionPass::getAnalysisUsage(AU);
}		}

		MachineFunctionProperties getClearedProperties() const override {
		return MachineFunctionProperties()
		.set(MachineFunctionProperties::Property::IsSSA)
		arsenmUnsubmitted Not Done Reply Inline Actions It shouldn't have been SSA to begin with ad this doesn't de-SSA arsenm: It shouldn't have been SSA to begin with ad this doesn't de-SSA
		yassinghUnsubmitted Not Done Reply Inline Actions Removing this line causes machine-verifier to crash in few tests. Any hints @cdevadas ? yassingh: Removing this line causes machine-verifier to crash in few tests. Any hints @cdevadas ?
		yassinghUnsubmitted Not Done Reply Inline Actions Removing this line works fine when running the whole pipeline as the compiler knows the code here is not in SSA form. However, when SILowerSGPRSpills and related passes are run in isolation the verifier assumes the code to be in SSA form(possibly a bug there, also we are introducing virtual vgprs maybe that's the reason). I can leave the line as it is or is there some way to update the test files to let the compiler know the input isn't SSA? I tried "isSSA: false", didn't work. yassingh: Removing this line works fine when running the whole pipeline as the compiler knows the code…
		cdevadasAuthorUnsubmitted Done Reply Inline Actions Seems reasonable to retain this line for now. The compiler might not be able to decide that this pass is run post phi-elimination and assume SSA form by default. There must be a serialized option to control it for MIR tests. cdevadas: Seems reasonable to retain this line for now. The compiler might not be able to decide that…
		yassinghUnsubmitted Not Done Reply Inline Actions Yeah, MIRParser::isSSA recomputes the SSA information and sets it to true, also it doesn't expose a way to override it. yassingh: Yeah, MIRParser::isSSA recomputes the SSA information and sets it to true, also it doesn't…
		arsenmUnsubmitted Not Done Reply Inline Actions this is kind of a mir parser bug arsenm: this is kind of a mir parser bug
		.set(MachineFunctionProperties::Property::NoVRegs);
		arsenmUnsubmitted Not Done Reply Inline Actions Add a comment explaining the new vregs? arsenm: Add a comment explaining the new vregs?
		}
};		};

} // end anonymous namespace		} // end anonymous namespace

char SILowerSGPRSpills::ID = 0;		char SILowerSGPRSpills::ID = 0;

INITIALIZE_PASS_BEGIN(SILowerSGPRSpills, DEBUG_TYPE,		INITIALIZE_PASS_BEGIN(SILowerSGPRSpills, DEBUG_TYPE,
"SI lower SGPR spill instructions", false, false)		"SI lower SGPR spill instructions", false, false)
INITIALIZE_PASS_DEPENDENCY(LiveIntervals)		INITIALIZE_PASS_DEPENDENCY(LiveIntervals)
INITIALIZE_PASS_DEPENDENCY(VirtRegMap)		INITIALIZE_PASS_DEPENDENCY(VirtRegMap)
		INITIALIZE_PASS_DEPENDENCY(MachineDominatorTree)
INITIALIZE_PASS_END(SILowerSGPRSpills, DEBUG_TYPE,		INITIALIZE_PASS_END(SILowerSGPRSpills, DEBUG_TYPE,
"SI lower SGPR spill instructions", false, false)		"SI lower SGPR spill instructions", false, false)

char &llvm::SILowerSGPRSpillsID = SILowerSGPRSpills::ID;		char &llvm::SILowerSGPRSpillsID = SILowerSGPRSpills::ID;

/// Insert spill code for the callee-saved registers used in the function.		/// Insert spill code for the callee-saved registers used in the function.
static void insertCSRSaves(MachineBasicBlock &SaveBlock,		static void insertCSRSaves(MachineBasicBlock &SaveBlock,
ArrayRef<CalleeSavedInfo> CSI, SlotIndexes *Indexes,		ArrayRef<CalleeSavedInfo> CSI, SlotIndexes *Indexes,
▲ Show 20 Lines • Show All 163 Lines • ▼ Show 20 Lines	if (!CSI.empty()) {
insertCSRRestores(*RestoreBlock, CSI, Indexes, LIS);		insertCSRRestores(*RestoreBlock, CSI, Indexes, LIS);
return true;		return true;
}		}
}		}

return false;		return false;
}		}

		void SILowerSGPRSpills::updateLaneVGPRDomInstr(
		int FI, MachineBasicBlock *MBB, MachineBasicBlock::iterator InsertPt,
		DenseMap<Register, MachineBasicBlock::iterator> &LaneVGPRDomInstr) {
		// For the Def of a virtual LaneVPGR to dominate all its uses, we should
		// insert an IMPLICIT_DEF before the dominating spill. Switching to a
		// depth first order doesn't really help since the machine function can be in
		// the unstructured control flow post-SSA. For each virtual register, hence
		arsenmUnsubmitted Not Done Reply Inline Actions Typo "the the". It's also not necessarily unstructured arsenm: Typo "the the". It's also not necessarily unstructured
		// finding the common dominator to get either the dominating spill or a block
		// dominating all spills. Is there a better way to handle it?
		arsenmUnsubmitted Not Done Reply Inline Actions Remove "Is there a better way to handle it?" arsenm: Remove "Is there a better way to handle it?"
		SIMachineFunctionInfo *FuncInfo =
		MBB->getParent()->getInfo<SIMachineFunctionInfo>();
		ArrayRef<SIRegisterInfo::SpilledReg> VGPRSpills =
		FuncInfo->getSGPRSpillToVGPRLanes(FI);
		Register PrevLaneVGPR;
		for (auto &Spill : VGPRSpills) {
		if (PrevLaneVGPR == Spill.VGPR)
		continue;

		PrevLaneVGPR = Spill.VGPR;
		auto I = LaneVGPRDomInstr.find(Spill.VGPR);
		if (Spill.Lane == 0 && I == LaneVGPRDomInstr.end()) {
		// Initially add the spill instruction itself for Insertion point.
		LaneVGPRDomInstr[Spill.VGPR] = InsertPt;
		} else {
		assert(I != LaneVGPRDomInstr.end());
		auto PrevInsertPt = I->second;
		MachineBasicBlock *DomMBB = PrevInsertPt->getParent();
		if (DomMBB == MBB) {
		// The insertion point earlier selected in a predecessor block whose
		// spills are currently being lowered. The earlier InsertPt would be
		// the one just before the block terminator and it should be changed
		// if we insert any new spill in it.
		if (MDT->dominates(&InsertPt, &PrevInsertPt))
		I->second = InsertPt;

		arsenmUnsubmitted Not Done Reply Inline Actions IsDominatesChecked is confusingly named and expresses the code not the intent. SeenSpillInBlock? arsenm: IsDominatesChecked is confusingly named and expresses the code not the intent. SeenSpillInBlock?
		continue;
		}

		// Find the common dominator block between PrevInsertPt and the
		// current spill.
		DomMBB = MDT->findNearestCommonDominator(DomMBB, MBB);
		if (DomMBB == MBB)
		I->second = InsertPt;
		else if (DomMBB != PrevInsertPt->getParent())
		I->second = &(*DomMBB->getFirstTerminator());
		}
		}
		}

		arsenmUnsubmitted Not Done Reply Inline Actions Extra ()s arsenm: Extra ()s
bool SILowerSGPRSpills::runOnMachineFunction(MachineFunction &MF) {		bool SILowerSGPRSpills::runOnMachineFunction(MachineFunction &MF) {
const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();		const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
TII = ST.getInstrInfo();		TII = ST.getInstrInfo();
TRI = &TII->getRegisterInfo();		TRI = &TII->getRegisterInfo();

LIS = getAnalysisIfAvailable<LiveIntervals>();		LIS = getAnalysisIfAvailable<LiveIntervals>();
Indexes = getAnalysisIfAvailable<SlotIndexes>();		Indexes = getAnalysisIfAvailable<SlotIndexes>();
		MDT = &getAnalysis<MachineDominatorTree>();

assert(SaveBlocks.empty() && RestoreBlocks.empty());		assert(SaveBlocks.empty() && RestoreBlocks.empty());

// First, expose any CSR SGPR spills. This is mostly the same as what PEI		// First, expose any CSR SGPR spills. This is mostly the same as what PEI
// does, but somewhat simpler.		// does, but somewhat simpler.
calculateSaveRestoreBlocks(MF);		calculateSaveRestoreBlocks(MF);
bool HasCSRs = spillCalleeSavedRegs(MF);		bool HasCSRs = spillCalleeSavedRegs(MF);

MachineFrameInfo &MFI = MF.getFrameInfo();		MachineFrameInfo &MFI = MF.getFrameInfo();
MachineRegisterInfo &MRI = MF.getRegInfo();
SIMachineFunctionInfo *FuncInfo = MF.getInfo<SIMachineFunctionInfo>();		SIMachineFunctionInfo *FuncInfo = MF.getInfo<SIMachineFunctionInfo>();

if (!MFI.hasStackObjects() && !HasCSRs) {		if (!MFI.hasStackObjects() && !HasCSRs) {
SaveBlocks.clear();		SaveBlocks.clear();
RestoreBlocks.clear();		RestoreBlocks.clear();
return false;		return false;
}		}

bool MadeChange = false;		bool MadeChange = false;
bool NewReservedRegs = false;

// TODO: CSR VGPRs will never be spilled to AGPRs. These can probably be		// TODO: CSR VGPRs will never be spilled to AGPRs. These can probably be
// handled as SpilledToReg in regular PrologEpilogInserter.		// handled as SpilledToReg in regular PrologEpilogInserter.
const bool HasSGPRSpillToVGPR = TRI->spillSGPRToVGPR() &&		const bool HasSGPRSpillToVGPR = TRI->spillSGPRToVGPR() &&
(HasCSRs \|\| FuncInfo->hasSpilledSGPRs());		(HasCSRs \|\| FuncInfo->hasSpilledSGPRs());
if (HasSGPRSpillToVGPR) {		if (HasSGPRSpillToVGPR) {
// Process all SGPR spills before frame offsets are finalized. Ideally SGPRs		// Process all SGPR spills before frame offsets are finalized. Ideally SGPRs
// are spilled to VGPRs, in which case we can eliminate the stack usage.		// are spilled to VGPRs, in which case we can eliminate the stack usage.
//		//
// This operates under the assumption that only other SGPR spills are users		// This operates under the assumption that only other SGPR spills are users
// of the frame index.		// of the frame index.

// To track the spill frame indices handled in this pass.		// To track the spill frame indices handled in this pass.
BitVector SpillFIs(MFI.getObjectIndexEnd(), false);		BitVector SpillFIs(MFI.getObjectIndexEnd(), false);

		// To track the IMPLICIT_DEF insertion point for the lane vgprs.
		DenseMap<Register, MachineBasicBlock::iterator> LaneVGPRDomInstr;
		arsenmUnsubmitted Not Done Reply Inline Actions Seems worthwhile for this to be its own real function instead of a lambda arsenm: Seems worthwhile for this to be its own real function instead of a lambda
		cdevadasAuthorUnsubmitted Done Reply Inline Actions Yes indeed. Will do. cdevadas: Yes indeed. Will do.

for (MachineBasicBlock &MBB : MF) {		for (MachineBasicBlock &MBB : MF) {
for (MachineInstr &MI : llvm::make_early_inc_range(MBB)) {		for (MachineInstr &MI : llvm::make_early_inc_range(MBB)) {
if (!TII->isSGPRSpill(MI))		if (!TII->isSGPRSpill(MI))
continue;		continue;

int FI = TII->getNamedOperand(MI, AMDGPU::OpName::addr)->getIndex();		int FI = TII->getNamedOperand(MI, AMDGPU::OpName::addr)->getIndex();
assert(MFI.getStackID(FI) == TargetStackID::SGPRSpill);		assert(MFI.getStackID(FI) == TargetStackID::SGPRSpill);
		MachineInstrSpan MIS(&MI, &MBB);
if (FuncInfo->allocateSGPRSpillToVGPRLane(MF, FI)) {		if (FuncInfo->allocateSGPRSpillToVGPRLane(MF, FI)) {
NewReservedRegs = true;
bool Spilled = TRI->eliminateSGPRToVGPRSpillFrameIndex(		bool Spilled = TRI->eliminateSGPRToVGPRSpillFrameIndex(
MI, FI, nullptr, Indexes, LIS);		MI, FI, nullptr, Indexes, LIS);
(void)Spilled;		(void)Spilled;
assert(Spilled && "failed to spill SGPR to VGPR when allocated");		assert(Spilled && "failed to spill SGPR to VGPR when allocated");
SpillFIs.set(FI);		SpillFIs.set(FI);
		updateLaneVGPRDomInstr(FI, &MBB, MIS.begin(), LaneVGPRDomInstr);
}		}
}		}
}		}

// FIXME: Adding to live-ins redundant with reserving registers.		for (auto Reg : FuncInfo->getSGPRSpillVGPRs()) {
for (MachineBasicBlock &MBB : MF) {		auto InsertPt = LaneVGPRDomInstr[Reg];
for (auto Reg : FuncInfo->getSGPRSpillVGPRs())		// Insert the IMPLICIT_DEF at the identified points.
		arsenmUnsubmitted Not Done Reply Inline Actions This could be the end iterator arsenm: This could be the end iterator
MBB.addLiveIn(Reg);		auto MIB =
MBB.sortUniqueLiveIns();		BuildMI(InsertPt->getParent(), InsertPt, InsertPt->getDebugLoc(),
		TII->get(AMDGPU::IMPLICIT_DEF), Reg);
		arsenmUnsubmitted Not Done Reply Inline Actions It might be worth adding a target comment flag for this implicit def to comment it's for SGPR spilling arsenm: It might be worth adding a target comment flag for this implicit def to comment it's for SGPR…
		cdevadasAuthorUnsubmitted Done Reply Inline Actions I don't think we can do any special handling for them at assembly emission. IMPLICIT_DEF is handled/printed by the generic part of AsmPrinter and it won't reach the target-specific emitInstruction at all. cdevadas: I don't think we can do any special handling for them at assembly emission. IMPLICIT_DEF is…
		arsenmUnsubmitted Not Done Reply Inline Actions You don't need to specially handle the instruction, see AsmPrinterFlags arsenm: You don't need to specially handle the instruction, see AsmPrinterFlags
		yassinghUnsubmitted Not Done Reply Inline Actions Tried adding a new flag here D153754 yassingh: Tried adding a new flag here D153754
		FuncInfo->setFlag(Reg, AMDGPU::VirtRegFlag::WWM_REG);
		if (LIS) {
		LIS->InsertMachineInstrInMaps(*MIB);
		arsenmUnsubmitted Not Done Reply Inline Actions Typo " in case if multiple spills" arsenm: Typo " in case if multiple spills"
		LIS->createAndComputeVirtRegInterval(Reg);
		}
		}

		for (MachineBasicBlock &MBB : MF) {
// FIXME: The dead frame indices are replaced with a null register from		// FIXME: The dead frame indices are replaced with a null register from
// the debug value instructions. We should instead, update it with the		// the debug value instructions. We should instead, update it with the
// correct register value. But not sure the register value alone is		// correct register value. But not sure the register value alone is
// adequate to lower the DIExpression. It should be worked out later.		// adequate to lower the DIExpression. It should be worked out later.
for (MachineInstr &MI : MBB) {		for (MachineInstr &MI : MBB) {
if (MI.isDebugValue() && MI.getOperand(0).isFI() &&		if (MI.isDebugValue() && MI.getOperand(0).isFI() &&
!MFI.isFixedObjectIndex(MI.getOperand(0).getIndex()) &&		!MFI.isFixedObjectIndex(MI.getOperand(0).getIndex()) &&
SpillFIs[MI.getOperand(0).getIndex()]) {		SpillFIs[MI.getOperand(0).getIndex()]) {
MI.getOperand(0).ChangeToRegister(Register(), false /isDef/);		MI.getOperand(0).ChangeToRegister(Register(), false /isDef/);
}		}
}		}
}		}

// All those frame indices which are dead by now should be removed from the		// All those frame indices which are dead by now should be removed from the
// function frame. Otherwise, there is a side effect such as re-mapping of		// function frame. Otherwise, there is a side effect such as re-mapping of
// free frame index ids by the later pass(es) like "stack slot coloring"		// free frame index ids by the later pass(es) like "stack slot coloring"
// which in turn could mess-up with the book keeping of "frame index to VGPR		// which in turn could mess-up with the book keeping of "frame index to VGPR
// lane".		// lane".
FuncInfo->removeDeadFrameIndices(MFI, /ResetSGPRSpillStackIDs/ false);		FuncInfo->removeDeadFrameIndices(MFI, /ResetSGPRSpillStackIDs/ false);

		MachineRegisterInfo &MRI = MF.getRegInfo();
		const TargetRegisterClass *RC =
		ST.isWave32() ? &AMDGPU::SGPR_32RegClass : &AMDGPU::SGPR_64RegClass;
		arsenmUnsubmitted Not Done Reply Inline Actions Should implement MachineFunctionPass::getClearedProperties instead of clearing these here arsenm: Should implement MachineFunctionPass::getClearedProperties instead of clearing these here
		cdevadasAuthorUnsubmitted Done Reply Inline Actions Will do. cdevadas: Will do.
		// Shift back the reserved SGPR for EXEC copy into the lowest range.
		// This SGPR is reserved to handle the whole-wave spill/copy operations
		// that might get inserted during vgpr regalloc.
		Register UnusedLowSGPR = TRI->findUnusedRegister(MRI, RC, MF);
		if (UnusedLowSGPR && TRI->getHWRegIndex(UnusedLowSGPR) <
		TRI->getHWRegIndex(FuncInfo->getSGPRForEXECCopy()))
		FuncInfo->setSGPRForEXECCopy(UnusedLowSGPR);

MadeChange = true;		MadeChange = true;
		} else {
		// No SGPR spills and hence there won't be any WWM spills/copies. Reset the
		// SGPR reserved for EXEC copy.
		FuncInfo->setSGPRForEXECCopy(AMDGPU::NoRegister);
}		}

SaveBlocks.clear();		SaveBlocks.clear();
RestoreBlocks.clear();		RestoreBlocks.clear();

// Updated the reserved registers with any VGPRs added for SGPR spills.
if (NewReservedRegs)
MRI.freezeReservedRegs(MF);

return MadeChange;		return MadeChange;
}		}

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h

Show First 20 Lines • Show All 294 Lines • ▼ Show 20 Lines	struct SIMachineFunctionInfo final : public yaml::MachineFunctionInfo {

unsigned BytesInStackArgArea = 0;		unsigned BytesInStackArgArea = 0;
bool ReturnsVoid = true;		bool ReturnsVoid = true;

Optional<SIArgumentInfo> ArgInfo;		Optional<SIArgumentInfo> ArgInfo;
SIMode Mode;		SIMode Mode;
Optional<FrameIndex> ScavengeFI;		Optional<FrameIndex> ScavengeFI;
StringValue VGPRForAGPRCopy;		StringValue VGPRForAGPRCopy;
		StringValue SGPRForEXECCopy;

SIMachineFunctionInfo() = default;		SIMachineFunctionInfo() = default;
SIMachineFunctionInfo(const llvm::SIMachineFunctionInfo &,		SIMachineFunctionInfo(const llvm::SIMachineFunctionInfo &,
const TargetRegisterInfo &TRI,		const TargetRegisterInfo &TRI,
const llvm::MachineFunction &MF);		const llvm::MachineFunction &MF);

void mappingImpl(yaml::IO &YamlIO) override;		void mappingImpl(yaml::IO &YamlIO) override;
~SIMachineFunctionInfo() = default;		~SIMachineFunctionInfo() = default;
Show All 25 Lines	static void mapping(IO &YamlIO, SIMachineFunctionInfo &MFI) {
YamlIO.mapOptional("mode", MFI.Mode, SIMode());		YamlIO.mapOptional("mode", MFI.Mode, SIMode());
YamlIO.mapOptional("highBitsOf32BitAddress",		YamlIO.mapOptional("highBitsOf32BitAddress",
MFI.HighBitsOf32BitAddress, 0u);		MFI.HighBitsOf32BitAddress, 0u);
YamlIO.mapOptional("occupancy", MFI.Occupancy, 0);		YamlIO.mapOptional("occupancy", MFI.Occupancy, 0);
YamlIO.mapOptional("wwmReservedRegs", MFI.WWMReservedRegs);		YamlIO.mapOptional("wwmReservedRegs", MFI.WWMReservedRegs);
YamlIO.mapOptional("scavengeFI", MFI.ScavengeFI);		YamlIO.mapOptional("scavengeFI", MFI.ScavengeFI);
YamlIO.mapOptional("vgprForAGPRCopy", MFI.VGPRForAGPRCopy,		YamlIO.mapOptional("vgprForAGPRCopy", MFI.VGPRForAGPRCopy,
StringValue()); // Don't print out when it's empty.		StringValue()); // Don't print out when it's empty.
		YamlIO.mapOptional("sgprForEXECCopy", MFI.SGPRForEXECCopy,
		StringValue()); // Don't print out when it's empty.
}		}
};		};

} // end namespace yaml		} // end namespace yaml

// A CSR SGPR value can be preserved inside a callee using one of the following		// A CSR SGPR value can be preserved inside a callee using one of the following
// methods.		// methods.
// 1. Copy to an unused scratch SGPR.		// 1. Copy to an unused scratch SGPR.
Show All 20 Lines	PrologEpilogSGPRSaveRestoreInfo(SGPRSaveKind K, Register R)
: Kind(K), Reg(R) {}		: Kind(K), Reg(R) {}
Register getReg() const { return Reg; }		Register getReg() const { return Reg; }
int getIndex() const { return Index; }		int getIndex() const { return Index; }
SGPRSaveKind getKind() const { return Kind; }		SGPRSaveKind getKind() const { return Kind; }
};		};

/// This class keeps track of the SPI_SP_INPUT_ADDR config register, which		/// This class keeps track of the SPI_SP_INPUT_ADDR config register, which
/// tells the hardware which interpolation parameters to load.		/// tells the hardware which interpolation parameters to load.
class SIMachineFunctionInfo final : public AMDGPUMachineFunction {		class SIMachineFunctionInfo final : public AMDGPUMachineFunction,
		private MachineRegisterInfo::Delegate {
friend class GCNTargetMachine;		friend class GCNTargetMachine;

// State of MODE register, assumed FP mode.		// State of MODE register, assumed FP mode.
AMDGPU::SIModeRegisterDefaults Mode;		AMDGPU::SIModeRegisterDefaults Mode;

// Registers that may be reserved for spilling purposes. These may be the same		// Registers that may be reserved for spilling purposes. These may be the same
// as the input registers.		// as the input registers.
Register ScratchRSrcReg = AMDGPU::PRIVATE_RSRC_REG;		Register ScratchRSrcReg = AMDGPU::PRIVATE_RSRC_REG;
▲ Show 20 Lines • Show All 83 Lines • ▼ Show 20 Lines	private:

// The hard-wired high half of the address of the global information table		// The hard-wired high half of the address of the global information table
// for AMDPAL OS type. 0xffffffff represents no hard-wired high half, since		// for AMDPAL OS type. 0xffffffff represents no hard-wired high half, since
// current hardware only allows a 16 bit value.		// current hardware only allows a 16 bit value.
unsigned GITPtrHigh;		unsigned GITPtrHigh;

unsigned HighBitsOf32BitAddress;		unsigned HighBitsOf32BitAddress;

		// Flags associated with the virtual registers.
		IndexedMap<uint8_t, VirtReg2IndexFunctor> VRegFlags;

// Current recorded maximum possible occupancy.		// Current recorded maximum possible occupancy.
unsigned Occupancy;		unsigned Occupancy;

mutable Optional<bool> UsesAGPRs;		mutable Optional<bool> UsesAGPRs;

MCPhysReg getNextUserSGPR() const;		MCPhysReg getNextUserSGPR() const;
		arsenmUnsubmitted Not Done Reply Inline Actions I don't like having state here for a single operation that's happening in one pass and isn't valid for multiple uses. I don't really understand how this is being set and passed around arsenm: I don't like having state here for a single operation that's happening in one pass and isn't…
		cdevadasAuthorUnsubmitted Done Reply Inline Actions CurrentVRegSpilled is needed to track the virtual register (Liverange) for which the physical register was assigned. And it is needed only for fast regalloc . We need this mapping to correctly track the WWM spills as RegAllocFast spills/restore the physical registers directly as there is no VRM. This will be appropriately set with the delegate MRI_NoteVirtualRegisterSpill which is inserted in the RegAllocFast spill/reload functions. SIMachineFunctionInfo is where the delegates are currently handled and I don't have a better place to move it. cdevadas: CurrentVRegSpilled is needed to track the virtual register (Liverange) for which the physical…

MCPhysReg getNextSystemSGPR() const;		MCPhysReg getNextSystemSGPR() const;

		// MachineRegisterInfo callback functions to notify events.
		void MRI_NoteNewVirtualRegister(Register Reg) override;
		void MRI_NotecloneVirtualRegister(Register NewReg, Register SrcReg) override;

public:		public:
struct VGPRSpillToAGPR {		struct VGPRSpillToAGPR {
SmallVector<MCPhysReg, 32> Lanes;		SmallVector<MCPhysReg, 32> Lanes;
bool FullyAllocated = false;		bool FullyAllocated = false;
bool IsDead = false;		bool IsDead = false;
};		};

private:		private:
// To track VGPR + lane index for each subregister of the SGPR spilled to		// To track virtual VGPR + lane index for each subregister of the SGPR spilled
// frameindex key during SILowerSGPRSpills pass.		// to frameindex key during SILowerSGPRSpills pass.
DenseMap<int, std::vector<SIRegisterInfo::SpilledReg>> SGPRSpillToVGPRLanes;		DenseMap<int, std::vector<SIRegisterInfo::SpilledReg>> SGPRSpillToVGPRLanes;
// To track VGPR + lane index for spilling special SGPRs like Frame Pointer		// To track physical VGPR + lane index for spilling special SGPRs like Frame
// identified during PrologEpilogInserter.		// Pointer identified during PrologEpilogInserter.
DenseMap<int, std::vector<SIRegisterInfo::SpilledReg>>		DenseMap<int, std::vector<SIRegisterInfo::SpilledReg>>
PrologEpilogSGPRSpillToVGPRLanes;		PrologEpilogSGPRSpillToVGPRLanes;
unsigned NumVGPRSpillLanes = 0;		unsigned NumVGPRSpillLanes = 0;
unsigned NumVGPRPrologEpilogSpillLanes = 0;		unsigned NumVGPRPrologEpilogSpillLanes = 0;
SmallVector<Register, 2> SpillVGPRs;		SmallVector<Register, 2> SpillVGPRs;
using WWMSpillsMap = MapVector<Register, int>;		using WWMSpillsMap = MapVector<Register, int>;
// To track the registers used in instructions that can potentially modify the		// To track the registers used in instructions that can potentially modify the
// inactive lanes. The WWM instructions and the writelane instructions for		// inactive lanes. The WWM instructions and the writelane instructions for
Show All 13 Lines	private:
using PrologEpilogSGPRSpillsMap =		using PrologEpilogSGPRSpillsMap =
DenseMap<Register, PrologEpilogSGPRSaveRestoreInfo>;		DenseMap<Register, PrologEpilogSGPRSaveRestoreInfo>;
// To track the SGPR spill method used for a CSR SGPR register during		// To track the SGPR spill method used for a CSR SGPR register during
// frame lowering. Even though the SGPR spills are handled during		// frame lowering. Even though the SGPR spills are handled during
// SILowerSGPRSpills pass, some special handling needed later during the		// SILowerSGPRSpills pass, some special handling needed later during the
// PrologEpilogInserter.		// PrologEpilogInserter.
PrologEpilogSGPRSpillsMap PrologEpilogSGPRSpills;		PrologEpilogSGPRSpillsMap PrologEpilogSGPRSpills;

		// To save/restore EXEC MASK around WWM spills and copies.
		Register SGPRForEXECCopy;

DenseMap<int, VGPRSpillToAGPR> VGPRToAGPRSpills;		DenseMap<int, VGPRSpillToAGPR> VGPRToAGPRSpills;

// AGPRs used for VGPR spills.		// AGPRs used for VGPR spills.
SmallVector<MCPhysReg, 32> SpillAGPR;		SmallVector<MCPhysReg, 32> SpillAGPR;

// VGPRs used for AGPR spills.		// VGPRs used for AGPR spills.
SmallVector<MCPhysReg, 32> SpillVGPR;		SmallVector<MCPhysReg, 32> SpillVGPR;

▲ Show 20 Lines • Show All 107 Lines • ▼ Show 20 Lines	public:
ArrayRef<SIRegisterInfo::SpilledReg>		ArrayRef<SIRegisterInfo::SpilledReg>
getPrologEpilogSGPRSpillToVGPRLanes(int FrameIndex) const {		getPrologEpilogSGPRSpillToVGPRLanes(int FrameIndex) const {
auto I = PrologEpilogSGPRSpillToVGPRLanes.find(FrameIndex);		auto I = PrologEpilogSGPRSpillToVGPRLanes.find(FrameIndex);
return (I == PrologEpilogSGPRSpillToVGPRLanes.end())		return (I == PrologEpilogSGPRSpillToVGPRLanes.end())
? ArrayRef<SIRegisterInfo::SpilledReg>()		? ArrayRef<SIRegisterInfo::SpilledReg>()
: makeArrayRef(I->second);		: makeArrayRef(I->second);
}		}

		void setFlag(Register Reg, uint8_t Flag) {
		assert(Reg.isVirtual());
		arsenmUnsubmitted Not Done Reply Inline Actions Reg.isVirtual() arsenm: Reg.isVirtual()
		if (VRegFlags.inBounds(Reg))
		VRegFlags[Reg] \|= (uint8_t)1 << Flag;
		}

		bool checkFlag(const MachineRegisterInfo &MRI, Register Reg,
		uint8_t Flag) const {
		arsenmUnsubmitted Not Done Reply Inline Actions Reg.isVirtual() arsenm: Reg.isVirtual()
		if (!Reg.isVirtual()) {
		// See if a virtReg is available for the physReg. If found, check the
		// flags of the virtual register.
		Register VirtReg = MRI.getPhysToCurrentVirtReg();
		if (!VirtReg)
		return false;

		Reg = VirtReg;
		}

		return VRegFlags.inBounds(Reg) && VRegFlags[Reg] & ((uint8_t)1 << Flag);
		}

void allocateWWMSpill(MachineFunction &MF, Register VGPR, uint64_t Size = 4,		void allocateWWMSpill(MachineFunction &MF, Register VGPR, uint64_t Size = 4,
Align Alignment = Align(4));		Align Alignment = Align(4));

void splitWWMSpillRegisters(		void splitWWMSpillRegisters(
MachineFunction &MF,		MachineFunction &MF,
SmallVectorImpl<std::pair<Register, int>> &CalleeSavedRegs,		SmallVectorImpl<std::pair<Register, int>> &CalleeSavedRegs,
SmallVectorImpl<std::pair<Register, int>> &ScratchRegs) const;		SmallVectorImpl<std::pair<Register, int>> &ScratchRegs) const;

ArrayRef<MCPhysReg> getAGPRSpillVGPRs() const {		ArrayRef<MCPhysReg> getAGPRSpillVGPRs() const {
return SpillAGPR;		return SpillAGPR;
}		}

		Register getSGPRForEXECCopy() const { return SGPRForEXECCopy; }

		void setSGPRForEXECCopy(Register Reg) { SGPRForEXECCopy = Reg; }

ArrayRef<MCPhysReg> getVGPRSpillAGPRs() const {		ArrayRef<MCPhysReg> getVGPRSpillAGPRs() const {
return SpillVGPR;		return SpillVGPR;
}		}

MCPhysReg getVGPRToAGPRSpill(int FrameIndex, unsigned Lane) const {		MCPhysReg getVGPRToAGPRSpill(int FrameIndex, unsigned Lane) const {
auto I = VGPRToAGPRSpills.find(FrameIndex);		auto I = VGPRToAGPRSpills.find(FrameIndex);
return (I == VGPRToAGPRSpills.end()) ? (MCPhysReg)AMDGPU::NoRegister		return (I == VGPRToAGPRSpills.end()) ? (MCPhysReg)AMDGPU::NoRegister
: I->second.Lanes[Lane];		: I->second.Lanes[Lane];
▲ Show 20 Lines • Show All 418 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp

Show First 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	SIMachineFunctionInfo::SIMachineFunctionInfo(const MachineFunction &MF)
const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();		const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
const Function &F = MF.getFunction();		const Function &F = MF.getFunction();
FlatWorkGroupSizes = ST.getFlatWorkGroupSizes(F);		FlatWorkGroupSizes = ST.getFlatWorkGroupSizes(F);
WavesPerEU = ST.getWavesPerEU(F);		WavesPerEU = ST.getWavesPerEU(F);

Occupancy = ST.computeOccupancy(F, getLDSSize());		Occupancy = ST.computeOccupancy(F, getLDSSize());
CallingConv::ID CC = F.getCallingConv();		CallingConv::ID CC = F.getCallingConv();

		const_cast<MachineFunction &>(MF).getRegInfo().addDelegate(this);
		VRegFlags.reserve(256);

// FIXME: Should have analysis or something rather than attribute to detect		// FIXME: Should have analysis or something rather than attribute to detect
// calls.		// calls.
const bool HasCalls = F.hasFnAttribute("amdgpu-calls");		const bool HasCalls = F.hasFnAttribute("amdgpu-calls");

const bool IsKernel = CC == CallingConv::AMDGPU_KERNEL \|\|		const bool IsKernel = CC == CallingConv::AMDGPU_KERNEL \|\|
CC == CallingConv::SPIR_KERNEL;		CC == CallingConv::SPIR_KERNEL;

if (IsKernel) {		if (IsKernel) {
▲ Show 20 Lines • Show All 231 Lines • ▼ Show 20 Lines	bool SIMachineFunctionInfo::isCalleeSavedReg(const MCPhysReg *CSRegs,
}		}

return false;		return false;
}		}

bool SIMachineFunctionInfo::allocateVGPRForSGPRSpills(MachineFunction &MF,		bool SIMachineFunctionInfo::allocateVGPRForSGPRSpills(MachineFunction &MF,
int FI,		int FI,
unsigned LaneIndex) {		unsigned LaneIndex) {
const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
const SIRegisterInfo *TRI = ST.getRegisterInfo();
MachineRegisterInfo &MRI = MF.getRegInfo();		MachineRegisterInfo &MRI = MF.getRegInfo();
Register LaneVGPR;		Register LaneVGPR;
if (!LaneIndex) {		if (!LaneIndex) {
LaneVGPR = TRI->findUnusedRegister(MRI, &AMDGPU::VGPR_32RegClass, MF);		LaneVGPR = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
		arsenmUnsubmitted Not Done Reply Inline Actions As part of the follow up to allow spill slot sharing, I think we can move all of this allocation stuff out of SIMachineFunctionInfo and into SILowerSGPRSpills arsenm: As part of the follow up to allow spill slot sharing, I think we can move all of this…
		cdevadasAuthorUnsubmitted Done Reply Inline Actions Ya, will try to move it entirely out of SIMachineFunctionInfo. cdevadas: Ya, will try to move it entirely out of SIMachineFunctionInfo.
if (LaneVGPR == AMDGPU::NoRegister) {
// We have no VGPRs left for spilling SGPRs. Reset because we will not
// partially spill the SGPR to VGPRs.
SGPRSpillToVGPRLanes.erase(FI);
return false;
}

SpillVGPRs.push_back(LaneVGPR);		SpillVGPRs.push_back(LaneVGPR);
// Add this register as live-in to all blocks to avoid machine verifier
// complaining about use of an undefined physical register.
for (MachineBasicBlock &BB : MF)
BB.addLiveIn(LaneVGPR);
} else {		} else {
LaneVGPR = SpillVGPRs.back();		LaneVGPR = SpillVGPRs.back();
}		}

SGPRSpillToVGPRLanes[FI].push_back(		SGPRSpillToVGPRLanes[FI].push_back(
SIRegisterInfo::SpilledReg(LaneVGPR, LaneIndex));		SIRegisterInfo::SpilledReg(LaneVGPR, LaneIndex));
return true;		return true;
}		}
▲ Show 20 Lines • Show All 190 Lines • ▼ Show 20 Lines	MCPhysReg SIMachineFunctionInfo::getNextUserSGPR() const {
assert(NumSystemSGPRs == 0 && "System SGPRs must be added after user SGPRs");		assert(NumSystemSGPRs == 0 && "System SGPRs must be added after user SGPRs");
return AMDGPU::SGPR0 + NumUserSGPRs;		return AMDGPU::SGPR0 + NumUserSGPRs;
}		}

MCPhysReg SIMachineFunctionInfo::getNextSystemSGPR() const {		MCPhysReg SIMachineFunctionInfo::getNextSystemSGPR() const {
return AMDGPU::SGPR0 + NumUserSGPRs + NumSystemSGPRs;		return AMDGPU::SGPR0 + NumUserSGPRs + NumSystemSGPRs;
}		}

		void SIMachineFunctionInfo::MRI_NoteNewVirtualRegister(Register Reg) {
		VRegFlags.grow(Reg);
		}

		void SIMachineFunctionInfo::MRI_NotecloneVirtualRegister(Register NewReg,
		Register SrcReg) {
		VRegFlags.grow(NewReg);
		VRegFlags[NewReg] = VRegFlags[SrcReg];
		}

Register		Register
SIMachineFunctionInfo::getGITPtrLoReg(const MachineFunction &MF) const {		SIMachineFunctionInfo::getGITPtrLoReg(const MachineFunction &MF) const {
const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();		const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
if (!ST.isAmdPalOS())		if (!ST.isAmdPalOS())
return Register();		return Register();
Register GitPtrLo = AMDGPU::SGPR0; // Low GIT address passed in		Register GitPtrLo = AMDGPU::SGPR0; // Low GIT address passed in
if (ST.hasMergedShaders()) {		if (ST.hasMergedShaders()) {
switch (MF.getFunction().getCallingConv()) {		switch (MF.getFunction().getCallingConv()) {
▲ Show 20 Lines • Show All 91 Lines • ▼ Show 20 Lines	: ExplicitKernArgSize(MFI.getExplicitKernArgSize()),
BytesInStackArgArea(MFI.getBytesInStackArgArea()),		BytesInStackArgArea(MFI.getBytesInStackArgArea()),
ReturnsVoid(MFI.returnsVoid()),		ReturnsVoid(MFI.returnsVoid()),
ArgInfo(convertArgumentInfo(MFI.getArgInfo(), TRI)), Mode(MFI.getMode()) {		ArgInfo(convertArgumentInfo(MFI.getArgInfo(), TRI)), Mode(MFI.getMode()) {
for (Register Reg : MFI.getWWMReservedRegs())		for (Register Reg : MFI.getWWMReservedRegs())
WWMReservedRegs.push_back(regToString(Reg, TRI));		WWMReservedRegs.push_back(regToString(Reg, TRI));

if (MFI.getVGPRForAGPRCopy())		if (MFI.getVGPRForAGPRCopy())
VGPRForAGPRCopy = regToString(MFI.getVGPRForAGPRCopy(), TRI);		VGPRForAGPRCopy = regToString(MFI.getVGPRForAGPRCopy(), TRI);

		if (MFI.getSGPRForEXECCopy())
		SGPRForEXECCopy = regToString(MFI.getSGPRForEXECCopy(), TRI);

auto SFI = MFI.getOptionalScavengeFI();		auto SFI = MFI.getOptionalScavengeFI();
if (SFI)		if (SFI)
ScavengeFI = yaml::FrameIndex(*SFI, MF.getFrameInfo());		ScavengeFI = yaml::FrameIndex(*SFI, MF.getFrameInfo());
}		}

void yaml::SIMachineFunctionInfo::mappingImpl(yaml::IO &YamlIO) {		void yaml::SIMachineFunctionInfo::mappingImpl(yaml::IO &YamlIO) {
MappingTraits<SIMachineFunctionInfo>::mapping(YamlIO, *this);		MappingTraits<SIMachineFunctionInfo>::mapping(YamlIO, *this);
}		}
▲ Show 20 Lines • Show All 111 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp

Show First 20 Lines • Show All 640 Lines • ▼ Show 20 Lines	BitVector SIRegisterInfo::getReservedRegs(const MachineFunction &MF) const {
}		}

if (hasBasePointer(MF)) {		if (hasBasePointer(MF)) {
MCRegister BasePtrReg = getBaseRegister();		MCRegister BasePtrReg = getBaseRegister();
reserveRegisterTuples(Reserved, BasePtrReg);		reserveRegisterTuples(Reserved, BasePtrReg);
assert(!isSubRegister(ScratchRSrcReg, BasePtrReg));		assert(!isSubRegister(ScratchRSrcReg, BasePtrReg));
}		}

		// SGPR used to preserve EXEC MASK around WWM spill/copy instructions.
		Register ExecCopyReg = MFI->getSGPRForEXECCopy();
		if (ExecCopyReg)
		arsenmUnsubmitted Not Done Reply Inline Actions Isn't this always required? arsenm: Isn't this always required?
		cdevadasAuthorUnsubmitted Done Reply Inline Actions No. They are reserved only if RA inserts any whole wave spill. cdevadas: No. They are reserved only if RA inserts any whole wave spill.
		reserveRegisterTuples(Reserved, ExecCopyReg);

// Reserve VGPRs/AGPRs.		// Reserve VGPRs/AGPRs.
//		//
unsigned MaxNumVGPRs = ST.getMaxNumVGPRs(MF);		unsigned MaxNumVGPRs = ST.getMaxNumVGPRs(MF);
unsigned MaxNumAGPRs = MaxNumVGPRs;		unsigned MaxNumAGPRs = MaxNumVGPRs;
unsigned TotalNumVGPRs = AMDGPU::VGPR_32RegClass.getNumRegs();		unsigned TotalNumVGPRs = AMDGPU::VGPR_32RegClass.getNumRegs();

// Reserve all the AGPRs if there are no instructions to use it.		// Reserve all the AGPRs if there are no instructions to use it.
if (!ST.hasMAIInsts()) {		if (!ST.hasMAIInsts()) {
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	BitVector SIRegisterInfo::getReservedRegs(const MachineFunction &MF) const {

// FIXME: Stop using reserved registers for this.		// FIXME: Stop using reserved registers for this.
for (MCPhysReg Reg : MFI->getAGPRSpillVGPRs())		for (MCPhysReg Reg : MFI->getAGPRSpillVGPRs())
reserveRegisterTuples(Reserved, Reg);		reserveRegisterTuples(Reserved, Reg);

for (MCPhysReg Reg : MFI->getVGPRSpillAGPRs())		for (MCPhysReg Reg : MFI->getVGPRSpillAGPRs())
reserveRegisterTuples(Reserved, Reg);		reserveRegisterTuples(Reserved, Reg);

for (auto Reg : MFI->getSGPRSpillVGPRs())
reserveRegisterTuples(Reserved, Reg);

return Reserved;		return Reserved;
}		}

bool SIRegisterInfo::isAsmClobberable(const MachineFunction &MF,		bool SIRegisterInfo::isAsmClobberable(const MachineFunction &MF,
MCRegister PhysReg) const {		MCRegister PhysReg) const {
return !MF.getRegInfo().isReserved(PhysReg);		return !MF.getRegInfo().isReserved(PhysReg);
}		}

▲ Show 20 Lines • Show All 299 Lines • ▼ Show 20 Lines	static unsigned getNumSubRegsForSpillOp(unsigned Op) {
case AMDGPU::SI_SPILL_S32_SAVE:		case AMDGPU::SI_SPILL_S32_SAVE:
case AMDGPU::SI_SPILL_S32_RESTORE:		case AMDGPU::SI_SPILL_S32_RESTORE:
case AMDGPU::SI_SPILL_V32_SAVE:		case AMDGPU::SI_SPILL_V32_SAVE:
case AMDGPU::SI_SPILL_V32_RESTORE:		case AMDGPU::SI_SPILL_V32_RESTORE:
case AMDGPU::SI_SPILL_A32_SAVE:		case AMDGPU::SI_SPILL_A32_SAVE:
case AMDGPU::SI_SPILL_A32_RESTORE:		case AMDGPU::SI_SPILL_A32_RESTORE:
case AMDGPU::SI_SPILL_AV32_SAVE:		case AMDGPU::SI_SPILL_AV32_SAVE:
case AMDGPU::SI_SPILL_AV32_RESTORE:		case AMDGPU::SI_SPILL_AV32_RESTORE:
		case AMDGPU::SI_SPILL_WWM_V32_SAVE:
		case AMDGPU::SI_SPILL_WWM_V32_RESTORE:
return 1;		return 1;
default: llvm_unreachable("Invalid spill opcode");		default: llvm_unreachable("Invalid spill opcode");
}		}
}		}

static int getOffsetMUBUFStore(unsigned Opc) {		static int getOffsetMUBUFStore(unsigned Opc) {
switch (Opc) {		switch (Opc) {
case AMDGPU::BUFFER_STORE_DWORD_OFFEN:		case AMDGPU::BUFFER_STORE_DWORD_OFFEN:
▲ Show 20 Lines • Show All 916 Lines • ▼ Show 20 Lines	bool SIRegisterInfo::eliminateSGPRToVGPRSpillFrameIndex(
case AMDGPU::SI_SPILL_S64_RESTORE:		case AMDGPU::SI_SPILL_S64_RESTORE:
case AMDGPU::SI_SPILL_S32_RESTORE:		case AMDGPU::SI_SPILL_S32_RESTORE:
return restoreSGPR(MI, FI, RS, Indexes, LIS, true);		return restoreSGPR(MI, FI, RS, Indexes, LIS, true);
default:		default:
llvm_unreachable("not an SGPR spill instruction");		llvm_unreachable("not an SGPR spill instruction");
}		}
}		}

		static void insertScratchExecCopy(MachineFunction &MF, MachineBasicBlock &MBB,
		MachineBasicBlock::iterator MBBI,
		const DebugLoc &DL, Register Reg,
		RegScavenger *RS) {
		const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
		const SIInstrInfo *TII = ST.getInstrInfo();
		bool IsWave32 = ST.isWave32();
		if (RS->isRegUsed(AMDGPU::SCC)) {
		// Insert two move instructions, one to save the original value of EXEC and
		// the other to turn on all bits in EXEC. This is required as we can't use
		// the single instruction S_OR_SAVEEXEC that clobbers SCC.
		unsigned MovOpc = IsWave32 ? AMDGPU::S_MOV_B32 : AMDGPU::S_MOV_B64;
		MCRegister Exec = IsWave32 ? AMDGPU::EXEC_LO : AMDGPU::EXEC;
		BuildMI(MBB, MBBI, DL, TII->get(MovOpc), Reg).addReg(Exec, RegState::Kill);
		BuildMI(MBB, MBBI, DL, TII->get(MovOpc), Exec).addImm(-1);
		} else {
		const unsigned OrSaveExec =
		IsWave32 ? AMDGPU::S_OR_SAVEEXEC_B32 : AMDGPU::S_OR_SAVEEXEC_B64;
		auto SaveExec =
		BuildMI(MBB, MBBI, DL, TII->get(OrSaveExec), Reg).addImm(-1);
		SaveExec->getOperand(3).setIsDead(); // Mark SCC as dead.
		Pierre-vhUnsubmitted Not Done Reply Inline Actions Why does SCC need to be dead? What happens if another instruction right after uses it? Pierre-vh: Why does SCC need to be dead? What happens if another instruction right after uses it?
		cdevadasAuthorUnsubmitted Done Reply Inline Actions The code here is only to manipulate exec mask and no other instruction depends on the SCC that it produces, and we should mark it dead to avoid unwanted side effects. We don't have an alternate instruction that doesn't clobber SCC. cdevadas: The code here is only to manipulate exec mask and no other instruction depends on the SCC that…
		Pierre-vhUnsubmitted Not Done Reply Inline Actions Ah that makes sense, but shouldn't this check that it's not inserting in a place where SCC is alive? I was trying out this patch and I have a case where it's causing issues: S_CMP_EQ_U32 killed renamable $sgpr6, killed renamable $sgpr7, implicit-def $scc renamable $vgpr0 = V_WRITELANE_B32 killed $sgpr4, 4, $vgpr0(tied-def 0), implicit-def $sgpr4_sgpr5, implicit $sgpr4_sgpr5 renamable $vgpr0 = V_WRITELANE_B32 killed $sgpr5, 5, $vgpr0(tied-def 0), implicit killed $sgpr4_sgpr5 $sgpr10_sgpr11 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec $agpr4 = V_ACCVGPR_WRITE_B32_e64 killed $vgpr0, implicit $exec $exec = S_MOV_B64 killed $sgpr10_sgpr11 S_CBRANCH_SCC1 %bb.5, implicit killed $scc Insertion is between the S_CMP and the S_CBRANCH. Pierre-vh: Ah that makes sense, but shouldn't this check that it's not inserting in a place where SCC is…
		cdevadasAuthorUnsubmitted Done Reply Inline Actions Yes, the check is already in place. See the code above, the if condition, that inserts two separate move instructions when SCC is live and the else part uses SCC when it is free. Not sure why RegScavenger returned false. It should have returned SCC as clobbered. cdevadas: Yes, the check is already in place. See the code above, the if condition, that inserts two…
		cdevadasAuthorUnsubmitted Done Reply Inline Actions See test llvm/test/CodeGen/AMDGPU/cf-loop-on-constant.ll A similar situation is handled. RS returned the correct liveness info for SCC. cdevadas: See test llvm/test/CodeGen/AMDGPU/cf-loop-on-constant.ll A similar situation is handled. RS…
		}
		}

		static void restoreExec(MachineFunction &MF, MachineBasicBlock &MBB,
		MachineBasicBlock::iterator MBBI, const DebugLoc &DL,
		Register Reg) {
		const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
		const SIInstrInfo *TII = ST.getInstrInfo();
		unsigned ExecMov = ST.isWave32() ? AMDGPU::S_MOV_B32 : AMDGPU::S_MOV_B64;
		MCRegister Exec = ST.isWave32() ? AMDGPU::EXEC_LO : AMDGPU::EXEC;
		BuildMI(MBB, MBBI, DL, TII->get(ExecMov), Exec).addReg(Reg, RegState::Kill);
		}

bool SIRegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator MI,		bool SIRegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator MI,
int SPAdj, unsigned FIOperandNum,		int SPAdj, unsigned FIOperandNum,
RegScavenger *RS) const {		RegScavenger *RS) const {
MachineFunction *MF = MI->getParent()->getParent();		MachineFunction *MF = MI->getParent()->getParent();
MachineBasicBlock *MBB = MI->getParent();		MachineBasicBlock *MBB = MI->getParent();
SIMachineFunctionInfo *MFI = MF->getInfo<SIMachineFunctionInfo>();		SIMachineFunctionInfo *MFI = MF->getInfo<SIMachineFunctionInfo>();
MachineFrameInfo &FrameInfo = MF->getFrameInfo();		MachineFrameInfo &FrameInfo = MF->getFrameInfo();
const SIInstrInfo *TII = ST.getInstrInfo();		const SIInstrInfo *TII = ST.getInstrInfo();
▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	switch (MI->getOpcode()) {
case AMDGPU::SI_SPILL_AV512_SAVE:		case AMDGPU::SI_SPILL_AV512_SAVE:
case AMDGPU::SI_SPILL_AV256_SAVE:		case AMDGPU::SI_SPILL_AV256_SAVE:
case AMDGPU::SI_SPILL_AV224_SAVE:		case AMDGPU::SI_SPILL_AV224_SAVE:
case AMDGPU::SI_SPILL_AV192_SAVE:		case AMDGPU::SI_SPILL_AV192_SAVE:
case AMDGPU::SI_SPILL_AV160_SAVE:		case AMDGPU::SI_SPILL_AV160_SAVE:
case AMDGPU::SI_SPILL_AV128_SAVE:		case AMDGPU::SI_SPILL_AV128_SAVE:
case AMDGPU::SI_SPILL_AV96_SAVE:		case AMDGPU::SI_SPILL_AV96_SAVE:
case AMDGPU::SI_SPILL_AV64_SAVE:		case AMDGPU::SI_SPILL_AV64_SAVE:
case AMDGPU::SI_SPILL_AV32_SAVE: {		case AMDGPU::SI_SPILL_AV32_SAVE:
		case AMDGPU::SI_SPILL_WWM_V32_SAVE: {
const MachineOperand VData = TII->getNamedOperand(MI,		const MachineOperand VData = TII->getNamedOperand(MI,
AMDGPU::OpName::vdata);		AMDGPU::OpName::vdata);
assert(TII->getNamedOperand(*MI, AMDGPU::OpName::soffset)->getReg() ==		assert(TII->getNamedOperand(*MI, AMDGPU::OpName::soffset)->getReg() ==
MFI->getStackPtrOffsetReg());		MFI->getStackPtrOffsetReg());

unsigned Opc = ST.enableFlatScratch() ? AMDGPU::SCRATCH_STORE_DWORD_SADDR		unsigned Opc = ST.enableFlatScratch() ? AMDGPU::SCRATCH_STORE_DWORD_SADDR
: AMDGPU::BUFFER_STORE_DWORD_OFFSET;		: AMDGPU::BUFFER_STORE_DWORD_OFFSET;
auto *MBB = MI->getParent();		auto *MBB = MI->getParent();
		bool IsWWMRegSpill = TII->isWWMRegSpillOpcode(MI->getOpcode());
		if (IsWWMRegSpill)
		insertScratchExecCopy(MF, MBB, MI, DL, MFI->getSGPRForEXECCopy(), RS);

buildSpillLoadStore(		buildSpillLoadStore(
*MBB, MI, DL, Opc, Index, VData->getReg(), VData->isKill(), FrameReg,		*MBB, MI, DL, Opc, Index, VData->getReg(), VData->isKill(), FrameReg,
TII->getNamedOperand(*MI, AMDGPU::OpName::offset)->getImm(),		TII->getNamedOperand(*MI, AMDGPU::OpName::offset)->getImm(),
*MI->memoperands_begin(), RS);		*MI->memoperands_begin(), RS);
MFI->addToSpilledVGPRs(getNumSubRegsForSpillOp(MI->getOpcode()));		MFI->addToSpilledVGPRs(getNumSubRegsForSpillOp(MI->getOpcode()));
		if (IsWWMRegSpill)
		restoreExec(MF, MBB, MI, DL, MFI->getSGPRForEXECCopy());

MI->eraseFromParent();		MI->eraseFromParent();
return true;		return true;
}		}
case AMDGPU::SI_SPILL_V32_RESTORE:		case AMDGPU::SI_SPILL_V32_RESTORE:
case AMDGPU::SI_SPILL_V64_RESTORE:		case AMDGPU::SI_SPILL_V64_RESTORE:
case AMDGPU::SI_SPILL_V96_RESTORE:		case AMDGPU::SI_SPILL_V96_RESTORE:
case AMDGPU::SI_SPILL_V128_RESTORE:		case AMDGPU::SI_SPILL_V128_RESTORE:
case AMDGPU::SI_SPILL_V160_RESTORE:		case AMDGPU::SI_SPILL_V160_RESTORE:
Show All 16 Lines	switch (MI->getOpcode()) {
case AMDGPU::SI_SPILL_AV64_RESTORE:		case AMDGPU::SI_SPILL_AV64_RESTORE:
case AMDGPU::SI_SPILL_AV96_RESTORE:		case AMDGPU::SI_SPILL_AV96_RESTORE:
case AMDGPU::SI_SPILL_AV128_RESTORE:		case AMDGPU::SI_SPILL_AV128_RESTORE:
case AMDGPU::SI_SPILL_AV160_RESTORE:		case AMDGPU::SI_SPILL_AV160_RESTORE:
case AMDGPU::SI_SPILL_AV192_RESTORE:		case AMDGPU::SI_SPILL_AV192_RESTORE:
case AMDGPU::SI_SPILL_AV224_RESTORE:		case AMDGPU::SI_SPILL_AV224_RESTORE:
case AMDGPU::SI_SPILL_AV256_RESTORE:		case AMDGPU::SI_SPILL_AV256_RESTORE:
case AMDGPU::SI_SPILL_AV512_RESTORE:		case AMDGPU::SI_SPILL_AV512_RESTORE:
case AMDGPU::SI_SPILL_AV1024_RESTORE: {		case AMDGPU::SI_SPILL_AV1024_RESTORE:
		case AMDGPU::SI_SPILL_WWM_V32_RESTORE: {
const MachineOperand VData = TII->getNamedOperand(MI,		const MachineOperand VData = TII->getNamedOperand(MI,
AMDGPU::OpName::vdata);		AMDGPU::OpName::vdata);
assert(TII->getNamedOperand(*MI, AMDGPU::OpName::soffset)->getReg() ==		assert(TII->getNamedOperand(*MI, AMDGPU::OpName::soffset)->getReg() ==
MFI->getStackPtrOffsetReg());		MFI->getStackPtrOffsetReg());

unsigned Opc = ST.enableFlatScratch() ? AMDGPU::SCRATCH_LOAD_DWORD_SADDR		unsigned Opc = ST.enableFlatScratch() ? AMDGPU::SCRATCH_LOAD_DWORD_SADDR
: AMDGPU::BUFFER_LOAD_DWORD_OFFSET;		: AMDGPU::BUFFER_LOAD_DWORD_OFFSET;
auto *MBB = MI->getParent();		auto *MBB = MI->getParent();
		bool IsWWMRegSpill = TII->isWWMRegSpillOpcode(MI->getOpcode());
		if (IsWWMRegSpill)
		insertScratchExecCopy(MF, MBB, MI, DL, MFI->getSGPRForEXECCopy(), RS);

buildSpillLoadStore(		buildSpillLoadStore(
*MBB, MI, DL, Opc, Index, VData->getReg(), VData->isKill(), FrameReg,		*MBB, MI, DL, Opc, Index, VData->getReg(), VData->isKill(), FrameReg,
TII->getNamedOperand(*MI, AMDGPU::OpName::offset)->getImm(),		TII->getNamedOperand(*MI, AMDGPU::OpName::offset)->getImm(),
*MI->memoperands_begin(), RS);		*MI->memoperands_begin(), RS);
		if (IsWWMRegSpill)
		restoreExec(MF, MBB, MI, DL, MFI->getSGPRForEXECCopy());

MI->eraseFromParent();		MI->eraseFromParent();
return true;		return true;
}		}

default: {		default: {
// Other access to frame index		// Other access to frame index
const DebugLoc &DL = MI->getDebugLoc();		const DebugLoc &DL = MI->getDebugLoc();

▲ Show 20 Lines • Show All 1,002 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/assert-align.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -global-isel -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -o - %s \| FileCheck %s			; RUN: llc -global-isel -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -o - %s \| FileCheck %s

	declare hidden i32 addrspace(1)* @ext(i8 addrspace(1)*)			declare hidden i32 addrspace(1)* @ext(i8 addrspace(1)*)

	define i32 addrspace(1)* @call_assert_align() {			define i32 addrspace(1)* @call_assert_align() {
	; CHECK-LABEL: call_assert_align:			; CHECK-LABEL: call_assert_align:
	; CHECK: ; %bb.0: ; %entry			; CHECK: ; %bb.0: ; %entry
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_mov_b32 s16, s33			; CHECK-NEXT: s_mov_b32 s16, s33
	; CHECK-NEXT: s_mov_b32 s33, s32			; CHECK-NEXT: s_mov_b32 s33, s32
	; CHECK-NEXT: s_or_saveexec_b64 s[18:19], -1			; CHECK-NEXT: s_or_saveexec_b64 s[18:19], -1
	; CHECK-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; CHECK-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; CHECK-NEXT: s_mov_b64 exec, s[18:19]			; CHECK-NEXT: s_mov_b64 exec, s[18:19]
				; CHECK-NEXT: ; implicit-def: $vgpr40
	; CHECK-NEXT: s_addk_i32 s32, 0x400			; CHECK-NEXT: s_addk_i32 s32, 0x400
	; CHECK-NEXT: v_writelane_b32 v40, s30, 0			; CHECK-NEXT: v_writelane_b32 v40, s30, 0
	; CHECK-NEXT: v_mov_b32_e32 v0, 0			; CHECK-NEXT: v_mov_b32_e32 v0, 0
	; CHECK-NEXT: v_mov_b32_e32 v1, 0			; CHECK-NEXT: v_mov_b32_e32 v1, 0
	; CHECK-NEXT: v_writelane_b32 v41, s16, 0			; CHECK-NEXT: v_writelane_b32 v41, s16, 0
	; CHECK-NEXT: v_writelane_b32 v40, s31, 1			; CHECK-NEXT: v_writelane_b32 v40, s31, 1
	; CHECK-NEXT: s_getpc_b64 s[16:17]			; CHECK-NEXT: s_getpc_b64 s[16:17]
	; CHECK-NEXT: s_add_u32 s16, s16, ext@rel32@lo+4			; CHECK-NEXT: s_add_u32 s16, s16, ext@rel32@lo+4
	Show All 36 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/call-outgoing-stack-args.ll

	Show First 20 Lines • Show All 240 Lines • ▼ Show 20 Lines
	; MUBUF-NEXT: s_or_saveexec_b64 s[6:7], -1			; MUBUF-NEXT: s_or_saveexec_b64 s[6:7], -1
	; MUBUF-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; MUBUF-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; MUBUF-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; MUBUF-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; MUBUF-NEXT: s_mov_b64 exec, s[6:7]			; MUBUF-NEXT: s_mov_b64 exec, s[6:7]
	; MUBUF-NEXT: s_addk_i32 s32, 0x400			; MUBUF-NEXT: s_addk_i32 s32, 0x400
	; MUBUF-NEXT: v_mov_b32_e32 v0, 9			; MUBUF-NEXT: v_mov_b32_e32 v0, 9
	; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4			; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4
	; MUBUF-NEXT: v_mov_b32_e32 v0, 10			; MUBUF-NEXT: v_mov_b32_e32 v0, 10
				; MUBUF-NEXT: ; implicit-def: $vgpr40
	; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8			; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8
	; MUBUF-NEXT: v_mov_b32_e32 v0, 11			; MUBUF-NEXT: v_mov_b32_e32 v0, 11
	; MUBUF-NEXT: v_writelane_b32 v40, s30, 0			; MUBUF-NEXT: v_writelane_b32 v40, s30, 0
	; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12			; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12
	; MUBUF-NEXT: v_mov_b32_e32 v0, 12			; MUBUF-NEXT: v_mov_b32_e32 v0, 12
	; MUBUF-NEXT: v_writelane_b32 v41, s4, 0			; MUBUF-NEXT: v_writelane_b32 v41, s4, 0
	; MUBUF-NEXT: v_writelane_b32 v40, s31, 1			; MUBUF-NEXT: v_writelane_b32 v40, s31, 1
	; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16			; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16
	Show All 21 Lines
	; FLATSCR-NEXT: s_or_saveexec_b64 s[2:3], -1			; FLATSCR-NEXT: s_or_saveexec_b64 s[2:3], -1
	; FLATSCR-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; FLATSCR-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; FLATSCR-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill			; FLATSCR-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; FLATSCR-NEXT: s_mov_b64 exec, s[2:3]			; FLATSCR-NEXT: s_mov_b64 exec, s[2:3]
	; FLATSCR-NEXT: s_add_i32 s32, s32, 16			; FLATSCR-NEXT: s_add_i32 s32, s32, 16
	; FLATSCR-NEXT: v_mov_b32_e32 v0, 9			; FLATSCR-NEXT: v_mov_b32_e32 v0, 9
	; FLATSCR-NEXT: scratch_store_dword off, v0, s32 offset:4			; FLATSCR-NEXT: scratch_store_dword off, v0, s32 offset:4
	; FLATSCR-NEXT: v_mov_b32_e32 v0, 10			; FLATSCR-NEXT: v_mov_b32_e32 v0, 10
				; FLATSCR-NEXT: ; implicit-def: $vgpr40
	; FLATSCR-NEXT: scratch_store_dword off, v0, s32 offset:8			; FLATSCR-NEXT: scratch_store_dword off, v0, s32 offset:8
	; FLATSCR-NEXT: v_mov_b32_e32 v0, 11			; FLATSCR-NEXT: v_mov_b32_e32 v0, 11
	; FLATSCR-NEXT: v_writelane_b32 v40, s30, 0			; FLATSCR-NEXT: v_writelane_b32 v40, s30, 0
	; FLATSCR-NEXT: scratch_store_dword off, v0, s32 offset:12			; FLATSCR-NEXT: scratch_store_dword off, v0, s32 offset:12
	; FLATSCR-NEXT: v_mov_b32_e32 v0, 12			; FLATSCR-NEXT: v_mov_b32_e32 v0, 12
	; FLATSCR-NEXT: v_writelane_b32 v41, s0, 0			; FLATSCR-NEXT: v_writelane_b32 v41, s0, 0
	; FLATSCR-NEXT: v_writelane_b32 v40, s31, 1			; FLATSCR-NEXT: v_writelane_b32 v40, s31, 1
	; FLATSCR-NEXT: scratch_store_dword off, v0, s32 offset:16			; FLATSCR-NEXT: scratch_store_dword off, v0, s32 offset:16
	Show All 24 Lines
	; MUBUF-NEXT: s_mov_b32 s33, s32			; MUBUF-NEXT: s_mov_b32 s33, s32
	; MUBUF-NEXT: s_or_saveexec_b64 s[6:7], -1			; MUBUF-NEXT: s_or_saveexec_b64 s[6:7], -1
	; MUBUF-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; MUBUF-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; MUBUF-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; MUBUF-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; MUBUF-NEXT: s_mov_b64 exec, s[6:7]			; MUBUF-NEXT: s_mov_b64 exec, s[6:7]
	; MUBUF-NEXT: buffer_load_dword v1, v0, s[0:3], 0 offen			; MUBUF-NEXT: buffer_load_dword v1, v0, s[0:3], 0 offen
	; MUBUF-NEXT: buffer_load_dword v2, v0, s[0:3], 0 offen offset:4			; MUBUF-NEXT: buffer_load_dword v2, v0, s[0:3], 0 offen offset:4
	; MUBUF-NEXT: s_addk_i32 s32, 0x400			; MUBUF-NEXT: s_addk_i32 s32, 0x400
	; MUBUF-NEXT: v_writelane_b32 v40, s30, 0			; MUBUF-NEXT: ; implicit-def: $vgpr40
	; MUBUF-NEXT: v_writelane_b32 v41, s4, 0			; MUBUF-NEXT: v_writelane_b32 v41, s4, 0
				; MUBUF-NEXT: v_writelane_b32 v40, s30, 0
	; MUBUF-NEXT: v_writelane_b32 v40, s31, 1			; MUBUF-NEXT: v_writelane_b32 v40, s31, 1
	; MUBUF-NEXT: s_getpc_b64 s[4:5]			; MUBUF-NEXT: s_getpc_b64 s[4:5]
	; MUBUF-NEXT: s_add_u32 s4, s4, external_void_func_byval@rel32@lo+4			; MUBUF-NEXT: s_add_u32 s4, s4, external_void_func_byval@rel32@lo+4
	; MUBUF-NEXT: s_addc_u32 s5, s5, external_void_func_byval@rel32@hi+12			; MUBUF-NEXT: s_addc_u32 s5, s5, external_void_func_byval@rel32@hi+12
	; MUBUF-NEXT: s_waitcnt vmcnt(1)			; MUBUF-NEXT: s_waitcnt vmcnt(1)
	; MUBUF-NEXT: buffer_store_dword v1, off, s[0:3], s32			; MUBUF-NEXT: buffer_store_dword v1, off, s[0:3], s32
	; MUBUF-NEXT: s_waitcnt vmcnt(1)			; MUBUF-NEXT: s_waitcnt vmcnt(1)
	; MUBUF-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:4			; MUBUF-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:4
	▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines
	; FLATSCR-NEXT: s_mov_b32 s0, s33			; FLATSCR-NEXT: s_mov_b32 s0, s33
	; FLATSCR-NEXT: s_mov_b32 s33, s32			; FLATSCR-NEXT: s_mov_b32 s33, s32
	; FLATSCR-NEXT: s_or_saveexec_b64 s[2:3], -1			; FLATSCR-NEXT: s_or_saveexec_b64 s[2:3], -1
	; FLATSCR-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill			; FLATSCR-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
	; FLATSCR-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill			; FLATSCR-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; FLATSCR-NEXT: s_mov_b64 exec, s[2:3]			; FLATSCR-NEXT: s_mov_b64 exec, s[2:3]
	; FLATSCR-NEXT: scratch_load_dwordx2 v[1:2], v0, off			; FLATSCR-NEXT: scratch_load_dwordx2 v[1:2], v0, off
	; FLATSCR-NEXT: s_add_i32 s32, s32, 16			; FLATSCR-NEXT: s_add_i32 s32, s32, 16
	; FLATSCR-NEXT: v_writelane_b32 v40, s30, 0			; FLATSCR-NEXT: ; implicit-def: $vgpr40
	; FLATSCR-NEXT: v_writelane_b32 v41, s0, 0			; FLATSCR-NEXT: v_writelane_b32 v41, s0, 0
				; FLATSCR-NEXT: v_writelane_b32 v40, s30, 0
	; FLATSCR-NEXT: v_writelane_b32 v40, s31, 1			; FLATSCR-NEXT: v_writelane_b32 v40, s31, 1
	; FLATSCR-NEXT: s_getpc_b64 s[0:1]			; FLATSCR-NEXT: s_getpc_b64 s[0:1]
	; FLATSCR-NEXT: s_add_u32 s0, s0, external_void_func_byval@rel32@lo+4			; FLATSCR-NEXT: s_add_u32 s0, s0, external_void_func_byval@rel32@lo+4
	; FLATSCR-NEXT: s_addc_u32 s1, s1, external_void_func_byval@rel32@hi+12			; FLATSCR-NEXT: s_addc_u32 s1, s1, external_void_func_byval@rel32@hi+12
	; FLATSCR-NEXT: s_waitcnt vmcnt(0)			; FLATSCR-NEXT: s_waitcnt vmcnt(0)
	; FLATSCR-NEXT: scratch_store_dwordx2 off, v[1:2], s32			; FLATSCR-NEXT: scratch_store_dwordx2 off, v[1:2], s32
	; FLATSCR-NEXT: scratch_load_dwordx2 v[1:2], v0, off offset:8			; FLATSCR-NEXT: scratch_load_dwordx2 v[1:2], v0, off offset:8
	; FLATSCR-NEXT: s_waitcnt vmcnt(0)			; FLATSCR-NEXT: s_waitcnt vmcnt(0)
	Show All 40 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/image-waterfall-loop-O0.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -global-isel -O0 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1031 -verify-machineinstrs -o - %s \| FileCheck %s			; RUN: llc -global-isel -O0 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1031 -verify-machineinstrs -o - %s \| FileCheck %s

	; Make sure the waterfall loop does not fail the verifier after regalloc fast			; Make sure the waterfall loop does not fail the verifier after regalloc fast
	define <4 x float> @waterfall_loop(<8 x i32> %vgpr_srd) {			define <4 x float> @waterfall_loop(<8 x i32> %vgpr_srd) {
	; CHECK-LABEL: waterfall_loop:			; CHECK-LABEL: waterfall_loop:
	; CHECK: ; %bb.0: ; %bb			; CHECK: ; %bb.0: ; %bb
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_waitcnt_vscnt null, 0x0			; CHECK-NEXT: s_waitcnt_vscnt null, 0x0
	; CHECK-NEXT: s_xor_saveexec_b32 s4, -1			; CHECK-NEXT: s_xor_saveexec_b32 s4, -1
	; CHECK-NEXT: buffer_store_dword v8, off, s[0:3], s32 offset:44 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:48 ; 4-byte Folded Spill
				; CHECK-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:52 ; 4-byte Folded Spill
	; CHECK-NEXT: s_mov_b32 exec_lo, s4			; CHECK-NEXT: s_mov_b32 exec_lo, s4
	; CHECK-NEXT: v_mov_b32_e32 v15, v1			; CHECK-NEXT: v_mov_b32_e32 v14, v1
	; CHECK-NEXT: v_mov_b32_e32 v14, v2			; CHECK-NEXT: v_mov_b32_e32 v13, v2
	; CHECK-NEXT: v_mov_b32_e32 v13, v3			; CHECK-NEXT: v_mov_b32_e32 v12, v3
	; CHECK-NEXT: v_mov_b32_e32 v12, v4			; CHECK-NEXT: v_mov_b32_e32 v11, v4
	; CHECK-NEXT: v_mov_b32_e32 v11, v5			; CHECK-NEXT: v_mov_b32_e32 v10, v5
	; CHECK-NEXT: v_mov_b32_e32 v10, v6			; CHECK-NEXT: v_mov_b32_e32 v9, v6
	; CHECK-NEXT: v_mov_b32_e32 v9, v7			; CHECK-NEXT: v_mov_b32_e32 v8, v7
	; CHECK-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7 killed $exec			; CHECK-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7 killed $exec
	; CHECK-NEXT: v_mov_b32_e32 v1, v15			; CHECK-NEXT: v_mov_b32_e32 v1, v14
	; CHECK-NEXT: v_mov_b32_e32 v2, v14			; CHECK-NEXT: v_mov_b32_e32 v2, v13
	; CHECK-NEXT: v_mov_b32_e32 v3, v13			; CHECK-NEXT: v_mov_b32_e32 v3, v12
	; CHECK-NEXT: v_mov_b32_e32 v4, v12			; CHECK-NEXT: v_mov_b32_e32 v4, v11
	; CHECK-NEXT: v_mov_b32_e32 v5, v11			; CHECK-NEXT: v_mov_b32_e32 v5, v10
	; CHECK-NEXT: v_mov_b32_e32 v6, v10			; CHECK-NEXT: v_mov_b32_e32 v6, v9
	; CHECK-NEXT: v_mov_b32_e32 v7, v9			; CHECK-NEXT: v_mov_b32_e32 v7, v8
	; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
	; CHECK-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
	; CHECK-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
	; CHECK-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill
	; CHECK-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:28 ; 4-byte Folded Spill
	; CHECK-NEXT: buffer_store_dword v5, off, s[0:3], s32 offset:28 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v5, off, s[0:3], s32 offset:32 ; 4-byte Folded Spill
	; CHECK-NEXT: buffer_store_dword v6, off, s[0:3], s32 offset:32 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v6, off, s[0:3], s32 offset:36 ; 4-byte Folded Spill
	; CHECK-NEXT: buffer_store_dword v7, off, s[0:3], s32 offset:36 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v7, off, s[0:3], s32 offset:40 ; 4-byte Folded Spill
	; CHECK-NEXT: s_mov_b32 s8, 0			; CHECK-NEXT: s_mov_b32 s8, 0
	; CHECK-NEXT: s_mov_b32 s4, s8			; CHECK-NEXT: s_mov_b32 s4, s8
	; CHECK-NEXT: s_mov_b32 s5, s8			; CHECK-NEXT: s_mov_b32 s5, s8
	; CHECK-NEXT: s_mov_b32 s6, s8			; CHECK-NEXT: s_mov_b32 s6, s8
	; CHECK-NEXT: s_mov_b32 s7, s8			; CHECK-NEXT: s_mov_b32 s7, s8
	; CHECK-NEXT: v_writelane_b32 v8, s4, 0			; CHECK-NEXT: ; implicit-def: $vgpr0
	; CHECK-NEXT: v_writelane_b32 v8, s5, 1			; CHECK-NEXT: v_writelane_b32 v0, s4, 0
	; CHECK-NEXT: v_writelane_b32 v8, s6, 2			; CHECK-NEXT: v_writelane_b32 v0, s5, 1
	; CHECK-NEXT: v_writelane_b32 v8, s7, 3			; CHECK-NEXT: v_writelane_b32 v0, s6, 2
				; CHECK-NEXT: v_writelane_b32 v0, s7, 3
	; CHECK-NEXT: s_mov_b32 s6, 0			; CHECK-NEXT: s_mov_b32 s6, 0
	; CHECK-NEXT: s_mov_b32 s4, s6			; CHECK-NEXT: s_mov_b32 s4, s6
	; CHECK-NEXT: s_mov_b32 s5, s6			; CHECK-NEXT: s_mov_b32 s5, s6
	; CHECK-NEXT: v_mov_b32_e32 v0, s4			; CHECK-NEXT: v_mov_b32_e32 v1, s4
	; CHECK-NEXT: v_mov_b32_e32 v1, s5			; CHECK-NEXT: v_mov_b32_e32 v2, s5
	; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
	; CHECK-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
				; CHECK-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; CHECK-NEXT: s_mov_b32 s4, exec_lo			; CHECK-NEXT: s_mov_b32 s4, exec_lo
	; CHECK-NEXT: v_writelane_b32 v8, s4, 4			; CHECK-NEXT: v_writelane_b32 v0, s4, 4
				; CHECK-NEXT: s_or_saveexec_b32 s21, -1
				; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
				; CHECK-NEXT: s_mov_b32 exec_lo, s21
	; CHECK-NEXT: .LBB0_1: ; =>This Inner Loop Header: Depth=1			; CHECK-NEXT: .LBB0_1: ; =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: buffer_load_dword v9, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload			; CHECK-NEXT: s_or_saveexec_b32 s21, -1
	; CHECK-NEXT: buffer_load_dword v10, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
	; CHECK-NEXT: buffer_load_dword v11, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload			; CHECK-NEXT: s_mov_b32 exec_lo, s21
	; CHECK-NEXT: buffer_load_dword v12, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v9, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
	; CHECK-NEXT: buffer_load_dword v13, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v10, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload
	; CHECK-NEXT: buffer_load_dword v14, off, s[0:3], s32 offset:28 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v11, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload
	; CHECK-NEXT: buffer_load_dword v15, off, s[0:3], s32 offset:32 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v12, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload
	; CHECK-NEXT: buffer_load_dword v16, off, s[0:3], s32 offset:36 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v13, off, s[0:3], s32 offset:28 ; 4-byte Folded Reload
				; CHECK-NEXT: buffer_load_dword v14, off, s[0:3], s32 offset:32 ; 4-byte Folded Reload
				; CHECK-NEXT: buffer_load_dword v15, off, s[0:3], s32 offset:36 ; 4-byte Folded Reload
				; CHECK-NEXT: buffer_load_dword v16, off, s[0:3], s32 offset:40 ; 4-byte Folded Reload
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: v_mov_b32_e32 v7, v9			; CHECK-NEXT: v_mov_b32_e32 v8, v9
	; CHECK-NEXT: v_mov_b32_e32 v6, v10			; CHECK-NEXT: v_mov_b32_e32 v7, v10
	; CHECK-NEXT: v_mov_b32_e32 v5, v11			; CHECK-NEXT: v_mov_b32_e32 v6, v11
	; CHECK-NEXT: v_mov_b32_e32 v4, v12			; CHECK-NEXT: v_mov_b32_e32 v5, v12
	; CHECK-NEXT: v_mov_b32_e32 v3, v13			; CHECK-NEXT: v_mov_b32_e32 v4, v13
	; CHECK-NEXT: v_mov_b32_e32 v2, v14			; CHECK-NEXT: v_mov_b32_e32 v3, v14
	; CHECK-NEXT: v_mov_b32_e32 v1, v15			; CHECK-NEXT: v_mov_b32_e32 v2, v15
	; CHECK-NEXT: v_mov_b32_e32 v0, v16			; CHECK-NEXT: v_mov_b32_e32 v1, v16
	; CHECK-NEXT: v_readfirstlane_b32 s12, v7			; CHECK-NEXT: v_readfirstlane_b32 s12, v8
	; CHECK-NEXT: v_readfirstlane_b32 s10, v6			; CHECK-NEXT: v_readfirstlane_b32 s10, v7
	; CHECK-NEXT: v_readfirstlane_b32 s9, v5			; CHECK-NEXT: v_readfirstlane_b32 s9, v6
	; CHECK-NEXT: v_readfirstlane_b32 s8, v4			; CHECK-NEXT: v_readfirstlane_b32 s8, v5
	; CHECK-NEXT: v_readfirstlane_b32 s7, v3			; CHECK-NEXT: v_readfirstlane_b32 s7, v4
	; CHECK-NEXT: v_readfirstlane_b32 s6, v2			; CHECK-NEXT: v_readfirstlane_b32 s6, v3
	; CHECK-NEXT: v_readfirstlane_b32 s5, v1			; CHECK-NEXT: v_readfirstlane_b32 s5, v2
	; CHECK-NEXT: v_readfirstlane_b32 s4, v0			; CHECK-NEXT: v_readfirstlane_b32 s4, v1
	; CHECK-NEXT: ; kill: def $sgpr12 killed $sgpr12 def $sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19			; CHECK-NEXT: ; kill: def $sgpr12 killed $sgpr12 def $sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19
	; CHECK-NEXT: s_mov_b32 s13, s10			; CHECK-NEXT: s_mov_b32 s13, s10
	; CHECK-NEXT: s_mov_b32 s14, s9			; CHECK-NEXT: s_mov_b32 s14, s9
	; CHECK-NEXT: s_mov_b32 s15, s8			; CHECK-NEXT: s_mov_b32 s15, s8
	; CHECK-NEXT: s_mov_b32 s16, s7			; CHECK-NEXT: s_mov_b32 s16, s7
	; CHECK-NEXT: s_mov_b32 s17, s6			; CHECK-NEXT: s_mov_b32 s17, s6
	; CHECK-NEXT: s_mov_b32 s18, s5			; CHECK-NEXT: s_mov_b32 s18, s5
	; CHECK-NEXT: s_mov_b32 s19, s4			; CHECK-NEXT: s_mov_b32 s19, s4
	; CHECK-NEXT: v_writelane_b32 v8, s12, 5			; CHECK-NEXT: v_writelane_b32 v0, s12, 5
	; CHECK-NEXT: v_writelane_b32 v8, s13, 6			; CHECK-NEXT: v_writelane_b32 v0, s13, 6
	; CHECK-NEXT: v_writelane_b32 v8, s14, 7			; CHECK-NEXT: v_writelane_b32 v0, s14, 7
	; CHECK-NEXT: v_writelane_b32 v8, s15, 8			; CHECK-NEXT: v_writelane_b32 v0, s15, 8
	; CHECK-NEXT: v_writelane_b32 v8, s16, 9			; CHECK-NEXT: v_writelane_b32 v0, s16, 9
	; CHECK-NEXT: v_writelane_b32 v8, s17, 10			; CHECK-NEXT: v_writelane_b32 v0, s17, 10
	; CHECK-NEXT: v_writelane_b32 v8, s18, 11			; CHECK-NEXT: v_writelane_b32 v0, s18, 11
	; CHECK-NEXT: v_writelane_b32 v8, s19, 12			; CHECK-NEXT: v_writelane_b32 v0, s19, 12
	; CHECK-NEXT: v_mov_b32_e32 v6, v9			; CHECK-NEXT: v_mov_b32_e32 v7, v9
	; CHECK-NEXT: v_mov_b32_e32 v7, v10			; CHECK-NEXT: v_mov_b32_e32 v8, v10
	; CHECK-NEXT: v_mov_b32_e32 v4, v11			; CHECK-NEXT: v_mov_b32_e32 v5, v11
	; CHECK-NEXT: v_mov_b32_e32 v5, v12			; CHECK-NEXT: v_mov_b32_e32 v6, v12
	; CHECK-NEXT: v_mov_b32_e32 v2, v13			; CHECK-NEXT: v_mov_b32_e32 v3, v13
	; CHECK-NEXT: v_mov_b32_e32 v3, v14			; CHECK-NEXT: v_mov_b32_e32 v4, v14
	; CHECK-NEXT: v_mov_b32_e32 v0, v15			; CHECK-NEXT: v_mov_b32_e32 v1, v15
	; CHECK-NEXT: v_mov_b32_e32 v1, v16			; CHECK-NEXT: v_mov_b32_e32 v2, v16
	; CHECK-NEXT: s_mov_b64 s[4:5], s[12:13]			; CHECK-NEXT: s_mov_b64 s[4:5], s[12:13]
	; CHECK-NEXT: s_mov_b64 s[10:11], s[14:15]			; CHECK-NEXT: s_mov_b64 s[10:11], s[14:15]
	; CHECK-NEXT: s_mov_b64 s[8:9], s[16:17]			; CHECK-NEXT: s_mov_b64 s[8:9], s[16:17]
	; CHECK-NEXT: s_mov_b64 s[6:7], s[18:19]			; CHECK-NEXT: s_mov_b64 s[6:7], s[18:19]
	; CHECK-NEXT: v_cmp_eq_u64_e64 s4, s[4:5], v[6:7]			; CHECK-NEXT: v_cmp_eq_u64_e64 s4, s[4:5], v[7:8]
	; CHECK-NEXT: v_cmp_eq_u64_e64 s5, s[10:11], v[4:5]			; CHECK-NEXT: v_cmp_eq_u64_e64 s5, s[10:11], v[5:6]
	; CHECK-NEXT: s_and_b32 s4, s4, s5			; CHECK-NEXT: s_and_b32 s4, s4, s5
	; CHECK-NEXT: v_cmp_eq_u64_e64 s5, s[8:9], v[2:3]			; CHECK-NEXT: v_cmp_eq_u64_e64 s5, s[8:9], v[3:4]
	; CHECK-NEXT: s_and_b32 s4, s4, s5			; CHECK-NEXT: s_and_b32 s4, s4, s5
	; CHECK-NEXT: v_cmp_eq_u64_e64 s5, s[6:7], v[0:1]			; CHECK-NEXT: v_cmp_eq_u64_e64 s5, s[6:7], v[1:2]
	; CHECK-NEXT: s_and_b32 s4, s4, s5			; CHECK-NEXT: s_and_b32 s4, s4, s5
	; CHECK-NEXT: s_and_saveexec_b32 s4, s4			; CHECK-NEXT: s_and_saveexec_b32 s4, s4
	; CHECK-NEXT: v_writelane_b32 v8, s4, 13			; CHECK-NEXT: v_writelane_b32 v0, s4, 13
				; CHECK-NEXT: s_or_saveexec_b32 s21, -1
				; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
				; CHECK-NEXT: s_mov_b32 exec_lo, s21
	; CHECK-NEXT: ; %bb.2: ; in Loop: Header=BB0_1 Depth=1			; CHECK-NEXT: ; %bb.2: ; in Loop: Header=BB0_1 Depth=1
	; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; CHECK-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; CHECK-NEXT: v_readlane_b32 s4, v8, 13			; CHECK-NEXT: s_or_saveexec_b32 s21, -1
	; CHECK-NEXT: v_readlane_b32 s8, v8, 5			; CHECK-NEXT: buffer_load_dword v2, off, s[0:3], s32 ; 4-byte Folded Reload
	; CHECK-NEXT: v_readlane_b32 s9, v8, 6			; CHECK-NEXT: s_mov_b32 exec_lo, s21
	; CHECK-NEXT: v_readlane_b32 s10, v8, 7
	; CHECK-NEXT: v_readlane_b32 s11, v8, 8
	; CHECK-NEXT: v_readlane_b32 s12, v8, 9
	; CHECK-NEXT: v_readlane_b32 s13, v8, 10
	; CHECK-NEXT: v_readlane_b32 s14, v8, 11
	; CHECK-NEXT: v_readlane_b32 s15, v8, 12
	; CHECK-NEXT: v_readlane_b32 s16, v8, 0
	; CHECK-NEXT: v_readlane_b32 s17, v8, 1
	; CHECK-NEXT: v_readlane_b32 s18, v8, 2
	; CHECK-NEXT: v_readlane_b32 s19, v8, 3
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0)
				; CHECK-NEXT: v_readlane_b32 s4, v2, 13
				; CHECK-NEXT: v_readlane_b32 s8, v2, 5
				; CHECK-NEXT: v_readlane_b32 s9, v2, 6
				; CHECK-NEXT: v_readlane_b32 s10, v2, 7
				; CHECK-NEXT: v_readlane_b32 s11, v2, 8
				; CHECK-NEXT: v_readlane_b32 s12, v2, 9
				; CHECK-NEXT: v_readlane_b32 s13, v2, 10
				; CHECK-NEXT: v_readlane_b32 s14, v2, 11
				; CHECK-NEXT: v_readlane_b32 s15, v2, 12
				; CHECK-NEXT: v_readlane_b32 s16, v2, 0
				; CHECK-NEXT: v_readlane_b32 s17, v2, 1
				; CHECK-NEXT: v_readlane_b32 s18, v2, 2
				; CHECK-NEXT: v_readlane_b32 s19, v2, 3
	; CHECK-NEXT: image_sample v0, v[0:1], s[8:15], s[16:19] dmask:0x1 dim:SQ_RSRC_IMG_2D			; CHECK-NEXT: image_sample v0, v[0:1], s[8:15], s[16:19] dmask:0x1 dim:SQ_RSRC_IMG_2D
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:40 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:44 ; 4-byte Folded Spill
	; CHECK-NEXT: s_xor_b32 exec_lo, exec_lo, s4			; CHECK-NEXT: s_xor_b32 exec_lo, exec_lo, s4
	; CHECK-NEXT: s_cbranch_execnz .LBB0_1			; CHECK-NEXT: s_cbranch_execnz .LBB0_1
	; CHECK-NEXT: ; %bb.3:			; CHECK-NEXT: ; %bb.3:
	; CHECK-NEXT: v_readlane_b32 s4, v8, 4			; CHECK-NEXT: s_or_saveexec_b32 s21, -1
				; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
				; CHECK-NEXT: s_mov_b32 exec_lo, s21
				; CHECK-NEXT: s_waitcnt vmcnt(0)
				; CHECK-NEXT: v_readlane_b32 s4, v0, 4
	; CHECK-NEXT: s_mov_b32 exec_lo, s4			; CHECK-NEXT: s_mov_b32 exec_lo, s4
	; CHECK-NEXT: ; %bb.4:			; CHECK-NEXT: ; %bb.4:
	; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:40 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:44 ; 4-byte Folded Reload
	; CHECK-NEXT: ; implicit-def: $sgpr4			; CHECK-NEXT: ; implicit-def: $sgpr4
	; CHECK-NEXT: v_mov_b32_e32 v1, s4			; CHECK-NEXT: v_mov_b32_e32 v1, s4
	; CHECK-NEXT: v_mov_b32_e32 v2, s4			; CHECK-NEXT: v_mov_b32_e32 v2, s4
	; CHECK-NEXT: v_mov_b32_e32 v3, s4			; CHECK-NEXT: v_mov_b32_e32 v3, s4
	; CHECK-NEXT: s_xor_saveexec_b32 s4, -1			; CHECK-NEXT: s_xor_saveexec_b32 s4, -1
	; CHECK-NEXT: buffer_load_dword v8, off, s[0:3], s32 offset:44 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:48 ; 4-byte Folded Reload
				; CHECK-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:52 ; 4-byte Folded Reload
	; CHECK-NEXT: s_mov_b32 exec_lo, s4			; CHECK-NEXT: s_mov_b32 exec_lo, s4
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: s_waitcnt_vscnt null, 0x0			; CHECK-NEXT: s_waitcnt_vscnt null, 0x0
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	bb:			bb:
	%ret = tail call <4 x float> @llvm.amdgcn.image.sample.2d.v4f32.f32(i32 1, float 0.000000e+00, float 0.000000e+00, <8 x i32> %vgpr_srd, <4 x i32> zeroinitializer, i1 false, i32 0, i32 0)			%ret = tail call <4 x float> @llvm.amdgcn.image.sample.2d.v4f32.f32(i32 1, float 0.000000e+00, float 0.000000e+00, <8 x i32> %vgpr_srd, <4 x i32> zeroinitializer, i1 false, i32 0, i32 0)
	ret <4 x float> %ret			ret <4 x float> %ret
	}			}

	declare <4 x float> @llvm.amdgcn.image.sample.2d.v4f32.f32(i32 immarg, float, float, <8 x i32>, <4 x i32>, i1 immarg, i32 immarg, i32 immarg) #0			declare <4 x float> @llvm.amdgcn.image.sample.2d.v4f32.f32(i32 immarg, float, float, <8 x i32>, <4 x i32>, i1 immarg, i32 immarg, i32 immarg) #0

	attributes #0 = { nounwind readonly willreturn }			attributes #0 = { nounwind readonly willreturn }

llvm/test/CodeGen/AMDGPU/GlobalISel/localizer.ll

	Show First 20 Lines • Show All 236 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: s_or_saveexec_b64 s[18:19], -1			; GFX9-NEXT: s_or_saveexec_b64 s[18:19], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[18:19]			; GFX9-NEXT: s_mov_b64 exec, s[18:19]
	; GFX9-NEXT: v_mov_b32_e32 v0, 0			; GFX9-NEXT: v_mov_b32_e32 v0, 0
	; GFX9-NEXT: v_mov_b32_e32 v1, 0			; GFX9-NEXT: v_mov_b32_e32 v1, 0
	; GFX9-NEXT: global_load_dword v0, v[0:1], off glc			; GFX9-NEXT: global_load_dword v0, v[0:1], off glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: ; implicit-def: $vgpr40
	; GFX9-NEXT: v_writelane_b32 v41, s16, 0			; GFX9-NEXT: v_writelane_b32 v41, s16, 0
				; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], 0			; GFX9-NEXT: s_swappc_b64 s[30:31], 0
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s4, v41, 0			; GFX9-NEXT: v_readlane_b32 s4, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	Show All 14 Lines

llvm/test/CodeGen/AMDGPU/abi-attribute-hints-undefined-behavior.ll

	Show All 17 Lines
	; FIXEDABI: ; %bb.0:			; FIXEDABI: ; %bb.0:
	; FIXEDABI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; FIXEDABI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; FIXEDABI-NEXT: s_mov_b32 s16, s33			; FIXEDABI-NEXT: s_mov_b32 s16, s33
	; FIXEDABI-NEXT: s_mov_b32 s33, s32			; FIXEDABI-NEXT: s_mov_b32 s33, s32
	; FIXEDABI-NEXT: s_or_saveexec_b64 s[18:19], -1			; FIXEDABI-NEXT: s_or_saveexec_b64 s[18:19], -1
	; FIXEDABI-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; FIXEDABI-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; FIXEDABI-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; FIXEDABI-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; FIXEDABI-NEXT: s_mov_b64 exec, s[18:19]			; FIXEDABI-NEXT: s_mov_b64 exec, s[18:19]
				; FIXEDABI-NEXT: ; implicit-def: $vgpr40
	; FIXEDABI-NEXT: s_addk_i32 s32, 0x400			; FIXEDABI-NEXT: s_addk_i32 s32, 0x400
	; FIXEDABI-NEXT: v_writelane_b32 v40, s30, 0			; FIXEDABI-NEXT: v_writelane_b32 v40, s30, 0
	; FIXEDABI-NEXT: v_writelane_b32 v41, s16, 0			; FIXEDABI-NEXT: v_writelane_b32 v41, s16, 0
	; FIXEDABI-NEXT: v_writelane_b32 v40, s31, 1			; FIXEDABI-NEXT: v_writelane_b32 v40, s31, 1
	; FIXEDABI-NEXT: s_getpc_b64 s[16:17]			; FIXEDABI-NEXT: s_getpc_b64 s[16:17]
	; FIXEDABI-NEXT: s_add_u32 s16, s16, requires_all_inputs@rel32@lo+4			; FIXEDABI-NEXT: s_add_u32 s16, s16, requires_all_inputs@rel32@lo+4
	; FIXEDABI-NEXT: s_addc_u32 s17, s17, requires_all_inputs@rel32@hi+12			; FIXEDABI-NEXT: s_addc_u32 s17, s17, requires_all_inputs@rel32@hi+12
	; FIXEDABI-NEXT: s_swappc_b64 s[30:31], s[16:17]			; FIXEDABI-NEXT: s_swappc_b64 s[30:31], s[16:17]
	▲ Show 20 Lines • Show All 361 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/branch-relax-spill.ll

	Show First 20 Lines • Show All 896 Lines • ▼ Show 20 Lines
	define void @spill_func(i32 addrspace(1)* %arg) #0 {			define void @spill_func(i32 addrspace(1)* %arg) #0 {
	; CHECK-LABEL: spill_func:			; CHECK-LABEL: spill_func:
	; CHECK: ; %bb.0: ; %entry			; CHECK: ; %bb.0: ; %entry
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_xor_saveexec_b64 s[4:5], -1			; CHECK-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
	; CHECK-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; CHECK-NEXT: s_mov_b64 exec, s[4:5]			; CHECK-NEXT: s_mov_b64 exec, s[4:5]
				; CHECK-NEXT: ; implicit-def: $vgpr0
				; CHECK-NEXT: ; implicit-def: $vgpr1
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: s_mov_b32 s0, 0
				; CHECK-NEXT: ;;#ASMEND
	; CHECK-NEXT: s_waitcnt expcnt(1)			; CHECK-NEXT: s_waitcnt expcnt(1)
	; CHECK-NEXT: v_writelane_b32 v0, s30, 0			; CHECK-NEXT: v_writelane_b32 v0, s30, 0
	; CHECK-NEXT: v_writelane_b32 v0, s31, 1			; CHECK-NEXT: v_writelane_b32 v0, s31, 1
	; CHECK-NEXT: v_writelane_b32 v0, s33, 2			; CHECK-NEXT: v_writelane_b32 v0, s33, 2
	; CHECK-NEXT: v_writelane_b32 v0, s34, 3			; CHECK-NEXT: v_writelane_b32 v0, s34, 3
	; CHECK-NEXT: v_writelane_b32 v0, s35, 4			; CHECK-NEXT: v_writelane_b32 v0, s35, 4
	; CHECK-NEXT: v_writelane_b32 v0, s36, 5			; CHECK-NEXT: v_writelane_b32 v0, s36, 5
	; CHECK-NEXT: v_writelane_b32 v0, s37, 6			; CHECK-NEXT: v_writelane_b32 v0, s37, 6
	▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: v_writelane_b32 v1, s99, 4			; CHECK-NEXT: v_writelane_b32 v1, s99, 4
	; CHECK-NEXT: v_writelane_b32 v0, s93, 62			; CHECK-NEXT: v_writelane_b32 v0, s93, 62
	; CHECK-NEXT: v_writelane_b32 v1, s100, 5			; CHECK-NEXT: v_writelane_b32 v1, s100, 5
	; CHECK-NEXT: s_mov_b32 s31, s12			; CHECK-NEXT: s_mov_b32 s31, s12
	; CHECK-NEXT: v_writelane_b32 v0, s94, 63			; CHECK-NEXT: v_writelane_b32 v0, s94, 63
	; CHECK-NEXT: v_writelane_b32 v1, s101, 6			; CHECK-NEXT: v_writelane_b32 v1, s101, 6
	; CHECK-NEXT: s_cmp_eq_u32 s31, 0			; CHECK-NEXT: s_cmp_eq_u32 s31, 0
	; CHECK-NEXT: ;;#ASMSTART			; CHECK-NEXT: ;;#ASMSTART
	; CHECK-NEXT: s_mov_b32 s0, 0
	; CHECK-NEXT: ;;#ASMEND
	; CHECK-NEXT: ;;#ASMSTART
	; CHECK-NEXT: s_mov_b32 s1, 0			; CHECK-NEXT: s_mov_b32 s1, 0
	; CHECK-NEXT: ;;#ASMEND			; CHECK-NEXT: ;;#ASMEND
	; CHECK-NEXT: ;;#ASMSTART			; CHECK-NEXT: ;;#ASMSTART
	; CHECK-NEXT: s_mov_b32 s2, 0			; CHECK-NEXT: s_mov_b32 s2, 0
	; CHECK-NEXT: ;;#ASMEND			; CHECK-NEXT: ;;#ASMEND
	; CHECK-NEXT: ;;#ASMSTART			; CHECK-NEXT: ;;#ASMSTART
	; CHECK-NEXT: s_mov_b32 s3, 0			; CHECK-NEXT: s_mov_b32 s3, 0
	; CHECK-NEXT: ;;#ASMEND			; CHECK-NEXT: ;;#ASMEND
	▲ Show 20 Lines • Show All 951 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/call-alias-register-usage-agpr.ll

	; RUN: llc -O0 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 < %s \| FileCheck -check-prefixes=ALL,GFX908 %s			; RUN: llc -O0 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 < %s \| FileCheck -check-prefixes=ALL,GFX908 %s
	; RUN: llc -O0 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a < %s \| FileCheck -check-prefixes=ALL,GFX90A %s			; RUN: llc -O0 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a < %s \| FileCheck -check-prefixes=ALL,GFX90A %s

	; CallGraphAnalysis, which CodeGenSCC order depends on, does not look			; CallGraphAnalysis, which CodeGenSCC order depends on, does not look
	; through aliases. If GlobalOpt is never run, we do not see direct			; through aliases. If GlobalOpt is never run, we do not see direct
	; calls,			; calls,

	@alias = hidden alias void (), void ()* @aliasee_default			@alias = hidden alias void (), void ()* @aliasee_default

	; ALL-LABEL: {{^}}kernel:			; ALL-LABEL: {{^}}kernel:
	; GFX908: .amdhsa_next_free_vgpr 41			; GFX908: .amdhsa_next_free_vgpr 32
	; GFX908-NEXT: .amdhsa_next_free_sgpr 33			; GFX908-NEXT: .amdhsa_next_free_sgpr 33

	; GFX90A: .amdhsa_next_free_vgpr 71			; GFX90A: .amdhsa_next_free_vgpr 59
	; GFX90A-NEXT: .amdhsa_next_free_sgpr 33			; GFX90A-NEXT: .amdhsa_next_free_sgpr 33
	; GFX90A-NEXT: .amdhsa_accum_offset 44			; GFX90A-NEXT: .amdhsa_accum_offset 32
	define amdgpu_kernel void @kernel() #0 {			define amdgpu_kernel void @kernel() #0 {
	bb:			bb:
	call void @alias() #2			call void @alias() #2
	ret void			ret void
	}			}

	define internal void @aliasee_default() #1 {			define internal void @aliasee_default() #1 {
	bb:			bb:
	call void asm sideeffect "; clobber a26 ", "~{a26}"()			call void asm sideeffect "; clobber a26 ", "~{a26}"()
	ret void			ret void
	}			}

	attributes #0 = { noinline norecurse nounwind optnone }			attributes #0 = { noinline norecurse nounwind optnone }
	attributes #1 = { noinline norecurse nounwind readnone willreturn }			attributes #1 = { noinline norecurse nounwind readnone willreturn }
	attributes #2 = { nounwind readnone willreturn }			attributes #2 = { nounwind readnone willreturn }

llvm/test/CodeGen/AMDGPU/call-alias-register-usage1.ll

	; RUN: llc -O0 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s \| FileCheck %s			; RUN: llc -O0 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s \| FileCheck %s

	; CallGraphAnalysis, which CodeGenSCC order depends on, does not look			; CallGraphAnalysis, which CodeGenSCC order depends on, does not look
	; through aliases. If GlobalOpt is never run, we do not see direct			; through aliases. If GlobalOpt is never run, we do not see direct
	; calls,			; calls,

	@alias1 = hidden alias void (), void ()* @aliasee_vgpr32_sgpr76			@alias1 = hidden alias void (), void ()* @aliasee_vgpr32_sgpr76

	; The parent kernel has a higher VGPR usage than the possible callees.			; The parent kernel has a higher VGPR usage than the possible callees.

	; CHECK-LABEL: {{^}}kernel1:			; CHECK-LABEL: {{^}}kernel1:
	; CHECK: .amdhsa_next_free_vgpr 42			; CHECK: .amdhsa_next_free_vgpr 41
	; CHECK-NEXT: .amdhsa_next_free_sgpr 33			; CHECK-NEXT: .amdhsa_next_free_sgpr 33
	define amdgpu_kernel void @kernel1() #0 {			define amdgpu_kernel void @kernel1() #0 {
	bb:			bb:
	call void asm sideeffect "; clobber v40 ", "~{v40}"()			call void asm sideeffect "; clobber v40 ", "~{v40}"()
	call void @alias1() #2			call void @alias1() #2
	ret void			ret void
	}			}

	Show All 9 Lines

llvm/test/CodeGen/AMDGPU/callee-frame-setup.ll

Show First 20 Lines • Show All 172 Lines • ▼ Show 20 Lines
}		}

declare hidden void @external_void_func_void() #0		declare hidden void @external_void_func_void() #0

; Make sure if a CSR vgpr is used for SGPR spilling, it is saved and		; Make sure if a CSR vgpr is used for SGPR spilling, it is saved and
; restored. No FP is required.		; restored. No FP is required.
;		;
; GCN-LABEL: {{^}}callee_func_sgpr_spill_no_calls:		; GCN-LABEL: {{^}}callee_func_sgpr_spill_no_calls:
; GCN: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN: s_xor_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 ; 4-byte Folded Spill		; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 ; 4-byte Folded Spill
; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s32 ; 4-byte Folded Spill		; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s32 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
; GCN: v_writelane_b32 [[CSR_VGPR]], s		; GCN: v_writelane_b32 [[CSR_VGPR]], s
; GCN: v_writelane_b32 [[CSR_VGPR]], s		; GCN: v_writelane_b32 [[CSR_VGPR]], s

; GCN: ;;#ASMSTART		; GCN: ;;#ASMSTART
; GCN: v_readlane_b32 s{{[0-9]+}}, [[CSR_VGPR]]		; GCN: v_readlane_b32 s{{[0-9]+}}, [[CSR_VGPR]]
; GCN: v_readlane_b32 s{{[0-9]+}}, [[CSR_VGPR]]		; GCN: v_readlane_b32 s{{[0-9]+}}, [[CSR_VGPR]]

; GCN: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN: s_xor_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 ; 4-byte Folded Reload		; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 ; 4-byte Folded Reload
; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s32 ; 4-byte Folded Reload		; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s32 ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
; GCN-NEXT: s_waitcnt		; GCN-NEXT: s_waitcnt
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define void @callee_func_sgpr_spill_no_calls(i32 %in) #0 {		define void @callee_func_sgpr_spill_no_calls(i32 %in) #0 {
call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}"() #0		call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}"() #0
call void asm sideeffect "", "~{v8},~{v9},~{v10},~{v11},~{v12},~{v13},~{v14},~{v15}"() #0		call void asm sideeffect "", "~{v8},~{v9},~{v10},~{v11},~{v12},~{v13},~{v14},~{v15}"() #0
Show All 21 Lines
; enable all lanes and restore.		; enable all lanes and restore.

; GCN-LABEL: {{^}}spill_only_csr_sgpr:		; GCN-LABEL: {{^}}spill_only_csr_sgpr:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GCN-NEXT: s_xor_saveexec_b64		; GCN-NEXT: s_xor_saveexec_b64
; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill		; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
; FLATSCR-NEXT: scratch_store_dword off, v0, s32 ; 4-byte Folded Spill		; FLATSCR-NEXT: scratch_store_dword off, v0, s32 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec,		; GCN-NEXT: s_mov_b64 exec,
		; GCN-NEXT: ; implicit-def: $vgpr0
; GCN-NEXT: v_writelane_b32 v0, s42, 0		; GCN-NEXT: v_writelane_b32 v0, s42, 0
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; clobber s42		; GCN-NEXT: ; clobber s42
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_readlane_b32 s42, v0, 0		; GCN-NEXT: v_readlane_b32 s42, v0, 0
; GCN-NEXT: s_xor_saveexec_b64		; GCN-NEXT: s_xor_saveexec_b64
; MUBUF-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload		; MUBUF-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
; FLATSCR-NEXT: scratch_load_dword v0, off, s32 ; 4-byte Folded Reload		; FLATSCR-NEXT: scratch_load_dword v0, off, s32 ; 4-byte Folded Reload
▲ Show 20 Lines • Show All 157 Lines • ▼ Show 20 Lines
; GCN-LABEL: {{^}}no_unused_non_csr_sgpr_for_fp:		; GCN-LABEL: {{^}}no_unused_non_csr_sgpr_for_fp:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GCN-NEXT: s_mov_b32 vcc_lo, s33		; GCN-NEXT: s_mov_b32 vcc_lo, s33
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32
; GCN-NEXT: s_xor_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_xor_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s33 offset:4 ; 4-byte Folded Spill		; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s33 offset:4 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
; GCN: v_writelane_b32 [[CSR_VGPR]], s30, 0		; GCN-NEXT: ; implicit-def: $vgpr0
; GCN: v_mov_b32_e32 [[ZERO:v[0-9]+]], 0		; GCN: v_mov_b32_e32 [[ZERO:v[0-9]+]], 0
		; GCN: v_writelane_b32 [[CSR_VGPR]], s30, 0
; GCN: v_writelane_b32 [[CSR_VGPR]], s31, 1		; GCN: v_writelane_b32 [[CSR_VGPR]], s31, 1
; MUBUF: buffer_store_dword [[ZERO]], off, s[0:3], s33{{$}}		; MUBUF: buffer_store_dword [[ZERO]], off, s[0:3], s33{{$}}
; FLATSCR: scratch_store_dword off, [[ZERO]], s33{{$}}		; FLATSCR: scratch_store_dword off, [[ZERO]], s33{{$}}
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN: ;;#ASMSTART		; GCN: ;;#ASMSTART
; MUBUF: s_addk_i32 s32, 0x300		; MUBUF: s_addk_i32 s32, 0x300
; FLATSCR: s_add_i32 s32, s32, 12		; FLATSCR: s_add_i32 s32, s32, 12
; GCN: v_readlane_b32 s31, [[CSR_VGPR]], 1		; GCN: v_readlane_b32 s31, [[CSR_VGPR]], 1
Show All 21 Lines	define void @no_unused_non_csr_sgpr_for_fp() #1 {
ret void		ret void
}		}

; Need a new CSR VGPR to satisfy the FP spill.		; Need a new CSR VGPR to satisfy the FP spill.
; GCN-LABEL: {{^}}no_unused_non_csr_sgpr_for_fp_no_scratch_vgpr:		; GCN-LABEL: {{^}}no_unused_non_csr_sgpr_for_fp_no_scratch_vgpr:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GCN-NEXT: s_mov_b32 vcc_lo, s33		; GCN-NEXT: s_mov_b32 vcc_lo, s33
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_xor_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s33 offset:4 ; 4-byte Folded Spill		; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s33 offset:4 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
		; GCN-NEXT: ; implicit-def: $vgpr48

; MUBUF-DAG: buffer_store_dword		; MUBUF-DAG: buffer_store_dword
; FLATSCR-DAG: scratch_store_dword		; FLATSCR-DAG: scratch_store_dword
; MUBUF: s_addk_i32 s32, 0x300{{$}}		; MUBUF: s_addk_i32 s32, 0x300{{$}}
; FLATSCR: s_add_i32 s32, s32, 12{{$}}		; FLATSCR: s_add_i32 s32, s32, 12{{$}}

; GCN: ;;#ASMSTART		; GCN: ;;#ASMSTART
; GCN: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN: s_xor_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s33 offset:4 ; 4-byte Folded Reload		; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s33 offset:4 ; 4-byte Folded Reload		; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s33 offset:4 ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
; MUBUF: s_addk_i32 s32, 0xfd00{{$}}		; MUBUF: s_addk_i32 s32, 0xfd00{{$}}
; FLATSCR: s_add_i32 s32, s32, -12{{$}}		; FLATSCR: s_add_i32 s32, s32, -12{{$}}
; GCN-NEXT: s_mov_b32 s33, vcc_lo		; GCN-NEXT: s_mov_b32 s33, vcc_lo
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
Show All 18 Lines
}		}

; The byval argument exceeds the MUBUF constant offset, so a scratch		; The byval argument exceeds the MUBUF constant offset, so a scratch
; register is needed to access the CSR VGPR slot.		; register is needed to access the CSR VGPR slot.
; GCN-LABEL: {{^}}scratch_reg_needed_mubuf_offset:		; GCN-LABEL: {{^}}scratch_reg_needed_mubuf_offset:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GCN-NEXT: s_mov_b32 vcc_lo, s33		; GCN-NEXT: s_mov_b32 vcc_lo, s33
; GCN-DAG: s_mov_b32 s33, s32		; GCN-DAG: s_mov_b32 s33, s32
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_xor_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: s_add_i32 [[SCRATCH_SGPR:s[0-9]+]], s33, 0x40100		; MUBUF-NEXT: s_add_i32 [[SCRATCH_SGPR:s[0-9]+]], s33, 0x40100
; FLATSCR-NEXT: s_add_i32 [[SCRATCH_SGPR:s[0-9]+]], s33, 0x1004		; FLATSCR-NEXT: s_add_i32 [[SCRATCH_SGPR:s[0-9]+]], s33, 0x1004
; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], [[SCRATCH_SGPR]] ; 4-byte Folded Spill		; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], [[SCRATCH_SGPR]] ; 4-byte Folded Spill
; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], [[SCRATCH_SGPR]] ; 4-byte Folded Spill		; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], [[SCRATCH_SGPR]] ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
; MUBUF-DAG: s_add_i32 s32, s32, 0x40300{{$}}		; MUBUF-DAG: s_add_i32 s32, s32, 0x40300{{$}}
; FLATSCR-DAG: s_addk_i32 s32, 0x100c{{$}}		; FLATSCR-DAG: s_addk_i32 s32, 0x100c{{$}}
; MUBUF-DAG: buffer_store_dword		; MUBUF-DAG: buffer_store_dword
; FLATSCR-DAG: scratch_store_dword		; FLATSCR-DAG: scratch_store_dword

; GCN: ;;#ASMSTART		; GCN: ;;#ASMSTART
; GCN: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN: s_xor_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: s_add_i32 [[SCRATCH_SGPR:s[0-9]+]], s33, 0x40100		; MUBUF-NEXT: s_add_i32 [[SCRATCH_SGPR:s[0-9]+]], s33, 0x40100
; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], [[SCRATCH_SGPR]] ; 4-byte Folded Reload		; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], [[SCRATCH_SGPR]] ; 4-byte Folded Reload
; FLATSCR-NEXT: s_add_i32 [[SCRATCH_SGPR:s[0-9]+]], s33, 0x1004		; FLATSCR-NEXT: s_add_i32 [[SCRATCH_SGPR:s[0-9]+]], s33, 0x1004
; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, [[SCRATCH_SGPR]] ; 4-byte Folded Reload		; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, [[SCRATCH_SGPR]] ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
; MUBUF: s_add_i32 s32, s32, 0xfffbfd00{{$}}		; MUBUF: s_add_i32 s32, s32, 0xfffbfd00{{$}}
; FLATSCR: s_addk_i32 s32, 0xeff4{{$}}		; FLATSCR: s_addk_i32 s32, 0xeff4{{$}}
; GCN-NEXT: s_mov_b32 s33, vcc_lo		; GCN-NEXT: s_mov_b32 s33, vcc_lo
▲ Show 20 Lines • Show All 77 Lines • ▼ Show 20 Lines	define void @callee_need_to_spill_fp_to_memory() #3 {
ret void		ret void
}		}

; If we have a reserved VGPR that can be used for SGPR spills, we may still		; If we have a reserved VGPR that can be used for SGPR spills, we may still
; need to spill the FP to memory if there are no free lanes in the reserved		; need to spill the FP to memory if there are no free lanes in the reserved
; VGPR.		; VGPR.
; GCN-LABEL: {{^}}callee_need_to_spill_fp_to_memory_full_reserved_vgpr:		; GCN-LABEL: {{^}}callee_need_to_spill_fp_to_memory_full_reserved_vgpr:
; MUBUF: s_mov_b32 [[FP_SCRATCH_COPY:s[0-9]+]], s33		; MUBUF: s_mov_b32 [[FP_SCRATCH_COPY:s[0-9]+]], s33
; FLATSCR: s_mov_b32 s33, s0		; FLATSCR: s_mov_b32 s33, s2
; MUBUF: s_mov_b32 s33, s32		; MUBUF: s_mov_b32 s33, s32
; MUBUF: s_xor_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; MUBUF: s_xor_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF: s_mov_b64 exec, [[COPY_EXEC1]]		; MUBUF: s_mov_b64 exec, [[COPY_EXEC1]]
; MUBUF: v_mov_b32_e32 [[TMP_VGPR1:v[0-9]+]], [[FP_SCRATCH_COPY]]		; MUBUF: v_mov_b32_e32 [[TMP_VGPR1:v[0-9]+]], [[FP_SCRATCH_COPY]]
; MUBUF: buffer_store_dword [[TMP_VGPR1]], off, s[0:3], s33 offset:[[OFF:[0-9]+]]		; MUBUF: buffer_store_dword [[TMP_VGPR1]], off, s[0:3], s33 offset:[[OFF:[0-9]+]]
; GCN-NOT: v_writelane_b32 v40, s33		; GCN-NOT: v_writelane_b32 v40, s33
; GCN-NOT: v_readlane_b32 s33, v40		; GCN-NOT: v_readlane_b32 s33, v40
; GCN-NOT: v_readlane_b32 s33, v40		; GCN-NOT: v_readlane_b32 s33, v40
Show All 24 Lines	define void @callee_need_to_spill_fp_to_memory_full_reserved_vgpr() #3 {
ret void		ret void
}		}

; When flat-scratch is enabled, we save the FP to s0. At the same time,		; When flat-scratch is enabled, we save the FP to s0. At the same time,
; the exec register is saved to s0 when saving CSR in the function prolog.		; the exec register is saved to s0 when saving CSR in the function prolog.
; Make sure that the FP save happens after restoring exec from the same		; Make sure that the FP save happens after restoring exec from the same
; register.		; register.
; GCN-LABEL: {{^}}callee_need_to_spill_fp_to_reg:		; GCN-LABEL: {{^}}callee_need_to_spill_fp_to_reg:
; FLATSCR: s_mov_b32 s0, s33		; FLATSCR: s_mov_b32 [[FP_SCRATCH_COPY:s[0-9]+]], s33
; FLATSCR: s_mov_b32 s33, s32		; FLATSCR: s_mov_b32 s33, s32
; GCN-NOT: v_writelane_b32 v40, s33		; GCN-NOT: v_writelane_b32 v40, s33
; FLATSCR: s_or_saveexec_b64 s[2:3], -1		; FLATSCR: s_xor_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; FLATSCR: s_mov_b64 exec, s[2:3]		; FLATSCR: s_mov_b64 exec, [[COPY_EXEC0]]
; FLATSCR: s_or_saveexec_b64 s[2:3], -1		; FLATSCR: s_xor_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; GCN-NOT: v_readlane_b32 s33, v40		; GCN-NOT: v_readlane_b32 s33, v40
; FLATSCR: s_mov_b32 s33, s0		; FLATSCR: s_mov_b32 s33, [[FP_SCRATCH_COPY]]
; GCN: s_setpc_b64		; GCN: s_setpc_b64
define void @callee_need_to_spill_fp_to_reg() #1 {		define void @callee_need_to_spill_fp_to_reg() #1 {
call void asm sideeffect "; clobber nonpreserved SGPRs and 64 CSRs",		call void asm sideeffect "; clobber nonpreserved SGPRs and 64 CSRs",
"~{s4},~{s5},~{s6},~{s7},~{s8},~{s9}		"~{s4},~{s5},~{s6},~{s7},~{s8},~{s9}
,~{s10},~{s11},~{s12},~{s13},~{s14},~{s15},~{s16},~{s17},~{s18},~{s19}		,~{s10},~{s11},~{s12},~{s13},~{s14},~{s15},~{s16},~{s17},~{s18},~{s19}
,~{s20},~{s21},~{s22},~{s23},~{s24},~{s25},~{s26},~{s27},~{s28},~{s29}		,~{s20},~{s21},~{s22},~{s23},~{s24},~{s25},~{s26},~{s27},~{s28},~{s29}
,~{s40},~{s41},~{s42},~{s43},~{s44},~{s45},~{s46},~{s47},~{s48},~{s49}		,~{s40},~{s41},~{s42},~{s43},~{s44},~{s45},~{s46},~{s47},~{s48},~{s49}
,~{s50},~{s51},~{s52},~{s53},~{s54},~{s55},~{s56},~{s57},~{s58},~{s59}		,~{s50},~{s51},~{s52},~{s53},~{s54},~{s55},~{s56},~{s57},~{s58},~{s59}
Show All 16 Lines
; GCN-LABEL: {{^}}spill_fp_to_memory_scratch_reg_needed_mubuf_offset		; GCN-LABEL: {{^}}spill_fp_to_memory_scratch_reg_needed_mubuf_offset
; MUBUF: s_mov_b32 [[FP_SCRATCH_COPY:s[0-9]+]], s33		; MUBUF: s_mov_b32 [[FP_SCRATCH_COPY:s[0-9]+]], s33
; MUBUF-NEXT: s_mov_b32 s33, s32		; MUBUF-NEXT: s_mov_b32 s33, s32
; MUBUF-NEXT: s_xor_saveexec_b64 s[6:7], -1		; MUBUF-NEXT: s_xor_saveexec_b64 s[6:7], -1
; MUBUF-NEXT: s_add_i32 [[SCRATCH_SGPR:s[0-9]+]], s33, 0x40100		; MUBUF-NEXT: s_add_i32 [[SCRATCH_SGPR:s[0-9]+]], s33, 0x40100
; MUBUF-NEXT: buffer_store_dword v39, off, s[0:3], [[SCRATCH_SGPR]] ; 4-byte Folded Spill		; MUBUF-NEXT: buffer_store_dword v39, off, s[0:3], [[SCRATCH_SGPR]] ; 4-byte Folded Spill
; MUBUF: v_mov_b32_e32 v0, [[FP_SCRATCH_COPY]]		; MUBUF: v_mov_b32_e32 v0, [[FP_SCRATCH_COPY]]
; GCN-NOT: v_mov_b32_e32 v0, 0x100c		; GCN-NOT: v_mov_b32_e32 v0, 0x100c
; MUBUF-NEXT: s_add_i32 [[SCRATCH_SGPR:s[0-9]+]], s33, 0x40200		; MUBUF: s_add_i32 [[SCRATCH_SGPR:s[0-9]+]], s33, 0x40200
; MUBUF: buffer_store_dword v0, off, s[0:3], [[SCRATCH_SGPR]] ; 4-byte Folded Spill		; MUBUF: buffer_store_dword v0, off, s[0:3], [[SCRATCH_SGPR]] ; 4-byte Folded Spill
; FLATSCR: v_mov_b32_e32 v0, 0		; FLATSCR: v_mov_b32_e32 v0, 0
; FLATSCR: s_add_i32 [[SOFF:s[0-9]+]], s33, 0x1000		; FLATSCR: s_add_i32 [[SOFF:s[0-9]+]], s33, 0x1000
; FLATSCR: scratch_store_dword off, v0, [[SOFF]]		; FLATSCR: scratch_store_dword off, v0, [[SOFF]]
define void @spill_fp_to_memory_scratch_reg_needed_mubuf_offset([4096 x i8] addrspace(5)* byval([4096 x i8]) align 4 %arg) #3 {		define void @spill_fp_to_memory_scratch_reg_needed_mubuf_offset([4096 x i8] addrspace(5)* byval([4096 x i8]) align 4 %arg) #3 {
%alloca = alloca i32, addrspace(5)		%alloca = alloca i32, addrspace(5)
store volatile i32 0, i32 addrspace(5)* %alloca		store volatile i32 0, i32 addrspace(5)* %alloca

Show All 24 Lines

llvm/test/CodeGen/AMDGPU/cf-loop-on-constant.ll

	Show All 24 Lines
	; GCN-NEXT: s_add_i32 s0, s0, 4			; GCN-NEXT: s_add_i32 s0, s0, 4
	; GCN-NEXT: s_mov_b64 vcc, vcc			; GCN-NEXT: s_mov_b64 vcc, vcc
	; GCN-NEXT: s_cbranch_vccnz .LBB0_2			; GCN-NEXT: s_cbranch_vccnz .LBB0_2
	; GCN-NEXT: .LBB0_3: ; %for.exit			; GCN-NEXT: .LBB0_3: ; %for.exit
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	;			;
	; GCN_DBG-LABEL: test_loop:			; GCN_DBG-LABEL: test_loop:
	; GCN_DBG: ; %bb.0: ; %entry			; GCN_DBG: ; %bb.0: ; %entry
				; GCN_DBG-NEXT: s_mov_b32 s8, SCRATCH_RSRC_DWORD0
				; GCN_DBG-NEXT: s_mov_b32 s9, SCRATCH_RSRC_DWORD1
				; GCN_DBG-NEXT: s_mov_b32 s10, -1
				; GCN_DBG-NEXT: s_mov_b32 s11, 0xe8f000
				; GCN_DBG-NEXT: s_add_u32 s8, s8, s3
				; GCN_DBG-NEXT: s_addc_u32 s9, s9, 0
	; GCN_DBG-NEXT: s_load_dword s2, s[0:1], 0x9			; GCN_DBG-NEXT: s_load_dword s2, s[0:1], 0x9
				; GCN_DBG-NEXT: ; implicit-def: $vgpr0
	; GCN_DBG-NEXT: s_waitcnt lgkmcnt(0)			; GCN_DBG-NEXT: s_waitcnt lgkmcnt(0)
	; GCN_DBG-NEXT: v_writelane_b32 v0, s2, 0			; GCN_DBG-NEXT: v_writelane_b32 v0, s2, 0
	; GCN_DBG-NEXT: s_load_dword s1, s[0:1], 0xa			; GCN_DBG-NEXT: s_load_dword s1, s[0:1], 0xa
	; GCN_DBG-NEXT: s_mov_b32 s0, 0			; GCN_DBG-NEXT: s_mov_b32 s0, 0
	; GCN_DBG-NEXT: s_mov_b32 s2, -1			; GCN_DBG-NEXT: s_mov_b32 s2, -1
	; GCN_DBG-NEXT: s_waitcnt lgkmcnt(0)			; GCN_DBG-NEXT: s_waitcnt lgkmcnt(0)
	; GCN_DBG-NEXT: s_cmp_lg_u32 s1, s2			; GCN_DBG-NEXT: s_cmp_lg_u32 s1, s2
	; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 1			; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 1
				; GCN_DBG-NEXT: s_mov_b64 s[4:5], exec
				; GCN_DBG-NEXT: s_mov_b64 exec, -1
				; GCN_DBG-NEXT: buffer_store_dword v0, off, s[8:11], 0 offset:4 ; 4-byte Folded Spill
				; GCN_DBG-NEXT: s_mov_b64 exec, s[4:5]
	; GCN_DBG-NEXT: s_cbranch_scc1 .LBB0_2			; GCN_DBG-NEXT: s_cbranch_scc1 .LBB0_2
	; GCN_DBG-NEXT: ; %bb.1: ; %for.exit			; GCN_DBG-NEXT: ; %bb.1: ; %for.exit
	; GCN_DBG-NEXT: s_endpgm			; GCN_DBG-NEXT: s_endpgm
	; GCN_DBG-NEXT: .LBB0_2: ; %for.body			; GCN_DBG-NEXT: .LBB0_2: ; %for.body
	; GCN_DBG-NEXT: ; =>This Inner Loop Header: Depth=1			; GCN_DBG-NEXT: ; =>This Inner Loop Header: Depth=1
				; GCN_DBG-NEXT: s_or_saveexec_b64 s[4:5], -1
				; GCN_DBG-NEXT: s_waitcnt expcnt(0)
				; GCN_DBG-NEXT: buffer_load_dword v0, off, s[8:11], 0 offset:4 ; 4-byte Folded Reload
				; GCN_DBG-NEXT: s_mov_b64 exec, s[4:5]
				; GCN_DBG-NEXT: s_waitcnt vmcnt(0)
	; GCN_DBG-NEXT: v_readlane_b32 s0, v0, 1			; GCN_DBG-NEXT: v_readlane_b32 s0, v0, 1
	; GCN_DBG-NEXT: v_readlane_b32 s2, v0, 0			; GCN_DBG-NEXT: v_readlane_b32 s2, v0, 0
	; GCN_DBG-NEXT: s_mov_b32 s1, 2			; GCN_DBG-NEXT: s_mov_b32 s1, 2
	; GCN_DBG-NEXT: s_lshl_b32 s1, s0, s1			; GCN_DBG-NEXT: s_lshl_b32 s1, s0, s1
	; GCN_DBG-NEXT: s_add_i32 s1, s1, s2			; GCN_DBG-NEXT: s_add_i32 s1, s1, s2
	; GCN_DBG-NEXT: s_mov_b32 s2, 0x80			; GCN_DBG-NEXT: s_mov_b32 s2, 0x80
	; GCN_DBG-NEXT: s_add_i32 s1, s1, s2			; GCN_DBG-NEXT: s_add_i32 s1, s1, s2
	; GCN_DBG-NEXT: s_mov_b32 m0, -1			; GCN_DBG-NEXT: s_mov_b32 m0, -1
	; GCN_DBG-NEXT: v_mov_b32_e32 v1, s1			; GCN_DBG-NEXT: v_mov_b32_e32 v1, s1
	; GCN_DBG-NEXT: ds_read_b32 v1, v1			; GCN_DBG-NEXT: ds_read_b32 v1, v1
	; GCN_DBG-NEXT: s_mov_b32 s2, 1.0			; GCN_DBG-NEXT: s_mov_b32 s2, 1.0
	; GCN_DBG-NEXT: s_waitcnt lgkmcnt(0)			; GCN_DBG-NEXT: s_waitcnt lgkmcnt(0)
	; GCN_DBG-NEXT: v_add_f32_e64 v2, v1, s2			; GCN_DBG-NEXT: v_add_f32_e64 v2, v1, s2
	; GCN_DBG-NEXT: s_mov_b32 m0, -1			; GCN_DBG-NEXT: s_mov_b32 m0, -1
	; GCN_DBG-NEXT: v_mov_b32_e32 v1, s1			; GCN_DBG-NEXT: v_mov_b32_e32 v1, s1
	; GCN_DBG-NEXT: ds_write_b32 v1, v2			; GCN_DBG-NEXT: ds_write_b32 v1, v2
	; GCN_DBG-NEXT: s_mov_b32 s1, 1			; GCN_DBG-NEXT: s_mov_b32 s1, 1
	; GCN_DBG-NEXT: s_add_i32 s0, s0, s1			; GCN_DBG-NEXT: s_add_i32 s0, s0, s1
	; GCN_DBG-NEXT: s_mov_b64 s[2:3], -1			; GCN_DBG-NEXT: s_mov_b64 s[2:3], -1
	; GCN_DBG-NEXT: s_and_b64 vcc, exec, s[2:3]			; GCN_DBG-NEXT: s_and_b64 vcc, exec, s[2:3]
	; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 1			; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 1
				; GCN_DBG-NEXT: s_or_saveexec_b64 s[4:5], -1
				; GCN_DBG-NEXT: buffer_store_dword v0, off, s[8:11], 0 offset:4 ; 4-byte Folded Spill
				; GCN_DBG-NEXT: s_mov_b64 exec, s[4:5]
	; GCN_DBG-NEXT: s_cbranch_vccnz .LBB0_2			; GCN_DBG-NEXT: s_cbranch_vccnz .LBB0_2
	; GCN_DBG-NEXT: ; %bb.3: ; %DummyReturnBlock			; GCN_DBG-NEXT: ; %bb.3: ; %DummyReturnBlock
	; GCN_DBG-NEXT: s_endpgm			; GCN_DBG-NEXT: s_endpgm
	entry:			entry:
	%cmp = icmp eq i32 %n, -1			%cmp = icmp eq i32 %n, -1
	br i1 %cmp, label %for.exit, label %for.body			br i1 %cmp, label %for.exit, label %for.body

	for.exit:			for.exit:
	Show All 24 Lines
	; GCN-NEXT: s_waitcnt lgkmcnt(0)			; GCN-NEXT: s_waitcnt lgkmcnt(0)
	; GCN-NEXT: v_add_f32_e32 v1, 1.0, v1			; GCN-NEXT: v_add_f32_e32 v1, 1.0, v1
	; GCN-NEXT: ds_write_b32 v0, v1			; GCN-NEXT: ds_write_b32 v0, v1
	; GCN-NEXT: s_add_i32 s0, s0, 4			; GCN-NEXT: s_add_i32 s0, s0, 4
	; GCN-NEXT: s_branch .LBB1_1			; GCN-NEXT: s_branch .LBB1_1
	;			;
	; GCN_DBG-LABEL: loop_const_true:			; GCN_DBG-LABEL: loop_const_true:
	; GCN_DBG: ; %bb.0: ; %entry			; GCN_DBG: ; %bb.0: ; %entry
				; GCN_DBG-NEXT: s_mov_b32 s8, SCRATCH_RSRC_DWORD0
				; GCN_DBG-NEXT: s_mov_b32 s9, SCRATCH_RSRC_DWORD1
				; GCN_DBG-NEXT: s_mov_b32 s10, -1
				; GCN_DBG-NEXT: s_mov_b32 s11, 0xe8f000
				; GCN_DBG-NEXT: s_add_u32 s8, s8, s3
				; GCN_DBG-NEXT: s_addc_u32 s9, s9, 0
	; GCN_DBG-NEXT: s_load_dword s0, s[0:1], 0x9			; GCN_DBG-NEXT: s_load_dword s0, s[0:1], 0x9
				; GCN_DBG-NEXT: ; implicit-def: $vgpr0
	; GCN_DBG-NEXT: s_waitcnt lgkmcnt(0)			; GCN_DBG-NEXT: s_waitcnt lgkmcnt(0)
	; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 0			; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 0
	; GCN_DBG-NEXT: s_mov_b32 s0, 0			; GCN_DBG-NEXT: s_mov_b32 s0, 0
	; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 1			; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 1
				; GCN_DBG-NEXT: s_or_saveexec_b64 s[4:5], -1
				; GCN_DBG-NEXT: buffer_store_dword v0, off, s[8:11], 0 offset:4 ; 4-byte Folded Spill
				; GCN_DBG-NEXT: s_mov_b64 exec, s[4:5]
	; GCN_DBG-NEXT: s_branch .LBB1_2			; GCN_DBG-NEXT: s_branch .LBB1_2
	; GCN_DBG-NEXT: .LBB1_1: ; %for.exit			; GCN_DBG-NEXT: .LBB1_1: ; %for.exit
	; GCN_DBG-NEXT: s_endpgm			; GCN_DBG-NEXT: s_endpgm
	; GCN_DBG-NEXT: .LBB1_2: ; %for.body			; GCN_DBG-NEXT: .LBB1_2: ; %for.body
	; GCN_DBG-NEXT: ; =>This Inner Loop Header: Depth=1			; GCN_DBG-NEXT: ; =>This Inner Loop Header: Depth=1
				; GCN_DBG-NEXT: s_or_saveexec_b64 s[4:5], -1
				; GCN_DBG-NEXT: s_waitcnt expcnt(0)
				; GCN_DBG-NEXT: buffer_load_dword v0, off, s[8:11], 0 offset:4 ; 4-byte Folded Reload
				; GCN_DBG-NEXT: s_mov_b64 exec, s[4:5]
				; GCN_DBG-NEXT: s_waitcnt vmcnt(0)
	; GCN_DBG-NEXT: v_readlane_b32 s0, v0, 1			; GCN_DBG-NEXT: v_readlane_b32 s0, v0, 1
	; GCN_DBG-NEXT: v_readlane_b32 s2, v0, 0			; GCN_DBG-NEXT: v_readlane_b32 s2, v0, 0
	; GCN_DBG-NEXT: s_mov_b32 s1, 2			; GCN_DBG-NEXT: s_mov_b32 s1, 2
	; GCN_DBG-NEXT: s_lshl_b32 s1, s0, s1			; GCN_DBG-NEXT: s_lshl_b32 s1, s0, s1
	; GCN_DBG-NEXT: s_add_i32 s1, s1, s2			; GCN_DBG-NEXT: s_add_i32 s1, s1, s2
	; GCN_DBG-NEXT: s_mov_b32 s2, 0x80			; GCN_DBG-NEXT: s_mov_b32 s2, 0x80
	; GCN_DBG-NEXT: s_add_i32 s1, s1, s2			; GCN_DBG-NEXT: s_add_i32 s1, s1, s2
	; GCN_DBG-NEXT: s_mov_b32 m0, -1			; GCN_DBG-NEXT: s_mov_b32 m0, -1
	; GCN_DBG-NEXT: v_mov_b32_e32 v1, s1			; GCN_DBG-NEXT: v_mov_b32_e32 v1, s1
	; GCN_DBG-NEXT: ds_read_b32 v1, v1			; GCN_DBG-NEXT: ds_read_b32 v1, v1
	; GCN_DBG-NEXT: s_mov_b32 s2, 1.0			; GCN_DBG-NEXT: s_mov_b32 s2, 1.0
	; GCN_DBG-NEXT: s_waitcnt lgkmcnt(0)			; GCN_DBG-NEXT: s_waitcnt lgkmcnt(0)
	; GCN_DBG-NEXT: v_add_f32_e64 v2, v1, s2			; GCN_DBG-NEXT: v_add_f32_e64 v2, v1, s2
	; GCN_DBG-NEXT: s_mov_b32 m0, -1			; GCN_DBG-NEXT: s_mov_b32 m0, -1
	; GCN_DBG-NEXT: v_mov_b32_e32 v1, s1			; GCN_DBG-NEXT: v_mov_b32_e32 v1, s1
	; GCN_DBG-NEXT: ds_write_b32 v1, v2			; GCN_DBG-NEXT: ds_write_b32 v1, v2
	; GCN_DBG-NEXT: s_mov_b32 s1, 1			; GCN_DBG-NEXT: s_mov_b32 s1, 1
	; GCN_DBG-NEXT: s_add_i32 s0, s0, s1			; GCN_DBG-NEXT: s_add_i32 s0, s0, s1
	; GCN_DBG-NEXT: s_mov_b64 s[2:3], 0			; GCN_DBG-NEXT: s_mov_b64 s[2:3], 0
	; GCN_DBG-NEXT: s_and_b64 vcc, exec, s[2:3]			; GCN_DBG-NEXT: s_and_b64 vcc, exec, s[2:3]
	; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 1			; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 1
				; GCN_DBG-NEXT: s_or_saveexec_b64 s[4:5], -1
				; GCN_DBG-NEXT: buffer_store_dword v0, off, s[8:11], 0 offset:4 ; 4-byte Folded Spill
				; GCN_DBG-NEXT: s_mov_b64 exec, s[4:5]
	; GCN_DBG-NEXT: s_cbranch_vccnz .LBB1_1			; GCN_DBG-NEXT: s_cbranch_vccnz .LBB1_1
	; GCN_DBG-NEXT: s_branch .LBB1_2			; GCN_DBG-NEXT: s_branch .LBB1_2
	entry:			entry:
	br label %for.body			br label %for.body

	for.exit:			for.exit:
	ret void			ret void

	Show All 18 Lines
	; GCN-NEXT: ds_read_b32 v1, v0 offset:128			; GCN-NEXT: ds_read_b32 v1, v0 offset:128
	; GCN-NEXT: s_waitcnt lgkmcnt(0)			; GCN-NEXT: s_waitcnt lgkmcnt(0)
	; GCN-NEXT: v_add_f32_e32 v1, 1.0, v1			; GCN-NEXT: v_add_f32_e32 v1, 1.0, v1
	; GCN-NEXT: ds_write_b32 v0, v1 offset:128			; GCN-NEXT: ds_write_b32 v0, v1 offset:128
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	;			;
	; GCN_DBG-LABEL: loop_const_false:			; GCN_DBG-LABEL: loop_const_false:
	; GCN_DBG: ; %bb.0: ; %entry			; GCN_DBG: ; %bb.0: ; %entry
				; GCN_DBG-NEXT: s_mov_b32 s8, SCRATCH_RSRC_DWORD0
				; GCN_DBG-NEXT: s_mov_b32 s9, SCRATCH_RSRC_DWORD1
				; GCN_DBG-NEXT: s_mov_b32 s10, -1
				; GCN_DBG-NEXT: s_mov_b32 s11, 0xe8f000
				; GCN_DBG-NEXT: s_add_u32 s8, s8, s3
				; GCN_DBG-NEXT: s_addc_u32 s9, s9, 0
	; GCN_DBG-NEXT: s_load_dword s0, s[0:1], 0x9			; GCN_DBG-NEXT: s_load_dword s0, s[0:1], 0x9
				; GCN_DBG-NEXT: ; implicit-def: $vgpr0
	; GCN_DBG-NEXT: s_waitcnt lgkmcnt(0)			; GCN_DBG-NEXT: s_waitcnt lgkmcnt(0)
	; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 0			; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 0
	; GCN_DBG-NEXT: s_mov_b32 s0, 0			; GCN_DBG-NEXT: s_mov_b32 s0, 0
	; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 1			; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 1
				; GCN_DBG-NEXT: s_or_saveexec_b64 s[4:5], -1
				; GCN_DBG-NEXT: buffer_store_dword v0, off, s[8:11], 0 offset:4 ; 4-byte Folded Spill
				; GCN_DBG-NEXT: s_mov_b64 exec, s[4:5]
	; GCN_DBG-NEXT: s_branch .LBB2_2			; GCN_DBG-NEXT: s_branch .LBB2_2
	; GCN_DBG-NEXT: .LBB2_1: ; %for.exit			; GCN_DBG-NEXT: .LBB2_1: ; %for.exit
	; GCN_DBG-NEXT: s_endpgm			; GCN_DBG-NEXT: s_endpgm
	; GCN_DBG-NEXT: .LBB2_2: ; %for.body			; GCN_DBG-NEXT: .LBB2_2: ; %for.body
	; GCN_DBG-NEXT: ; =>This Inner Loop Header: Depth=1			; GCN_DBG-NEXT: ; =>This Inner Loop Header: Depth=1
				; GCN_DBG-NEXT: s_or_saveexec_b64 s[4:5], -1
				; GCN_DBG-NEXT: s_waitcnt expcnt(0)
				; GCN_DBG-NEXT: buffer_load_dword v0, off, s[8:11], 0 offset:4 ; 4-byte Folded Reload
				; GCN_DBG-NEXT: s_mov_b64 exec, s[4:5]
				; GCN_DBG-NEXT: s_waitcnt vmcnt(0)
	; GCN_DBG-NEXT: v_readlane_b32 s0, v0, 1			; GCN_DBG-NEXT: v_readlane_b32 s0, v0, 1
	; GCN_DBG-NEXT: v_readlane_b32 s2, v0, 0			; GCN_DBG-NEXT: v_readlane_b32 s2, v0, 0
	; GCN_DBG-NEXT: s_mov_b32 s1, 2			; GCN_DBG-NEXT: s_mov_b32 s1, 2
	; GCN_DBG-NEXT: s_lshl_b32 s1, s0, s1			; GCN_DBG-NEXT: s_lshl_b32 s1, s0, s1
	; GCN_DBG-NEXT: s_add_i32 s1, s1, s2			; GCN_DBG-NEXT: s_add_i32 s1, s1, s2
	; GCN_DBG-NEXT: s_mov_b32 s2, 0x80			; GCN_DBG-NEXT: s_mov_b32 s2, 0x80
	; GCN_DBG-NEXT: s_add_i32 s1, s1, s2			; GCN_DBG-NEXT: s_add_i32 s1, s1, s2
	; GCN_DBG-NEXT: s_mov_b32 m0, -1			; GCN_DBG-NEXT: s_mov_b32 m0, -1
	; GCN_DBG-NEXT: v_mov_b32_e32 v1, s1			; GCN_DBG-NEXT: v_mov_b32_e32 v1, s1
	; GCN_DBG-NEXT: ds_read_b32 v1, v1			; GCN_DBG-NEXT: ds_read_b32 v1, v1
	; GCN_DBG-NEXT: s_mov_b32 s2, 1.0			; GCN_DBG-NEXT: s_mov_b32 s2, 1.0
	; GCN_DBG-NEXT: s_waitcnt lgkmcnt(0)			; GCN_DBG-NEXT: s_waitcnt lgkmcnt(0)
	; GCN_DBG-NEXT: v_add_f32_e64 v2, v1, s2			; GCN_DBG-NEXT: v_add_f32_e64 v2, v1, s2
	; GCN_DBG-NEXT: s_mov_b32 m0, -1			; GCN_DBG-NEXT: s_mov_b32 m0, -1
	; GCN_DBG-NEXT: v_mov_b32_e32 v1, s1			; GCN_DBG-NEXT: v_mov_b32_e32 v1, s1
	; GCN_DBG-NEXT: ds_write_b32 v1, v2			; GCN_DBG-NEXT: ds_write_b32 v1, v2
	; GCN_DBG-NEXT: s_mov_b32 s1, 1			; GCN_DBG-NEXT: s_mov_b32 s1, 1
	; GCN_DBG-NEXT: s_add_i32 s0, s0, s1			; GCN_DBG-NEXT: s_add_i32 s0, s0, s1
	; GCN_DBG-NEXT: s_mov_b64 s[2:3], -1			; GCN_DBG-NEXT: s_mov_b64 s[2:3], -1
	; GCN_DBG-NEXT: s_and_b64 vcc, exec, s[2:3]			; GCN_DBG-NEXT: s_and_b64 vcc, exec, s[2:3]
	; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 1			; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 1
				; GCN_DBG-NEXT: s_or_saveexec_b64 s[4:5], -1
				; GCN_DBG-NEXT: buffer_store_dword v0, off, s[8:11], 0 offset:4 ; 4-byte Folded Spill
				; GCN_DBG-NEXT: s_mov_b64 exec, s[4:5]
	; GCN_DBG-NEXT: s_cbranch_vccnz .LBB2_1			; GCN_DBG-NEXT: s_cbranch_vccnz .LBB2_1
	; GCN_DBG-NEXT: s_branch .LBB2_2			; GCN_DBG-NEXT: s_branch .LBB2_2
	entry:			entry:
	br label %for.body			br label %for.body

	for.exit:			for.exit:
	ret void			ret void

	Show All 19 Lines
	; GCN-NEXT: ds_read_b32 v1, v0 offset:128			; GCN-NEXT: ds_read_b32 v1, v0 offset:128
	; GCN-NEXT: s_waitcnt lgkmcnt(0)			; GCN-NEXT: s_waitcnt lgkmcnt(0)
	; GCN-NEXT: v_add_f32_e32 v1, 1.0, v1			; GCN-NEXT: v_add_f32_e32 v1, 1.0, v1
	; GCN-NEXT: ds_write_b32 v0, v1 offset:128			; GCN-NEXT: ds_write_b32 v0, v1 offset:128
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	;			;
	; GCN_DBG-LABEL: loop_const_undef:			; GCN_DBG-LABEL: loop_const_undef:
	; GCN_DBG: ; %bb.0: ; %entry			; GCN_DBG: ; %bb.0: ; %entry
				; GCN_DBG-NEXT: s_mov_b32 s8, SCRATCH_RSRC_DWORD0
				; GCN_DBG-NEXT: s_mov_b32 s9, SCRATCH_RSRC_DWORD1
				; GCN_DBG-NEXT: s_mov_b32 s10, -1
				; GCN_DBG-NEXT: s_mov_b32 s11, 0xe8f000
				; GCN_DBG-NEXT: s_add_u32 s8, s8, s3
				; GCN_DBG-NEXT: s_addc_u32 s9, s9, 0
	; GCN_DBG-NEXT: s_load_dword s0, s[0:1], 0x9			; GCN_DBG-NEXT: s_load_dword s0, s[0:1], 0x9
				; GCN_DBG-NEXT: ; implicit-def: $vgpr0
	; GCN_DBG-NEXT: s_waitcnt lgkmcnt(0)			; GCN_DBG-NEXT: s_waitcnt lgkmcnt(0)
	; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 0			; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 0
	; GCN_DBG-NEXT: s_mov_b32 s0, 0			; GCN_DBG-NEXT: s_mov_b32 s0, 0
	; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 1			; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 1
				; GCN_DBG-NEXT: s_or_saveexec_b64 s[4:5], -1
				; GCN_DBG-NEXT: buffer_store_dword v0, off, s[8:11], 0 offset:4 ; 4-byte Folded Spill
				; GCN_DBG-NEXT: s_mov_b64 exec, s[4:5]
	; GCN_DBG-NEXT: s_branch .LBB3_2			; GCN_DBG-NEXT: s_branch .LBB3_2
	; GCN_DBG-NEXT: .LBB3_1: ; %for.exit			; GCN_DBG-NEXT: .LBB3_1: ; %for.exit
	; GCN_DBG-NEXT: s_endpgm			; GCN_DBG-NEXT: s_endpgm
	; GCN_DBG-NEXT: .LBB3_2: ; %for.body			; GCN_DBG-NEXT: .LBB3_2: ; %for.body
	; GCN_DBG-NEXT: ; =>This Inner Loop Header: Depth=1			; GCN_DBG-NEXT: ; =>This Inner Loop Header: Depth=1
				; GCN_DBG-NEXT: s_or_saveexec_b64 s[4:5], -1
				; GCN_DBG-NEXT: s_waitcnt expcnt(0)
				; GCN_DBG-NEXT: buffer_load_dword v0, off, s[8:11], 0 offset:4 ; 4-byte Folded Reload
				; GCN_DBG-NEXT: s_mov_b64 exec, s[4:5]
				; GCN_DBG-NEXT: s_waitcnt vmcnt(0)
	; GCN_DBG-NEXT: v_readlane_b32 s0, v0, 1			; GCN_DBG-NEXT: v_readlane_b32 s0, v0, 1
	; GCN_DBG-NEXT: v_readlane_b32 s2, v0, 0			; GCN_DBG-NEXT: v_readlane_b32 s2, v0, 0
	; GCN_DBG-NEXT: s_mov_b32 s1, 2			; GCN_DBG-NEXT: s_mov_b32 s1, 2
	; GCN_DBG-NEXT: s_lshl_b32 s1, s0, s1			; GCN_DBG-NEXT: s_lshl_b32 s1, s0, s1
	; GCN_DBG-NEXT: s_add_i32 s1, s1, s2			; GCN_DBG-NEXT: s_add_i32 s1, s1, s2
	; GCN_DBG-NEXT: s_mov_b32 s2, 0x80			; GCN_DBG-NEXT: s_mov_b32 s2, 0x80
	; GCN_DBG-NEXT: s_add_i32 s1, s1, s2			; GCN_DBG-NEXT: s_add_i32 s1, s1, s2
	; GCN_DBG-NEXT: s_mov_b32 m0, -1			; GCN_DBG-NEXT: s_mov_b32 m0, -1
	; GCN_DBG-NEXT: v_mov_b32_e32 v1, s1			; GCN_DBG-NEXT: v_mov_b32_e32 v1, s1
	; GCN_DBG-NEXT: ds_read_b32 v1, v1			; GCN_DBG-NEXT: ds_read_b32 v1, v1
	; GCN_DBG-NEXT: s_mov_b32 s2, 1.0			; GCN_DBG-NEXT: s_mov_b32 s2, 1.0
	; GCN_DBG-NEXT: s_waitcnt lgkmcnt(0)			; GCN_DBG-NEXT: s_waitcnt lgkmcnt(0)
	; GCN_DBG-NEXT: v_add_f32_e64 v2, v1, s2			; GCN_DBG-NEXT: v_add_f32_e64 v2, v1, s2
	; GCN_DBG-NEXT: s_mov_b32 m0, -1			; GCN_DBG-NEXT: s_mov_b32 m0, -1
	; GCN_DBG-NEXT: v_mov_b32_e32 v1, s1			; GCN_DBG-NEXT: v_mov_b32_e32 v1, s1
	; GCN_DBG-NEXT: ds_write_b32 v1, v2			; GCN_DBG-NEXT: ds_write_b32 v1, v2
	; GCN_DBG-NEXT: s_mov_b32 s1, 1			; GCN_DBG-NEXT: s_mov_b32 s1, 1
	; GCN_DBG-NEXT: s_add_i32 s0, s0, s1			; GCN_DBG-NEXT: s_add_i32 s0, s0, s1
	; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 1			; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 1
				; GCN_DBG-NEXT: s_or_saveexec_b64 s[4:5], -1
				; GCN_DBG-NEXT: buffer_store_dword v0, off, s[8:11], 0 offset:4 ; 4-byte Folded Spill
				; GCN_DBG-NEXT: s_mov_b64 exec, s[4:5]
	; GCN_DBG-NEXT: s_cbranch_scc1 .LBB3_1			; GCN_DBG-NEXT: s_cbranch_scc1 .LBB3_1
	; GCN_DBG-NEXT: s_branch .LBB3_2			; GCN_DBG-NEXT: s_branch .LBB3_2
	entry:			entry:
	br label %for.body			br label %for.body

	for.exit:			for.exit:
	ret void			ret void

	Show All 33 Lines
	; GCN-NEXT: s_add_i32 s0, s0, 4			; GCN-NEXT: s_add_i32 s0, s0, 4
	; GCN-NEXT: s_mov_b64 vcc, vcc			; GCN-NEXT: s_mov_b64 vcc, vcc
	; GCN-NEXT: s_cbranch_vccz .LBB4_1			; GCN-NEXT: s_cbranch_vccz .LBB4_1
	; GCN-NEXT: ; %bb.2: ; %for.exit			; GCN-NEXT: ; %bb.2: ; %for.exit
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	;			;
	; GCN_DBG-LABEL: loop_arg_0:			; GCN_DBG-LABEL: loop_arg_0:
	; GCN_DBG: ; %bb.0: ; %entry			; GCN_DBG: ; %bb.0: ; %entry
				; GCN_DBG-NEXT: s_mov_b32 s8, SCRATCH_RSRC_DWORD0
				; GCN_DBG-NEXT: s_mov_b32 s9, SCRATCH_RSRC_DWORD1
				; GCN_DBG-NEXT: s_mov_b32 s10, -1
				; GCN_DBG-NEXT: s_mov_b32 s11, 0xe8f000
				; GCN_DBG-NEXT: s_add_u32 s8, s8, s3
				; GCN_DBG-NEXT: s_addc_u32 s9, s9, 0
	; GCN_DBG-NEXT: s_load_dword s0, s[0:1], 0x9			; GCN_DBG-NEXT: s_load_dword s0, s[0:1], 0x9
				; GCN_DBG-NEXT: ; implicit-def: $vgpr0
	; GCN_DBG-NEXT: s_waitcnt lgkmcnt(0)			; GCN_DBG-NEXT: s_waitcnt lgkmcnt(0)
	; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 0			; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 0
	; GCN_DBG-NEXT: v_mov_b32_e32 v1, 0			; GCN_DBG-NEXT: v_mov_b32_e32 v1, 0
	; GCN_DBG-NEXT: s_mov_b32 m0, -1			; GCN_DBG-NEXT: s_mov_b32 m0, -1
	; GCN_DBG-NEXT: ds_read_u8 v1, v1			; GCN_DBG-NEXT: ds_read_u8 v1, v1
	; GCN_DBG-NEXT: s_waitcnt lgkmcnt(0)			; GCN_DBG-NEXT: s_waitcnt lgkmcnt(0)
	; GCN_DBG-NEXT: v_readfirstlane_b32 s0, v1			; GCN_DBG-NEXT: v_readfirstlane_b32 s0, v1
	; GCN_DBG-NEXT: s_and_b32 s0, 1, s0			; GCN_DBG-NEXT: s_and_b32 s0, 1, s0
	; GCN_DBG-NEXT: s_cmp_eq_u32 s0, 1			; GCN_DBG-NEXT: s_cmp_eq_u32 s0, 1
	; GCN_DBG-NEXT: s_cselect_b64 s[0:1], -1, 0			; GCN_DBG-NEXT: s_cselect_b64 s[0:1], -1, 0
	; GCN_DBG-NEXT: s_mov_b64 s[2:3], -1			; GCN_DBG-NEXT: s_mov_b64 s[2:3], -1
	; GCN_DBG-NEXT: s_xor_b64 s[0:1], s[0:1], s[2:3]			; GCN_DBG-NEXT: s_xor_b64 s[0:1], s[0:1], s[2:3]
	; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 1			; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 1
	; GCN_DBG-NEXT: v_writelane_b32 v0, s1, 2			; GCN_DBG-NEXT: v_writelane_b32 v0, s1, 2
	; GCN_DBG-NEXT: s_mov_b32 s0, 0			; GCN_DBG-NEXT: s_mov_b32 s0, 0
	; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 3			; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 3
				; GCN_DBG-NEXT: s_or_saveexec_b64 s[6:7], -1
				; GCN_DBG-NEXT: buffer_store_dword v0, off, s[8:11], 0 offset:4 ; 4-byte Folded Spill
				; GCN_DBG-NEXT: s_mov_b64 exec, s[6:7]
	; GCN_DBG-NEXT: s_branch .LBB4_2			; GCN_DBG-NEXT: s_branch .LBB4_2
	; GCN_DBG-NEXT: .LBB4_1: ; %for.exit			; GCN_DBG-NEXT: .LBB4_1: ; %for.exit
	; GCN_DBG-NEXT: s_endpgm			; GCN_DBG-NEXT: s_endpgm
	; GCN_DBG-NEXT: .LBB4_2: ; %for.body			; GCN_DBG-NEXT: .LBB4_2: ; %for.body
	; GCN_DBG-NEXT: ; =>This Inner Loop Header: Depth=1			; GCN_DBG-NEXT: ; =>This Inner Loop Header: Depth=1
				; GCN_DBG-NEXT: s_or_saveexec_b64 s[6:7], -1
				; GCN_DBG-NEXT: s_waitcnt expcnt(0)
				; GCN_DBG-NEXT: buffer_load_dword v0, off, s[8:11], 0 offset:4 ; 4-byte Folded Reload
				; GCN_DBG-NEXT: s_mov_b64 exec, s[6:7]
				; GCN_DBG-NEXT: s_waitcnt vmcnt(0)
	; GCN_DBG-NEXT: v_readlane_b32 s0, v0, 3			; GCN_DBG-NEXT: v_readlane_b32 s0, v0, 3
	; GCN_DBG-NEXT: v_readlane_b32 s2, v0, 1			; GCN_DBG-NEXT: v_readlane_b32 s2, v0, 1
	; GCN_DBG-NEXT: v_readlane_b32 s3, v0, 2			; GCN_DBG-NEXT: v_readlane_b32 s3, v0, 2
	; GCN_DBG-NEXT: v_readlane_b32 s4, v0, 0			; GCN_DBG-NEXT: v_readlane_b32 s4, v0, 0
	; GCN_DBG-NEXT: s_mov_b32 s1, 2			; GCN_DBG-NEXT: s_mov_b32 s1, 2
	; GCN_DBG-NEXT: s_lshl_b32 s1, s0, s1			; GCN_DBG-NEXT: s_lshl_b32 s1, s0, s1
	; GCN_DBG-NEXT: s_add_i32 s1, s1, s4			; GCN_DBG-NEXT: s_add_i32 s1, s1, s4
	; GCN_DBG-NEXT: s_mov_b32 s4, 0x80			; GCN_DBG-NEXT: s_mov_b32 s4, 0x80
	; GCN_DBG-NEXT: s_add_i32 s1, s1, s4			; GCN_DBG-NEXT: s_add_i32 s1, s1, s4
	; GCN_DBG-NEXT: s_mov_b32 m0, -1			; GCN_DBG-NEXT: s_mov_b32 m0, -1
	; GCN_DBG-NEXT: v_mov_b32_e32 v1, s1			; GCN_DBG-NEXT: v_mov_b32_e32 v1, s1
	; GCN_DBG-NEXT: ds_read_b32 v1, v1			; GCN_DBG-NEXT: ds_read_b32 v1, v1
	; GCN_DBG-NEXT: s_mov_b32 s4, 1.0			; GCN_DBG-NEXT: s_mov_b32 s4, 1.0
	; GCN_DBG-NEXT: s_waitcnt lgkmcnt(0)			; GCN_DBG-NEXT: s_waitcnt lgkmcnt(0)
	; GCN_DBG-NEXT: v_add_f32_e64 v2, v1, s4			; GCN_DBG-NEXT: v_add_f32_e64 v2, v1, s4
	; GCN_DBG-NEXT: s_mov_b32 m0, -1			; GCN_DBG-NEXT: s_mov_b32 m0, -1
	; GCN_DBG-NEXT: v_mov_b32_e32 v1, s1			; GCN_DBG-NEXT: v_mov_b32_e32 v1, s1
	; GCN_DBG-NEXT: ds_write_b32 v1, v2			; GCN_DBG-NEXT: ds_write_b32 v1, v2
	; GCN_DBG-NEXT: s_mov_b32 s1, 1			; GCN_DBG-NEXT: s_mov_b32 s1, 1
	; GCN_DBG-NEXT: s_add_i32 s0, s0, s1			; GCN_DBG-NEXT: s_add_i32 s0, s0, s1
	; GCN_DBG-NEXT: s_and_b64 vcc, exec, s[2:3]			; GCN_DBG-NEXT: s_and_b64 vcc, exec, s[2:3]
	; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 3			; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 3
				; GCN_DBG-NEXT: s_or_saveexec_b64 s[6:7], -1
				; GCN_DBG-NEXT: buffer_store_dword v0, off, s[8:11], 0 offset:4 ; 4-byte Folded Spill
				; GCN_DBG-NEXT: s_mov_b64 exec, s[6:7]
	; GCN_DBG-NEXT: s_cbranch_vccnz .LBB4_1			; GCN_DBG-NEXT: s_cbranch_vccnz .LBB4_1
	; GCN_DBG-NEXT: s_branch .LBB4_2			; GCN_DBG-NEXT: s_branch .LBB4_2
	entry:			entry:
	%cond = load volatile i1, i1 addrspace(3)* null			%cond = load volatile i1, i1 addrspace(3)* null
	br label %for.body			br label %for.body

	for.exit:			for.exit:
	ret void			ret void
	Show All 11 Lines

llvm/test/CodeGen/AMDGPU/collapse-endcf.ll

	Show All 13 Lines
	; GCN-NEXT: s_or_b64 exec, exec, [[SAVEEXEC]]			; GCN-NEXT: s_or_b64 exec, exec, [[SAVEEXEC]]
	; GCN: ds_write_b32			; GCN: ds_write_b32
	; GCN: s_endpgm			; GCN: s_endpgm
	;			;
	; GCN-O0-LABEL: {{^}}simple_nested_if:			; GCN-O0-LABEL: {{^}}simple_nested_if:
	; GCN-O0: s_mov_b64 s[{{[0-9:]+}}], exec			; GCN-O0: s_mov_b64 s[{{[0-9:]+}}], exec
	; GCN-O0-DAG: v_writelane_b32 [[VGPR:v[0-9]+]], s{{[0-9]+}}, [[OUTER_SPILL_LANE_0:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR:v[0-9]+]], s{{[0-9]+}}, [[OUTER_SPILL_LANE_0:[0-9]+]]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[OUTER_SPILL_LANE_1:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[OUTER_SPILL_LANE_1:[0-9]+]]
	; GCN-O0-NEXT: s_and_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}], s[{{[0-9:]+}}]			; GCN-O0-DAG: s_and_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}], s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_mov_b64 exec, s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_mov_b64 exec, s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_cbranch_execz [[ENDIF_OUTER:.LBB[0-9_]+]]			; GCN-O0-NEXT: s_cbranch_execz [[ENDIF_OUTER:.LBB[0-9_]+]]
	; GCN-O0-NEXT: ; %bb.{{[0-9]+}}:			; GCN-O0-NEXT: ; %bb.{{[0-9]+}}:
	; GCN-O0: s_mov_b64 s[{{[0-9:]+}}], exec			; GCN-O0: s_mov_b64 s[{{[0-9:]+}}], exec
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_SPILL_LANE_0:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_SPILL_LANE_0:[0-9]+]]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_SPILL_LANE_1:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_SPILL_LANE_1:[0-9]+]]
	; GCN-O0-NEXT: s_and_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}], s[{{[0-9:]+}}]			; GCN-O0-DAG: s_and_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}], s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_mov_b64 exec, s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_mov_b64 exec, s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_cbranch_execz [[ENDIF_INNER:.LBB[0-9_]+]]			; GCN-O0-NEXT: s_cbranch_execz [[ENDIF_INNER:.LBB[0-9_]+]]
	; GCN-O0-NEXT: ; %bb.{{[0-9]+}}:			; GCN-O0-NEXT: ; %bb.{{[0-9]+}}:
	; GCN-O0: store_dword			; GCN-O0: store_dword
	; GCN-O0-NEXT: {{^}}[[ENDIF_INNER]]:			; GCN-O0-NEXT: {{^}}[[ENDIF_INNER]]:
	; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[INNER_SPILL_LANE_0]]			; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[INNER_SPILL_LANE_0]]
	; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[INNER_SPILL_LANE_1]]			; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[INNER_SPILL_LANE_1]]
	; GCN-O0-NEXT: s_or_b64 exec, exec, s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_or_b64 exec, exec, s[{{[0-9:]+}}]
	▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
	; GCN-NEXT: s_or_b64 exec, exec, [[SAVEEXEC_OUTER]]			; GCN-NEXT: s_or_b64 exec, exec, [[SAVEEXEC_OUTER]]
	; GCN: ds_write_b32			; GCN: ds_write_b32
	; GCN: s_endpgm			; GCN: s_endpgm
	;			;
	; GCN-O0-LABEL: {{^}}uncollapsable_nested_if:			; GCN-O0-LABEL: {{^}}uncollapsable_nested_if:
	; GCN-O0: s_mov_b64 s[{{[0-9:]+}}], exec			; GCN-O0: s_mov_b64 s[{{[0-9:]+}}], exec
	; GCN-O0-DAG: v_writelane_b32 [[VGPR:v[0-9]+]], s{{[0-9]+}}, [[OUTER_SPILL_LANE_0:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR:v[0-9]+]], s{{[0-9]+}}, [[OUTER_SPILL_LANE_0:[0-9]+]]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[OUTER_SPILL_LANE_1:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[OUTER_SPILL_LANE_1:[0-9]+]]
	; GCN-O0-NEXT: s_and_b64 s[{{[0-9:]+}}]			; GCN-O0-DAG: s_and_b64 s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_mov_b64 exec, s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_mov_b64 exec, s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_cbranch_execz [[ENDIF_OUTER:.LBB[0-9_]+]]			; GCN-O0-NEXT: s_cbranch_execz [[ENDIF_OUTER:.LBB[0-9_]+]]
	; GCN-O0-NEXT: ; %bb.{{[0-9]+}}:			; GCN-O0-NEXT: ; %bb.{{[0-9]+}}:
	; GCN-O0: s_mov_b64 s[{{[0-9:]+}}], exec			; GCN-O0: s_mov_b64 s[{{[0-9:]+}}], exec
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_SPILL_LANE_0:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_SPILL_LANE_0:[0-9]+]]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_SPILL_LANE_1:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_SPILL_LANE_1:[0-9]+]]
	; GCN-O0-NEXT: s_and_b64 s[{{[0-9:]+}}]			; GCN-O0-DAG: s_and_b64 s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_mov_b64 exec, s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_mov_b64 exec, s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_cbranch_execz [[ENDIF_INNER:.LBB[0-9_]+]]			; GCN-O0-NEXT: s_cbranch_execz [[ENDIF_INNER:.LBB[0-9_]+]]
	; GCN-O0-NEXT: ; %bb.{{[0-9]+}}:			; GCN-O0-NEXT: ; %bb.{{[0-9]+}}:
	; GCN-O0: store_dword			; GCN-O0: store_dword
	; GCN-O0-NEXT: s_branch [[ENDIF_INNER]]			; GCN-O0-NEXT: s_branch [[ENDIF_INNER]]
	; GCN-O0-NEXT: {{^}}[[ENDIF_OUTER]]:			; GCN-O0-NEXT: {{^}}[[ENDIF_OUTER]]:
	; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[OUTER_SPILL_LANE_0]]			; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[OUTER_SPILL_LANE_0]]
	; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[OUTER_SPILL_LANE_1]]			; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[OUTER_SPILL_LANE_1]]
	▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
	; GCN-NEXT: s_or_b64 exec, exec, [[SAVEEXEC_OUTER]]			; GCN-NEXT: s_or_b64 exec, exec, [[SAVEEXEC_OUTER]]
	; GCN: ds_write_b32			; GCN: ds_write_b32
	; GCN: s_endpgm			; GCN: s_endpgm
	;			;
	; GCN-O0-LABEL: {{^}}nested_if_if_else:			; GCN-O0-LABEL: {{^}}nested_if_if_else:
	; GCN-O0: s_mov_b64 s[{{[0-9:]+}}], exec			; GCN-O0: s_mov_b64 s[{{[0-9:]+}}], exec
	; GCN-O0-DAG: v_writelane_b32 [[VGPR:v[0-9]+]], s{{[0-9]+}}, [[OUTER_SPILL_LANE_0:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR:v[0-9]+]], s{{[0-9]+}}, [[OUTER_SPILL_LANE_0:[0-9]+]]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[OUTER_SPILL_LANE_1:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[OUTER_SPILL_LANE_1:[0-9]+]]
	; GCN-O0-NEXT: s_and_b64 s[{{[0-9:]+}}]			; GCN-O0-DAG: s_and_b64 s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_mov_b64 exec, s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_mov_b64 exec, s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_cbranch_execz [[ENDIF_OUTER:.LBB[0-9_]+]]			; GCN-O0-NEXT: s_cbranch_execz [[ENDIF_OUTER:.LBB[0-9_]+]]
	; GCN-O0-NEXT: ; %bb.{{[0-9]+}}:			; GCN-O0-NEXT: ; %bb.{{[0-9]+}}:
	; GCN-O0: s_mov_b64 s[{{[0-9:]+}}], exec			; GCN-O0: s_mov_b64 s[{{[0-9:]+}}], exec
	; GCN-O0-NEXT: s_and_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}], s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_and_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}], s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_xor_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}], s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_xor_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}], s[{{[0-9:]+}}]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[THEN_SPILL_LANE_0:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[THEN_SPILL_LANE_0:[0-9]+]]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[THEN_SPILL_LANE_1:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[THEN_SPILL_LANE_1:[0-9]+]]
				; GCN-O0-NEXT: s_or_saveexec_b64 [[EXEC_COPY:s\[[0-9]+:[0-9]+\]]], -1
				; GCN-O0-NEXT: buffer_store_dword
				; GCN-O0-NEXT: s_mov_b64 exec, [[EXEC_COPY]]
	; GCN-O0-NEXT: s_mov_b64 exec, s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_mov_b64 exec, s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_cbranch_execz [[THEN_INNER:.LBB[0-9_]+]]			; GCN-O0-NEXT: s_cbranch_execz [[THEN_INNER:.LBB[0-9_]+]]
	; GCN-O0-NEXT: s_branch [[TEMP_BB:.LBB[0-9_]+]]			; GCN-O0-NEXT: s_branch [[TEMP_BB:.LBB[0-9_]+]]
	; GCN-O0-NEXT: {{^}}[[THEN_INNER]]:			; GCN-O0-NEXT: {{^}}[[THEN_INNER]]:
	; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[THEN_SPILL_LANE_0]]			; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[THEN_SPILL_LANE_0]]
	; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[THEN_SPILL_LANE_1]]			; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[THEN_SPILL_LANE_1]]
	; GCN-O0-NEXT: s_or_saveexec_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_or_saveexec_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_and_b64 s[{{[0-9:]+}}], exec, s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_and_b64 s[{{[0-9:]+}}], exec, s[{{[0-9:]+}}]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_SPILL_LANE_0:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_SPILL_LANE_0:[0-9]+]]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_SPILL_LANE_1:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_SPILL_LANE_1:[0-9]+]]
	; GCN-O0-NEXT: s_xor_b64 exec, exec, s[{{[0-9:]+}}]			; GCN-O0-DAG: s_xor_b64 exec, exec, s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_cbranch_execz [[ENDIF_INNER:.LBB[0-9_]+]]			; GCN-O0-NEXT: s_cbranch_execz [[ENDIF_INNER:.LBB[0-9_]+]]
	; GCN-O0-NEXT: ; %bb.{{[0-9]+}}:			; GCN-O0-NEXT: ; %bb.{{[0-9]+}}:
	; GCN-O0: store_dword			; GCN-O0: store_dword
	; GCN-O0-NEXT: s_branch [[ENDIF_INNER]]			; GCN-O0-NEXT: s_branch [[ENDIF_INNER]]
	; GCN-O0-NEXT: {{^}}[[TEMP_BB]]:			; GCN-O0-NEXT: {{^}}[[TEMP_BB]]:
	; GCN-O0: s_branch [[THEN_INNER]]			; GCN-O0: s_branch [[THEN_INNER]]
	; GCN-O0-NEXT: {{^}}[[ENDIF_INNER]]:			; GCN-O0-NEXT: {{^}}[[ENDIF_INNER]]:
	; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[INNER_SPILL_LANE_0]]			; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[INNER_SPILL_LANE_0]]
	▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
	; GCN: s_endpgm			; GCN: s_endpgm
	;			;
	; GCN-O0-LABEL: {{^}}nested_if_else_if:			; GCN-O0-LABEL: {{^}}nested_if_else_if:
	; GCN-O0: s_mov_b64 s[{{[0-9:]+}}], exec			; GCN-O0: s_mov_b64 s[{{[0-9:]+}}], exec
	; GCN-O0-NEXT: s_and_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}], s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_and_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}], s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_xor_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}], s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_xor_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}], s[{{[0-9:]+}}]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR:v[0-9]+]], s{{[0-9]+}}, [[OUTER_SPILL_LANE_0:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR:v[0-9]+]], s{{[0-9]+}}, [[OUTER_SPILL_LANE_0:[0-9]+]]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[OUTER_SPILL_LANE_1:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[OUTER_SPILL_LANE_1:[0-9]+]]
				; GCN-O0-NEXT: s_or_saveexec_b64 [[EXEC_COPY:s\[[0-9]+:[0-9]+\]]], -1
				; GCN-O0-NEXT: buffer_store_dword [[VGPR]]
				; GCN-O0-NEXT: s_mov_b64 exec, [[EXEC_COPY]]
	; GCN-O0-NEXT: s_mov_b64 exec, s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_mov_b64 exec, s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_cbranch_execz [[THEN_OUTER:.LBB[0-9_]+]]			; GCN-O0-NEXT: s_cbranch_execz [[THEN_OUTER:.LBB[0-9_]+]]
	; GCN-O0-NEXT: s_branch [[INNER_IF_OUTER_ELSE:.LBB[0-9_]+]]			; GCN-O0-NEXT: s_branch [[INNER_IF_OUTER_ELSE:.LBB[0-9_]+]]
	; GCN-O0-NEXT: {{^}}[[THEN_OUTER]]:			; GCN-O0-NEXT: {{^}}[[THEN_OUTER]]:
	; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[OUTER_SPILL_LANE_0]]			; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[OUTER_SPILL_LANE_0]]
	; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[OUTER_SPILL_LANE_1]]			; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[OUTER_SPILL_LANE_1]]
	; GCN-O0-NEXT: s_or_saveexec_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_or_saveexec_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_and_b64 s[{{[0-9:]+}}], exec, s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_and_b64 s[{{[0-9:]+}}], exec, s[{{[0-9:]+}}]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[OUTER_2_SPILL_LANE_0:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[OUTER_2_SPILL_LANE_0:[0-9]+]]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[OUTER_2_SPILL_LANE_1:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[OUTER_2_SPILL_LANE_1:[0-9]+]]
	; GCN-O0-NEXT: s_xor_b64 exec, exec, s[{{[0-9:]+}}]			; GCN-O0-DAG: s_xor_b64 exec, exec, s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_cbranch_execz [[ENDIF_OUTER:.LBB[0-9_]+]]			; GCN-O0-NEXT: s_cbranch_execz [[ENDIF_OUTER:.LBB[0-9_]+]]
	; GCN-O0-NEXT: ; %bb.{{[0-9]+}}:			; GCN-O0-NEXT: ; %bb.{{[0-9]+}}:
	; GCN-O0: store_dword			; GCN-O0: store_dword
	; GCN-O0: s_mov_b64 s[{{[0-9:]+}}], exec			; GCN-O0: s_mov_b64 s[{{[0-9:]+}}], exec
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[ELSE_SPILL_LANE_0:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[ELSE_SPILL_LANE_0:[0-9]+]]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[ELSE_SPILL_LANE_1:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[ELSE_SPILL_LANE_1:[0-9]+]]
	; GCN-O0-NEXT: s_and_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}], s[{{[0-9:]+}}]			; GCN-O0-DAG: s_and_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}], s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_mov_b64 exec, s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_mov_b64 exec, s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_cbranch_execz [[FLOW1:.LBB[0-9_]+]]			; GCN-O0-NEXT: s_cbranch_execz [[FLOW1:.LBB[0-9_]+]]
	; GCN-O0-NEXT: ; %bb.{{[0-9]+}}:			; GCN-O0-NEXT: ; %bb.{{[0-9]+}}:
	; GCN-O0: store_dword			; GCN-O0: store_dword
	; GCN-O0-NEXT: s_branch [[FLOW1]]			; GCN-O0-NEXT: s_branch [[FLOW1]]
	; GCN-O0-NEXT: {{^}}[[INNER_IF_OUTER_ELSE]]			; GCN-O0-NEXT: {{^}}[[INNER_IF_OUTER_ELSE]]
	; GCN-O0: s_mov_b64 s[{{[0-9:]+}}], exec			; GCN-O0: s_mov_b64 s[{{[0-9:]+}}], exec
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_IF_OUTER_ELSE_SPILL_LANE_0:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_IF_OUTER_ELSE_SPILL_LANE_0:[0-9]+]]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_IF_OUTER_ELSE_SPILL_LANE_1:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_IF_OUTER_ELSE_SPILL_LANE_1:[0-9]+]]
	; GCN-O0-NEXT: s_and_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}], s[{{[0-9:]+}}]			; GCN-O0-DAG: s_and_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}], s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_mov_b64 exec, s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_mov_b64 exec, s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_cbranch_execz [[THEN_OUTER_FLOW:.LBB[0-9_]+]]			; GCN-O0-NEXT: s_cbranch_execz [[THEN_OUTER_FLOW:.LBB[0-9_]+]]
	; GCN-O0-NEXT: ; %bb.{{[0-9]+}}:			; GCN-O0-NEXT: ; %bb.{{[0-9]+}}:
	; GCN-O0: store_dword			; GCN-O0: store_dword
	; GCN-O0-NEXT: {{^}}[[THEN_OUTER_FLOW]]			; GCN-O0-NEXT: {{^}}[[THEN_OUTER_FLOW]]
	; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[INNER_IF_OUTER_ELSE_SPILL_LANE_0]]			; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[INNER_IF_OUTER_ELSE_SPILL_LANE_0]]
	; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[INNER_IF_OUTER_ELSE_SPILL_LANE_1]]			; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[INNER_IF_OUTER_ELSE_SPILL_LANE_1]]
	; GCN-O0-NEXT: s_or_b64 exec, exec, s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_or_b64 exec, exec, s[{{[0-9:]+}}]
	▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
	; GCN-NEXT: s_or_b64 exec, exec, [[SAVEEXEC]]			; GCN-NEXT: s_or_b64 exec, exec, [[SAVEEXEC]]
	; GCN: s_barrier			; GCN: s_barrier
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	;			;
	; GCN-O0-LABEL: {{^}}s_endpgm_unsafe_barrier:			; GCN-O0-LABEL: {{^}}s_endpgm_unsafe_barrier:
	; GCN-O0: s_mov_b64 s[{{[0-9:]+}}], exec			; GCN-O0: s_mov_b64 s[{{[0-9:]+}}], exec
	; GCN-O0-DAG: v_writelane_b32 [[VGPR:v[0-9]+]], s{{[0-9]+}}, [[SPILL_LANE_0:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR:v[0-9]+]], s{{[0-9]+}}, [[SPILL_LANE_0:[0-9]+]]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[SPILL_LANE_1:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[SPILL_LANE_1:[0-9]+]]
	; GCN-O0-NEXT: s_and_b64 s[{{[0-9:]+}}]			; GCN-O0-DAG: s_and_b64 s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_mov_b64 exec, s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_mov_b64 exec, s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_cbranch_execz [[ENDIF:.LBB[0-9_]+]]			; GCN-O0-NEXT: s_cbranch_execz [[ENDIF:.LBB[0-9_]+]]
	; GCN-O0-NEXT: ; %bb.{{[0-9]+}}:			; GCN-O0-NEXT: ; %bb.{{[0-9]+}}:
	; GCN-O0: store_dword			; GCN-O0: store_dword
	; GCN-O0-NEXT: {{^}}[[ENDIF]]:			; GCN-O0-NEXT: {{^}}[[ENDIF]]:
	; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[SPILL_LANE_0]]			; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[SPILL_LANE_0]]
	; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[SPILL_LANE_1]]			; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[SPILL_LANE_1]]
	; GCN-O0-NEXT: s_or_b64 exec, exec, s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_or_b64 exec, exec, s[{{[0-9:]+}}]
	▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
	; GCN: s_setpc_b64			; GCN: s_setpc_b64
	;			;
	; GCN-O0-LABEL: {{^}}scc_liveness:			; GCN-O0-LABEL: {{^}}scc_liveness:
	; GCN-O0-COUNT-2: buffer_store_dword			; GCN-O0-COUNT-2: buffer_store_dword
	; GCN-O0-DAG: v_writelane_b32 [[VGPR:v[0-9]+]], s{{[0-9]+}}, [[INNER_LOOP_IN_EXEC_SPILL_LANE_0:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR:v[0-9]+]], s{{[0-9]+}}, [[INNER_LOOP_IN_EXEC_SPILL_LANE_0:[0-9]+]]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_LOOP_IN_EXEC_SPILL_LANE_1:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_LOOP_IN_EXEC_SPILL_LANE_1:[0-9]+]]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_LOOP_BACK_EDGE_EXEC_SPILL_LANE_0:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_LOOP_BACK_EDGE_EXEC_SPILL_LANE_0:[0-9]+]]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_LOOP_BACK_EDGE_EXEC_SPILL_LANE_1:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_LOOP_BACK_EDGE_EXEC_SPILL_LANE_1:[0-9]+]]
				; GCN-O0-NEXT: s_or_saveexec_b64 [[EXEC_COPY:s\[[0-9]+:[0-9]+\]]], -1
				; GCN-O0-NEXT: buffer_store_dword [[VGPR]], off, s{{\[[0-9]+:[0-9]+\]}}, s32 ; 4-byte Folded Spill
				; GCN-O0-NEXT: s_mov_b64 exec, [[EXEC_COPY]]
	; GCN-O0: [[INNER_LOOP:.LBB[0-9]+_[0-9]+]]:			; GCN-O0: [[INNER_LOOP:.LBB[0-9]+_[0-9]+]]:
	; GCN-O0: buffer_load_dword			; GCN-O0: buffer_load_dword [[RESTORED_VGPR:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, s32 ; 4-byte Folded Reload
	; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[INNER_LOOP_BACK_EDGE_EXEC_SPILL_LANE_0]]			; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[RESTORED_VGPR]], [[INNER_LOOP_BACK_EDGE_EXEC_SPILL_LANE_0]]
	; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[INNER_LOOP_BACK_EDGE_EXEC_SPILL_LANE_1]]			; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[RESTORED_VGPR]], [[INNER_LOOP_BACK_EDGE_EXEC_SPILL_LANE_1]]
	; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[INNER_LOOP_IN_EXEC_SPILL_LANE_0]]			; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[RESTORED_VGPR]], [[INNER_LOOP_IN_EXEC_SPILL_LANE_0]]
	; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[INNER_LOOP_IN_EXEC_SPILL_LANE_1]]			; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[RESTORED_VGPR]], [[INNER_LOOP_IN_EXEC_SPILL_LANE_1]]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[OUTER_LOOP_EXEC_SPILL_LANE_0:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[RESTORED_VGPR]], s{{[0-9]+}}, [[OUTER_LOOP_EXEC_SPILL_LANE_0:[0-9]+]]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[OUTER_LOOP_EXEC_SPILL_LANE_1:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[RESTORED_VGPR]], s{{[0-9]+}}, [[OUTER_LOOP_EXEC_SPILL_LANE_1:[0-9]+]]
	; GCN-O0: s_or_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}], s[{{[0-9:]+}}]			; GCN-O0: s_or_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}], s[{{[0-9:]+}}]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_LOOP_OUT_EXEC_SPILL_LANE_0:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[RESTORED_VGPR]], s{{[0-9]+}}, [[INNER_LOOP_OUT_EXEC_SPILL_LANE_0:[0-9]+]]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_LOOP_OUT_EXEC_SPILL_LANE_1:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[RESTORED_VGPR]], s{{[0-9]+}}, [[INNER_LOOP_OUT_EXEC_SPILL_LANE_1:[0-9]+]]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_LOOP_IN_EXEC_SPILL_LANE_0]]			; GCN-O0-DAG: v_writelane_b32 [[RESTORED_VGPR]], s{{[0-9]+}}, [[INNER_LOOP_IN_EXEC_SPILL_LANE_0]]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_LOOP_IN_EXEC_SPILL_LANE_1]]			; GCN-O0-DAG: v_writelane_b32 [[RESTORED_VGPR]], s{{[0-9]+}}, [[INNER_LOOP_IN_EXEC_SPILL_LANE_1]]
	; GCN-O0-NEXT: s_mov_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_mov_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_LOOP_BACK_EDGE_EXEC_SPILL_LANE_0]]			; GCN-O0-DAG: v_writelane_b32 [[RESTORED_VGPR]], s{{[0-9]+}}, [[INNER_LOOP_BACK_EDGE_EXEC_SPILL_LANE_0]]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_LOOP_BACK_EDGE_EXEC_SPILL_LANE_1]]			; GCN-O0-DAG: v_writelane_b32 [[RESTORED_VGPR]], s{{[0-9]+}}, [[INNER_LOOP_BACK_EDGE_EXEC_SPILL_LANE_1]]
				; GCN-O0-NEXT: s_or_saveexec_b64 [[EXEC_COPY:s\[[0-9]+:[0-9]+\]]], -1
				; GCN-O0-NEXT: buffer_store_dword [[RESTORED_VGPR]], off, s{{\[[0-9]+:[0-9]+\]}}, s32 ; 4-byte Folded Spill
				; GCN-O0-NEXT: s_mov_b64 exec, [[EXEC_COPY]]
	; GCN-O0-NEXT: s_andn2_b64 exec, exec, s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_andn2_b64 exec, exec, s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_cbranch_execnz [[INNER_LOOP]]			; GCN-O0-NEXT: s_cbranch_execnz [[INNER_LOOP]]
	; GCN-O0-NEXT: ; %bb.{{[0-9]+}}:			; GCN-O0-NEXT: ; %bb.{{[0-9]+}}:
	; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[INNER_LOOP_OUT_EXEC_SPILL_LANE_0]]			; GCN-O0: buffer_load_dword [[RESTORED_1_VGPR:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, s32 ; 4-byte Folded Reload
	; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[INNER_LOOP_OUT_EXEC_SPILL_LANE_1]]			; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[RESTORED_1_VGPR]], [[INNER_LOOP_OUT_EXEC_SPILL_LANE_0]]
				; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[RESTORED_1_VGPR]], [[INNER_LOOP_OUT_EXEC_SPILL_LANE_1]]
	; GCN-O0-NEXT: s_or_b64 exec, exec, s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_or_b64 exec, exec, s[{{[0-9:]+}}]
	; GCN-O0: s_mov_b64 s[{{[0-9:]+}}], exec			; GCN-O0: s_mov_b64 s[{{[0-9:]+}}], exec
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[FLOW2_IN_EXEC_SPILL_LANE_0:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[RESTORED_1_VGPR]], s{{[0-9]+}}, [[FLOW2_IN_EXEC_SPILL_LANE_0:[0-9]+]]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[FLOW2_IN_EXEC_SPILL_LANE_1:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[RESTORED_1_VGPR]], s{{[0-9]+}}, [[FLOW2_IN_EXEC_SPILL_LANE_1:[0-9]+]]
				; GCN-O0-NEXT: s_or_saveexec_b64 [[EXEC_COPY:s\[[0-9]+:[0-9]+\]]], -1
				; GCN-O0-NEXT: buffer_store_dword [[RESTORED_1_VGPR]], off, s{{\[[0-9]+:[0-9]+\]}}, s32 ; 4-byte Folded Spill
				; GCN-O0-NEXT: s_mov_b64 exec, [[EXEC_COPY]]
	; GCN-O0-NEXT: s_and_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}], s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_and_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}], s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_mov_b64 exec, s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_mov_b64 exec, s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_cbranch_execz [[FLOW2:.LBB[0-9_]+]]			; GCN-O0-NEXT: s_cbranch_execz [[FLOW2:.LBB[0-9_]+]]
	; GCN-O0: {{^}}[[FLOW2]]:			; GCN-O0: {{^}}[[FLOW2]]:
	; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[FLOW2_IN_EXEC_SPILL_LANE_0]]			; GCN-O0: buffer_load_dword [[RESTORED_2_VGPR:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, s32 ; 4-byte Folded Reload
	; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[FLOW2_IN_EXEC_SPILL_LANE_1]]			; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[RESTORED_2_VGPR]], [[FLOW2_IN_EXEC_SPILL_LANE_0]]
				; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[RESTORED_2_VGPR]], [[FLOW2_IN_EXEC_SPILL_LANE_1]]
	; GCN-O0: s_branch [[FLOW:.LBB[0-9_]+]]			; GCN-O0: s_branch [[FLOW:.LBB[0-9_]+]]
	; GCN-O0: {{^}}[[FLOW]]:			; GCN-O0: {{^}}[[FLOW]]:
				; GCN-O0: buffer_load_dword [[RESTORED_3_VGPR:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, s32 ; 4-byte Folded Reload
	; GCN-O0: s_mov_b64 s[{{[0-9:]+}}], exec			; GCN-O0: s_mov_b64 s[{{[0-9:]+}}], exec
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[FLOW3_IN_EXEC_SPILL_LANE_0:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[RESTORED_3_VGPR]], s{{[0-9]+}}, [[FLOW3_IN_EXEC_SPILL_LANE_0:[0-9]+]]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[FLOW3_IN_EXEC_SPILL_LANE_1:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[RESTORED_3_VGPR]], s{{[0-9]+}}, [[FLOW3_IN_EXEC_SPILL_LANE_1:[0-9]+]]
				; GCN-O0-NEXT: s_or_saveexec_b64 [[EXEC_COPY:s\[[0-9]+:[0-9]+\]]], -1
				; GCN-O0-NEXT: buffer_store_dword [[RESTORED_3_VGPR]], off, s{{\[[0-9]+:[0-9]+\]}}, s32 ; 4-byte Folded Spill
				; GCN-O0-NEXT: s_mov_b64 exec, [[EXEC_COPY]]
	; GCN-O0-NEXT: s_and_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}], s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_and_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}], s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_mov_b64 exec, s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_mov_b64 exec, s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_cbranch_execz [[FLOW3:.LBB[0-9_]+]]			; GCN-O0-NEXT: s_cbranch_execz [[FLOW3:.LBB[0-9_]+]]
	; GCN-O0: ; %bb.{{[0-9]+}}:			; GCN-O0: ; %bb.{{[0-9]+}}:
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[FLOW1_OUT_EXEC_SPILL_LANE_0:[0-9]+]]			; GCN-O0: buffer_load_dword [[RESTORED_4_VGPR:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, s32 ; 4-byte Folded Reload
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[FLOW1_OUT_EXEC_SPILL_LANE_1:[0-9]+]]			; GCN-O0-DAG: v_writelane_b32 [[RESTORED_4_VGPR]], s{{[0-9]+}}, [[FLOW1_OUT_EXEC_SPILL_LANE_0:[0-9]+]]
				; GCN-O0-DAG: v_writelane_b32 [[RESTORED_4_VGPR]], s{{[0-9]+}}, [[FLOW1_OUT_EXEC_SPILL_LANE_1:[0-9]+]]
				; GCN-O0-NEXT: s_or_saveexec_b64 [[EXEC_COPY:s\[[0-9]+:[0-9]+\]]], -1
				; GCN-O0-NEXT: buffer_store_dword [[RESTORED_4_VGPR]], off, s{{\[[0-9]+:[0-9]+\]}}, s32 ; 4-byte Folded Spill
				; GCN-O0-NEXT: s_mov_b64 exec, [[EXEC_COPY]]
	; GCN-O0: {{^}}[[FLOW3]]:			; GCN-O0: {{^}}[[FLOW3]]:
	; GCN-O0-COUNT-4: buffer_load_dword			; GCN-O0-COUNT-4: buffer_load_dword
	; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[OUTER_LOOP_EXEC_SPILL_LANE_0]]			; GCN-O0: buffer_load_dword [[RESTORED_5_VGPR:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, s32 ; 4-byte Folded Reload
	; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[OUTER_LOOP_EXEC_SPILL_LANE_1]]			; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[RESTORED_5_VGPR]], [[OUTER_LOOP_EXEC_SPILL_LANE_0]]
	; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[FLOW1_OUT_EXEC_SPILL_LANE_0]]			; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[RESTORED_5_VGPR]], [[OUTER_LOOP_EXEC_SPILL_LANE_1]]
	; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[VGPR]], [[FLOW1_OUT_EXEC_SPILL_LANE_1]]			; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[RESTORED_5_VGPR]], [[FLOW1_OUT_EXEC_SPILL_LANE_0]]
				; GCN-O0-DAG: v_readlane_b32 s{{[0-9]+}}, [[RESTORED_5_VGPR]], [[FLOW1_OUT_EXEC_SPILL_LANE_1]]
	; GCN-O0: s_and_b64 s[{{[0-9:]+}}], exec, s[{{[0-9:]+}}]			; GCN-O0: s_and_b64 s[{{[0-9:]+}}], exec, s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_or_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}], s[{{[0-9:]+}}]			; GCN-O0-NEXT: s_or_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}], s[{{[0-9:]+}}]
	; GCN-O0-COUNT-2: s_mov_b64			; GCN-O0-COUNT-2: s_mov_b64
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_LOOP_IN_EXEC_SPILL_LANE_0]]			; GCN-O0-DAG: v_writelane_b32 [[RESTORED_5_VGPR]], s{{[0-9]+}}, [[INNER_LOOP_IN_EXEC_SPILL_LANE_0]]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_LOOP_IN_EXEC_SPILL_LANE_1]]			; GCN-O0-DAG: v_writelane_b32 [[RESTORED_5_VGPR]], s{{[0-9]+}}, [[INNER_LOOP_IN_EXEC_SPILL_LANE_1]]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_LOOP_BACK_EDGE_EXEC_SPILL_LANE_0]]			; GCN-O0-DAG: v_writelane_b32 [[RESTORED_5_VGPR]], s{{[0-9]+}}, [[INNER_LOOP_BACK_EDGE_EXEC_SPILL_LANE_0]]
	; GCN-O0-DAG: v_writelane_b32 [[VGPR]], s{{[0-9]+}}, [[INNER_LOOP_BACK_EDGE_EXEC_SPILL_LANE_1]]			; GCN-O0-DAG: v_writelane_b32 [[RESTORED_5_VGPR]], s{{[0-9]+}}, [[INNER_LOOP_BACK_EDGE_EXEC_SPILL_LANE_1]]
	; GCN-O0-COUNT-4: buffer_store_dword			; GCN-O0-COUNT-4: buffer_store_dword
	; GCN-O0: s_andn2_b64 exec, exec, s[{{[0-9:]+}}]			; GCN-O0: s_andn2_b64 exec, exec, s[{{[0-9:]+}}]
	; GCN-O0-NEXT: s_cbranch_execnz [[INNER_LOOP]]			; GCN-O0-NEXT: s_cbranch_execnz [[INNER_LOOP]]
	; GCN-O0: ; %bb.{{[0-9]+}}:			; GCN-O0: ; %bb.{{[0-9]+}}:
	; GCN-O0-COUNT-4: buffer_store_dword			; GCN-O0-COUNT-4: buffer_store_dword
	; GCN-O0: s_setpc_b64			; GCN-O0: s_setpc_b64
	;			;
	define void @scc_liveness(i32 %arg) local_unnamed_addr #0 {			define void @scc_liveness(i32 %arg) local_unnamed_addr #0 {
	▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/control-flow-fastregalloc.ll

	; RUN: llc -O0 -mtriple=amdgcn--amdhsa -march=amdgcn --amdhsa-code-object-version=2 -amdgpu-spill-sgpr-to-vgpr=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=VMEM -check-prefix=GCN %s			; RUN: llc -O0 -mtriple=amdgcn--amdhsa -march=amdgcn --amdhsa-code-object-version=2 -amdgpu-spill-sgpr-to-vgpr=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=VMEM -check-prefix=GCN %s
	; RUN: llc -O0 -mtriple=amdgcn--amdhsa -march=amdgcn --amdhsa-code-object-version=2 -amdgpu-spill-sgpr-to-vgpr=1 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=VGPR -check-prefix=GCN %s			; RUN: llc -O0 -mtriple=amdgcn--amdhsa -march=amdgcn --amdhsa-code-object-version=2 -amdgpu-spill-sgpr-to-vgpr=1 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=VGPR -check-prefix=GCN %s

	; Verify registers used for tracking exec mask changes when all			; Verify registers used for tracking exec mask changes when all
	; registers are spilled at the end of the block. The SGPR spill			; registers are spilled at the end of the block. The SGPR spill
	; placement relative to the exec modifications are important.			; placement relative to the exec modifications are important.

	; FIXME: This checks with SGPR to VGPR spilling disabled, but this may			; FIXME: This checks with SGPR to VGPR spilling disabled, but this may
	; not work correctly in cases where no workitems take a branch.			; not work correctly in cases where no workitems take a branch.


	; GCN-LABEL: {{^}}divergent_if_endif:			; GCN-LABEL: {{^}}divergent_if_endif:
	; VGPR: workitem_private_segment_byte_size = 12{{$}}			; VGPR: workitem_private_segment_byte_size = 16{{$}}


	; GCN: {{^}}; %bb.0:			; GCN: {{^}}; %bb.0:
	; GCN: s_mov_b32 m0, -1			; GCN: s_mov_b32 m0, -1
	; GCN: ds_read_b32 [[LOAD0:v[0-9]+]]			; GCN: ds_read_b32 [[LOAD0:v[0-9]+]]

	; Spill load			; Spill load
	; GCN: buffer_store_dword [[LOAD0]], off, s[0:3], 0 offset:[[LOAD0_OFFSET:[0-9]+]] ; 4-byte Folded Spill			; GCN: buffer_store_dword [[LOAD0]], off, s[0:3], 0 offset:[[LOAD0_OFFSET:[0-9]+]] ; 4-byte Folded Spill
	; GCN: v_cmp_eq_u32_e64 [[CMP0:s\[[0-9]+:[0-9]\]]], v0, s{{[0-9]+}}			; GCN: v_cmp_eq_u32_e64 [[CMP0:s\[[0-9]+:[0-9]\]]], v{{[0-9]+}}, s{{[0-9]+}}

	; Spill saved exec			; Spill saved exec
	; GCN: s_mov_b64 s[[[SAVEEXEC_LO:[0-9]+]]:[[SAVEEXEC_HI:[0-9]+]]], exec			; GCN: s_mov_b64 s[[[SAVEEXEC_LO:[0-9]+]]:[[SAVEEXEC_HI:[0-9]+]]], exec
	; VGPR: v_writelane_b32 [[SPILL_VGPR:v[0-9]+]], s[[SAVEEXEC_LO]], [[SAVEEXEC_LO_LANE:[0-9]+]]			; VGPR: v_writelane_b32 [[SPILL_VGPR:v[0-9]+]], s[[SAVEEXEC_LO]], [[SAVEEXEC_LO_LANE:[0-9]+]]
	; VGPR: v_writelane_b32 [[SPILL_VGPR]], s[[SAVEEXEC_HI]], [[SAVEEXEC_HI_LANE:[0-9]+]]			; VGPR: v_writelane_b32 [[SPILL_VGPR]], s[[SAVEEXEC_HI]], [[SAVEEXEC_HI_LANE:[0-9]+]]

	; VMEM: v_writelane_b32 v[[V_SAVEEXEC:[0-9]+]], s[[SAVEEXEC_LO]], 0			; VMEM: v_writelane_b32 v[[V_SAVEEXEC:[0-9]+]], s[[SAVEEXEC_LO]], 0
	; VMEM: v_writelane_b32 v[[V_SAVEEXEC]], s[[SAVEEXEC_HI]], 1			; VMEM: v_writelane_b32 v[[V_SAVEEXEC]], s[[SAVEEXEC_HI]], 1
	▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines

	endif:			endif:
	%tmp4 = phi i32 [ %val, %if ], [ 0, %entry ]			%tmp4 = phi i32 [ %val, %if ], [ 0, %entry ]
	store i32 %tmp4, i32 addrspace(1)* %out			store i32 %tmp4, i32 addrspace(1)* %out
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}divergent_loop:			; GCN-LABEL: {{^}}divergent_loop:
	; VGPR: workitem_private_segment_byte_size = 16{{$}}			; VGPR: workitem_private_segment_byte_size = 20{{$}}

	; GCN: {{^}}; %bb.0:			; GCN: {{^}}; %bb.0:
	; GCN-DAG: s_mov_b32 m0, -1			; GCN-DAG: s_mov_b32 m0, -1
	; GCN-DAG: v_mov_b32_e32 [[PTR0:v[0-9]+]], 0{{$}}			; GCN-DAG: v_mov_b32_e32 [[PTR0:v[0-9]+]], 0{{$}}
	; GCN: ds_read_b32 [[LOAD0:v[0-9]+]], [[PTR0]]			; GCN: ds_read_b32 [[LOAD0:v[0-9]+]], [[PTR0]]
	; GCN: v_cmp_eq_u32_e64 [[CMP0:s\[[0-9]+:[0-9]+\]]], v0, s{{[0-9]+}}			; GCN: v_cmp_eq_u32_e64 [[CMP0:s\[[0-9]+:[0-9]+\]]], v{{[0-9]+}}, s{{[0-9]+}}

	; Spill load			; Spill load
	; GCN: buffer_store_dword [[LOAD0]], off, s[0:3], 0 offset:[[LOAD0_OFFSET:[0-9]+]] ; 4-byte Folded Spill			; GCN: buffer_store_dword [[LOAD0]], off, s[0:3], 0 offset:[[LOAD0_OFFSET:[0-9]+]] ; 4-byte Folded Spill
	; GCN: s_mov_b64 s[[[SAVEEXEC_LO:[0-9]+]]:[[SAVEEXEC_HI:[0-9]+]]], exec			; GCN: s_mov_b64 s[[[SAVEEXEC_LO:[0-9]+]]:[[SAVEEXEC_HI:[0-9]+]]], exec

	; Spill saved exec			; Spill saved exec
	; VGPR: v_writelane_b32 [[SPILL_VGPR:v[0-9]+]], s[[SAVEEXEC_LO]], [[SAVEEXEC_LO_LANE:[0-9]+]]			; VGPR: v_writelane_b32 [[SPILL_VGPR:v[0-9]+]], s[[SAVEEXEC_LO]], [[SAVEEXEC_LO_LANE:[0-9]+]]
	; VGPR: v_writelane_b32 [[SPILL_VGPR]], s[[SAVEEXEC_HI]], [[SAVEEXEC_HI_LANE:[0-9]+]]			; VGPR: v_writelane_b32 [[SPILL_VGPR]], s[[SAVEEXEC_HI]], [[SAVEEXEC_HI_LANE:[0-9]+]]
	▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
	; GCN-DAG: s_mov_b32 m0, -1			; GCN-DAG: s_mov_b32 m0, -1
	; GCN-DAG: v_mov_b32_e32 [[PTR0:v[0-9]+]], 0{{$}}			; GCN-DAG: v_mov_b32_e32 [[PTR0:v[0-9]+]], 0{{$}}
	; GCN: ds_read_b32 [[LOAD0:v[0-9]+]], [[PTR0]]			; GCN: ds_read_b32 [[LOAD0:v[0-9]+]], [[PTR0]]

	; Spill load			; Spill load
	; GCN: buffer_store_dword [[LOAD0]], off, s[0:3], 0 offset:[[LOAD0_OFFSET:[0-9]+]] ; 4-byte Folded Spill			; GCN: buffer_store_dword [[LOAD0]], off, s[0:3], 0 offset:[[LOAD0_OFFSET:[0-9]+]] ; 4-byte Folded Spill

	; GCN: s_mov_b32 [[ZERO:s[0-9]+]], 0			; GCN: s_mov_b32 [[ZERO:s[0-9]+]], 0
	; GCN: v_cmp_ne_u32_e64 [[CMP0:s\[[0-9]+:[0-9]\]]], v0, [[ZERO]]			; GCN: v_cmp_ne_u32_e64 [[CMP0:s\[[0-9]+:[0-9]\]]], v{{[0-9]+}}, [[ZERO]]

	; GCN: s_mov_b64 s[[[SAVEEXEC_LO:[0-9]+]]:[[SAVEEXEC_HI:[0-9]+]]], exec			; GCN: s_mov_b64 s[[[SAVEEXEC_LO:[0-9]+]]:[[SAVEEXEC_HI:[0-9]+]]], exec
	; GCN: s_and_b64 s[[[ANDEXEC_LO:[0-9]+]]:[[ANDEXEC_HI:[0-9]+]]], s[[[SAVEEXEC_LO:[0-9]+]]:[[SAVEEXEC_HI:[0-9]+]]], [[CMP0]]			; GCN: s_and_b64 s[[[ANDEXEC_LO:[0-9]+]]:[[ANDEXEC_HI:[0-9]+]]], s[[[SAVEEXEC_LO:[0-9]+]]:[[SAVEEXEC_HI:[0-9]+]]], [[CMP0]]
	; GCN: s_xor_b64 s[[[SAVEEXEC_LO]]:[[SAVEEXEC_HI]]], s[[[ANDEXEC_LO]]:[[ANDEXEC_HI]]], s[[[SAVEEXEC_LO]]:[[SAVEEXEC_HI]]]			; GCN: s_xor_b64 s[[[SAVEEXEC_LO]]:[[SAVEEXEC_HI]]], s[[[ANDEXEC_LO]]:[[ANDEXEC_HI]]], s[[[SAVEEXEC_LO]]:[[SAVEEXEC_HI]]]

	; Spill saved exec			; Spill saved exec
	; VGPR: v_writelane_b32 [[SPILL_VGPR:v[0-9]+]], s[[SAVEEXEC_LO]], [[SAVEEXEC_LO_LANE:[0-9]+]]			; VGPR: v_writelane_b32 [[SPILL_VGPR:v[0-9]+]], s[[SAVEEXEC_LO]], [[SAVEEXEC_LO_LANE:[0-9]+]]
	; VGPR: v_writelane_b32 [[SPILL_VGPR]], s[[SAVEEXEC_HI]], [[SAVEEXEC_HI_LANE:[0-9]+]]			; VGPR: v_writelane_b32 [[SPILL_VGPR]], s[[SAVEEXEC_HI]], [[SAVEEXEC_HI_LANE:[0-9]+]]

	; VMEM: v_writelane_b32 v[[V_SAVEEXEC:[0-9]+]], s[[SAVEEXEC_LO]], 0			; VMEM: v_writelane_b32 v[[V_SAVEEXEC:[0-9]+]], s[[SAVEEXEC_LO]], 0
	; VMEM: v_writelane_b32 v[[V_SAVEEXEC]], s[[SAVEEXEC_HI]], 1			; VMEM: v_writelane_b32 v[[V_SAVEEXEC]], s[[SAVEEXEC_HI]], 1
	; VMEM: buffer_store_dword v[[V_SAVEEXEC]], off, s[0:3], 0 offset:[[SAVEEXEC_OFFSET:[0-9]+]] ; 4-byte Folded Spill			; VMEM: buffer_store_dword v[[V_SAVEEXEC]], off, s[0:3], 0 offset:[[SAVEEXEC_OFFSET:[0-9]+]] ; 4-byte Folded Spill

	; GCN: s_mov_b64 exec, [[CMP0]]			; GCN: s_mov_b64 exec, [[CMP0]]

	; FIXME: It makes no sense to put this skip here			; FIXME: It makes no sense to put this skip here
	; GCN: s_cbranch_execz [[FLOW:.LBB[0-9]+_[0-9]+]]			; GCN: s_cbranch_execz [[FLOW:.LBB[0-9]+_[0-9]+]]
	; GCN-NEXT: s_branch [[ELSE:.LBB[0-9]+_[0-9]+]]			; GCN-NEXT: s_branch [[ELSE:.LBB[0-9]+_[0-9]+]]

	; GCN: [[FLOW]]: ; %Flow			; GCN: [[FLOW]]: ; %Flow
				; VGPR: buffer_load_dword
	; GCN: buffer_load_dword [[FLOW_VAL:v[0-9]+]], off, s[0:3], 0 offset:[[FLOW_VAL_OFFSET:[0-9]+]] ; 4-byte Folded Reload			; GCN: buffer_load_dword [[FLOW_VAL:v[0-9]+]], off, s[0:3], 0 offset:[[FLOW_VAL_OFFSET:[0-9]+]] ; 4-byte Folded Reload
	; VGPR: v_readlane_b32 s[[FLOW_S_RELOAD_SAVEEXEC_LO:[0-9]+]], [[SPILL_VGPR]], [[SAVEEXEC_LO_LANE]]			; VGPR: v_readlane_b32 s[[FLOW_S_RELOAD_SAVEEXEC_LO:[0-9]+]], [[SPILL_VGPR]], [[SAVEEXEC_LO_LANE]]
	; VGPR: v_readlane_b32 s[[FLOW_S_RELOAD_SAVEEXEC_HI:[0-9]+]], [[SPILL_VGPR]], [[SAVEEXEC_HI_LANE]]			; VGPR: v_readlane_b32 s[[FLOW_S_RELOAD_SAVEEXEC_HI:[0-9]+]], [[SPILL_VGPR]], [[SAVEEXEC_HI_LANE]]

	; VMEM: buffer_load_dword v[[FLOW_V_RELOAD_SAVEEXEC:[0-9]+]], off, s[0:3], 0 offset:[[SAVEEXEC_OFFSET]]			; VMEM: buffer_load_dword v[[FLOW_V_RELOAD_SAVEEXEC:[0-9]+]], off, s[0:3], 0 offset:[[SAVEEXEC_OFFSET]]
	; VMEM: s_waitcnt vmcnt(0)			; VMEM: s_waitcnt vmcnt(0)
	; VMEM: v_readlane_b32 s[[FLOW_S_RELOAD_SAVEEXEC_LO:[0-9]+]], v[[FLOW_V_RELOAD_SAVEEXEC]], 0			; VMEM: v_readlane_b32 s[[FLOW_S_RELOAD_SAVEEXEC_LO:[0-9]+]], v[[FLOW_V_RELOAD_SAVEEXEC]], 0
	; VMEM: v_readlane_b32 s[[FLOW_S_RELOAD_SAVEEXEC_HI:[0-9]+]], v[[FLOW_V_RELOAD_SAVEEXEC]], 1			; VMEM: v_readlane_b32 s[[FLOW_S_RELOAD_SAVEEXEC_HI:[0-9]+]], v[[FLOW_V_RELOAD_SAVEEXEC]], 1
	▲ Show 20 Lines • Show All 75 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/cross-block-use-is-not-abi-copy.ll

	Show All 27 Lines
	; GCN: ; %bb.0: ; %bb0			; GCN: ; %bb.0: ; %bb0
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_mov_b32 s16, s33			; GCN-NEXT: s_mov_b32 s16, s33
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_or_saveexec_b64 s[18:19], -1			; GCN-NEXT: s_or_saveexec_b64 s[18:19], -1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[18:19]			; GCN-NEXT: s_mov_b64 exec, s[18:19]
				; GCN-NEXT: ; implicit-def: $vgpr40
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: v_writelane_b32 v40, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	; GCN-NEXT: v_writelane_b32 v41, s16, 0			; GCN-NEXT: v_writelane_b32 v41, s16, 0
	; GCN-NEXT: v_writelane_b32 v40, s31, 1			; GCN-NEXT: v_writelane_b32 v40, s31, 1
	; GCN-NEXT: s_getpc_b64 s[16:17]			; GCN-NEXT: s_getpc_b64 s[16:17]
	; GCN-NEXT: s_add_u32 s16, s16, func_v2f32@rel32@lo+4			; GCN-NEXT: s_add_u32 s16, s16, func_v2f32@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s17, s17, func_v2f32@rel32@hi+12			; GCN-NEXT: s_addc_u32 s17, s17, func_v2f32@rel32@hi+12
	; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]			; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]
	Show All 22 Lines
	; GCN: ; %bb.0: ; %bb0			; GCN: ; %bb.0: ; %bb0
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_mov_b32 s16, s33			; GCN-NEXT: s_mov_b32 s16, s33
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_or_saveexec_b64 s[18:19], -1			; GCN-NEXT: s_or_saveexec_b64 s[18:19], -1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[18:19]			; GCN-NEXT: s_mov_b64 exec, s[18:19]
				; GCN-NEXT: ; implicit-def: $vgpr40
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: v_writelane_b32 v40, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	; GCN-NEXT: v_writelane_b32 v41, s16, 0			; GCN-NEXT: v_writelane_b32 v41, s16, 0
	; GCN-NEXT: v_writelane_b32 v40, s31, 1			; GCN-NEXT: v_writelane_b32 v40, s31, 1
	; GCN-NEXT: s_getpc_b64 s[16:17]			; GCN-NEXT: s_getpc_b64 s[16:17]
	; GCN-NEXT: s_add_u32 s16, s16, func_v3f32@rel32@lo+4			; GCN-NEXT: s_add_u32 s16, s16, func_v3f32@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s17, s17, func_v3f32@rel32@hi+12			; GCN-NEXT: s_addc_u32 s17, s17, func_v3f32@rel32@hi+12
	; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]			; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]
	Show All 22 Lines
	; GCN: ; %bb.0: ; %bb0			; GCN: ; %bb.0: ; %bb0
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_mov_b32 s16, s33			; GCN-NEXT: s_mov_b32 s16, s33
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_or_saveexec_b64 s[18:19], -1			; GCN-NEXT: s_or_saveexec_b64 s[18:19], -1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[18:19]			; GCN-NEXT: s_mov_b64 exec, s[18:19]
				; GCN-NEXT: ; implicit-def: $vgpr40
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: v_writelane_b32 v40, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	; GCN-NEXT: v_writelane_b32 v41, s16, 0			; GCN-NEXT: v_writelane_b32 v41, s16, 0
	; GCN-NEXT: v_writelane_b32 v40, s31, 1			; GCN-NEXT: v_writelane_b32 v40, s31, 1
	; GCN-NEXT: s_getpc_b64 s[16:17]			; GCN-NEXT: s_getpc_b64 s[16:17]
	; GCN-NEXT: s_add_u32 s16, s16, func_v4f16@rel32@lo+4			; GCN-NEXT: s_add_u32 s16, s16, func_v4f16@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s17, s17, func_v4f16@rel32@hi+12			; GCN-NEXT: s_addc_u32 s17, s17, func_v4f16@rel32@hi+12
	; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]			; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]
	Show All 22 Lines
	; GCN: ; %bb.0: ; %bb0			; GCN: ; %bb.0: ; %bb0
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_mov_b32 s16, s33			; GCN-NEXT: s_mov_b32 s16, s33
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_or_saveexec_b64 s[18:19], -1			; GCN-NEXT: s_or_saveexec_b64 s[18:19], -1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[18:19]			; GCN-NEXT: s_mov_b64 exec, s[18:19]
				; GCN-NEXT: ; implicit-def: $vgpr40
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: v_writelane_b32 v40, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	; GCN-NEXT: v_writelane_b32 v41, s16, 0			; GCN-NEXT: v_writelane_b32 v41, s16, 0
	; GCN-NEXT: v_writelane_b32 v40, s31, 1			; GCN-NEXT: v_writelane_b32 v40, s31, 1
	; GCN-NEXT: s_getpc_b64 s[16:17]			; GCN-NEXT: s_getpc_b64 s[16:17]
	; GCN-NEXT: s_add_u32 s16, s16, func_struct@rel32@lo+4			; GCN-NEXT: s_add_u32 s16, s16, func_struct@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s17, s17, func_struct@rel32@hi+12			; GCN-NEXT: s_addc_u32 s17, s17, func_struct@rel32@hi+12
	; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]			; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]
	▲ Show 20 Lines • Show All 142 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/csr-sgpr-spill-live-ins.mir

	# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	# RUN: llc -march=amdgcn -mcpu=gfx906 -start-before=si-lower-sgpr-spills -stop-after=prologepilog -verify-machineinstrs -o - %s \| FileCheck %s			# RUN: llc -march=amdgcn -mcpu=gfx906 -start-before=si-lower-sgpr-spills -stop-after=prologepilog -verify-machineinstrs -o - %s \| FileCheck %s

	# Make sure the modified CSR VGPRs are added as live-in to the entry			# Make sure the modified CSR VGPRs are added as live-in to the entry
	# block.			# block.

	---			---
	name: def_csr_sgpr			name: def_csr_sgpr
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32
	body: \|			body: \|
	; CHECK-LABEL: name: def_csr_sgpr			; CHECK-LABEL: name: def_csr_sgpr
	; CHECK: bb.0:			; CHECK: bb.0:
	; CHECK-NEXT: successors: %bb.1(0x80000000)			; CHECK-NEXT: successors: %bb.1(0x80000000)
	; CHECK-NEXT: liveins: $sgpr42, $sgpr43, $sgpr46, $sgpr47, $vgpr0			; CHECK-NEXT: liveins: $sgpr42, $sgpr43, $sgpr46, $sgpr47, $vgpr0
	; CHECK-NEXT: {{ $}}			; CHECK-NEXT: {{ $}}
	; CHECK-NEXT: $sgpr4_sgpr5 = S_XOR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec			; CHECK-NEXT: $sgpr4_sgpr5 = S_XOR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec
				arsenmUnsubmitted Not Done Reply Inline Actions Can you precommit a change to add the -NEXTs here arsenm: Can you precommit a change to add the -NEXTs here
				cdevadasAuthorUnsubmitted Done Reply Inline Actions Will do. cdevadas: Will do.
	; CHECK-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec :: (store (s32) into %stack.4, addrspace 5)			; CHECK-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec :: (store (s32) into %stack.4, addrspace 5)
	; CHECK-NEXT: $exec = S_MOV_B64 killed $sgpr4_sgpr5			; CHECK-NEXT: $exec = S_MOV_B64 killed $sgpr4_sgpr5
	; CHECK-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr42, 0, $vgpr0			; CHECK-NEXT: renamable $vgpr0 = IMPLICIT_DEF
	; CHECK-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr43, 1, $vgpr0			; CHECK-NEXT: renamable $vgpr0 = V_WRITELANE_B32 $sgpr42, 0, killed $vgpr0
	; CHECK-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr46, 2, $vgpr0			; CHECK-NEXT: renamable $vgpr0 = V_WRITELANE_B32 $sgpr43, 1, killed $vgpr0
	; CHECK-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr47, 3, $vgpr0			; CHECK-NEXT: renamable $vgpr0 = V_WRITELANE_B32 $sgpr46, 2, killed $vgpr0
				; CHECK-NEXT: dead renamable $vgpr0 = V_WRITELANE_B32 $sgpr47, 3, killed $vgpr0
	; CHECK-NEXT: S_NOP 0			; CHECK-NEXT: S_NOP 0
	; CHECK-NEXT: {{ $}}			; CHECK-NEXT: {{ $}}
	; CHECK-NEXT: bb.1:			; CHECK-NEXT: bb.1:
	; CHECK-NEXT: liveins: $vgpr0			; CHECK-NEXT: liveins: $vgpr0
	; CHECK-NEXT: {{ $}}			; CHECK-NEXT: {{ $}}
	; CHECK-NEXT: $sgpr42 = S_MOV_B32 0			; CHECK-NEXT: $sgpr42 = S_MOV_B32 0
	; CHECK-NEXT: $sgpr43 = S_MOV_B32 1			; CHECK-NEXT: $sgpr43 = S_MOV_B32 1
	; CHECK-NEXT: $sgpr46_sgpr47 = S_MOV_B64 2			; CHECK-NEXT: $sgpr46_sgpr47 = S_MOV_B64 2
	bb.0:			bb.0:
	S_NOP 0			S_NOP 0

	bb.1:			bb.1:
	$sgpr42 = S_MOV_B32 0			$sgpr42 = S_MOV_B32 0
	$sgpr43 = S_MOV_B32 1			$sgpr43 = S_MOV_B32 1
	$sgpr46_sgpr47 = S_MOV_B64 2			$sgpr46_sgpr47 = S_MOV_B64 2
	...			...

llvm/test/CodeGen/AMDGPU/dwarf-multi-register-use-crash.ll

	Show All 13 Lines
	; CHECK-NEXT: .loc 1 288 0 ; dummy:288:0			; CHECK-NEXT: .loc 1 288 0 ; dummy:288:0
	; CHECK-NEXT: .cfi_sections .debug_frame			; CHECK-NEXT: .cfi_sections .debug_frame
	; CHECK-NEXT: .cfi_startproc			; CHECK-NEXT: .cfi_startproc
	; CHECK-NEXT: ; %bb.0:			; CHECK-NEXT: ; %bb.0:
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_mov_b32 s16, s33			; CHECK-NEXT: s_mov_b32 s16, s33
	; CHECK-NEXT: s_mov_b32 s33, s32			; CHECK-NEXT: s_mov_b32 s33, s32
	; CHECK-NEXT: s_or_saveexec_b64 s[18:19], -1			; CHECK-NEXT: s_or_saveexec_b64 s[18:19], -1
	; CHECK-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; CHECK-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; CHECK-NEXT: s_mov_b64 exec, s[18:19]			; CHECK-NEXT: s_mov_b64 exec, s[18:19]
	; CHECK-NEXT: v_writelane_b32 v40, s30, 0			; CHECK-NEXT: ; implicit-def: $vgpr41
	; CHECK-NEXT: v_writelane_b32 v40, s31, 1
	; CHECK-NEXT: v_writelane_b32 v40, s34, 2
	; CHECK-NEXT: v_writelane_b32 v40, s35, 3
	; CHECK-NEXT: v_writelane_b32 v40, s36, 4
	; CHECK-NEXT: v_writelane_b32 v40, s37, 5
	; CHECK-NEXT: v_writelane_b32 v40, s38, 6
	; CHECK-NEXT: v_writelane_b32 v40, s39, 7
	; CHECK-NEXT: v_writelane_b32 v40, s40, 8
	; CHECK-NEXT: v_writelane_b32 v40, s41, 9
	; CHECK-NEXT: v_writelane_b32 v40, s42, 10
	; CHECK-NEXT: v_writelane_b32 v40, s43, 11
	; CHECK-NEXT: v_writelane_b32 v40, s44, 12
	; CHECK-NEXT: s_addk_i32 s32, 0x400			; CHECK-NEXT: s_addk_i32 s32, 0x400
	; CHECK-NEXT: v_writelane_b32 v40, s45, 13			; CHECK-NEXT: v_writelane_b32 v41, s30, 0
	; CHECK-NEXT: v_writelane_b32 v40, s46, 14			; CHECK-NEXT: v_writelane_b32 v41, s31, 1
				; CHECK-NEXT: v_writelane_b32 v41, s34, 2
				; CHECK-NEXT: v_writelane_b32 v41, s35, 3
				; CHECK-NEXT: v_writelane_b32 v41, s36, 4
				; CHECK-NEXT: v_writelane_b32 v41, s37, 5
				; CHECK-NEXT: v_writelane_b32 v41, s38, 6
				; CHECK-NEXT: v_writelane_b32 v41, s39, 7
				; CHECK-NEXT: v_writelane_b32 v41, s40, 8
				; CHECK-NEXT: v_writelane_b32 v41, s41, 9
				; CHECK-NEXT: v_writelane_b32 v41, s42, 10
				; CHECK-NEXT: v_writelane_b32 v41, s43, 11
				; CHECK-NEXT: v_writelane_b32 v41, s44, 12
				; CHECK-NEXT: v_writelane_b32 v41, s45, 13
				; CHECK-NEXT: v_writelane_b32 v41, s46, 14
	; CHECK-NEXT: s_mov_b64 s[40:41], s[4:5]			; CHECK-NEXT: s_mov_b64 s[40:41], s[4:5]
	; CHECK-NEXT: ;DEBUG_VALUE: dummy:dummy <- undef			; CHECK-NEXT: ;DEBUG_VALUE: dummy:dummy <- undef
	; CHECK-NEXT: .Ltmp0:			; CHECK-NEXT: .Ltmp0:
	; CHECK-NEXT: .loc 1 49 9 prologue_end ; dummy:49:9			; CHECK-NEXT: .loc 1 49 9 prologue_end ; dummy:49:9
	; CHECK-NEXT: s_getpc_b64 s[4:5]			; CHECK-NEXT: s_getpc_b64 s[4:5]
	; CHECK-NEXT: s_add_u32 s4, s4, __kmpc_alloc_shared@gotpcrel32@lo+4			; CHECK-NEXT: s_add_u32 s4, s4, __kmpc_alloc_shared@gotpcrel32@lo+4
	; CHECK-NEXT: s_addc_u32 s5, s5, __kmpc_alloc_shared@gotpcrel32@hi+12			; CHECK-NEXT: s_addc_u32 s5, s5, __kmpc_alloc_shared@gotpcrel32@hi+12
	; CHECK-NEXT: v_writelane_b32 v40, s47, 15			; CHECK-NEXT: v_writelane_b32 v41, s47, 15
	; CHECK-NEXT: s_load_dwordx2 s[46:47], s[4:5], 0x0			; CHECK-NEXT: s_load_dwordx2 s[46:47], s[4:5], 0x0
	; CHECK-NEXT: s_mov_b64 s[4:5], s[40:41]			; CHECK-NEXT: s_mov_b64 s[4:5], s[40:41]
	; CHECK-NEXT: v_writelane_b32 v42, s16, 0			; CHECK-NEXT: v_writelane_b32 v42, s16, 0
	; CHECK-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; CHECK-NEXT: v_mov_b32_e32 v41, v31			; CHECK-NEXT: v_mov_b32_e32 v40, v31
	; CHECK-NEXT: s_mov_b32 s42, s15			; CHECK-NEXT: s_mov_b32 s42, s15
	; CHECK-NEXT: s_mov_b32 s43, s14			; CHECK-NEXT: s_mov_b32 s43, s14
	; CHECK-NEXT: s_mov_b32 s44, s13			; CHECK-NEXT: s_mov_b32 s44, s13
	; CHECK-NEXT: s_mov_b32 s45, s12			; CHECK-NEXT: s_mov_b32 s45, s12
	; CHECK-NEXT: s_mov_b64 s[34:35], s[10:11]			; CHECK-NEXT: s_mov_b64 s[34:35], s[10:11]
	; CHECK-NEXT: s_mov_b64 s[36:37], s[8:9]			; CHECK-NEXT: s_mov_b64 s[36:37], s[8:9]
	; CHECK-NEXT: s_mov_b64 s[38:39], s[6:7]			; CHECK-NEXT: s_mov_b64 s[38:39], s[6:7]
	; CHECK-NEXT: s_waitcnt lgkmcnt(0)			; CHECK-NEXT: s_waitcnt lgkmcnt(0)
	; CHECK-NEXT: s_swappc_b64 s[30:31], s[46:47]			; CHECK-NEXT: s_swappc_b64 s[30:31], s[46:47]
	; CHECK-NEXT: s_mov_b64 s[4:5], s[40:41]			; CHECK-NEXT: s_mov_b64 s[4:5], s[40:41]
	; CHECK-NEXT: s_mov_b64 s[6:7], s[38:39]			; CHECK-NEXT: s_mov_b64 s[6:7], s[38:39]
	; CHECK-NEXT: s_mov_b64 s[8:9], s[36:37]			; CHECK-NEXT: s_mov_b64 s[8:9], s[36:37]
	; CHECK-NEXT: s_mov_b64 s[10:11], s[34:35]			; CHECK-NEXT: s_mov_b64 s[10:11], s[34:35]
	; CHECK-NEXT: s_mov_b32 s12, s45			; CHECK-NEXT: s_mov_b32 s12, s45
	; CHECK-NEXT: s_mov_b32 s13, s44			; CHECK-NEXT: s_mov_b32 s13, s44
	; CHECK-NEXT: s_mov_b32 s14, s43			; CHECK-NEXT: s_mov_b32 s14, s43
	; CHECK-NEXT: s_mov_b32 s15, s42			; CHECK-NEXT: s_mov_b32 s15, s42
	; CHECK-NEXT: v_mov_b32_e32 v31, v41			; CHECK-NEXT: v_mov_b32_e32 v31, v40
	; CHECK-NEXT: s_swappc_b64 s[30:31], s[46:47]			; CHECK-NEXT: s_swappc_b64 s[30:31], s[46:47]
	; CHECK-NEXT: .Ltmp1:			; CHECK-NEXT: .Ltmp1:
	; CHECK-NEXT: ;DEBUG_VALUE: dummy:dummy <- [$vgpr0_vgpr1+0]			; CHECK-NEXT: ;DEBUG_VALUE: dummy:dummy <- [$vgpr0_vgpr1+0]
	; CHECK-NEXT: .loc 1 0 9 is_stmt 0 ; dummy:0:9			; CHECK-NEXT: .loc 1 0 9 is_stmt 0 ; dummy:0:9
	; CHECK-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; CHECK-NEXT: v_mov_b32_e32 v2, 0			; CHECK-NEXT: v_mov_b32_e32 v2, 0
	; CHECK-NEXT: flat_store_dword v[0:1], v2			; CHECK-NEXT: flat_store_dword v[0:1], v2
	; CHECK-NEXT: v_readlane_b32 s47, v40, 15			; CHECK-NEXT: v_readlane_b32 s47, v41, 15
	; CHECK-NEXT: v_readlane_b32 s46, v40, 14			; CHECK-NEXT: v_readlane_b32 s46, v41, 14
	; CHECK-NEXT: v_readlane_b32 s45, v40, 13			; CHECK-NEXT: v_readlane_b32 s45, v41, 13
	; CHECK-NEXT: v_readlane_b32 s44, v40, 12			; CHECK-NEXT: v_readlane_b32 s44, v41, 12
	; CHECK-NEXT: v_readlane_b32 s43, v40, 11			; CHECK-NEXT: v_readlane_b32 s43, v41, 11
	; CHECK-NEXT: v_readlane_b32 s42, v40, 10			; CHECK-NEXT: v_readlane_b32 s42, v41, 10
	; CHECK-NEXT: v_readlane_b32 s41, v40, 9			; CHECK-NEXT: v_readlane_b32 s41, v41, 9
	; CHECK-NEXT: v_readlane_b32 s40, v40, 8			; CHECK-NEXT: v_readlane_b32 s40, v41, 8
	; CHECK-NEXT: v_readlane_b32 s39, v40, 7			; CHECK-NEXT: v_readlane_b32 s39, v41, 7
	; CHECK-NEXT: v_readlane_b32 s38, v40, 6			; CHECK-NEXT: v_readlane_b32 s38, v41, 6
	; CHECK-NEXT: v_readlane_b32 s37, v40, 5			; CHECK-NEXT: v_readlane_b32 s37, v41, 5
	; CHECK-NEXT: v_readlane_b32 s36, v40, 4			; CHECK-NEXT: v_readlane_b32 s36, v41, 4
	; CHECK-NEXT: v_readlane_b32 s35, v40, 3			; CHECK-NEXT: v_readlane_b32 s35, v41, 3
	; CHECK-NEXT: v_readlane_b32 s34, v40, 2			; CHECK-NEXT: v_readlane_b32 s34, v41, 2
	; CHECK-NEXT: v_readlane_b32 s31, v40, 1			; CHECK-NEXT: v_readlane_b32 s31, v41, 1
	; CHECK-NEXT: v_readlane_b32 s30, v40, 0			; CHECK-NEXT: v_readlane_b32 s30, v41, 0
	; CHECK-NEXT: v_readlane_b32 s4, v42, 0			; CHECK-NEXT: v_readlane_b32 s4, v42, 0
	; CHECK-NEXT: s_or_saveexec_b64 s[6:7], -1			; CHECK-NEXT: s_or_saveexec_b64 s[6:7], -1
	; CHECK-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; CHECK-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; CHECK-NEXT: s_mov_b64 exec, s[6:7]			; CHECK-NEXT: s_mov_b64 exec, s[6:7]
	; CHECK-NEXT: s_addk_i32 s32, 0xfc00			; CHECK-NEXT: s_addk_i32 s32, 0xfc00
	; CHECK-NEXT: s_mov_b32 s33, s4			; CHECK-NEXT: s_mov_b32 s33, s4
	; CHECK-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	; CHECK-NEXT: .Ltmp2:			; CHECK-NEXT: .Ltmp2:
	%2 = call ptr @__kmpc_alloc_shared(), !dbg !43			%2 = call ptr @__kmpc_alloc_shared(), !dbg !43
	▲ Show 20 Lines • Show All 55 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/fix-frame-reg-in-custom-csr-spills.ll

	Show All 19 Lines
	; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:100 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:100 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[18:19]			; GCN-NEXT: s_mov_b64 exec, s[18:19]
	; GCN-NEXT: s_addk_i32 s32, 0x3000			; GCN-NEXT: s_addk_i32 s32, 0x3000
	; GCN-NEXT: v_writelane_b32 v43, s16, 0			; GCN-NEXT: v_writelane_b32 v43, s16, 0
	; GCN-NEXT: s_getpc_b64 s[16:17]			; GCN-NEXT: s_getpc_b64 s[16:17]
	; GCN-NEXT: s_add_u32 s16, s16, extern_func@gotpcrel32@lo+4			; GCN-NEXT: s_add_u32 s16, s16, extern_func@gotpcrel32@lo+4
	; GCN-NEXT: s_addc_u32 s17, s17, extern_func@gotpcrel32@hi+12			; GCN-NEXT: s_addc_u32 s17, s17, extern_func@gotpcrel32@hi+12
	; GCN-NEXT: s_load_dwordx2 s[16:17], s[16:17], 0x0			; GCN-NEXT: s_load_dwordx2 s[16:17], s[16:17], 0x0
				; GCN-NEXT: ; implicit-def: $vgpr42
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
	; GCN-NEXT: v_writelane_b32 v42, s30, 0			; GCN-NEXT: v_writelane_b32 v42, s30, 0
	; GCN-NEXT: buffer_store_dword v7, off, s[0:3], s33 offset:92			; GCN-NEXT: buffer_store_dword v7, off, s[0:3], s33 offset:92
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: buffer_store_dword v6, off, s[0:3], s33 offset:88			; GCN-NEXT: buffer_store_dword v6, off, s[0:3], s33 offset:88
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: buffer_store_dword v5, off, s[0:3], s33 offset:84			; GCN-NEXT: buffer_store_dword v5, off, s[0:3], s33 offset:84
	▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/flat-scratch-init.ll

	Show First 20 Lines • Show All 113 Lines • ▼ Show 20 Lines
	define amdgpu_kernel void @test(i32 addrspace(1)* %out, i32 %in) {			define amdgpu_kernel void @test(i32 addrspace(1)* %out, i32 %in) {
	; FLAT_SCR_OPT-LABEL: test:			; FLAT_SCR_OPT-LABEL: test:
	; FLAT_SCR_OPT: ; %bb.0:			; FLAT_SCR_OPT: ; %bb.0:
	; FLAT_SCR_OPT-NEXT: s_add_u32 s2, s2, s5			; FLAT_SCR_OPT-NEXT: s_add_u32 s2, s2, s5
	; FLAT_SCR_OPT-NEXT: s_addc_u32 s3, s3, 0			; FLAT_SCR_OPT-NEXT: s_addc_u32 s3, s3, 0
	; FLAT_SCR_OPT-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2			; FLAT_SCR_OPT-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2
	; FLAT_SCR_OPT-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3			; FLAT_SCR_OPT-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3
	; FLAT_SCR_OPT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0			; FLAT_SCR_OPT-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
	; FLAT_SCR_OPT-NEXT: s_mov_b32 s104, exec_lo			; FLAT_SCR_OPT-NEXT: ; implicit-def: $vgpr0
	; FLAT_SCR_OPT-NEXT: s_mov_b32 exec_lo, 3
	; FLAT_SCR_OPT-NEXT: s_mov_b32 s4, 0
	; FLAT_SCR_OPT-NEXT: scratch_store_dword off, v72, s4
	; FLAT_SCR_OPT-NEXT: s_waitcnt lgkmcnt(0)			; FLAT_SCR_OPT-NEXT: s_waitcnt lgkmcnt(0)
	; FLAT_SCR_OPT-NEXT: v_writelane_b32 v72, s2, 0			; FLAT_SCR_OPT-NEXT: v_writelane_b32 v0, s2, 0
	; FLAT_SCR_OPT-NEXT: s_mov_b32 s4, 4			; FLAT_SCR_OPT-NEXT: v_writelane_b32 v0, s3, 1
	; FLAT_SCR_OPT-NEXT: v_writelane_b32 v72, s3, 1			; FLAT_SCR_OPT-NEXT: s_or_saveexec_b32 s105, -1
	; FLAT_SCR_OPT-NEXT: scratch_store_dword off, v72, s4 ; 4-byte Folded Spill			; FLAT_SCR_OPT-NEXT: s_mov_b32 s2, 4
				; FLAT_SCR_OPT-NEXT: scratch_store_dword off, v0, s2 ; 4-byte Folded Spill
	; FLAT_SCR_OPT-NEXT: s_waitcnt_depctr 0xffe3			; FLAT_SCR_OPT-NEXT: s_waitcnt_depctr 0xffe3
	; FLAT_SCR_OPT-NEXT: s_mov_b32 s4, 0			; FLAT_SCR_OPT-NEXT: s_mov_b32 exec_lo, s105
	; FLAT_SCR_OPT-NEXT: scratch_load_dword v72, off, s4
	; FLAT_SCR_OPT-NEXT: s_waitcnt vmcnt(0)
	; FLAT_SCR_OPT-NEXT: s_waitcnt_depctr 0xffe3
	; FLAT_SCR_OPT-NEXT: s_mov_b32 exec_lo, s104
	; FLAT_SCR_OPT-NEXT: s_load_dword vcc_lo, s[0:1], 0x8			; FLAT_SCR_OPT-NEXT: s_load_dword vcc_lo, s[0:1], 0x8
	; FLAT_SCR_OPT-NEXT: ; kill: killed $sgpr0_sgpr1			; FLAT_SCR_OPT-NEXT: ; kill: killed $sgpr0_sgpr1
	; FLAT_SCR_OPT-NEXT: ;;#ASMSTART			; FLAT_SCR_OPT-NEXT: ;;#ASMSTART
	; FLAT_SCR_OPT-NEXT: ;;#ASMEND			; FLAT_SCR_OPT-NEXT: ;;#ASMEND
	; FLAT_SCR_OPT-NEXT: ;;#ASMSTART			; FLAT_SCR_OPT-NEXT: ;;#ASMSTART
	; FLAT_SCR_OPT-NEXT: ;;#ASMEND			; FLAT_SCR_OPT-NEXT: ;;#ASMEND
	; FLAT_SCR_OPT-NEXT: ;;#ASMSTART			; FLAT_SCR_OPT-NEXT: ;;#ASMSTART
	; FLAT_SCR_OPT-NEXT: ;;#ASMEND			; FLAT_SCR_OPT-NEXT: ;;#ASMEND
	▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines
	; FLAT_SCR_OPT-NEXT: ;;#ASMSTART			; FLAT_SCR_OPT-NEXT: ;;#ASMSTART
	; FLAT_SCR_OPT-NEXT: ;;#ASMEND			; FLAT_SCR_OPT-NEXT: ;;#ASMEND
	; FLAT_SCR_OPT-NEXT: ;;#ASMSTART			; FLAT_SCR_OPT-NEXT: ;;#ASMSTART
	; FLAT_SCR_OPT-NEXT: ;;#ASMEND			; FLAT_SCR_OPT-NEXT: ;;#ASMEND
	; FLAT_SCR_OPT-NEXT: ;;#ASMSTART			; FLAT_SCR_OPT-NEXT: ;;#ASMSTART
	; FLAT_SCR_OPT-NEXT: ;;#ASMEND			; FLAT_SCR_OPT-NEXT: ;;#ASMEND
	; FLAT_SCR_OPT-NEXT: ;;#ASMSTART			; FLAT_SCR_OPT-NEXT: ;;#ASMSTART
	; FLAT_SCR_OPT-NEXT: ;;#ASMEND			; FLAT_SCR_OPT-NEXT: ;;#ASMEND
	; FLAT_SCR_OPT-NEXT: s_mov_b32 s2, exec_lo			; FLAT_SCR_OPT-NEXT: s_or_saveexec_b32 s105, -1
	; FLAT_SCR_OPT-NEXT: s_mov_b32 exec_lo, 3			; FLAT_SCR_OPT-NEXT: s_mov_b32 s0, 4
	; FLAT_SCR_OPT-NEXT: s_mov_b32 s3, 0			; FLAT_SCR_OPT-NEXT: scratch_load_dword v1, off, s0 ; 4-byte Folded Reload
	; FLAT_SCR_OPT-NEXT: scratch_store_dword off, v2, s3
	; FLAT_SCR_OPT-NEXT: s_waitcnt_depctr 0xffe3
	; FLAT_SCR_OPT-NEXT: s_mov_b32 s3, 4
	; FLAT_SCR_OPT-NEXT: scratch_load_dword v2, off, s3 ; 4-byte Folded Reload
	; FLAT_SCR_OPT-NEXT: s_waitcnt_depctr 0xffe3			; FLAT_SCR_OPT-NEXT: s_waitcnt_depctr 0xffe3
	; FLAT_SCR_OPT-NEXT: s_mov_b32 s3, 0			; FLAT_SCR_OPT-NEXT: s_mov_b32 exec_lo, s105
	; FLAT_SCR_OPT-NEXT: s_waitcnt vmcnt(0)			; FLAT_SCR_OPT-NEXT: s_waitcnt vmcnt(0)
	; FLAT_SCR_OPT-NEXT: v_readlane_b32 s0, v2, 0			; FLAT_SCR_OPT-NEXT: v_readlane_b32 s0, v1, 0
	; FLAT_SCR_OPT-NEXT: v_readlane_b32 s1, v2, 1			; FLAT_SCR_OPT-NEXT: v_readlane_b32 s1, v1, 1
	; FLAT_SCR_OPT-NEXT: scratch_load_dword v2, off, s3
	; FLAT_SCR_OPT-NEXT: s_waitcnt vmcnt(0)
	; FLAT_SCR_OPT-NEXT: s_waitcnt_depctr 0xffe3
	; FLAT_SCR_OPT-NEXT: s_mov_b32 exec_lo, s2
	; FLAT_SCR_OPT-NEXT: v_mov_b32_e32 v1, 0			; FLAT_SCR_OPT-NEXT: v_mov_b32_e32 v1, 0
	; FLAT_SCR_OPT-NEXT: global_store_dword v1, v0, s[0:1]			; FLAT_SCR_OPT-NEXT: global_store_dword v1, v0, s[0:1]
	; FLAT_SCR_OPT-NEXT: s_endpgm			; FLAT_SCR_OPT-NEXT: s_endpgm
	;			;
	; FLAT_SCR_ARCH-LABEL: test:			; FLAT_SCR_ARCH-LABEL: test:
	; FLAT_SCR_ARCH: ; %bb.0:			; FLAT_SCR_ARCH: ; %bb.0:
	; FLAT_SCR_ARCH-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0			; FLAT_SCR_ARCH-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x0
	; FLAT_SCR_ARCH-NEXT: s_mov_b32 s104, exec_lo			; FLAT_SCR_ARCH-NEXT: ; implicit-def: $vgpr0
	; FLAT_SCR_ARCH-NEXT: s_mov_b32 exec_lo, 3
	; FLAT_SCR_ARCH-NEXT: s_mov_b32 s4, 0
	; FLAT_SCR_ARCH-NEXT: scratch_store_dword off, v72, s4
	; FLAT_SCR_ARCH-NEXT: s_waitcnt lgkmcnt(0)			; FLAT_SCR_ARCH-NEXT: s_waitcnt lgkmcnt(0)
	; FLAT_SCR_ARCH-NEXT: v_writelane_b32 v72, s2, 0			; FLAT_SCR_ARCH-NEXT: v_writelane_b32 v0, s2, 0
	; FLAT_SCR_ARCH-NEXT: s_mov_b32 s4, 4			; FLAT_SCR_ARCH-NEXT: v_writelane_b32 v0, s3, 1
	; FLAT_SCR_ARCH-NEXT: v_writelane_b32 v72, s3, 1			; FLAT_SCR_ARCH-NEXT: s_or_saveexec_b32 s105, -1
	; FLAT_SCR_ARCH-NEXT: scratch_store_dword off, v72, s4 ; 4-byte Folded Spill			; FLAT_SCR_ARCH-NEXT: s_mov_b32 s2, 4
				; FLAT_SCR_ARCH-NEXT: scratch_store_dword off, v0, s2 ; 4-byte Folded Spill
	; FLAT_SCR_ARCH-NEXT: s_waitcnt_depctr 0xffe3			; FLAT_SCR_ARCH-NEXT: s_waitcnt_depctr 0xffe3
	; FLAT_SCR_ARCH-NEXT: s_mov_b32 s4, 0			; FLAT_SCR_ARCH-NEXT: s_mov_b32 exec_lo, s105
	; FLAT_SCR_ARCH-NEXT: scratch_load_dword v72, off, s4
	; FLAT_SCR_ARCH-NEXT: s_waitcnt vmcnt(0)
	; FLAT_SCR_ARCH-NEXT: s_waitcnt_depctr 0xffe3
	; FLAT_SCR_ARCH-NEXT: s_mov_b32 exec_lo, s104
	; FLAT_SCR_ARCH-NEXT: s_load_dword vcc_lo, s[0:1], 0x8			; FLAT_SCR_ARCH-NEXT: s_load_dword vcc_lo, s[0:1], 0x8
	; FLAT_SCR_ARCH-NEXT: ; kill: killed $sgpr0_sgpr1			; FLAT_SCR_ARCH-NEXT: ; kill: killed $sgpr0_sgpr1
	; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART			; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART
	; FLAT_SCR_ARCH-NEXT: ;;#ASMEND			; FLAT_SCR_ARCH-NEXT: ;;#ASMEND
	; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART			; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART
	; FLAT_SCR_ARCH-NEXT: ;;#ASMEND			; FLAT_SCR_ARCH-NEXT: ;;#ASMEND
	; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART			; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART
	; FLAT_SCR_ARCH-NEXT: ;;#ASMEND			; FLAT_SCR_ARCH-NEXT: ;;#ASMEND
	▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines
	; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART			; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART
	; FLAT_SCR_ARCH-NEXT: ;;#ASMEND			; FLAT_SCR_ARCH-NEXT: ;;#ASMEND
	; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART			; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART
	; FLAT_SCR_ARCH-NEXT: ;;#ASMEND			; FLAT_SCR_ARCH-NEXT: ;;#ASMEND
	; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART			; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART
	; FLAT_SCR_ARCH-NEXT: ;;#ASMEND			; FLAT_SCR_ARCH-NEXT: ;;#ASMEND
	; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART			; FLAT_SCR_ARCH-NEXT: ;;#ASMSTART
	; FLAT_SCR_ARCH-NEXT: ;;#ASMEND			; FLAT_SCR_ARCH-NEXT: ;;#ASMEND
	; FLAT_SCR_ARCH-NEXT: s_mov_b32 s2, exec_lo			; FLAT_SCR_ARCH-NEXT: s_or_saveexec_b32 s105, -1
	; FLAT_SCR_ARCH-NEXT: s_mov_b32 exec_lo, 3			; FLAT_SCR_ARCH-NEXT: s_mov_b32 s0, 4
	; FLAT_SCR_ARCH-NEXT: s_mov_b32 s3, 0			; FLAT_SCR_ARCH-NEXT: scratch_load_dword v1, off, s0 ; 4-byte Folded Reload
	; FLAT_SCR_ARCH-NEXT: scratch_store_dword off, v2, s3
	; FLAT_SCR_ARCH-NEXT: s_waitcnt_depctr 0xffe3
	; FLAT_SCR_ARCH-NEXT: s_mov_b32 s3, 4
	; FLAT_SCR_ARCH-NEXT: scratch_load_dword v2, off, s3 ; 4-byte Folded Reload
	; FLAT_SCR_ARCH-NEXT: s_waitcnt_depctr 0xffe3			; FLAT_SCR_ARCH-NEXT: s_waitcnt_depctr 0xffe3
	; FLAT_SCR_ARCH-NEXT: s_mov_b32 s3, 0			; FLAT_SCR_ARCH-NEXT: s_mov_b32 exec_lo, s105
	; FLAT_SCR_ARCH-NEXT: s_waitcnt vmcnt(0)			; FLAT_SCR_ARCH-NEXT: s_waitcnt vmcnt(0)
	; FLAT_SCR_ARCH-NEXT: v_readlane_b32 s0, v2, 0			; FLAT_SCR_ARCH-NEXT: v_readlane_b32 s0, v1, 0
	; FLAT_SCR_ARCH-NEXT: v_readlane_b32 s1, v2, 1			; FLAT_SCR_ARCH-NEXT: v_readlane_b32 s1, v1, 1
	; FLAT_SCR_ARCH-NEXT: scratch_load_dword v2, off, s3
	; FLAT_SCR_ARCH-NEXT: s_waitcnt vmcnt(0)
	; FLAT_SCR_ARCH-NEXT: s_waitcnt_depctr 0xffe3
	; FLAT_SCR_ARCH-NEXT: s_mov_b32 exec_lo, s2
	; FLAT_SCR_ARCH-NEXT: v_mov_b32_e32 v1, 0			; FLAT_SCR_ARCH-NEXT: v_mov_b32_e32 v1, 0
	; FLAT_SCR_ARCH-NEXT: global_store_dword v1, v0, s[0:1]			; FLAT_SCR_ARCH-NEXT: global_store_dword v1, v0, s[0:1]
	; FLAT_SCR_ARCH-NEXT: s_endpgm			; FLAT_SCR_ARCH-NEXT: s_endpgm
	call void asm sideeffect "", "~{s[0:7]}" ()			call void asm sideeffect "", "~{s[0:7]}" ()
	call void asm sideeffect "", "~{s[8:15]}" ()			call void asm sideeffect "", "~{s[8:15]}" ()
	call void asm sideeffect "", "~{s[16:23]}" ()			call void asm sideeffect "", "~{s[16:23]}" ()
	call void asm sideeffect "", "~{s[24:31]}" ()			call void asm sideeffect "", "~{s[24:31]}" ()
	call void asm sideeffect "", "~{s[32:39]}" ()			call void asm sideeffect "", "~{s[32:39]}" ()
	▲ Show 20 Lines • Show All 54 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/fold-reload-into-exec.mir

	# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs -stress-regalloc=2 -start-before=greedy -stop-after=virtregmap -o - %s \| FileCheck %s			# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs -stress-regalloc=2 -start-before=greedy -stop-after=virtregmap -o - %s \| FileCheck %s

	# Test that a spill of a copy of exec is not folded to be a spill of exec directly.			# Test that a spill of a copy of exec is not folded to be a spill of exec directly.

	---			---

	name: merge_sgpr_spill_into_copy_from_exec_lo			name: merge_sgpr_spill_into_copy_from_exec_lo
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: true			isEntryFunction: true
	body: \|			body: \|
	bb.0:			bb.0:
	; CHECK-LABEL: name: merge_sgpr_spill_into_copy_from_exec_lo			; CHECK-LABEL: name: merge_sgpr_spill_into_copy_from_exec_lo
	; CHECK: liveins: $vgpr0			; CHECK: S_NOP 0, implicit-def $exec_lo
	; CHECK-NEXT: {{ $}}
	; CHECK-NEXT: S_NOP 0, implicit-def $exec_lo
	; CHECK-NEXT: $sgpr0 = S_MOV_B32 $exec_lo			; CHECK-NEXT: $sgpr0 = S_MOV_B32 $exec_lo
	; CHECK-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr0, 0, $vgpr0			; CHECK-NEXT: renamable $vgpr0 = IMPLICIT_DEF
				; CHECK-NEXT: renamable $vgpr0 = V_WRITELANE_B32 killed $sgpr0, 0, killed $vgpr0
	; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0			; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0
	; CHECK-NEXT: S_NOP 0, implicit-def dead renamable $sgpr1, implicit-def dead renamable $sgpr0, implicit killed renamable $sgpr0			; CHECK-NEXT: S_NOP 0, implicit-def dead renamable $sgpr1, implicit-def dead renamable $sgpr0, implicit killed renamable $sgpr0
	; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0			; CHECK-NEXT: $sgpr0 = V_READLANE_B32 killed $vgpr0, 0
	; CHECK-NEXT: $exec_lo = S_MOV_B32 killed $sgpr0			; CHECK-NEXT: $exec_lo = S_MOV_B32 killed $sgpr0
	; CHECK-NEXT: S_SENDMSG 0, implicit $m0, implicit $exec			; CHECK-NEXT: S_SENDMSG 0, implicit $m0, implicit $exec
	S_NOP 0, implicit-def $exec_lo			S_NOP 0, implicit-def $exec_lo
	%0:sreg_32 = COPY $exec_lo			%0:sreg_32 = COPY $exec_lo
	S_NOP 0, implicit-def %1:sreg_32, implicit-def %2:sreg_32, implicit %0			S_NOP 0, implicit-def %1:sreg_32, implicit-def %2:sreg_32, implicit %0
	$exec_lo = COPY %0			$exec_lo = COPY %0
	S_SENDMSG 0, implicit $m0, implicit $exec			S_SENDMSG 0, implicit $m0, implicit $exec

	...			...
	---			---

	name: merge_sgpr_spill_into_copy_from_exec_hi			name: merge_sgpr_spill_into_copy_from_exec_hi
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: true			isEntryFunction: true
	body: \|			body: \|
	bb.0:			bb.0:
	; CHECK-LABEL: name: merge_sgpr_spill_into_copy_from_exec_hi			; CHECK-LABEL: name: merge_sgpr_spill_into_copy_from_exec_hi
	; CHECK: liveins: $vgpr0			; CHECK: S_NOP 0, implicit-def $exec_hi
	; CHECK-NEXT: {{ $}}
	; CHECK-NEXT: S_NOP 0, implicit-def $exec_hi
	; CHECK-NEXT: $sgpr0 = S_MOV_B32 $exec_hi			; CHECK-NEXT: $sgpr0 = S_MOV_B32 $exec_hi
	; CHECK-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr0, 0, $vgpr0			; CHECK-NEXT: renamable $vgpr0 = IMPLICIT_DEF
				; CHECK-NEXT: renamable $vgpr0 = V_WRITELANE_B32 killed $sgpr0, 0, killed $vgpr0
	; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0			; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0
	; CHECK-NEXT: S_NOP 0, implicit-def dead renamable $sgpr1, implicit-def dead renamable $sgpr0, implicit killed renamable $sgpr0			; CHECK-NEXT: S_NOP 0, implicit-def dead renamable $sgpr1, implicit-def dead renamable $sgpr0, implicit killed renamable $sgpr0
	; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0			; CHECK-NEXT: $sgpr0 = V_READLANE_B32 killed $vgpr0, 0
	; CHECK-NEXT: $exec_hi = S_MOV_B32 killed $sgpr0			; CHECK-NEXT: $exec_hi = S_MOV_B32 killed $sgpr0
	; CHECK-NEXT: S_SENDMSG 0, implicit $m0, implicit $exec			; CHECK-NEXT: S_SENDMSG 0, implicit $m0, implicit $exec
	S_NOP 0, implicit-def $exec_hi			S_NOP 0, implicit-def $exec_hi
	%0:sreg_32 = COPY $exec_hi			%0:sreg_32 = COPY $exec_hi
	S_NOP 0, implicit-def %1:sreg_32, implicit-def %2:sreg_32, implicit %0			S_NOP 0, implicit-def %1:sreg_32, implicit-def %2:sreg_32, implicit %0
	$exec_hi = COPY %0			$exec_hi = COPY %0
	S_SENDMSG 0, implicit $m0, implicit $exec			S_SENDMSG 0, implicit $m0, implicit $exec

	...			...
	---			---

	name: merge_sgpr_spill_into_copy_from_exec			name: merge_sgpr_spill_into_copy_from_exec
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: true			isEntryFunction: true
	body: \|			body: \|
	bb.0:			bb.0:
	; CHECK-LABEL: name: merge_sgpr_spill_into_copy_from_exec			; CHECK-LABEL: name: merge_sgpr_spill_into_copy_from_exec
	; CHECK: liveins: $vgpr0			; CHECK: S_NOP 0, implicit-def $exec
	; CHECK-NEXT: {{ $}}
	; CHECK-NEXT: S_NOP 0, implicit-def $exec
	; CHECK-NEXT: $sgpr0_sgpr1 = S_MOV_B64 $exec			; CHECK-NEXT: $sgpr0_sgpr1 = S_MOV_B64 $exec
	; CHECK-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr0, 0, $vgpr0, implicit-def $sgpr0_sgpr1, implicit $sgpr0_sgpr1			; CHECK-NEXT: renamable $vgpr0 = IMPLICIT_DEF
	; CHECK-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr1, 1, $vgpr0, implicit $sgpr0_sgpr1			; CHECK-NEXT: renamable $vgpr0 = V_WRITELANE_B32 killed $sgpr0, 0, killed $vgpr0, implicit-def $sgpr0_sgpr1, implicit $sgpr0_sgpr1
				; CHECK-NEXT: renamable $vgpr0 = V_WRITELANE_B32 killed $sgpr1, 1, killed $vgpr0, implicit $sgpr0_sgpr1
	; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0, implicit-def $sgpr0_sgpr1			; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0, implicit-def $sgpr0_sgpr1
	; CHECK-NEXT: $sgpr1 = V_READLANE_B32 $vgpr0, 1			; CHECK-NEXT: $sgpr1 = V_READLANE_B32 $vgpr0, 1
	; CHECK-NEXT: S_NOP 0, implicit-def dead renamable $sgpr2_sgpr3, implicit-def dead renamable $sgpr0_sgpr1, implicit killed renamable $sgpr0_sgpr1			; CHECK-NEXT: S_NOP 0, implicit-def dead renamable $sgpr2_sgpr3, implicit-def dead renamable $sgpr0_sgpr1, implicit killed renamable $sgpr0_sgpr1
	; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0, implicit-def $sgpr0_sgpr1			; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0, implicit-def $sgpr0_sgpr1
	; CHECK-NEXT: $sgpr1 = V_READLANE_B32 $vgpr0, 1			; CHECK-NEXT: $sgpr1 = V_READLANE_B32 killed $vgpr0, 1
	; CHECK-NEXT: $exec = S_MOV_B64 killed $sgpr0_sgpr1			; CHECK-NEXT: $exec = S_MOV_B64 killed $sgpr0_sgpr1
	; CHECK-NEXT: S_SENDMSG 0, implicit $m0, implicit $exec			; CHECK-NEXT: S_SENDMSG 0, implicit $m0, implicit $exec
	S_NOP 0, implicit-def $exec			S_NOP 0, implicit-def $exec
	%0:sreg_64 = COPY $exec			%0:sreg_64 = COPY $exec
	S_NOP 0, implicit-def %1:sreg_64, implicit-def %2:sreg_64, implicit %0			S_NOP 0, implicit-def %1:sreg_64, implicit-def %2:sreg_64, implicit %0
	$exec = COPY %0			$exec = COPY %0
	S_SENDMSG 0, implicit $m0, implicit $exec			S_SENDMSG 0, implicit $m0, implicit $exec

	...			...

	# Test that a reload into a copy of exec is not folded to be a reload of exec directly.			# Test that a reload into a copy of exec is not folded to be a reload of exec directly.

	---			---

	name: reload_sgpr_spill_into_copy_to_exec_lo			name: reload_sgpr_spill_into_copy_to_exec_lo
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: true			isEntryFunction: true
	body: \|			body: \|
	bb.0:			bb.0:
	; CHECK-LABEL: name: reload_sgpr_spill_into_copy_to_exec_lo			; CHECK-LABEL: name: reload_sgpr_spill_into_copy_to_exec_lo
	; CHECK: liveins: $vgpr0			; CHECK: S_NOP 0, implicit-def renamable $sgpr0, implicit-def dead renamable $sgpr1, implicit-def $exec_lo
	; CHECK-NEXT: {{ $}}			; CHECK-NEXT: renamable $vgpr0 = IMPLICIT_DEF
	; CHECK-NEXT: S_NOP 0, implicit-def renamable $sgpr0, implicit-def dead renamable $sgpr1, implicit-def $exec_lo			; CHECK-NEXT: renamable $vgpr0 = V_WRITELANE_B32 killed $sgpr0, 0, killed $vgpr0
	; CHECK-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr0, 0, $vgpr0
	; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0			; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0
	; CHECK-NEXT: S_NOP 0, implicit killed renamable $sgpr0, implicit-def dead renamable $sgpr1, implicit-def dead renamable $sgpr0			; CHECK-NEXT: S_NOP 0, implicit killed renamable $sgpr0, implicit-def dead renamable $sgpr1, implicit-def dead renamable $sgpr0
	; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0			; CHECK-NEXT: $sgpr0 = V_READLANE_B32 killed $vgpr0, 0
	; CHECK-NEXT: $exec_lo = S_MOV_B32 killed $sgpr0			; CHECK-NEXT: $exec_lo = S_MOV_B32 killed $sgpr0
	; CHECK-NEXT: S_SENDMSG 0, implicit $m0, implicit $exec			; CHECK-NEXT: S_SENDMSG 0, implicit $m0, implicit $exec
	S_NOP 0, implicit-def %0:sreg_32, implicit-def %1:sreg_32, implicit-def $exec_lo			S_NOP 0, implicit-def %0:sreg_32, implicit-def %1:sreg_32, implicit-def $exec_lo
	S_NOP 0, implicit %0, implicit-def %3:sreg_32, implicit-def %4:sreg_32			S_NOP 0, implicit %0, implicit-def %3:sreg_32, implicit-def %4:sreg_32
	$exec_lo = COPY %0			$exec_lo = COPY %0
	S_SENDMSG 0, implicit $m0, implicit $exec			S_SENDMSG 0, implicit $m0, implicit $exec

	...			...
	---			---

	name: reload_sgpr_spill_into_copy_to_exec_hi			name: reload_sgpr_spill_into_copy_to_exec_hi
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: true			isEntryFunction: true
	body: \|			body: \|
	bb.0:			bb.0:
	; CHECK-LABEL: name: reload_sgpr_spill_into_copy_to_exec_hi			; CHECK-LABEL: name: reload_sgpr_spill_into_copy_to_exec_hi
	; CHECK: liveins: $vgpr0			; CHECK: S_NOP 0, implicit-def renamable $sgpr0, implicit-def dead renamable $sgpr1, implicit-def $exec_hi
	; CHECK-NEXT: {{ $}}			; CHECK-NEXT: renamable $vgpr0 = IMPLICIT_DEF
	; CHECK-NEXT: S_NOP 0, implicit-def renamable $sgpr0, implicit-def dead renamable $sgpr1, implicit-def $exec_hi			; CHECK-NEXT: renamable $vgpr0 = V_WRITELANE_B32 killed $sgpr0, 0, killed $vgpr0
	; CHECK-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr0, 0, $vgpr0
	; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0			; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0
	; CHECK-NEXT: S_NOP 0, implicit killed renamable $sgpr0, implicit-def dead renamable $sgpr1, implicit-def dead renamable $sgpr0			; CHECK-NEXT: S_NOP 0, implicit killed renamable $sgpr0, implicit-def dead renamable $sgpr1, implicit-def dead renamable $sgpr0
	; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0			; CHECK-NEXT: $sgpr0 = V_READLANE_B32 killed $vgpr0, 0
	; CHECK-NEXT: $exec_hi = S_MOV_B32 killed $sgpr0			; CHECK-NEXT: $exec_hi = S_MOV_B32 killed $sgpr0
	; CHECK-NEXT: S_SENDMSG 0, implicit $m0, implicit $exec			; CHECK-NEXT: S_SENDMSG 0, implicit $m0, implicit $exec
	S_NOP 0, implicit-def %0:sreg_32, implicit-def %1:sreg_32, implicit-def $exec_hi			S_NOP 0, implicit-def %0:sreg_32, implicit-def %1:sreg_32, implicit-def $exec_hi
	S_NOP 0, implicit %0, implicit-def %3:sreg_32, implicit-def %4:sreg_32			S_NOP 0, implicit %0, implicit-def %3:sreg_32, implicit-def %4:sreg_32
	$exec_hi = COPY %0			$exec_hi = COPY %0
	S_SENDMSG 0, implicit $m0, implicit $exec			S_SENDMSG 0, implicit $m0, implicit $exec

	...			...
	---			---

	name: reload_sgpr_spill_into_copy_to_exec			name: reload_sgpr_spill_into_copy_to_exec
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: true			isEntryFunction: true
	body: \|			body: \|
	bb.0:			bb.0:
	; CHECK-LABEL: name: reload_sgpr_spill_into_copy_to_exec			; CHECK-LABEL: name: reload_sgpr_spill_into_copy_to_exec
	; CHECK: liveins: $vgpr0			; CHECK: S_NOP 0, implicit-def renamable $sgpr0_sgpr1, implicit-def dead renamable $sgpr2_sgpr3, implicit-def $exec
	; CHECK-NEXT: {{ $}}			; CHECK-NEXT: renamable $vgpr0 = IMPLICIT_DEF
	; CHECK-NEXT: S_NOP 0, implicit-def renamable $sgpr0_sgpr1, implicit-def dead renamable $sgpr2_sgpr3, implicit-def $exec			; CHECK-NEXT: renamable $vgpr0 = V_WRITELANE_B32 killed $sgpr0, 0, killed $vgpr0, implicit-def $sgpr0_sgpr1, implicit $sgpr0_sgpr1
	; CHECK-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr0, 0, $vgpr0, implicit-def $sgpr0_sgpr1, implicit $sgpr0_sgpr1			; CHECK-NEXT: renamable $vgpr0 = V_WRITELANE_B32 killed $sgpr1, 1, killed $vgpr0, implicit $sgpr0_sgpr1
	; CHECK-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr1, 1, $vgpr0, implicit $sgpr0_sgpr1
	; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0, implicit-def $sgpr0_sgpr1			; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0, implicit-def $sgpr0_sgpr1
	; CHECK-NEXT: $sgpr1 = V_READLANE_B32 $vgpr0, 1			; CHECK-NEXT: $sgpr1 = V_READLANE_B32 $vgpr0, 1
	; CHECK-NEXT: S_NOP 0, implicit killed renamable $sgpr0_sgpr1, implicit-def dead renamable $sgpr2_sgpr3, implicit-def dead renamable $sgpr0_sgpr1			; CHECK-NEXT: S_NOP 0, implicit killed renamable $sgpr0_sgpr1, implicit-def dead renamable $sgpr2_sgpr3, implicit-def dead renamable $sgpr0_sgpr1
	; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0, implicit-def $sgpr0_sgpr1			; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0, implicit-def $sgpr0_sgpr1
	; CHECK-NEXT: $sgpr1 = V_READLANE_B32 $vgpr0, 1			; CHECK-NEXT: $sgpr1 = V_READLANE_B32 killed $vgpr0, 1
	; CHECK-NEXT: $exec = S_MOV_B64 killed $sgpr0_sgpr1			; CHECK-NEXT: $exec = S_MOV_B64 killed $sgpr0_sgpr1
	; CHECK-NEXT: S_SENDMSG 0, implicit $m0, implicit $exec			; CHECK-NEXT: S_SENDMSG 0, implicit $m0, implicit $exec
	S_NOP 0, implicit-def %0:sreg_64, implicit-def %1:sreg_64, implicit-def $exec			S_NOP 0, implicit-def %0:sreg_64, implicit-def %1:sreg_64, implicit-def $exec
	S_NOP 0, implicit %0, implicit-def %3:sreg_64, implicit-def %4:sreg_64			S_NOP 0, implicit %0, implicit-def %3:sreg_64, implicit-def %4:sreg_64
	$exec = COPY %0			$exec = COPY %0
	S_SENDMSG 0, implicit $m0, implicit $exec			S_SENDMSG 0, implicit $m0, implicit $exec

	...			...

llvm/test/CodeGen/AMDGPU/fold-reload-into-m0.mir

	# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs -stress-regalloc=2 -start-before=greedy -stop-after=virtregmap -o - %s \| FileCheck %s			# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs -stress-regalloc=2 -start-before=greedy -stop-after=virtregmap -o - %s \| FileCheck %s

	# Test that a spill of a copy of m0 is not folded to be a spill of m0 directly.			# Test that a spill of a copy of m0 is not folded to be a spill of m0 directly.

	---			---

	name: merge_sgpr_spill_into_copy_from_m0			name: merge_sgpr_spill_into_copy_from_m0
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: true			isEntryFunction: true
	body: \|			body: \|
	bb.0:			bb.0:

	; CHECK-LABEL: name: merge_sgpr_spill_into_copy_from_m0			; CHECK-LABEL: name: merge_sgpr_spill_into_copy_from_m0
	; CHECK: liveins: $vgpr0			; CHECK: S_NOP 0, implicit-def $m0
	; CHECK-NEXT: {{ $}}
	; CHECK-NEXT: S_NOP 0, implicit-def $m0
	; CHECK-NEXT: $sgpr0 = S_MOV_B32 $m0			; CHECK-NEXT: $sgpr0 = S_MOV_B32 $m0
	; CHECK-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr0, 0, $vgpr0			; CHECK-NEXT: renamable $vgpr0 = IMPLICIT_DEF
				; CHECK-NEXT: renamable $vgpr0 = V_WRITELANE_B32 killed $sgpr0, 0, killed $vgpr0
	; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0			; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0
	; CHECK-NEXT: S_NOP 0, implicit-def dead renamable $sgpr1, implicit-def dead renamable $sgpr0, implicit killed renamable $sgpr0			; CHECK-NEXT: S_NOP 0, implicit-def dead renamable $sgpr1, implicit-def dead renamable $sgpr0, implicit killed renamable $sgpr0
	; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0			; CHECK-NEXT: $sgpr0 = V_READLANE_B32 killed $vgpr0, 0
	; CHECK-NEXT: $m0 = S_MOV_B32 killed $sgpr0			; CHECK-NEXT: $m0 = S_MOV_B32 killed $sgpr0
	; CHECK-NEXT: S_NOP 0			; CHECK-NEXT: S_NOP 0
	; CHECK-NEXT: S_SENDMSG 0, implicit $m0, implicit $exec			; CHECK-NEXT: S_SENDMSG 0, implicit $m0, implicit $exec
	S_NOP 0, implicit-def $m0			S_NOP 0, implicit-def $m0
	%0:sreg_32 = COPY $m0			%0:sreg_32 = COPY $m0
	S_NOP 0, implicit-def %1:sreg_32, implicit-def %2:sreg_32, implicit %0			S_NOP 0, implicit-def %1:sreg_32, implicit-def %2:sreg_32, implicit %0
	$m0 = COPY %0			$m0 = COPY %0
	S_SENDMSG 0, implicit $m0, implicit $exec			S_SENDMSG 0, implicit $m0, implicit $exec

	...			...

	# Test that a reload into a copy of m0 is not folded to be a reload of m0 directly.			# Test that a reload into a copy of m0 is not folded to be a reload of m0 directly.

	---			---

	name: reload_sgpr_spill_into_copy_to_m0			name: reload_sgpr_spill_into_copy_to_m0
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: true			isEntryFunction: true
	body: \|			body: \|
	bb.0:			bb.0:

	; CHECK-LABEL: name: reload_sgpr_spill_into_copy_to_m0			; CHECK-LABEL: name: reload_sgpr_spill_into_copy_to_m0
	; CHECK: liveins: $vgpr0			; CHECK: renamable $vgpr0 = IMPLICIT_DEF
	; CHECK-NEXT: {{ $}}
	; CHECK-NEXT: S_NOP 0, implicit-def renamable $sgpr0, implicit-def dead renamable $sgpr1, implicit-def $m0			; CHECK-NEXT: S_NOP 0, implicit-def renamable $sgpr0, implicit-def dead renamable $sgpr1, implicit-def $m0
	; CHECK-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr0, 0, $vgpr0			; CHECK-NEXT: renamable $vgpr0 = V_WRITELANE_B32 killed $sgpr0, 0, killed $vgpr0
	; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0			; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0
	; CHECK-NEXT: S_NOP 0, implicit killed renamable $sgpr0, implicit-def dead renamable $sgpr1, implicit-def dead renamable $sgpr0			; CHECK-NEXT: S_NOP 0, implicit killed renamable $sgpr0, implicit-def dead renamable $sgpr1, implicit-def dead renamable $sgpr0
	; CHECK-NEXT: $sgpr0 = V_READLANE_B32 $vgpr0, 0			; CHECK-NEXT: $sgpr0 = V_READLANE_B32 killed $vgpr0, 0
	; CHECK-NEXT: $m0 = S_MOV_B32 killed $sgpr0			; CHECK-NEXT: $m0 = S_MOV_B32 killed $sgpr0
	; CHECK-NEXT: S_NOP 0			; CHECK-NEXT: S_NOP 0
	; CHECK-NEXT: S_SENDMSG 0, implicit $m0, implicit $exec			; CHECK-NEXT: S_SENDMSG 0, implicit $m0, implicit $exec
	S_NOP 0, implicit-def %0:sreg_32, implicit-def %1:sreg_32, implicit-def $m0			S_NOP 0, implicit-def %0:sreg_32, implicit-def %1:sreg_32, implicit-def $m0
	S_NOP 0, implicit %0, implicit-def %3:sreg_32, implicit-def %4:sreg_32			S_NOP 0, implicit %0, implicit-def %3:sreg_32, implicit-def %4:sreg_32
	$m0 = COPY %0			$m0 = COPY %0
	S_SENDMSG 0, implicit $m0, implicit $exec			S_SENDMSG 0, implicit $m0, implicit $exec

	...			...

llvm/test/CodeGen/AMDGPU/frame-setup-without-sgpr-to-vgpr-spills.ll

	Show All 10 Lines
	; SPILL-TO-VGPR: ; %bb.0:			; SPILL-TO-VGPR: ; %bb.0:
	; SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; SPILL-TO-VGPR-NEXT: s_mov_b32 s4, s33			; SPILL-TO-VGPR-NEXT: s_mov_b32 s4, s33
	; SPILL-TO-VGPR-NEXT: s_mov_b32 s33, s32			; SPILL-TO-VGPR-NEXT: s_mov_b32 s33, s32
	; SPILL-TO-VGPR-NEXT: s_or_saveexec_b64 s[8:9], -1			; SPILL-TO-VGPR-NEXT: s_or_saveexec_b64 s[8:9], -1
	; SPILL-TO-VGPR-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; SPILL-TO-VGPR-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; SPILL-TO-VGPR-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill			; SPILL-TO-VGPR-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; SPILL-TO-VGPR-NEXT: s_mov_b64 exec, s[8:9]			; SPILL-TO-VGPR-NEXT: s_mov_b64 exec, s[8:9]
				; SPILL-TO-VGPR-NEXT: ; implicit-def: $vgpr40
	; SPILL-TO-VGPR-NEXT: s_addk_i32 s32, 0x400			; SPILL-TO-VGPR-NEXT: s_addk_i32 s32, 0x400
	; SPILL-TO-VGPR-NEXT: v_writelane_b32 v40, s30, 0			; SPILL-TO-VGPR-NEXT: v_writelane_b32 v40, s30, 0
	; SPILL-TO-VGPR-NEXT: v_mov_b32_e32 v0, 0			; SPILL-TO-VGPR-NEXT: v_mov_b32_e32 v0, 0
	; SPILL-TO-VGPR-NEXT: v_writelane_b32 v41, s4, 0			; SPILL-TO-VGPR-NEXT: v_writelane_b32 v41, s4, 0
	; SPILL-TO-VGPR-NEXT: v_writelane_b32 v40, s31, 1			; SPILL-TO-VGPR-NEXT: v_writelane_b32 v40, s31, 1
	; SPILL-TO-VGPR-NEXT: buffer_store_dword v0, off, s[0:3], s33			; SPILL-TO-VGPR-NEXT: buffer_store_dword v0, off, s[0:3], s33
	; SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0)			; SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0)
	; SPILL-TO-VGPR-NEXT: s_getpc_b64 s[4:5]			; SPILL-TO-VGPR-NEXT: s_getpc_b64 s[4:5]
	▲ Show 20 Lines • Show All 77 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/gfx-call-non-gfx-func.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefix=SDAG -enable-var-scope %s			; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefix=SDAG -enable-var-scope %s
	; RUN: llc -global-isel -mtriple=amdgcn--amdpal -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefix=GISEL -enable-var-scope %s			; RUN: llc -global-isel -mtriple=amdgcn--amdpal -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefix=GISEL -enable-var-scope %s

	declare void @extern_c_func()			declare void @extern_c_func()

	define amdgpu_gfx void @gfx_func() {			define amdgpu_gfx void @gfx_func() {
	; SDAG-LABEL: gfx_func:			; SDAG-LABEL: gfx_func:
	; SDAG: ; %bb.0:			; SDAG: ; %bb.0:
	; SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; SDAG-NEXT: s_mov_b32 s36, s33			; SDAG-NEXT: s_mov_b32 s38, s33
	; SDAG-NEXT: s_mov_b32 s33, s32			; SDAG-NEXT: s_mov_b32 s33, s32
	; SDAG-NEXT: s_or_saveexec_b64 s[34:35], -1			; SDAG-NEXT: s_or_saveexec_b64 s[34:35], -1
	; SDAG-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; SDAG-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; SDAG-NEXT: s_mov_b64 exec, s[34:35]			; SDAG-NEXT: s_mov_b64 exec, s[34:35]
				; SDAG-NEXT: ; implicit-def: $vgpr40
				; SDAG-NEXT: s_addk_i32 s32, 0x400
	; SDAG-NEXT: v_writelane_b32 v40, s4, 0			; SDAG-NEXT: v_writelane_b32 v40, s4, 0
	; SDAG-NEXT: v_writelane_b32 v40, s5, 1			; SDAG-NEXT: v_writelane_b32 v40, s5, 1
	; SDAG-NEXT: v_writelane_b32 v40, s6, 2			; SDAG-NEXT: v_writelane_b32 v40, s6, 2
	; SDAG-NEXT: v_writelane_b32 v40, s7, 3			; SDAG-NEXT: v_writelane_b32 v40, s7, 3
	; SDAG-NEXT: v_writelane_b32 v40, s8, 4			; SDAG-NEXT: v_writelane_b32 v40, s8, 4
	; SDAG-NEXT: v_writelane_b32 v40, s9, 5			; SDAG-NEXT: v_writelane_b32 v40, s9, 5
	; SDAG-NEXT: v_writelane_b32 v40, s10, 6			; SDAG-NEXT: v_writelane_b32 v40, s10, 6
	; SDAG-NEXT: v_writelane_b32 v40, s11, 7			; SDAG-NEXT: v_writelane_b32 v40, s11, 7
	; SDAG-NEXT: v_writelane_b32 v40, s12, 8			; SDAG-NEXT: v_writelane_b32 v40, s12, 8
	; SDAG-NEXT: v_writelane_b32 v40, s13, 9			; SDAG-NEXT: v_writelane_b32 v40, s13, 9
	; SDAG-NEXT: v_writelane_b32 v40, s14, 10			; SDAG-NEXT: v_writelane_b32 v40, s14, 10
	; SDAG-NEXT: v_writelane_b32 v40, s15, 11			; SDAG-NEXT: v_writelane_b32 v40, s15, 11
	; SDAG-NEXT: v_writelane_b32 v40, s16, 12			; SDAG-NEXT: v_writelane_b32 v40, s16, 12
	; SDAG-NEXT: v_writelane_b32 v40, s17, 13			; SDAG-NEXT: v_writelane_b32 v40, s17, 13
	; SDAG-NEXT: v_writelane_b32 v40, s18, 14			; SDAG-NEXT: v_writelane_b32 v40, s18, 14
	; SDAG-NEXT: v_writelane_b32 v40, s19, 15			; SDAG-NEXT: v_writelane_b32 v40, s19, 15
	; SDAG-NEXT: v_writelane_b32 v40, s20, 16			; SDAG-NEXT: v_writelane_b32 v40, s20, 16
	; SDAG-NEXT: v_writelane_b32 v40, s21, 17			; SDAG-NEXT: v_writelane_b32 v40, s21, 17
	; SDAG-NEXT: v_writelane_b32 v40, s22, 18			; SDAG-NEXT: v_writelane_b32 v40, s22, 18
	; SDAG-NEXT: v_writelane_b32 v40, s23, 19			; SDAG-NEXT: v_writelane_b32 v40, s23, 19
	; SDAG-NEXT: s_addk_i32 s32, 0x400
	; SDAG-NEXT: v_writelane_b32 v40, s24, 20			; SDAG-NEXT: v_writelane_b32 v40, s24, 20
	; SDAG-NEXT: v_writelane_b32 v40, s25, 21			; SDAG-NEXT: v_writelane_b32 v40, s25, 21
	; SDAG-NEXT: s_getpc_b64 s[34:35]			; SDAG-NEXT: s_getpc_b64 s[34:35]
	; SDAG-NEXT: s_add_u32 s34, s34, extern_c_func@gotpcrel32@lo+4			; SDAG-NEXT: s_add_u32 s34, s34, extern_c_func@gotpcrel32@lo+4
	; SDAG-NEXT: s_addc_u32 s35, s35, extern_c_func@gotpcrel32@hi+12			; SDAG-NEXT: s_addc_u32 s35, s35, extern_c_func@gotpcrel32@hi+12
	; SDAG-NEXT: v_writelane_b32 v40, s26, 22			; SDAG-NEXT: v_writelane_b32 v40, s26, 22
	; SDAG-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; SDAG-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; SDAG-NEXT: v_writelane_b32 v40, s27, 23			; SDAG-NEXT: v_writelane_b32 v40, s27, 23
	Show All 31 Lines
	; SDAG-NEXT: v_readlane_b32 s7, v40, 3			; SDAG-NEXT: v_readlane_b32 s7, v40, 3
	; SDAG-NEXT: v_readlane_b32 s6, v40, 2			; SDAG-NEXT: v_readlane_b32 s6, v40, 2
	; SDAG-NEXT: v_readlane_b32 s5, v40, 1			; SDAG-NEXT: v_readlane_b32 s5, v40, 1
	; SDAG-NEXT: v_readlane_b32 s4, v40, 0			; SDAG-NEXT: v_readlane_b32 s4, v40, 0
	; SDAG-NEXT: s_or_saveexec_b64 s[34:35], -1			; SDAG-NEXT: s_or_saveexec_b64 s[34:35], -1
	; SDAG-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; SDAG-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; SDAG-NEXT: s_mov_b64 exec, s[34:35]			; SDAG-NEXT: s_mov_b64 exec, s[34:35]
	; SDAG-NEXT: s_addk_i32 s32, 0xfc00			; SDAG-NEXT: s_addk_i32 s32, 0xfc00
	; SDAG-NEXT: s_mov_b32 s33, s36			; SDAG-NEXT: s_mov_b32 s33, s38
	; SDAG-NEXT: s_waitcnt vmcnt(0)			; SDAG-NEXT: s_waitcnt vmcnt(0)
	; SDAG-NEXT: s_setpc_b64 s[30:31]			; SDAG-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GISEL-LABEL: gfx_func:			; GISEL-LABEL: gfx_func:
	; GISEL: ; %bb.0:			; GISEL: ; %bb.0:
	; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GISEL-NEXT: s_mov_b32 s36, s33			; GISEL-NEXT: s_mov_b32 s38, s33
	; GISEL-NEXT: s_mov_b32 s33, s32			; GISEL-NEXT: s_mov_b32 s33, s32
	; GISEL-NEXT: s_or_saveexec_b64 s[34:35], -1			; GISEL-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GISEL-NEXT: s_mov_b64 exec, s[34:35]			; GISEL-NEXT: s_mov_b64 exec, s[34:35]
				; GISEL-NEXT: ; implicit-def: $vgpr40
				; GISEL-NEXT: s_addk_i32 s32, 0x400
	; GISEL-NEXT: v_writelane_b32 v40, s4, 0			; GISEL-NEXT: v_writelane_b32 v40, s4, 0
	; GISEL-NEXT: v_writelane_b32 v40, s5, 1			; GISEL-NEXT: v_writelane_b32 v40, s5, 1
	; GISEL-NEXT: v_writelane_b32 v40, s6, 2			; GISEL-NEXT: v_writelane_b32 v40, s6, 2
	; GISEL-NEXT: v_writelane_b32 v40, s7, 3			; GISEL-NEXT: v_writelane_b32 v40, s7, 3
	; GISEL-NEXT: v_writelane_b32 v40, s8, 4			; GISEL-NEXT: v_writelane_b32 v40, s8, 4
	; GISEL-NEXT: v_writelane_b32 v40, s9, 5			; GISEL-NEXT: v_writelane_b32 v40, s9, 5
	; GISEL-NEXT: v_writelane_b32 v40, s10, 6			; GISEL-NEXT: v_writelane_b32 v40, s10, 6
	; GISEL-NEXT: v_writelane_b32 v40, s11, 7			; GISEL-NEXT: v_writelane_b32 v40, s11, 7
	; GISEL-NEXT: v_writelane_b32 v40, s12, 8			; GISEL-NEXT: v_writelane_b32 v40, s12, 8
	; GISEL-NEXT: v_writelane_b32 v40, s13, 9			; GISEL-NEXT: v_writelane_b32 v40, s13, 9
	; GISEL-NEXT: v_writelane_b32 v40, s14, 10			; GISEL-NEXT: v_writelane_b32 v40, s14, 10
	; GISEL-NEXT: v_writelane_b32 v40, s15, 11			; GISEL-NEXT: v_writelane_b32 v40, s15, 11
	; GISEL-NEXT: v_writelane_b32 v40, s16, 12			; GISEL-NEXT: v_writelane_b32 v40, s16, 12
	; GISEL-NEXT: v_writelane_b32 v40, s17, 13			; GISEL-NEXT: v_writelane_b32 v40, s17, 13
	; GISEL-NEXT: v_writelane_b32 v40, s18, 14			; GISEL-NEXT: v_writelane_b32 v40, s18, 14
	; GISEL-NEXT: v_writelane_b32 v40, s19, 15			; GISEL-NEXT: v_writelane_b32 v40, s19, 15
	; GISEL-NEXT: v_writelane_b32 v40, s20, 16			; GISEL-NEXT: v_writelane_b32 v40, s20, 16
	; GISEL-NEXT: v_writelane_b32 v40, s21, 17			; GISEL-NEXT: v_writelane_b32 v40, s21, 17
	; GISEL-NEXT: v_writelane_b32 v40, s22, 18			; GISEL-NEXT: v_writelane_b32 v40, s22, 18
	; GISEL-NEXT: v_writelane_b32 v40, s23, 19			; GISEL-NEXT: v_writelane_b32 v40, s23, 19
	; GISEL-NEXT: s_addk_i32 s32, 0x400
	; GISEL-NEXT: v_writelane_b32 v40, s24, 20			; GISEL-NEXT: v_writelane_b32 v40, s24, 20
	; GISEL-NEXT: v_writelane_b32 v40, s25, 21			; GISEL-NEXT: v_writelane_b32 v40, s25, 21
	; GISEL-NEXT: s_getpc_b64 s[34:35]			; GISEL-NEXT: s_getpc_b64 s[34:35]
	; GISEL-NEXT: s_add_u32 s34, s34, extern_c_func@gotpcrel32@lo+4			; GISEL-NEXT: s_add_u32 s34, s34, extern_c_func@gotpcrel32@lo+4
	; GISEL-NEXT: s_addc_u32 s35, s35, extern_c_func@gotpcrel32@hi+12			; GISEL-NEXT: s_addc_u32 s35, s35, extern_c_func@gotpcrel32@hi+12
	; GISEL-NEXT: v_writelane_b32 v40, s26, 22			; GISEL-NEXT: v_writelane_b32 v40, s26, 22
	; GISEL-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GISEL-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GISEL-NEXT: v_writelane_b32 v40, s27, 23			; GISEL-NEXT: v_writelane_b32 v40, s27, 23
	Show All 31 Lines
	; GISEL-NEXT: v_readlane_b32 s7, v40, 3			; GISEL-NEXT: v_readlane_b32 s7, v40, 3
	; GISEL-NEXT: v_readlane_b32 s6, v40, 2			; GISEL-NEXT: v_readlane_b32 s6, v40, 2
	; GISEL-NEXT: v_readlane_b32 s5, v40, 1			; GISEL-NEXT: v_readlane_b32 s5, v40, 1
	; GISEL-NEXT: v_readlane_b32 s4, v40, 0			; GISEL-NEXT: v_readlane_b32 s4, v40, 0
	; GISEL-NEXT: s_or_saveexec_b64 s[34:35], -1			; GISEL-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GISEL-NEXT: s_mov_b64 exec, s[34:35]			; GISEL-NEXT: s_mov_b64 exec, s[34:35]
	; GISEL-NEXT: s_addk_i32 s32, 0xfc00			; GISEL-NEXT: s_addk_i32 s32, 0xfc00
	; GISEL-NEXT: s_mov_b32 s33, s36			; GISEL-NEXT: s_mov_b32 s33, s38
	; GISEL-NEXT: s_waitcnt vmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0)
	; GISEL-NEXT: s_setpc_b64 s[30:31]			; GISEL-NEXT: s_setpc_b64 s[30:31]
	call void @extern_c_func()			call void @extern_c_func()
	ret void			ret void
	}			}

llvm/test/CodeGen/AMDGPU/gfx-callable-argument-types.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 97 Lines • ▼ Show 20 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v0, 1		; GFX9-NEXT: v_mov_b32_e32 v0, 1
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i1@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i1@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i1@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i1@rel32@hi+12
Show All 17 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v0, 1		; GFX10-NEXT: v_mov_b32_e32 v0, 1
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i1@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i1@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i1@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i1@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s32		; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s32
		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-NEXT: v_readlane_b32 s30, v40, 0
; GFX10-NEXT: v_readlane_b32 s34, v41, 0		; GFX10-NEXT: v_readlane_b32 s34, v41, 0
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: s_clause 0x1		; GFX10-NEXT: s_clause 0x1
; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33		; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4		; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
Show All 10 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: v_mov_b32_e32 v0, 1		; GFX11-NEXT: v_mov_b32_e32 v0, 1
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i1@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i1@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i1@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i1@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: scratch_store_b8 off, v0, s32		; GFX11-NEXT: scratch_store_b8 off, v0, s32
		; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
; GFX11-NEXT: v_readlane_b32 s31, v40, 1		; GFX11-NEXT: v_readlane_b32 s31, v40, 1
; GFX11-NEXT: v_readlane_b32 s30, v40, 0		; GFX11-NEXT: v_readlane_b32 s30, v40, 0
; GFX11-NEXT: v_readlane_b32 s0, v41, 0		; GFX11-NEXT: v_readlane_b32 s0, v41, 0
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_load_b32 v40, off, s33		; GFX11-NEXT: scratch_load_b32 v40, off, s33
; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4		; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: s_add_i32 s32, s32, -16		; GFX11-NEXT: s_add_i32 s32, s32, -16
; GFX11-NEXT: s_mov_b32 s33, s0		; GFX11-NEXT: s_mov_b32 s33, s0
; GFX11-NEXT: s_waitcnt vmcnt(0)		; GFX11-NEXT: s_waitcnt vmcnt(0)
; GFX11-NEXT: s_setpc_b64 s[30:31]		; GFX11-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-SCRATCH-LABEL: test_call_external_void_func_i1_imm:		; GFX10-SCRATCH-LABEL: test_call_external_void_func_i1_imm:
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i1@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i1@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i1@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i1@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s32		; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s32
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: s_clause 0x1		; GFX10-SCRATCH-NEXT: s_clause 0x1
; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33		; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4		; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
Show All 14 Lines
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
; GFX9-NEXT: global_load_ubyte v0, v[0:1], off glc		; GFX9-NEXT: global_load_ubyte v0, v[0:1], off glc
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i1_signext@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i1_signext@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i1_signext@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i1_signext@rel32@hi+12
		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: v_and_b32_e32 v0, 1, v0		; GFX9-NEXT: v_and_b32_e32 v0, 1, v0
; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s32		; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s32
; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX9-NEXT: v_readlane_b32 s31, v40, 1		; GFX9-NEXT: v_readlane_b32 s31, v40, 1
; GFX9-NEXT: v_readlane_b32 s30, v40, 0		; GFX9-NEXT: v_readlane_b32 s30, v40, 0
; GFX9-NEXT: v_readlane_b32 s34, v41, 0		; GFX9-NEXT: v_readlane_b32 s34, v41, 0
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload		; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
Show All 12 Lines
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: global_load_ubyte v0, v[0:1], off glc dlc		; GFX10-NEXT: global_load_ubyte v0, v[0:1], off glc dlc
; GFX10-NEXT: s_waitcnt vmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0)
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i1_signext@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i1_signext@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i1_signext@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i1_signext@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
		; GFX10-NEXT: s_waitcnt vmcnt(0)
; GFX10-NEXT: v_and_b32_e32 v0, 1, v0		; GFX10-NEXT: v_and_b32_e32 v0, 1, v0
; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s32		; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s32
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-NEXT: v_readlane_b32 s30, v40, 0
; GFX10-NEXT: v_readlane_b32 s34, v41, 0		; GFX10-NEXT: v_readlane_b32 s34, v41, 0
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: s_clause 0x1		; GFX10-NEXT: s_clause 0x1
Show All 14 Lines
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: global_load_u8 v0, v[0:1], off glc dlc		; GFX11-NEXT: global_load_u8 v0, v[0:1], off glc dlc
; GFX11-NEXT: s_waitcnt vmcnt(0)		; GFX11-NEXT: s_waitcnt vmcnt(0)
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i1_signext@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i1_signext@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i1_signext@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i1_signext@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s31, 1		; GFX11-NEXT: v_writelane_b32 v40, s31, 1
		; GFX11-NEXT: s_waitcnt vmcnt(0)
; GFX11-NEXT: v_and_b32_e32 v0, 1, v0		; GFX11-NEXT: v_and_b32_e32 v0, 1, v0
; GFX11-NEXT: scratch_store_b8 off, v0, s32		; GFX11-NEXT: scratch_store_b8 off, v0, s32
; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX11-NEXT: v_readlane_b32 s31, v40, 1		; GFX11-NEXT: v_readlane_b32 s31, v40, 1
; GFX11-NEXT: v_readlane_b32 s30, v40, 0		; GFX11-NEXT: v_readlane_b32 s30, v40, 0
; GFX11-NEXT: v_readlane_b32 s0, v41, 0		; GFX11-NEXT: v_readlane_b32 s0, v41, 0
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
Show All 13 Lines
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: global_load_ubyte v0, v[0:1], off glc dlc		; GFX10-SCRATCH-NEXT: global_load_ubyte v0, v[0:1], off glc dlc
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i1_signext@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i1_signext@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i1_signext@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i1_signext@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
; GFX10-SCRATCH-NEXT: v_and_b32_e32 v0, 1, v0		; GFX10-SCRATCH-NEXT: v_and_b32_e32 v0, 1, v0
; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s32		; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s32
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: s_clause 0x1		; GFX10-SCRATCH-NEXT: s_clause 0x1
Show All 17 Lines
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
; GFX9-NEXT: global_load_ubyte v0, v[0:1], off glc		; GFX9-NEXT: global_load_ubyte v0, v[0:1], off glc
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i1_zeroext@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i1_zeroext@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i1_zeroext@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i1_zeroext@rel32@hi+12
		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: v_and_b32_e32 v0, 1, v0		; GFX9-NEXT: v_and_b32_e32 v0, 1, v0
; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s32		; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s32
; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX9-NEXT: v_readlane_b32 s31, v40, 1		; GFX9-NEXT: v_readlane_b32 s31, v40, 1
; GFX9-NEXT: v_readlane_b32 s30, v40, 0		; GFX9-NEXT: v_readlane_b32 s30, v40, 0
; GFX9-NEXT: v_readlane_b32 s34, v41, 0		; GFX9-NEXT: v_readlane_b32 s34, v41, 0
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload		; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
Show All 12 Lines
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: global_load_ubyte v0, v[0:1], off glc dlc		; GFX10-NEXT: global_load_ubyte v0, v[0:1], off glc dlc
; GFX10-NEXT: s_waitcnt vmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0)
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i1_zeroext@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i1_zeroext@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i1_zeroext@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i1_zeroext@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
		; GFX10-NEXT: s_waitcnt vmcnt(0)
; GFX10-NEXT: v_and_b32_e32 v0, 1, v0		; GFX10-NEXT: v_and_b32_e32 v0, 1, v0
; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s32		; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s32
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-NEXT: v_readlane_b32 s30, v40, 0
; GFX10-NEXT: v_readlane_b32 s34, v41, 0		; GFX10-NEXT: v_readlane_b32 s34, v41, 0
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: s_clause 0x1		; GFX10-NEXT: s_clause 0x1
Show All 14 Lines
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: global_load_u8 v0, v[0:1], off glc dlc		; GFX11-NEXT: global_load_u8 v0, v[0:1], off glc dlc
; GFX11-NEXT: s_waitcnt vmcnt(0)		; GFX11-NEXT: s_waitcnt vmcnt(0)
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i1_zeroext@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i1_zeroext@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i1_zeroext@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i1_zeroext@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s31, 1		; GFX11-NEXT: v_writelane_b32 v40, s31, 1
		; GFX11-NEXT: s_waitcnt vmcnt(0)
; GFX11-NEXT: v_and_b32_e32 v0, 1, v0		; GFX11-NEXT: v_and_b32_e32 v0, 1, v0
; GFX11-NEXT: scratch_store_b8 off, v0, s32		; GFX11-NEXT: scratch_store_b8 off, v0, s32
; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX11-NEXT: v_readlane_b32 s31, v40, 1		; GFX11-NEXT: v_readlane_b32 s31, v40, 1
; GFX11-NEXT: v_readlane_b32 s30, v40, 0		; GFX11-NEXT: v_readlane_b32 s30, v40, 0
; GFX11-NEXT: v_readlane_b32 s0, v41, 0		; GFX11-NEXT: v_readlane_b32 s0, v41, 0
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
Show All 13 Lines
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: global_load_ubyte v0, v[0:1], off glc dlc		; GFX10-SCRATCH-NEXT: global_load_ubyte v0, v[0:1], off glc dlc
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i1_zeroext@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i1_zeroext@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i1_zeroext@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i1_zeroext@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
; GFX10-SCRATCH-NEXT: v_and_b32_e32 v0, 1, v0		; GFX10-SCRATCH-NEXT: v_and_b32_e32 v0, 1, v0
; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s32		; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s32
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: s_clause 0x1		; GFX10-SCRATCH-NEXT: s_clause 0x1
Show All 15 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v0, 0x7b		; GFX9-NEXT: v_mov_b32_e32 v0, 0x7b
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i8@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i8@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i8@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i8@rel32@hi+12
Show All 16 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v0, 0x7b		; GFX10-NEXT: v_mov_b32_e32 v0, 0x7b
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i8@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i8@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i8@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i8@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
Show All 16 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: v_mov_b32_e32 v0, 0x7b		; GFX11-NEXT: v_mov_b32_e32 v0, 0x7b
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i8@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i8@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i8@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i8@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s31, 1		; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
Show All 16 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x7b		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x7b
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i8@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i8@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i8@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i8@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
Show All 20 Lines
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
; GFX9-NEXT: global_load_sbyte v0, v[0:1], off glc		; GFX9-NEXT: global_load_sbyte v0, v[0:1], off glc
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i8_signext@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i8_signext@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i8_signext@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i8_signext@rel32@hi+12
; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
Show All 17 Lines
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: global_load_sbyte v0, v[0:1], off glc dlc		; GFX10-NEXT: global_load_sbyte v0, v[0:1], off glc dlc
; GFX10-NEXT: s_waitcnt vmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0)
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i8_signext@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i8_signext@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i8_signext@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i8_signext@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-NEXT: v_readlane_b32 s30, v40, 0
Show All 17 Lines
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: global_load_i8 v0, v[0:1], off glc dlc		; GFX11-NEXT: global_load_i8 v0, v[0:1], off glc dlc
; GFX11-NEXT: s_waitcnt vmcnt(0)		; GFX11-NEXT: s_waitcnt vmcnt(0)
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i8_signext@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i8_signext@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i8_signext@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i8_signext@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s31, 1		; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
; GFX11-NEXT: v_readlane_b32 s31, v40, 1		; GFX11-NEXT: v_readlane_b32 s31, v40, 1
Show All 17 Lines
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: global_load_sbyte v0, v[0:1], off glc dlc		; GFX10-SCRATCH-NEXT: global_load_sbyte v0, v[0:1], off glc dlc
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i8_signext@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i8_signext@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i8_signext@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i8_signext@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
Show All 20 Lines
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
; GFX9-NEXT: global_load_ubyte v0, v[0:1], off glc		; GFX9-NEXT: global_load_ubyte v0, v[0:1], off glc
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i8_zeroext@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i8_zeroext@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i8_zeroext@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i8_zeroext@rel32@hi+12
; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
Show All 17 Lines
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: global_load_ubyte v0, v[0:1], off glc dlc		; GFX10-NEXT: global_load_ubyte v0, v[0:1], off glc dlc
; GFX10-NEXT: s_waitcnt vmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0)
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i8_zeroext@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i8_zeroext@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i8_zeroext@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i8_zeroext@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-NEXT: v_readlane_b32 s30, v40, 0
Show All 17 Lines
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: global_load_u8 v0, v[0:1], off glc dlc		; GFX11-NEXT: global_load_u8 v0, v[0:1], off glc dlc
; GFX11-NEXT: s_waitcnt vmcnt(0)		; GFX11-NEXT: s_waitcnt vmcnt(0)
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i8_zeroext@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i8_zeroext@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i8_zeroext@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i8_zeroext@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s31, 1		; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
; GFX11-NEXT: v_readlane_b32 s31, v40, 1		; GFX11-NEXT: v_readlane_b32 s31, v40, 1
Show All 17 Lines
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: global_load_ubyte v0, v[0:1], off glc dlc		; GFX10-SCRATCH-NEXT: global_load_ubyte v0, v[0:1], off glc dlc
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i8_zeroext@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i8_zeroext@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i8_zeroext@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i8_zeroext@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
Show All 18 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v0, 0x7b		; GFX9-NEXT: v_mov_b32_e32 v0, 0x7b
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i16@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i16@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i16@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i16@rel32@hi+12
Show All 16 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v0, 0x7b		; GFX10-NEXT: v_mov_b32_e32 v0, 0x7b
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i16@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i16@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i16@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i16@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
Show All 16 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: v_mov_b32_e32 v0, 0x7b		; GFX11-NEXT: v_mov_b32_e32 v0, 0x7b
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i16@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i16@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i16@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i16@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s31, 1		; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
Show All 16 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x7b		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x7b
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i16@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i16@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i16@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i16@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
Show All 20 Lines
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
; GFX9-NEXT: global_load_ushort v0, v[0:1], off glc		; GFX9-NEXT: global_load_ushort v0, v[0:1], off glc
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i16_signext@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i16_signext@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i16_signext@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i16_signext@rel32@hi+12
; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
Show All 17 Lines
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: global_load_ushort v0, v[0:1], off glc dlc		; GFX10-NEXT: global_load_ushort v0, v[0:1], off glc dlc
; GFX10-NEXT: s_waitcnt vmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0)
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i16_signext@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i16_signext@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i16_signext@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i16_signext@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-NEXT: v_readlane_b32 s30, v40, 0
Show All 17 Lines
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: global_load_u16 v0, v[0:1], off glc dlc		; GFX11-NEXT: global_load_u16 v0, v[0:1], off glc dlc
; GFX11-NEXT: s_waitcnt vmcnt(0)		; GFX11-NEXT: s_waitcnt vmcnt(0)
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i16_signext@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i16_signext@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i16_signext@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i16_signext@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s31, 1		; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
; GFX11-NEXT: v_readlane_b32 s31, v40, 1		; GFX11-NEXT: v_readlane_b32 s31, v40, 1
Show All 17 Lines
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: global_load_ushort v0, v[0:1], off glc dlc		; GFX10-SCRATCH-NEXT: global_load_ushort v0, v[0:1], off glc dlc
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i16_signext@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i16_signext@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i16_signext@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i16_signext@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
Show All 20 Lines
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
; GFX9-NEXT: global_load_ushort v0, v[0:1], off glc		; GFX9-NEXT: global_load_ushort v0, v[0:1], off glc
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i16_zeroext@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i16_zeroext@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i16_zeroext@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i16_zeroext@rel32@hi+12
; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
Show All 17 Lines
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: global_load_ushort v0, v[0:1], off glc dlc		; GFX10-NEXT: global_load_ushort v0, v[0:1], off glc dlc
; GFX10-NEXT: s_waitcnt vmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0)
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i16_zeroext@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i16_zeroext@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i16_zeroext@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i16_zeroext@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-NEXT: v_readlane_b32 s30, v40, 0
Show All 17 Lines
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: global_load_u16 v0, v[0:1], off glc dlc		; GFX11-NEXT: global_load_u16 v0, v[0:1], off glc dlc
; GFX11-NEXT: s_waitcnt vmcnt(0)		; GFX11-NEXT: s_waitcnt vmcnt(0)
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i16_zeroext@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i16_zeroext@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i16_zeroext@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i16_zeroext@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s31, 1		; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
; GFX11-NEXT: v_readlane_b32 s31, v40, 1		; GFX11-NEXT: v_readlane_b32 s31, v40, 1
Show All 17 Lines
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: global_load_ushort v0, v[0:1], off glc dlc		; GFX10-SCRATCH-NEXT: global_load_ushort v0, v[0:1], off glc dlc
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i16_zeroext@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i16_zeroext@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i16_zeroext@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i16_zeroext@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
Show All 18 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v0, 42		; GFX9-NEXT: v_mov_b32_e32 v0, 42
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i32@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i32@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i32@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i32@rel32@hi+12
Show All 16 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v0, 42		; GFX10-NEXT: v_mov_b32_e32 v0, 42
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i32@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i32@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i32@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i32@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
Show All 16 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: v_mov_b32_e32 v0, 42		; GFX11-NEXT: v_mov_b32_e32 v0, 42
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i32@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i32@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i32@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i32@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s31, 1		; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
Show All 16 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 42		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 42
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i32@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i32@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i32@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i32@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
Show All 18 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v0, 0x7b		; GFX9-NEXT: v_mov_b32_e32 v0, 0x7b
; GFX9-NEXT: v_mov_b32_e32 v1, 0		; GFX9-NEXT: v_mov_b32_e32 v1, 0
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i64@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i64@rel32@lo+4
Show All 17 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v0, 0x7b		; GFX10-NEXT: v_mov_b32_e32 v0, 0x7b
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_mov_b32_e32 v1, 0		; GFX10-NEXT: v_mov_b32_e32 v1, 0
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i64@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i64@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i64@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i64@rel32@hi+12
		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-NEXT: v_readlane_b32 s30, v40, 0
; GFX10-NEXT: v_readlane_b32 s34, v41, 0		; GFX10-NEXT: v_readlane_b32 s34, v41, 0
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: s_clause 0x1		; GFX10-NEXT: s_clause 0x1
; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33		; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4		; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
Show All 10 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: v_dual_mov_b32 v0, 0x7b :: v_dual_mov_b32 v1, 0		; GFX11-NEXT: v_dual_mov_b32 v0, 0x7b :: v_dual_mov_b32 v1, 0
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i64@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i64@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i64@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i64@rel32@hi+12
; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)		; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
; GFX11-NEXT: v_readlane_b32 s31, v40, 1		; GFX11-NEXT: v_readlane_b32 s31, v40, 1
; GFX11-NEXT: v_readlane_b32 s30, v40, 0		; GFX11-NEXT: v_readlane_b32 s30, v40, 0
; GFX11-NEXT: v_readlane_b32 s0, v41, 0		; GFX11-NEXT: v_readlane_b32 s0, v41, 0
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_load_b32 v40, off, s33		; GFX11-NEXT: scratch_load_b32 v40, off, s33
; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4		; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: s_add_i32 s32, s32, -16		; GFX11-NEXT: s_add_i32 s32, s32, -16
; GFX11-NEXT: s_mov_b32 s33, s0		; GFX11-NEXT: s_mov_b32 s33, s0
; GFX11-NEXT: s_waitcnt vmcnt(0)		; GFX11-NEXT: s_waitcnt vmcnt(0)
; GFX11-NEXT: s_setpc_b64 s[30:31]		; GFX11-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-SCRATCH-LABEL: test_call_external_void_func_i64_imm:		; GFX10-SCRATCH-LABEL: test_call_external_void_func_i64_imm:
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x7b		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x7b
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i64@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i64@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i64@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i64@rel32@hi+12
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: s_clause 0x1		; GFX10-SCRATCH-NEXT: s_clause 0x1
; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33		; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4		; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
Show All 15 Lines
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
; GFX9-NEXT: v_mov_b32_e32 v0, 0		; GFX9-NEXT: v_mov_b32_e32 v0, 0
; GFX9-NEXT: v_mov_b32_e32 v1, 0		; GFX9-NEXT: v_mov_b32_e32 v1, 0
; GFX9-NEXT: global_load_dwordx4 v[0:3], v[0:1], off		; GFX9-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i64@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i64@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64@rel32@hi+12
; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
Show All 17 Lines
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_mov_b32_e32 v0, 0		; GFX10-NEXT: v_mov_b32_e32 v0, 0
; GFX10-NEXT: v_mov_b32_e32 v1, 0		; GFX10-NEXT: v_mov_b32_e32 v1, 0
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i64@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i64@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64@rel32@hi+12
; GFX10-NEXT: global_load_dwordx4 v[0:3], v[0:1], off		; GFX10-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
Show All 18 Lines
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_mov_b32_e32 v0, 0		; GFX11-NEXT: v_mov_b32_e32 v0, 0
; GFX11-NEXT: v_mov_b32_e32 v1, 0		; GFX11-NEXT: v_mov_b32_e32 v1, 0
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i64@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i64@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64@rel32@hi+12
; GFX11-NEXT: global_load_b128 v[0:3], v[0:1], off		; GFX11-NEXT: global_load_b128 v[0:3], v[0:1], off
; GFX11-NEXT: v_writelane_b32 v40, s31, 1		; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
Show All 18 Lines
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i64@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i64@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64@rel32@hi+12
; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v[0:1], off		; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
Show All 19 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v0, 1		; GFX9-NEXT: v_mov_b32_e32 v0, 1
; GFX9-NEXT: v_mov_b32_e32 v1, 2		; GFX9-NEXT: v_mov_b32_e32 v1, 2
; GFX9-NEXT: v_mov_b32_e32 v2, 3		; GFX9-NEXT: v_mov_b32_e32 v2, 3
; GFX9-NEXT: v_mov_b32_e32 v3, 4		; GFX9-NEXT: v_mov_b32_e32 v3, 4
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
Show All 19 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v0, 1		; GFX10-NEXT: v_mov_b32_e32 v0, 1
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_mov_b32_e32 v1, 2		; GFX10-NEXT: v_mov_b32_e32 v1, 2
; GFX10-NEXT: v_mov_b32_e32 v2, 3		; GFX10-NEXT: v_mov_b32_e32 v2, 3
; GFX10-NEXT: v_mov_b32_e32 v3, 4		; GFX10-NEXT: v_mov_b32_e32 v3, 4
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i64@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i64@rel32@lo+4
Show All 19 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: v_dual_mov_b32 v0, 1 :: v_dual_mov_b32 v1, 2		; GFX11-NEXT: v_dual_mov_b32 v0, 1 :: v_dual_mov_b32 v1, 2
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: v_dual_mov_b32 v2, 3 :: v_dual_mov_b32 v3, 4		; GFX11-NEXT: v_dual_mov_b32 v2, 3 :: v_dual_mov_b32 v3, 4
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: v_writelane_b32 v40, s31, 1		; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i64@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i64@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64@rel32@hi+12
; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)		; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
Show All 17 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 3		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 3
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 4		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 4
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i64@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i64@rel32@lo+4
Show All 24 Lines
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
; GFX9-NEXT: v_mov_b32_e32 v0, 0		; GFX9-NEXT: v_mov_b32_e32 v0, 0
; GFX9-NEXT: v_mov_b32_e32 v1, 0		; GFX9-NEXT: v_mov_b32_e32 v1, 0
; GFX9-NEXT: global_load_dwordx4 v[0:3], v[0:1], off		; GFX9-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v4, 1		; GFX9-NEXT: v_mov_b32_e32 v4, 1
; GFX9-NEXT: v_mov_b32_e32 v5, 2		; GFX9-NEXT: v_mov_b32_e32 v5, 2
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i64@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i64@rel32@lo+4
Show All 19 Lines
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_mov_b32_e32 v0, 0		; GFX10-NEXT: v_mov_b32_e32 v0, 0
; GFX10-NEXT: v_mov_b32_e32 v1, 0		; GFX10-NEXT: v_mov_b32_e32 v1, 0
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v4, 1		; GFX10-NEXT: v_mov_b32_e32 v4, 1
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_mov_b32_e32 v5, 2		; GFX10-NEXT: v_mov_b32_e32 v5, 2
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: global_load_dwordx4 v[0:3], v[0:1], off		; GFX10-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i64@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i64@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i64@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i64@rel32@hi+12
Show All 20 Lines
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v5, 2		; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v5, 2
; GFX11-NEXT: v_dual_mov_b32 v1, 0 :: v_dual_mov_b32 v4, 1		; GFX11-NEXT: v_dual_mov_b32 v1, 0 :: v_dual_mov_b32 v4, 1
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: global_load_b128 v[0:3], v[0:1], off		; GFX11-NEXT: global_load_b128 v[0:3], v[0:1], off
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i64@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i64@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i64@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i64@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s31, 1		; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
Show All 18 Lines
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 1		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 1
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 2		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 2
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v[0:1], off		; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i64@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i64@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i64@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i64@rel32@hi+12
Show All 26 Lines
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
; GFX9-NEXT: v_mov_b32_e32 v0, 0		; GFX9-NEXT: v_mov_b32_e32 v0, 0
; GFX9-NEXT: v_mov_b32_e32 v1, 0		; GFX9-NEXT: v_mov_b32_e32 v1, 0
; GFX9-NEXT: global_load_dwordx4 v[0:3], v[0:1], off		; GFX9-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v4, 1		; GFX9-NEXT: v_mov_b32_e32 v4, 1
; GFX9-NEXT: v_mov_b32_e32 v5, 2		; GFX9-NEXT: v_mov_b32_e32 v5, 2
; GFX9-NEXT: v_mov_b32_e32 v6, 3		; GFX9-NEXT: v_mov_b32_e32 v6, 3
; GFX9-NEXT: v_mov_b32_e32 v7, 4		; GFX9-NEXT: v_mov_b32_e32 v7, 4
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
Show All 21 Lines
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_mov_b32_e32 v0, 0		; GFX10-NEXT: v_mov_b32_e32 v0, 0
; GFX10-NEXT: v_mov_b32_e32 v1, 0		; GFX10-NEXT: v_mov_b32_e32 v1, 0
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v4, 1		; GFX10-NEXT: v_mov_b32_e32 v4, 1
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_mov_b32_e32 v5, 2		; GFX10-NEXT: v_mov_b32_e32 v5, 2
; GFX10-NEXT: v_mov_b32_e32 v6, 3		; GFX10-NEXT: v_mov_b32_e32 v6, 3
; GFX10-NEXT: global_load_dwordx4 v[0:3], v[0:1], off		; GFX10-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
; GFX10-NEXT: v_mov_b32_e32 v7, 4		; GFX10-NEXT: v_mov_b32_e32 v7, 4
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
Show All 22 Lines
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v5, 2		; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v5, 2
; GFX11-NEXT: v_dual_mov_b32 v1, 0 :: v_dual_mov_b32 v4, 1		; GFX11-NEXT: v_dual_mov_b32 v1, 0 :: v_dual_mov_b32 v4, 1
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: v_dual_mov_b32 v6, 3 :: v_dual_mov_b32 v7, 4		; GFX11-NEXT: v_dual_mov_b32 v6, 3 :: v_dual_mov_b32 v7, 4
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: global_load_b128 v[0:3], v[0:1], off		; GFX11-NEXT: global_load_b128 v[0:3], v[0:1], off
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i64@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i64@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i64@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i64@rel32@hi+12
; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)		; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
; GFX11-NEXT: v_readlane_b32 s31, v40, 1		; GFX11-NEXT: v_readlane_b32 s31, v40, 1
; GFX11-NEXT: v_readlane_b32 s30, v40, 0		; GFX11-NEXT: v_readlane_b32 s30, v40, 0
; GFX11-NEXT: v_readlane_b32 s0, v41, 0		; GFX11-NEXT: v_readlane_b32 s0, v41, 0
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_load_b32 v40, off, s33		; GFX11-NEXT: scratch_load_b32 v40, off, s33
; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4		; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
Show All 10 Lines
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 1		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 1
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 2		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 2
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 3		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 3
; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v[0:1], off		; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 4		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 4
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
Show All 24 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v0, 0x4400		; GFX9-NEXT: v_mov_b32_e32 v0, 0x4400
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f16@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f16@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f16@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f16@rel32@hi+12
Show All 16 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v0, 0x4400		; GFX10-NEXT: v_mov_b32_e32 v0, 0x4400
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f16@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f16@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f16@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f16@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
Show All 16 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: v_mov_b32_e32 v0, 0x4400		; GFX11-NEXT: v_mov_b32_e32 v0, 0x4400
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_f16@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_f16@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_f16@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_f16@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s31, 1		; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
Show All 16 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x4400		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x4400
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f16@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f16@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f16@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f16@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
Show All 18 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v0, 4.0		; GFX9-NEXT: v_mov_b32_e32 v0, 4.0
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f32@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f32@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f32@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f32@rel32@hi+12
Show All 16 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v0, 4.0		; GFX10-NEXT: v_mov_b32_e32 v0, 4.0
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f32@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f32@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f32@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f32@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
Show All 16 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: v_mov_b32_e32 v0, 4.0		; GFX11-NEXT: v_mov_b32_e32 v0, 4.0
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_f32@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_f32@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_f32@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_f32@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s31, 1		; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
Show All 16 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 4.0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 4.0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f32@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f32@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f32@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f32@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
Show All 18 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v0, 1.0		; GFX9-NEXT: v_mov_b32_e32 v0, 1.0
; GFX9-NEXT: v_mov_b32_e32 v1, 2.0		; GFX9-NEXT: v_mov_b32_e32 v1, 2.0
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2f32@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2f32@rel32@lo+4
Show All 17 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v0, 1.0		; GFX10-NEXT: v_mov_b32_e32 v0, 1.0
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_mov_b32_e32 v1, 2.0		; GFX10-NEXT: v_mov_b32_e32 v1, 2.0
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f32@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f32@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f32@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f32@rel32@hi+12
		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-NEXT: v_readlane_b32 s30, v40, 0
; GFX10-NEXT: v_readlane_b32 s34, v41, 0		; GFX10-NEXT: v_readlane_b32 s34, v41, 0
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: s_clause 0x1		; GFX10-NEXT: s_clause 0x1
; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33		; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4		; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
Show All 10 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: v_dual_mov_b32 v0, 1.0 :: v_dual_mov_b32 v1, 2.0		; GFX11-NEXT: v_dual_mov_b32 v0, 1.0 :: v_dual_mov_b32 v1, 2.0
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2f32@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2f32@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2f32@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2f32@rel32@hi+12
; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)		; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
; GFX11-NEXT: v_readlane_b32 s31, v40, 1		; GFX11-NEXT: v_readlane_b32 s31, v40, 1
; GFX11-NEXT: v_readlane_b32 s30, v40, 0		; GFX11-NEXT: v_readlane_b32 s30, v40, 0
; GFX11-NEXT: v_readlane_b32 s0, v41, 0		; GFX11-NEXT: v_readlane_b32 s0, v41, 0
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_load_b32 v40, off, s33		; GFX11-NEXT: scratch_load_b32 v40, off, s33
; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4		; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: s_add_i32 s32, s32, -16		; GFX11-NEXT: s_add_i32 s32, s32, -16
; GFX11-NEXT: s_mov_b32 s33, s0		; GFX11-NEXT: s_mov_b32 s33, s0
; GFX11-NEXT: s_waitcnt vmcnt(0)		; GFX11-NEXT: s_waitcnt vmcnt(0)
; GFX11-NEXT: s_setpc_b64 s[30:31]		; GFX11-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f32_imm:		; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f32_imm:
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1.0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1.0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f32@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f32@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f32@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f32@rel32@hi+12
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: s_clause 0x1		; GFX10-SCRATCH-NEXT: s_clause 0x1
; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33		; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4		; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
Show All 12 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v0, 1.0		; GFX9-NEXT: v_mov_b32_e32 v0, 1.0
; GFX9-NEXT: v_mov_b32_e32 v1, 2.0		; GFX9-NEXT: v_mov_b32_e32 v1, 2.0
; GFX9-NEXT: v_mov_b32_e32 v2, 4.0		; GFX9-NEXT: v_mov_b32_e32 v2, 4.0
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
Show All 18 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v0, 1.0		; GFX10-NEXT: v_mov_b32_e32 v0, 1.0
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_mov_b32_e32 v1, 2.0		; GFX10-NEXT: v_mov_b32_e32 v1, 2.0
; GFX10-NEXT: v_mov_b32_e32 v2, 4.0		; GFX10-NEXT: v_mov_b32_e32 v2, 4.0
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f32@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f32@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f32@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f32@rel32@hi+12
Show All 18 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: v_dual_mov_b32 v0, 1.0 :: v_dual_mov_b32 v1, 2.0		; GFX11-NEXT: v_dual_mov_b32 v0, 1.0 :: v_dual_mov_b32 v1, 2.0
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: v_mov_b32_e32 v2, 4.0		; GFX11-NEXT: v_mov_b32_e32 v2, 4.0
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3f32@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3f32@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3f32@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3f32@rel32@hi+12
; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)		; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
; GFX11-NEXT: v_readlane_b32 s31, v40, 1		; GFX11-NEXT: v_readlane_b32 s31, v40, 1
; GFX11-NEXT: v_readlane_b32 s30, v40, 0		; GFX11-NEXT: v_readlane_b32 s30, v40, 0
; GFX11-NEXT: v_readlane_b32 s0, v41, 0		; GFX11-NEXT: v_readlane_b32 s0, v41, 0
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_load_b32 v40, off, s33		; GFX11-NEXT: scratch_load_b32 v40, off, s33
; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4		; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: s_add_i32 s32, s32, -16		; GFX11-NEXT: s_add_i32 s32, s32, -16
; GFX11-NEXT: s_mov_b32 s33, s0		; GFX11-NEXT: s_mov_b32 s33, s0
; GFX11-NEXT: s_waitcnt vmcnt(0)		; GFX11-NEXT: s_waitcnt vmcnt(0)
; GFX11-NEXT: s_setpc_b64 s[30:31]		; GFX11-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f32_imm:		; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f32_imm:
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1.0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1.0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 4.0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 4.0
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f32@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f32@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f32@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f32@rel32@hi+12
Show All 20 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v0, 1.0		; GFX9-NEXT: v_mov_b32_e32 v0, 1.0
; GFX9-NEXT: v_mov_b32_e32 v1, 2.0		; GFX9-NEXT: v_mov_b32_e32 v1, 2.0
; GFX9-NEXT: v_mov_b32_e32 v2, 4.0		; GFX9-NEXT: v_mov_b32_e32 v2, 4.0
; GFX9-NEXT: v_mov_b32_e32 v3, -1.0		; GFX9-NEXT: v_mov_b32_e32 v3, -1.0
; GFX9-NEXT: v_mov_b32_e32 v4, 0.5		; GFX9-NEXT: v_mov_b32_e32 v4, 0.5
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
Show All 20 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v0, 1.0		; GFX10-NEXT: v_mov_b32_e32 v0, 1.0
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_mov_b32_e32 v1, 2.0		; GFX10-NEXT: v_mov_b32_e32 v1, 2.0
; GFX10-NEXT: v_mov_b32_e32 v2, 4.0		; GFX10-NEXT: v_mov_b32_e32 v2, 4.0
; GFX10-NEXT: v_mov_b32_e32 v3, -1.0		; GFX10-NEXT: v_mov_b32_e32 v3, -1.0
; GFX10-NEXT: v_mov_b32_e32 v4, 0.5		; GFX10-NEXT: v_mov_b32_e32 v4, 0.5
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
Show All 20 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: v_dual_mov_b32 v0, 1.0 :: v_dual_mov_b32 v1, 2.0		; GFX11-NEXT: v_dual_mov_b32 v0, 1.0 :: v_dual_mov_b32 v1, 2.0
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: v_dual_mov_b32 v2, 4.0 :: v_dual_mov_b32 v3, -1.0		; GFX11-NEXT: v_dual_mov_b32 v2, 4.0 :: v_dual_mov_b32 v3, -1.0
; GFX11-NEXT: v_mov_b32_e32 v4, 0.5		; GFX11-NEXT: v_mov_b32_e32 v4, 0.5
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: v_writelane_b32 v40, s31, 1		; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v5f32@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v5f32@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v5f32@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v5f32@rel32@hi+12
Show All 18 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1.0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1.0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 4.0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 4.0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, -1.0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, -1.0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 0.5		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 0.5
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
Show All 22 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v0, 0		; GFX9-NEXT: v_mov_b32_e32 v0, 0
; GFX9-NEXT: v_mov_b32_e32 v1, 0x40100000		; GFX9-NEXT: v_mov_b32_e32 v1, 0x40100000
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f64@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f64@rel32@lo+4
Show All 17 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v0, 0		; GFX10-NEXT: v_mov_b32_e32 v0, 0
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_mov_b32_e32 v1, 0x40100000		; GFX10-NEXT: v_mov_b32_e32 v1, 0x40100000
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f64@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f64@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f64@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f64@rel32@hi+12
		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-NEXT: v_readlane_b32 s30, v40, 0
; GFX10-NEXT: v_readlane_b32 s34, v41, 0		; GFX10-NEXT: v_readlane_b32 s34, v41, 0
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: s_clause 0x1		; GFX10-NEXT: s_clause 0x1
; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33		; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4		; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
Show All 10 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 0x40100000		; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 0x40100000
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_f64@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_f64@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_f64@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_f64@rel32@hi+12
; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)		; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
; GFX11-NEXT: v_readlane_b32 s31, v40, 1		; GFX11-NEXT: v_readlane_b32 s31, v40, 1
; GFX11-NEXT: v_readlane_b32 s30, v40, 0		; GFX11-NEXT: v_readlane_b32 s30, v40, 0
; GFX11-NEXT: v_readlane_b32 s0, v41, 0		; GFX11-NEXT: v_readlane_b32 s0, v41, 0
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_load_b32 v40, off, s33		; GFX11-NEXT: scratch_load_b32 v40, off, s33
; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4		; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: s_add_i32 s32, s32, -16		; GFX11-NEXT: s_add_i32 s32, s32, -16
; GFX11-NEXT: s_mov_b32 s33, s0		; GFX11-NEXT: s_mov_b32 s33, s0
; GFX11-NEXT: s_waitcnt vmcnt(0)		; GFX11-NEXT: s_waitcnt vmcnt(0)
; GFX11-NEXT: s_setpc_b64 s[30:31]		; GFX11-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-SCRATCH-LABEL: test_call_external_void_func_f64_imm:		; GFX10-SCRATCH-LABEL: test_call_external_void_func_f64_imm:
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0x40100000		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0x40100000
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f64@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f64@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f64@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f64@rel32@hi+12
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: s_clause 0x1		; GFX10-SCRATCH-NEXT: s_clause 0x1
; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33		; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4		; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
Show All 12 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v0, 0		; GFX9-NEXT: v_mov_b32_e32 v0, 0
; GFX9-NEXT: v_mov_b32_e32 v1, 2.0		; GFX9-NEXT: v_mov_b32_e32 v1, 2.0
; GFX9-NEXT: v_mov_b32_e32 v2, 0		; GFX9-NEXT: v_mov_b32_e32 v2, 0
; GFX9-NEXT: v_mov_b32_e32 v3, 0x40100000		; GFX9-NEXT: v_mov_b32_e32 v3, 0x40100000
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
Show All 19 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v0, 0		; GFX10-NEXT: v_mov_b32_e32 v0, 0
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_mov_b32_e32 v1, 2.0		; GFX10-NEXT: v_mov_b32_e32 v1, 2.0
; GFX10-NEXT: v_mov_b32_e32 v2, 0		; GFX10-NEXT: v_mov_b32_e32 v2, 0
; GFX10-NEXT: v_mov_b32_e32 v3, 0x40100000		; GFX10-NEXT: v_mov_b32_e32 v3, 0x40100000
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f64@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f64@rel32@lo+4
Show All 19 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 2.0		; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 2.0
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: v_dual_mov_b32 v2, 0 :: v_dual_mov_b32 v3, 0x40100000		; GFX11-NEXT: v_dual_mov_b32 v2, 0 :: v_dual_mov_b32 v3, 0x40100000
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: v_writelane_b32 v40, s31, 1		; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2f64@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2f64@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2f64@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2f64@rel32@hi+12
; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)		; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
Show All 17 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 0x40100000		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 0x40100000
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f64@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f64@rel32@lo+4
Show All 21 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v0, 0		; GFX9-NEXT: v_mov_b32_e32 v0, 0
; GFX9-NEXT: v_mov_b32_e32 v1, 2.0		; GFX9-NEXT: v_mov_b32_e32 v1, 2.0
; GFX9-NEXT: v_mov_b32_e32 v2, 0		; GFX9-NEXT: v_mov_b32_e32 v2, 0
; GFX9-NEXT: v_mov_b32_e32 v3, 0x40100000		; GFX9-NEXT: v_mov_b32_e32 v3, 0x40100000
; GFX9-NEXT: v_mov_b32_e32 v4, 0		; GFX9-NEXT: v_mov_b32_e32 v4, 0
; GFX9-NEXT: v_mov_b32_e32 v5, 0x40200000		; GFX9-NEXT: v_mov_b32_e32 v5, 0x40200000
Show All 21 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v0, 0		; GFX10-NEXT: v_mov_b32_e32 v0, 0
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_mov_b32_e32 v1, 2.0		; GFX10-NEXT: v_mov_b32_e32 v1, 2.0
; GFX10-NEXT: v_mov_b32_e32 v2, 0		; GFX10-NEXT: v_mov_b32_e32 v2, 0
; GFX10-NEXT: v_mov_b32_e32 v3, 0x40100000		; GFX10-NEXT: v_mov_b32_e32 v3, 0x40100000
; GFX10-NEXT: v_mov_b32_e32 v4, 0		; GFX10-NEXT: v_mov_b32_e32 v4, 0
; GFX10-NEXT: v_mov_b32_e32 v5, 0x40200000		; GFX10-NEXT: v_mov_b32_e32 v5, 0x40200000
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
Show All 21 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 2.0		; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 2.0
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: v_dual_mov_b32 v2, 0 :: v_dual_mov_b32 v3, 0x40100000		; GFX11-NEXT: v_dual_mov_b32 v2, 0 :: v_dual_mov_b32 v3, 0x40100000
; GFX11-NEXT: v_dual_mov_b32 v4, 0 :: v_dual_mov_b32 v5, 0x40200000		; GFX11-NEXT: v_dual_mov_b32 v4, 0 :: v_dual_mov_b32 v5, 0x40200000
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: v_writelane_b32 v40, s31, 1		; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3f64@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3f64@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3f64@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3f64@rel32@hi+12
Show All 18 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 0x40100000		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 0x40100000
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 0x40200000		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 0x40200000
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
Show All 24 Lines
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
; GFX9-NEXT: global_load_dword v0, v[0:1], off		; GFX9-NEXT: global_load_dword v0, v[0:1], off
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i16@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i16@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i16@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i16@rel32@hi+12
; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
Show All 16 Lines
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: global_load_dword v0, v[0:1], off		; GFX10-NEXT: global_load_dword v0, v[0:1], off
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i16@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i16@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i16@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i16@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-NEXT: v_readlane_b32 s30, v40, 0
Show All 16 Lines
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: global_load_b32 v0, v[0:1], off		; GFX11-NEXT: global_load_b32 v0, v[0:1], off
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i16@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i16@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i16@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i16@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s31, 1		; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
; GFX11-NEXT: v_readlane_b32 s31, v40, 1		; GFX11-NEXT: v_readlane_b32 s31, v40, 1
Show All 16 Lines
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: global_load_dword v0, v[0:1], off		; GFX10-SCRATCH-NEXT: global_load_dword v0, v[0:1], off
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i16@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i16@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i16@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i16@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
Show All 19 Lines
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off		; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i16@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i16@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16@rel32@hi+12
; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
Show All 16 Lines
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: global_load_dwordx2 v[0:1], v[0:1], off		; GFX10-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i16@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i16@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-NEXT: v_readlane_b32 s30, v40, 0
Show All 16 Lines
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: global_load_b64 v[0:1], v[0:1], off		; GFX11-NEXT: global_load_b64 v[0:1], v[0:1], off
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i16@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i16@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s31, 1		; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
; GFX11-NEXT: v_readlane_b32 s31, v40, 1		; GFX11-NEXT: v_readlane_b32 s31, v40, 1
Show All 16 Lines
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: global_load_dwordx2 v[0:1], v[0:1], off		; GFX10-SCRATCH-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i16@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i16@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
Show All 19 Lines
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off		; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f16@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f16@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16@rel32@hi+12
; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
Show All 16 Lines
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: global_load_dwordx2 v[0:1], v[0:1], off		; GFX10-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f16@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f16@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-NEXT: v_readlane_b32 s30, v40, 0
Show All 16 Lines
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: global_load_b64 v[0:1], v[0:1], off		; GFX11-NEXT: global_load_b64 v[0:1], v[0:1], off
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3f16@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3f16@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s31, 1		; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
; GFX11-NEXT: v_readlane_b32 s31, v40, 1		; GFX11-NEXT: v_readlane_b32 s31, v40, 1
Show All 16 Lines
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: global_load_dwordx2 v[0:1], v[0:1], off		; GFX10-SCRATCH-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f16@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f16@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
Show All 18 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v0, 0x20001		; GFX9-NEXT: v_mov_b32_e32 v0, 0x20001
; GFX9-NEXT: v_mov_b32_e32 v1, 3		; GFX9-NEXT: v_mov_b32_e32 v1, 3
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i16@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i16@rel32@lo+4
Show All 17 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v0, 0x20001		; GFX10-NEXT: v_mov_b32_e32 v0, 0x20001
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_mov_b32_e32 v1, 3		; GFX10-NEXT: v_mov_b32_e32 v1, 3
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i16@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i16@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16@rel32@hi+12
		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-NEXT: v_readlane_b32 s30, v40, 0
; GFX10-NEXT: v_readlane_b32 s34, v41, 0		; GFX10-NEXT: v_readlane_b32 s34, v41, 0
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: s_clause 0x1		; GFX10-NEXT: s_clause 0x1
; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33		; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4		; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
Show All 10 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: v_dual_mov_b32 v0, 0x20001 :: v_dual_mov_b32 v1, 3		; GFX11-NEXT: v_dual_mov_b32 v0, 0x20001 :: v_dual_mov_b32 v1, 3
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i16@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i16@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16@rel32@hi+12
; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)		; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
; GFX11-NEXT: v_readlane_b32 s31, v40, 1		; GFX11-NEXT: v_readlane_b32 s31, v40, 1
; GFX11-NEXT: v_readlane_b32 s30, v40, 0		; GFX11-NEXT: v_readlane_b32 s30, v40, 0
; GFX11-NEXT: v_readlane_b32 s0, v41, 0		; GFX11-NEXT: v_readlane_b32 s0, v41, 0
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_load_b32 v40, off, s33		; GFX11-NEXT: scratch_load_b32 v40, off, s33
; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4		; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: s_add_i32 s32, s32, -16		; GFX11-NEXT: s_add_i32 s32, s32, -16
; GFX11-NEXT: s_mov_b32 s33, s0		; GFX11-NEXT: s_mov_b32 s33, s0
; GFX11-NEXT: s_waitcnt vmcnt(0)		; GFX11-NEXT: s_waitcnt vmcnt(0)
; GFX11-NEXT: s_setpc_b64 s[30:31]		; GFX11-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i16_imm:		; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i16_imm:
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x20001		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x20001
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 3		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 3
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i16@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i16@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16@rel32@hi+12
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: s_clause 0x1		; GFX10-SCRATCH-NEXT: s_clause 0x1
; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33		; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4		; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
Show All 12 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v0, 0x40003c00		; GFX9-NEXT: v_mov_b32_e32 v0, 0x40003c00
; GFX9-NEXT: v_mov_b32_e32 v1, 0x4400		; GFX9-NEXT: v_mov_b32_e32 v1, 0x4400
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f16@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f16@rel32@lo+4
Show All 17 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v0, 0x40003c00		; GFX10-NEXT: v_mov_b32_e32 v0, 0x40003c00
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_mov_b32_e32 v1, 0x4400		; GFX10-NEXT: v_mov_b32_e32 v1, 0x4400
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f16@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f16@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16@rel32@hi+12
		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-NEXT: v_readlane_b32 s30, v40, 0
; GFX10-NEXT: v_readlane_b32 s34, v41, 0		; GFX10-NEXT: v_readlane_b32 s34, v41, 0
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: s_clause 0x1		; GFX10-NEXT: s_clause 0x1
; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33		; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4		; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
Show All 10 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: v_mov_b32_e32 v0, 0x40003c00		; GFX11-NEXT: v_mov_b32_e32 v0, 0x40003c00
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: v_mov_b32_e32 v1, 0x4400		; GFX11-NEXT: v_mov_b32_e32 v1, 0x4400
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3f16@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3f16@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16@rel32@hi+12
; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)		; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
; GFX11-NEXT: v_readlane_b32 s31, v40, 1		; GFX11-NEXT: v_readlane_b32 s31, v40, 1
; GFX11-NEXT: v_readlane_b32 s30, v40, 0		; GFX11-NEXT: v_readlane_b32 s30, v40, 0
; GFX11-NEXT: v_readlane_b32 s0, v41, 0		; GFX11-NEXT: v_readlane_b32 s0, v41, 0
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_load_b32 v40, off, s33		; GFX11-NEXT: scratch_load_b32 v40, off, s33
; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4		; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: s_add_i32 s32, s32, -16		; GFX11-NEXT: s_add_i32 s32, s32, -16
; GFX11-NEXT: s_mov_b32 s33, s0		; GFX11-NEXT: s_mov_b32 s33, s0
; GFX11-NEXT: s_waitcnt vmcnt(0)		; GFX11-NEXT: s_waitcnt vmcnt(0)
; GFX11-NEXT: s_setpc_b64 s[30:31]		; GFX11-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f16_imm:		; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f16_imm:
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x40003c00		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x40003c00
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0x4400		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0x4400
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f16@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f16@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16@rel32@hi+12
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: s_clause 0x1		; GFX10-SCRATCH-NEXT: s_clause 0x1
; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33		; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4		; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
Show All 13 Lines
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off		; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i16@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i16@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16@rel32@hi+12
; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
Show All 16 Lines
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: global_load_dwordx2 v[0:1], v[0:1], off		; GFX10-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i16@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i16@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-NEXT: v_readlane_b32 s30, v40, 0
Show All 16 Lines
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: global_load_b64 v[0:1], v[0:1], off		; GFX11-NEXT: global_load_b64 v[0:1], v[0:1], off
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i16@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i16@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s31, 1		; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
; GFX11-NEXT: v_readlane_b32 s31, v40, 1		; GFX11-NEXT: v_readlane_b32 s31, v40, 1
Show All 16 Lines
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: global_load_dwordx2 v[0:1], v[0:1], off		; GFX10-SCRATCH-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i16@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i16@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
Show All 18 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v0, 0x20001		; GFX9-NEXT: v_mov_b32_e32 v0, 0x20001
; GFX9-NEXT: v_mov_b32_e32 v1, 0x40003		; GFX9-NEXT: v_mov_b32_e32 v1, 0x40003
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i16@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i16@rel32@lo+4
Show All 17 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v0, 0x20001		; GFX10-NEXT: v_mov_b32_e32 v0, 0x20001
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_mov_b32_e32 v1, 0x40003		; GFX10-NEXT: v_mov_b32_e32 v1, 0x40003
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i16@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i16@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16@rel32@hi+12
		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-NEXT: v_readlane_b32 s30, v40, 0
; GFX10-NEXT: v_readlane_b32 s34, v41, 0		; GFX10-NEXT: v_readlane_b32 s34, v41, 0
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: s_clause 0x1		; GFX10-NEXT: s_clause 0x1
; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33		; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4		; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
Show All 10 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: v_mov_b32_e32 v0, 0x20001		; GFX11-NEXT: v_mov_b32_e32 v0, 0x20001
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: v_mov_b32_e32 v1, 0x40003		; GFX11-NEXT: v_mov_b32_e32 v1, 0x40003
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i16@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i16@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16@rel32@hi+12
; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)		; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
; GFX11-NEXT: v_readlane_b32 s31, v40, 1		; GFX11-NEXT: v_readlane_b32 s31, v40, 1
; GFX11-NEXT: v_readlane_b32 s30, v40, 0		; GFX11-NEXT: v_readlane_b32 s30, v40, 0
; GFX11-NEXT: v_readlane_b32 s0, v41, 0		; GFX11-NEXT: v_readlane_b32 s0, v41, 0
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_load_b32 v40, off, s33		; GFX11-NEXT: scratch_load_b32 v40, off, s33
; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4		; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: s_add_i32 s32, s32, -16		; GFX11-NEXT: s_add_i32 s32, s32, -16
; GFX11-NEXT: s_mov_b32 s33, s0		; GFX11-NEXT: s_mov_b32 s33, s0
; GFX11-NEXT: s_waitcnt vmcnt(0)		; GFX11-NEXT: s_waitcnt vmcnt(0)
; GFX11-NEXT: s_setpc_b64 s[30:31]		; GFX11-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i16_imm:		; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i16_imm:
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x20001		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x20001
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0x40003		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0x40003
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i16@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i16@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16@rel32@hi+12
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: s_clause 0x1		; GFX10-SCRATCH-NEXT: s_clause 0x1
; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33		; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4		; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
Show All 13 Lines
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
; GFX9-NEXT: global_load_dword v0, v[0:1], off		; GFX9-NEXT: global_load_dword v0, v[0:1], off
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2f16@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2f16@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2f16@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2f16@rel32@hi+12
; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
Show All 16 Lines
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: global_load_dword v0, v[0:1], off		; GFX10-NEXT: global_load_dword v0, v[0:1], off
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f16@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f16@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f16@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f16@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-NEXT: v_readlane_b32 s30, v40, 0
Show All 16 Lines
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: global_load_b32 v0, v[0:1], off		; GFX11-NEXT: global_load_b32 v0, v[0:1], off
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2f16@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2f16@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2f16@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2f16@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s31, 1		; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
; GFX11-NEXT: v_readlane_b32 s31, v40, 1		; GFX11-NEXT: v_readlane_b32 s31, v40, 1
Show All 16 Lines
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: global_load_dword v0, v[0:1], off		; GFX10-SCRATCH-NEXT: global_load_dword v0, v[0:1], off
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f16@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f16@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f16@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f16@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
Show All 19 Lines
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off		; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i32@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i32@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32@rel32@hi+12
; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
Show All 16 Lines
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: global_load_dwordx2 v[0:1], v[0:1], off		; GFX10-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i32@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i32@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-NEXT: v_readlane_b32 s30, v40, 0
Show All 16 Lines
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: global_load_b64 v[0:1], v[0:1], off		; GFX11-NEXT: global_load_b64 v[0:1], v[0:1], off
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i32@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i32@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s31, 1		; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
; GFX11-NEXT: v_readlane_b32 s31, v40, 1		; GFX11-NEXT: v_readlane_b32 s31, v40, 1
Show All 16 Lines
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: global_load_dwordx2 v[0:1], v[0:1], off		; GFX10-SCRATCH-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i32@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i32@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
Show All 18 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v0, 1		; GFX9-NEXT: v_mov_b32_e32 v0, 1
; GFX9-NEXT: v_mov_b32_e32 v1, 2		; GFX9-NEXT: v_mov_b32_e32 v1, 2
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i32@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i32@rel32@lo+4
Show All 17 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v0, 1		; GFX10-NEXT: v_mov_b32_e32 v0, 1
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_mov_b32_e32 v1, 2		; GFX10-NEXT: v_mov_b32_e32 v1, 2
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i32@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i32@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32@rel32@hi+12
		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-NEXT: v_readlane_b32 s30, v40, 0
; GFX10-NEXT: v_readlane_b32 s34, v41, 0		; GFX10-NEXT: v_readlane_b32 s34, v41, 0
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: s_clause 0x1		; GFX10-NEXT: s_clause 0x1
; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33		; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4		; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
Show All 10 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: v_dual_mov_b32 v0, 1 :: v_dual_mov_b32 v1, 2		; GFX11-NEXT: v_dual_mov_b32 v0, 1 :: v_dual_mov_b32 v1, 2
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i32@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i32@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32@rel32@hi+12
; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)		; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
; GFX11-NEXT: v_readlane_b32 s31, v40, 1		; GFX11-NEXT: v_readlane_b32 s31, v40, 1
; GFX11-NEXT: v_readlane_b32 s30, v40, 0		; GFX11-NEXT: v_readlane_b32 s30, v40, 0
; GFX11-NEXT: v_readlane_b32 s0, v41, 0		; GFX11-NEXT: v_readlane_b32 s0, v41, 0
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_load_b32 v40, off, s33		; GFX11-NEXT: scratch_load_b32 v40, off, s33
; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4		; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: s_add_i32 s32, s32, -16		; GFX11-NEXT: s_add_i32 s32, s32, -16
; GFX11-NEXT: s_mov_b32 s33, s0		; GFX11-NEXT: s_mov_b32 s33, s0
; GFX11-NEXT: s_waitcnt vmcnt(0)		; GFX11-NEXT: s_waitcnt vmcnt(0)
; GFX11-NEXT: s_setpc_b64 s[30:31]		; GFX11-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i32_imm:		; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i32_imm:
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i32@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i32@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32@rel32@hi+12
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: s_clause 0x1		; GFX10-SCRATCH-NEXT: s_clause 0x1
; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33		; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4		; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
Show All 12 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v0, 3		; GFX9-NEXT: v_mov_b32_e32 v0, 3
; GFX9-NEXT: v_mov_b32_e32 v1, 4		; GFX9-NEXT: v_mov_b32_e32 v1, 4
; GFX9-NEXT: v_mov_b32_e32 v2, 5		; GFX9-NEXT: v_mov_b32_e32 v2, 5
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
Show All 18 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v0, 3		; GFX10-NEXT: v_mov_b32_e32 v0, 3
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_mov_b32_e32 v1, 4		; GFX10-NEXT: v_mov_b32_e32 v1, 4
; GFX10-NEXT: v_mov_b32_e32 v2, 5		; GFX10-NEXT: v_mov_b32_e32 v2, 5
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i32@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i32@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32@rel32@hi+12
Show All 18 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: v_dual_mov_b32 v0, 3 :: v_dual_mov_b32 v1, 4		; GFX11-NEXT: v_dual_mov_b32 v0, 3 :: v_dual_mov_b32 v1, 4
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: v_mov_b32_e32 v2, 5		; GFX11-NEXT: v_mov_b32_e32 v2, 5
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i32@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i32@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32@rel32@hi+12
; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)		; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
; GFX11-NEXT: v_readlane_b32 s31, v40, 1		; GFX11-NEXT: v_readlane_b32 s31, v40, 1
; GFX11-NEXT: v_readlane_b32 s30, v40, 0		; GFX11-NEXT: v_readlane_b32 s30, v40, 0
; GFX11-NEXT: v_readlane_b32 s0, v41, 0		; GFX11-NEXT: v_readlane_b32 s0, v41, 0
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_load_b32 v40, off, s33		; GFX11-NEXT: scratch_load_b32 v40, off, s33
; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4		; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: s_add_i32 s32, s32, -16		; GFX11-NEXT: s_add_i32 s32, s32, -16
; GFX11-NEXT: s_mov_b32 s33, s0		; GFX11-NEXT: s_mov_b32 s33, s0
; GFX11-NEXT: s_waitcnt vmcnt(0)		; GFX11-NEXT: s_waitcnt vmcnt(0)
; GFX11-NEXT: s_setpc_b64 s[30:31]		; GFX11-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i32_imm:		; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i32_imm:
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 3		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 3
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 4		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 4
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 5		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 5
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i32@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i32@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32@rel32@hi+12
Show All 20 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v0, 3		; GFX9-NEXT: v_mov_b32_e32 v0, 3
; GFX9-NEXT: v_mov_b32_e32 v1, 4		; GFX9-NEXT: v_mov_b32_e32 v1, 4
; GFX9-NEXT: v_mov_b32_e32 v2, 5		; GFX9-NEXT: v_mov_b32_e32 v2, 5
; GFX9-NEXT: v_mov_b32_e32 v3, 6		; GFX9-NEXT: v_mov_b32_e32 v3, 6
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
Show All 19 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v0, 3		; GFX10-NEXT: v_mov_b32_e32 v0, 3
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_mov_b32_e32 v1, 4		; GFX10-NEXT: v_mov_b32_e32 v1, 4
; GFX10-NEXT: v_mov_b32_e32 v2, 5		; GFX10-NEXT: v_mov_b32_e32 v2, 5
; GFX10-NEXT: v_mov_b32_e32 v3, 6		; GFX10-NEXT: v_mov_b32_e32 v3, 6
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i32_i32@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i32_i32@rel32@lo+4
Show All 19 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: v_dual_mov_b32 v0, 3 :: v_dual_mov_b32 v1, 4		; GFX11-NEXT: v_dual_mov_b32 v0, 3 :: v_dual_mov_b32 v1, 4
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: v_dual_mov_b32 v2, 5 :: v_dual_mov_b32 v3, 6		; GFX11-NEXT: v_dual_mov_b32 v2, 5 :: v_dual_mov_b32 v3, 6
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: v_writelane_b32 v40, s31, 1		; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i32_i32@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i32_i32@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32_i32@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32_i32@rel32@hi+12
; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)		; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
Show All 17 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 3		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 3
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 4		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 4
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 5		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 5
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 6		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 6
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i32_i32@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i32_i32@rel32@lo+4
Show All 22 Lines
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
; GFX9-NEXT: global_load_dwordx4 v[0:3], v[0:1], off		; GFX9-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i32@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i32@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32@rel32@hi+12
; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
Show All 16 Lines
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: global_load_dwordx4 v[0:3], v[0:1], off		; GFX10-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i32@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i32@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-NEXT: v_readlane_b32 s30, v40, 0
Show All 16 Lines
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: global_load_b128 v[0:3], v[0:1], off		; GFX11-NEXT: global_load_b128 v[0:3], v[0:1], off
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i32@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i32@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s31, 1		; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
; GFX11-NEXT: v_readlane_b32 s31, v40, 1		; GFX11-NEXT: v_readlane_b32 s31, v40, 1
Show All 16 Lines
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v[0:1], off		; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i32@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i32@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
Show All 18 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v0, 1		; GFX9-NEXT: v_mov_b32_e32 v0, 1
; GFX9-NEXT: v_mov_b32_e32 v1, 2		; GFX9-NEXT: v_mov_b32_e32 v1, 2
; GFX9-NEXT: v_mov_b32_e32 v2, 3		; GFX9-NEXT: v_mov_b32_e32 v2, 3
; GFX9-NEXT: v_mov_b32_e32 v3, 4		; GFX9-NEXT: v_mov_b32_e32 v3, 4
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
Show All 19 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v0, 1		; GFX10-NEXT: v_mov_b32_e32 v0, 1
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_mov_b32_e32 v1, 2		; GFX10-NEXT: v_mov_b32_e32 v1, 2
; GFX10-NEXT: v_mov_b32_e32 v2, 3		; GFX10-NEXT: v_mov_b32_e32 v2, 3
; GFX10-NEXT: v_mov_b32_e32 v3, 4		; GFX10-NEXT: v_mov_b32_e32 v3, 4
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i32@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i32@rel32@lo+4
Show All 19 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: v_dual_mov_b32 v0, 1 :: v_dual_mov_b32 v1, 2		; GFX11-NEXT: v_dual_mov_b32 v0, 1 :: v_dual_mov_b32 v1, 2
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: v_dual_mov_b32 v2, 3 :: v_dual_mov_b32 v3, 4		; GFX11-NEXT: v_dual_mov_b32 v2, 3 :: v_dual_mov_b32 v3, 4
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: v_writelane_b32 v40, s31, 1		; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i32@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i32@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32@rel32@hi+12
; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)		; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
Show All 17 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 3		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 3
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 4		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 4
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i32@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i32@rel32@lo+4
Show All 21 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v0, 1		; GFX9-NEXT: v_mov_b32_e32 v0, 1
; GFX9-NEXT: v_mov_b32_e32 v1, 2		; GFX9-NEXT: v_mov_b32_e32 v1, 2
; GFX9-NEXT: v_mov_b32_e32 v2, 3		; GFX9-NEXT: v_mov_b32_e32 v2, 3
; GFX9-NEXT: v_mov_b32_e32 v3, 4		; GFX9-NEXT: v_mov_b32_e32 v3, 4
; GFX9-NEXT: v_mov_b32_e32 v4, 5		; GFX9-NEXT: v_mov_b32_e32 v4, 5
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
Show All 20 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v0, 1		; GFX10-NEXT: v_mov_b32_e32 v0, 1
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_mov_b32_e32 v1, 2		; GFX10-NEXT: v_mov_b32_e32 v1, 2
; GFX10-NEXT: v_mov_b32_e32 v2, 3		; GFX10-NEXT: v_mov_b32_e32 v2, 3
; GFX10-NEXT: v_mov_b32_e32 v3, 4		; GFX10-NEXT: v_mov_b32_e32 v3, 4
; GFX10-NEXT: v_mov_b32_e32 v4, 5		; GFX10-NEXT: v_mov_b32_e32 v4, 5
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
Show All 20 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: v_dual_mov_b32 v0, 1 :: v_dual_mov_b32 v1, 2		; GFX11-NEXT: v_dual_mov_b32 v0, 1 :: v_dual_mov_b32 v1, 2
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: v_dual_mov_b32 v2, 3 :: v_dual_mov_b32 v3, 4		; GFX11-NEXT: v_dual_mov_b32 v2, 3 :: v_dual_mov_b32 v3, 4
; GFX11-NEXT: v_mov_b32_e32 v4, 5		; GFX11-NEXT: v_mov_b32_e32 v4, 5
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: v_writelane_b32 v40, s31, 1		; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v5i32@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v5i32@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v5i32@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v5i32@rel32@hi+12
Show All 18 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 3		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 3
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 4		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 4
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 5		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 5
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
Show All 25 Lines
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0		; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
; GFX9-NEXT: v_mov_b32_e32 v8, 0		; GFX9-NEXT: v_mov_b32_e32 v8, 0
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: global_load_dwordx4 v[0:3], v8, s[34:35]		; GFX9-NEXT: global_load_dwordx4 v[0:3], v8, s[34:35]
; GFX9-NEXT: global_load_dwordx4 v[4:7], v8, s[34:35] offset:16		; GFX9-NEXT: global_load_dwordx4 v[4:7], v8, s[34:35] offset:16
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v8i32@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v8i32@rel32@lo+4
Show All 20 Lines
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0		; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
; GFX10-NEXT: v_mov_b32_e32 v8, 0		; GFX10-NEXT: v_mov_b32_e32 v8, 0
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-NEXT: s_clause 0x1		; GFX10-NEXT: s_clause 0x1
; GFX10-NEXT: global_load_dwordx4 v[0:3], v8, s[34:35]		; GFX10-NEXT: global_load_dwordx4 v[0:3], v8, s[34:35]
; GFX10-NEXT: global_load_dwordx4 v[4:7], v8, s[34:35] offset:16		; GFX10-NEXT: global_load_dwordx4 v[4:7], v8, s[34:35] offset:16
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v8i32@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v8i32@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v8i32@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v8i32@rel32@hi+12
Show All 21 Lines
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0		; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
; GFX11-NEXT: v_mov_b32_e32 v4, 0		; GFX11-NEXT: v_mov_b32_e32 v4, 0
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: s_waitcnt lgkmcnt(0)		; GFX11-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: global_load_b128 v[0:3], v4, s[0:1]		; GFX11-NEXT: global_load_b128 v[0:3], v4, s[0:1]
; GFX11-NEXT: global_load_b128 v[4:7], v4, s[0:1] offset:16		; GFX11-NEXT: global_load_b128 v[4:7], v4, s[0:1] offset:16
; GFX11-NEXT: v_writelane_b32 v40, s31, 1		; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v8i32@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v8i32@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v8i32@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v8i32@rel32@hi+12
Show All 21 Lines
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0		; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v8, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v8, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_clause 0x1		; GFX10-SCRATCH-NEXT: s_clause 0x1
; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v8, s[0:1]		; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v8, s[0:1]
; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[4:7], v8, s[0:1] offset:16		; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[4:7], v8, s[0:1] offset:16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v8i32@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v8i32@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v8i32@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v8i32@rel32@hi+12
Show All 22 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v0, 1		; GFX9-NEXT: v_mov_b32_e32 v0, 1
; GFX9-NEXT: v_mov_b32_e32 v1, 2		; GFX9-NEXT: v_mov_b32_e32 v1, 2
; GFX9-NEXT: v_mov_b32_e32 v2, 3		; GFX9-NEXT: v_mov_b32_e32 v2, 3
; GFX9-NEXT: v_mov_b32_e32 v3, 4		; GFX9-NEXT: v_mov_b32_e32 v3, 4
; GFX9-NEXT: v_mov_b32_e32 v4, 5		; GFX9-NEXT: v_mov_b32_e32 v4, 5
; GFX9-NEXT: v_mov_b32_e32 v5, 6		; GFX9-NEXT: v_mov_b32_e32 v5, 6
Show All 23 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v0, 1		; GFX10-NEXT: v_mov_b32_e32 v0, 1
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_mov_b32_e32 v1, 2		; GFX10-NEXT: v_mov_b32_e32 v1, 2
; GFX10-NEXT: v_mov_b32_e32 v2, 3		; GFX10-NEXT: v_mov_b32_e32 v2, 3
; GFX10-NEXT: v_mov_b32_e32 v3, 4		; GFX10-NEXT: v_mov_b32_e32 v3, 4
; GFX10-NEXT: v_mov_b32_e32 v4, 5		; GFX10-NEXT: v_mov_b32_e32 v4, 5
; GFX10-NEXT: v_mov_b32_e32 v5, 6		; GFX10-NEXT: v_mov_b32_e32 v5, 6
; GFX10-NEXT: v_mov_b32_e32 v6, 7		; GFX10-NEXT: v_mov_b32_e32 v6, 7
; GFX10-NEXT: v_mov_b32_e32 v7, 8		; GFX10-NEXT: v_mov_b32_e32 v7, 8
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
Show All 23 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: v_dual_mov_b32 v0, 1 :: v_dual_mov_b32 v1, 2		; GFX11-NEXT: v_dual_mov_b32 v0, 1 :: v_dual_mov_b32 v1, 2
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: v_dual_mov_b32 v2, 3 :: v_dual_mov_b32 v3, 4		; GFX11-NEXT: v_dual_mov_b32 v2, 3 :: v_dual_mov_b32 v3, 4
; GFX11-NEXT: v_dual_mov_b32 v4, 5 :: v_dual_mov_b32 v5, 6		; GFX11-NEXT: v_dual_mov_b32 v4, 5 :: v_dual_mov_b32 v5, 6
; GFX11-NEXT: v_dual_mov_b32 v6, 7 :: v_dual_mov_b32 v7, 8		; GFX11-NEXT: v_dual_mov_b32 v6, 7 :: v_dual_mov_b32 v7, 8
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: v_writelane_b32 v40, s31, 1		; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v8i32@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v8i32@rel32@lo+4
Show All 19 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 3		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 3
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 4		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 4
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 5		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 5
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 6		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 6
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 7		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 7
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 8		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 8
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
Show All 28 Lines
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0		; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
; GFX9-NEXT: v_mov_b32_e32 v16, 0		; GFX9-NEXT: v_mov_b32_e32 v16, 0
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: global_load_dwordx4 v[0:3], v16, s[34:35]		; GFX9-NEXT: global_load_dwordx4 v[0:3], v16, s[34:35]
; GFX9-NEXT: global_load_dwordx4 v[4:7], v16, s[34:35] offset:16		; GFX9-NEXT: global_load_dwordx4 v[4:7], v16, s[34:35] offset:16
; GFX9-NEXT: global_load_dwordx4 v[8:11], v16, s[34:35] offset:32		; GFX9-NEXT: global_load_dwordx4 v[8:11], v16, s[34:35] offset:32
; GFX9-NEXT: global_load_dwordx4 v[12:15], v16, s[34:35] offset:48		; GFX9-NEXT: global_load_dwordx4 v[12:15], v16, s[34:35] offset:48
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
Show All 22 Lines
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0		; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
; GFX10-NEXT: v_mov_b32_e32 v16, 0		; GFX10-NEXT: v_mov_b32_e32 v16, 0
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-NEXT: s_clause 0x3		; GFX10-NEXT: s_clause 0x3
; GFX10-NEXT: global_load_dwordx4 v[0:3], v16, s[34:35]		; GFX10-NEXT: global_load_dwordx4 v[0:3], v16, s[34:35]
; GFX10-NEXT: global_load_dwordx4 v[4:7], v16, s[34:35] offset:16		; GFX10-NEXT: global_load_dwordx4 v[4:7], v16, s[34:35] offset:16
; GFX10-NEXT: global_load_dwordx4 v[8:11], v16, s[34:35] offset:32		; GFX10-NEXT: global_load_dwordx4 v[8:11], v16, s[34:35] offset:32
; GFX10-NEXT: global_load_dwordx4 v[12:15], v16, s[34:35] offset:48		; GFX10-NEXT: global_load_dwordx4 v[12:15], v16, s[34:35] offset:48
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
Show All 23 Lines
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0		; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
; GFX11-NEXT: v_mov_b32_e32 v12, 0		; GFX11-NEXT: v_mov_b32_e32 v12, 0
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: s_waitcnt lgkmcnt(0)		; GFX11-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-NEXT: s_clause 0x3		; GFX11-NEXT: s_clause 0x3
; GFX11-NEXT: global_load_b128 v[0:3], v12, s[0:1]		; GFX11-NEXT: global_load_b128 v[0:3], v12, s[0:1]
; GFX11-NEXT: global_load_b128 v[4:7], v12, s[0:1] offset:16		; GFX11-NEXT: global_load_b128 v[4:7], v12, s[0:1] offset:16
; GFX11-NEXT: global_load_b128 v[8:11], v12, s[0:1] offset:32		; GFX11-NEXT: global_load_b128 v[8:11], v12, s[0:1] offset:32
; GFX11-NEXT: global_load_b128 v[12:15], v12, s[0:1] offset:48		; GFX11-NEXT: global_load_b128 v[12:15], v12, s[0:1] offset:48
; GFX11-NEXT: v_writelane_b32 v40, s31, 1		; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
Show All 23 Lines
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0		; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v16, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v16, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_clause 0x3		; GFX10-SCRATCH-NEXT: s_clause 0x3
; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v16, s[0:1]		; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v16, s[0:1]
; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[4:7], v16, s[0:1] offset:16		; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[4:7], v16, s[0:1] offset:16
; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[8:11], v16, s[0:1] offset:32		; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[8:11], v16, s[0:1] offset:32
; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[12:15], v16, s[0:1] offset:48		; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[12:15], v16, s[0:1] offset:48
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
Show All 27 Lines
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0		; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
; GFX9-NEXT: v_mov_b32_e32 v28, 0		; GFX9-NEXT: v_mov_b32_e32 v28, 0
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: global_load_dwordx4 v[0:3], v28, s[34:35]		; GFX9-NEXT: global_load_dwordx4 v[0:3], v28, s[34:35]
; GFX9-NEXT: global_load_dwordx4 v[4:7], v28, s[34:35] offset:16		; GFX9-NEXT: global_load_dwordx4 v[4:7], v28, s[34:35] offset:16
; GFX9-NEXT: global_load_dwordx4 v[8:11], v28, s[34:35] offset:32		; GFX9-NEXT: global_load_dwordx4 v[8:11], v28, s[34:35] offset:32
; GFX9-NEXT: global_load_dwordx4 v[12:15], v28, s[34:35] offset:48		; GFX9-NEXT: global_load_dwordx4 v[12:15], v28, s[34:35] offset:48
; GFX9-NEXT: global_load_dwordx4 v[16:19], v28, s[34:35] offset:64		; GFX9-NEXT: global_load_dwordx4 v[16:19], v28, s[34:35] offset:64
Show All 27 Lines
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0		; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
; GFX10-NEXT: v_mov_b32_e32 v32, 0		; GFX10-NEXT: v_mov_b32_e32 v32, 0
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-NEXT: s_clause 0x7		; GFX10-NEXT: s_clause 0x7
; GFX10-NEXT: global_load_dwordx4 v[0:3], v32, s[34:35]		; GFX10-NEXT: global_load_dwordx4 v[0:3], v32, s[34:35]
; GFX10-NEXT: global_load_dwordx4 v[4:7], v32, s[34:35] offset:16		; GFX10-NEXT: global_load_dwordx4 v[4:7], v32, s[34:35] offset:16
; GFX10-NEXT: global_load_dwordx4 v[8:11], v32, s[34:35] offset:32		; GFX10-NEXT: global_load_dwordx4 v[8:11], v32, s[34:35] offset:32
; GFX10-NEXT: global_load_dwordx4 v[12:15], v32, s[34:35] offset:48		; GFX10-NEXT: global_load_dwordx4 v[12:15], v32, s[34:35] offset:48
; GFX10-NEXT: global_load_dwordx4 v[16:19], v32, s[34:35] offset:64		; GFX10-NEXT: global_load_dwordx4 v[16:19], v32, s[34:35] offset:64
; GFX10-NEXT: global_load_dwordx4 v[20:23], v32, s[34:35] offset:80		; GFX10-NEXT: global_load_dwordx4 v[20:23], v32, s[34:35] offset:80
Show All 27 Lines
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0		; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
; GFX11-NEXT: v_mov_b32_e32 v28, 0		; GFX11-NEXT: v_mov_b32_e32 v28, 0
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: s_waitcnt lgkmcnt(0)		; GFX11-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-NEXT: s_clause 0x7		; GFX11-NEXT: s_clause 0x7
; GFX11-NEXT: global_load_b128 v[0:3], v28, s[0:1]		; GFX11-NEXT: global_load_b128 v[0:3], v28, s[0:1]
; GFX11-NEXT: global_load_b128 v[4:7], v28, s[0:1] offset:16		; GFX11-NEXT: global_load_b128 v[4:7], v28, s[0:1] offset:16
; GFX11-NEXT: global_load_b128 v[8:11], v28, s[0:1] offset:32		; GFX11-NEXT: global_load_b128 v[8:11], v28, s[0:1] offset:32
; GFX11-NEXT: global_load_b128 v[12:15], v28, s[0:1] offset:48		; GFX11-NEXT: global_load_b128 v[12:15], v28, s[0:1] offset:48
; GFX11-NEXT: global_load_b128 v[16:19], v28, s[0:1] offset:64		; GFX11-NEXT: global_load_b128 v[16:19], v28, s[0:1] offset:64
; GFX11-NEXT: global_load_b128 v[20:23], v28, s[0:1] offset:80		; GFX11-NEXT: global_load_b128 v[20:23], v28, s[0:1] offset:80
Show All 27 Lines
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0		; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v32, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v32, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_clause 0x7		; GFX10-SCRATCH-NEXT: s_clause 0x7
; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v32, s[0:1]		; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v32, s[0:1]
; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[4:7], v32, s[0:1] offset:16		; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[4:7], v32, s[0:1] offset:16
; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[8:11], v32, s[0:1] offset:32		; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[8:11], v32, s[0:1] offset:32
; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[12:15], v32, s[0:1] offset:48		; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[12:15], v32, s[0:1] offset:48
; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[16:19], v32, s[0:1] offset:64		; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[16:19], v32, s[0:1] offset:64
; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[20:23], v32, s[0:1] offset:80		; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[20:23], v32, s[0:1] offset:80
Show All 32 Lines
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0		; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
; GFX9-NEXT: v_mov_b32_e32 v28, 0		; GFX9-NEXT: v_mov_b32_e32 v28, 0
; GFX9-NEXT: global_load_dword v32, v[0:1], off		; GFX9-NEXT: global_load_dword v32, v[0:1], off
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: global_load_dwordx4 v[0:3], v28, s[34:35]		; GFX9-NEXT: global_load_dwordx4 v[0:3], v28, s[34:35]
; GFX9-NEXT: global_load_dwordx4 v[4:7], v28, s[34:35] offset:16		; GFX9-NEXT: global_load_dwordx4 v[4:7], v28, s[34:35] offset:16
; GFX9-NEXT: global_load_dwordx4 v[8:11], v28, s[34:35] offset:32		; GFX9-NEXT: global_load_dwordx4 v[8:11], v28, s[34:35] offset:32
; GFX9-NEXT: global_load_dwordx4 v[12:15], v28, s[34:35] offset:48		; GFX9-NEXT: global_load_dwordx4 v[12:15], v28, s[34:35] offset:48
; GFX9-NEXT: global_load_dwordx4 v[16:19], v28, s[34:35] offset:64		; GFX9-NEXT: global_load_dwordx4 v[16:19], v28, s[34:35] offset:64
; GFX9-NEXT: global_load_dwordx4 v[20:23], v28, s[34:35] offset:80		; GFX9-NEXT: global_load_dwordx4 v[20:23], v28, s[34:35] offset:80
Show All 29 Lines
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0		; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
; GFX10-NEXT: v_mov_b32_e32 v32, 0		; GFX10-NEXT: v_mov_b32_e32 v32, 0
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: global_load_dword v33, v[0:1], off		; GFX10-NEXT: global_load_dword v33, v[0:1], off
; GFX10-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-NEXT: s_clause 0x7		; GFX10-NEXT: s_clause 0x7
; GFX10-NEXT: global_load_dwordx4 v[0:3], v32, s[34:35]		; GFX10-NEXT: global_load_dwordx4 v[0:3], v32, s[34:35]
; GFX10-NEXT: global_load_dwordx4 v[4:7], v32, s[34:35] offset:16		; GFX10-NEXT: global_load_dwordx4 v[4:7], v32, s[34:35] offset:16
; GFX10-NEXT: global_load_dwordx4 v[8:11], v32, s[34:35] offset:32		; GFX10-NEXT: global_load_dwordx4 v[8:11], v32, s[34:35] offset:32
; GFX10-NEXT: global_load_dwordx4 v[12:15], v32, s[34:35] offset:48		; GFX10-NEXT: global_load_dwordx4 v[12:15], v32, s[34:35] offset:48
; GFX10-NEXT: global_load_dwordx4 v[16:19], v32, s[34:35] offset:64		; GFX10-NEXT: global_load_dwordx4 v[16:19], v32, s[34:35] offset:64
Show All 30 Lines
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0		; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
; GFX11-NEXT: v_mov_b32_e32 v28, 0		; GFX11-NEXT: v_mov_b32_e32 v28, 0
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: global_load_b32 v32, v[0:1], off		; GFX11-NEXT: global_load_b32 v32, v[0:1], off
; GFX11-NEXT: s_waitcnt lgkmcnt(0)		; GFX11-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-NEXT: s_clause 0x7		; GFX11-NEXT: s_clause 0x7
; GFX11-NEXT: global_load_b128 v[0:3], v28, s[0:1]		; GFX11-NEXT: global_load_b128 v[0:3], v28, s[0:1]
; GFX11-NEXT: global_load_b128 v[4:7], v28, s[0:1] offset:16		; GFX11-NEXT: global_load_b128 v[4:7], v28, s[0:1] offset:16
; GFX11-NEXT: global_load_b128 v[8:11], v28, s[0:1] offset:32		; GFX11-NEXT: global_load_b128 v[8:11], v28, s[0:1] offset:32
; GFX11-NEXT: global_load_b128 v[12:15], v28, s[0:1] offset:48		; GFX11-NEXT: global_load_b128 v[12:15], v28, s[0:1] offset:48
; GFX11-NEXT: global_load_b128 v[16:19], v28, s[0:1] offset:64		; GFX11-NEXT: global_load_b128 v[16:19], v28, s[0:1] offset:64
Show All 29 Lines
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0		; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v32, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v32, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: global_load_dword v33, v[0:1], off		; GFX10-SCRATCH-NEXT: global_load_dword v33, v[0:1], off
; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_clause 0x7		; GFX10-SCRATCH-NEXT: s_clause 0x7
; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v32, s[0:1]		; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v32, s[0:1]
; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[4:7], v32, s[0:1] offset:16		; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[4:7], v32, s[0:1] offset:16
; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[8:11], v32, s[0:1] offset:32		; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[8:11], v32, s[0:1] offset:32
; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[12:15], v32, s[0:1] offset:48		; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[12:15], v32, s[0:1] offset:48
; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[16:19], v32, s[0:1] offset:64		; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[16:19], v32, s[0:1] offset:64
Show All 29 Lines

define amdgpu_gfx void @test_call_external_i32_func_i32_imm(i32 addrspace(1)* %out) #0 {		define amdgpu_gfx void @test_call_external_i32_func_i32_imm(i32 addrspace(1)* %out) #0 {
; GFX9-LABEL: test_call_external_i32_func_i32_imm:		; GFX9-LABEL: test_call_external_i32_func_i32_imm:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr42
; GFX9-NEXT: s_addk_i32 s32, 0x800		; GFX9-NEXT: s_addk_i32 s32, 0x800
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v42, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v41, v0		; GFX9-NEXT: v_mov_b32_e32 v40, v0
; GFX9-NEXT: v_mov_b32_e32 v0, 42		; GFX9-NEXT: v_mov_b32_e32 v0, 42
; GFX9-NEXT: v_writelane_b32 v43, s34, 0		; GFX9-NEXT: v_writelane_b32 v43, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v42, s31, 1
; GFX9-NEXT: v_mov_b32_e32 v42, v1		; GFX9-NEXT: v_mov_b32_e32 v41, v1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_i32_func_i32@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_i32_func_i32@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_i32_func_i32@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_i32_func_i32@rel32@hi+12
; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX9-NEXT: global_store_dword v[41:42], v0, off		; GFX9-NEXT: global_store_dword v[40:41], v0, off
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 ; 4-byte Folded Reload		; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload		; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
; GFX9-NEXT: v_readlane_b32 s31, v40, 1		; GFX9-NEXT: v_readlane_b32 s31, v42, 1
; GFX9-NEXT: v_readlane_b32 s30, v40, 0		; GFX9-NEXT: v_readlane_b32 s30, v42, 0
; GFX9-NEXT: v_readlane_b32 s34, v43, 0		; GFX9-NEXT: v_readlane_b32 s34, v43, 0
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload		; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
; GFX9-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload		; GFX9-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
; GFX9-NEXT: s_addk_i32 s32, 0xf800		; GFX9-NEXT: s_addk_i32 s32, 0xf800
; GFX9-NEXT: s_mov_b32 s33, s34		; GFX9-NEXT: s_mov_b32 s33, s34
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: s_setpc_b64 s[30:31]		; GFX9-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-LABEL: test_call_external_i32_func_i32_imm:		; GFX10-LABEL: test_call_external_i32_func_i32_imm:
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: ; implicit-def: $vgpr42
; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: v_mov_b32_e32 v41, v0		; GFX10-NEXT: v_writelane_b32 v42, s30, 0
		; GFX10-NEXT: v_mov_b32_e32 v40, v0
; GFX10-NEXT: v_mov_b32_e32 v0, 42		; GFX10-NEXT: v_mov_b32_e32 v0, 42
; GFX10-NEXT: s_addk_i32 s32, 0x400		; GFX10-NEXT: s_addk_i32 s32, 0x400
; GFX10-NEXT: v_writelane_b32 v43, s34, 0		; GFX10-NEXT: v_writelane_b32 v43, s34, 0
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v42, s31, 1
; GFX10-NEXT: v_mov_b32_e32 v42, v1		; GFX10-NEXT: v_mov_b32_e32 v41, v1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_i32_func_i32@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_i32_func_i32@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_i32_func_i32@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_i32_func_i32@rel32@hi+12
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: global_store_dword v[41:42], v0, off		; GFX10-NEXT: global_store_dword v[40:41], v0, off
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_clause 0x1		; GFX10-NEXT: s_clause 0x1
; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s33		; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33
; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4		; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v42, 1
; GFX10-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-NEXT: v_readlane_b32 s30, v42, 0
; GFX10-NEXT: v_readlane_b32 s34, v43, 0		; GFX10-NEXT: v_readlane_b32 s34, v43, 0
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: s_clause 0x1		; GFX10-NEXT: s_clause 0x1
; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:8		; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:8
; GFX10-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:12		; GFX10-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:12
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: s_addk_i32 s32, 0xfc00		; GFX10-NEXT: s_addk_i32 s32, 0xfc00
; GFX10-NEXT: s_mov_b32 s33, s34		; GFX10-NEXT: s_mov_b32 s33, s34
; GFX10-NEXT: s_waitcnt vmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0)
; GFX10-NEXT: s_setpc_b64 s[30:31]		; GFX10-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX11-LABEL: test_call_external_i32_func_i32_imm:		; GFX11-LABEL: test_call_external_i32_func_i32_imm:
; GFX11: ; %bb.0:		; GFX11: ; %bb.0:
; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33 offset:8		; GFX11-NEXT: scratch_store_b32 off, v42, s33 offset:8
; GFX11-NEXT: scratch_store_b32 off, v43, s33 offset:12		; GFX11-NEXT: scratch_store_b32 off, v43, s33 offset:12
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
		; GFX11-NEXT: ; implicit-def: $vgpr42
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v40, s33 offset:4
; GFX11-NEXT: scratch_store_b32 off, v42, s33		; GFX11-NEXT: scratch_store_b32 off, v41, s33
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: v_writelane_b32 v42, s30, 0
; GFX11-NEXT: v_dual_mov_b32 v42, v1 :: v_dual_mov_b32 v41, v0		; GFX11-NEXT: v_dual_mov_b32 v41, v1 :: v_dual_mov_b32 v40, v0
; GFX11-NEXT: v_mov_b32_e32 v0, 42		; GFX11-NEXT: v_mov_b32_e32 v0, 42
; GFX11-NEXT: s_add_i32 s32, s32, 32		; GFX11-NEXT: s_add_i32 s32, s32, 32
; GFX11-NEXT: v_writelane_b32 v43, s0, 0		; GFX11-NEXT: v_writelane_b32 v43, s0, 0
; GFX11-NEXT: v_writelane_b32 v40, s31, 1		; GFX11-NEXT: v_writelane_b32 v42, s31, 1
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_i32_func_i32@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_i32_func_i32@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_i32_func_i32@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_i32_func_i32@rel32@hi+12
; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)		; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX11-NEXT: global_store_b32 v[41:42], v0, off dlc		; GFX11-NEXT: global_store_b32 v[40:41], v0, off dlc
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_load_b32 v42, off, s33		; GFX11-NEXT: scratch_load_b32 v41, off, s33
; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4		; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:4
; GFX11-NEXT: v_readlane_b32 s31, v40, 1		; GFX11-NEXT: v_readlane_b32 s31, v42, 1
; GFX11-NEXT: v_readlane_b32 s30, v40, 0		; GFX11-NEXT: v_readlane_b32 s30, v42, 0
; GFX11-NEXT: v_readlane_b32 s0, v43, 0		; GFX11-NEXT: v_readlane_b32 s0, v43, 0
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:8		; GFX11-NEXT: scratch_load_b32 v42, off, s33 offset:8
; GFX11-NEXT: scratch_load_b32 v43, off, s33 offset:12		; GFX11-NEXT: scratch_load_b32 v43, off, s33 offset:12
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: s_addk_i32 s32, 0xffe0		; GFX11-NEXT: s_addk_i32 s32, 0xffe0
; GFX11-NEXT: s_mov_b32 s33, s0		; GFX11-NEXT: s_mov_b32 s33, s0
; GFX11-NEXT: s_waitcnt vmcnt(0)		; GFX11-NEXT: s_waitcnt vmcnt(0)
; GFX11-NEXT: s_setpc_b64 s[30:31]		; GFX11-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-SCRATCH-LABEL: test_call_external_i32_func_i32_imm:		; GFX10-SCRATCH-LABEL: test_call_external_i32_func_i32_imm:
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 offset:8 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v42, s33 offset:8 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v43, s33 offset:12 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v43, s33 offset:12 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr42
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v42, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v41, v0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v42, s30, 0
		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v40, v0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 42		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 42
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 32		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 32
; GFX10-SCRATCH-NEXT: v_writelane_b32 v43, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v43, s0, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v42, s31, 1
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v42, v1		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v41, v1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_i32_func_i32@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_i32_func_i32@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_i32_func_i32@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_i32_func_i32@rel32@hi+12
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: global_store_dword v[41:42], v0, off		; GFX10-SCRATCH-NEXT: global_store_dword v[40:41], v0, off
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_clause 0x1		; GFX10-SCRATCH-NEXT: s_clause 0x1
; GFX10-SCRATCH-NEXT: scratch_load_dword v42, off, s33		; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33
; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4		; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 offset:4
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v42, 1
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v42, 0
; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v43, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v43, 0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: s_clause 0x1		; GFX10-SCRATCH-NEXT: s_clause 0x1
; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 offset:8		; GFX10-SCRATCH-NEXT: scratch_load_dword v42, off, s33 offset:8
; GFX10-SCRATCH-NEXT: scratch_load_dword v43, off, s33 offset:12		; GFX10-SCRATCH-NEXT: scratch_load_dword v43, off, s33 offset:12
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: s_addk_i32 s32, 0xffe0		; GFX10-SCRATCH-NEXT: s_addk_i32 s32, 0xffe0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]		; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
%val = call amdgpu_gfx i32 @external_i32_func_i32(i32 42)		%val = call amdgpu_gfx i32 @external_i32_func_i32(i32 42)
Show All 9 Lines
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0		; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
; GFX9-NEXT: v_mov_b32_e32 v2, 0		; GFX9-NEXT: v_mov_b32_e32 v2, 0
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: global_load_ubyte v0, v2, s[34:35]		; GFX9-NEXT: global_load_ubyte v0, v2, s[34:35]
; GFX9-NEXT: global_load_dword v1, v2, s[34:35] offset:4		; GFX9-NEXT: global_load_dword v1, v2, s[34:35] offset:4
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_struct_i8_i32@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_struct_i8_i32@rel32@lo+4
Show All 20 Lines
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0		; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
; GFX10-NEXT: v_mov_b32_e32 v2, 0		; GFX10-NEXT: v_mov_b32_e32 v2, 0
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-NEXT: s_clause 0x1		; GFX10-NEXT: s_clause 0x1
; GFX10-NEXT: global_load_ubyte v0, v2, s[34:35]		; GFX10-NEXT: global_load_ubyte v0, v2, s[34:35]
; GFX10-NEXT: global_load_dword v1, v2, s[34:35] offset:4		; GFX10-NEXT: global_load_dword v1, v2, s[34:35] offset:4
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_struct_i8_i32@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_struct_i8_i32@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_struct_i8_i32@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_struct_i8_i32@rel32@hi+12
Show All 21 Lines
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0		; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
; GFX11-NEXT: v_mov_b32_e32 v1, 0		; GFX11-NEXT: v_mov_b32_e32 v1, 0
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: s_waitcnt lgkmcnt(0)		; GFX11-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: global_load_u8 v0, v1, s[0:1]		; GFX11-NEXT: global_load_u8 v0, v1, s[0:1]
; GFX11-NEXT: global_load_b32 v1, v1, s[0:1] offset:4		; GFX11-NEXT: global_load_b32 v1, v1, s[0:1] offset:4
; GFX11-NEXT: v_writelane_b32 v40, s31, 1		; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_struct_i8_i32@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_struct_i8_i32@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_struct_i8_i32@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_struct_i8_i32@rel32@hi+12
Show All 21 Lines
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0		; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_clause 0x1		; GFX10-SCRATCH-NEXT: s_clause 0x1
; GFX10-SCRATCH-NEXT: global_load_ubyte v0, v2, s[0:1]		; GFX10-SCRATCH-NEXT: global_load_ubyte v0, v2, s[0:1]
; GFX10-SCRATCH-NEXT: global_load_dword v1, v2, s[0:1] offset:4		; GFX10-SCRATCH-NEXT: global_load_dword v1, v2, s[0:1] offset:4
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_struct_i8_i32@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_struct_i8_i32@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_struct_i8_i32@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_struct_i8_i32@rel32@hi+12
Show All 23 Lines
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
; GFX9-NEXT: v_mov_b32_e32 v0, 3		; GFX9-NEXT: v_mov_b32_e32 v0, 3
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s33		; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s33
; GFX9-NEXT: v_mov_b32_e32 v0, 8		; GFX9-NEXT: v_mov_b32_e32 v0, 8
; GFX9-NEXT: s_addk_i32 s32, 0x800		; GFX9-NEXT: s_addk_i32 s32, 0x800
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:4		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:4
; GFX9-NEXT: v_lshrrev_b32_e64 v0, 6, s33		; GFX9-NEXT: v_lshrrev_b32_e64 v0, 6, s33
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
Show All 21 Lines
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_mov_b32_e32 v0, 3		; GFX10-NEXT: v_mov_b32_e32 v0, 3
; GFX10-NEXT: v_mov_b32_e32 v1, 8		; GFX10-NEXT: v_mov_b32_e32 v1, 8
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: s_addk_i32 s32, 0x400		; GFX10-NEXT: s_addk_i32 s32, 0x400
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s33		; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s33
; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:4		; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:4
; GFX10-NEXT: v_lshrrev_b32_e64 v0, 5, s33		; GFX10-NEXT: v_lshrrev_b32_e64 v0, 5, s33
; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_byval_struct_i8_i32@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_byval_struct_i8_i32@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_byval_struct_i8_i32@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_byval_struct_i8_i32@rel32@hi+12
		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-NEXT: v_readlane_b32 s30, v40, 0
; GFX10-NEXT: v_readlane_b32 s34, v41, 0		; GFX10-NEXT: v_readlane_b32 s34, v41, 0
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: s_clause 0x1		; GFX10-NEXT: s_clause 0x1
; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:8		; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:8
; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:12		; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:12
Show All 11 Lines
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33 offset:8		; GFX11-NEXT: scratch_store_b32 off, v40, s33 offset:8
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:12		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:12
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_dual_mov_b32 v0, 3 :: v_dual_mov_b32 v1, 8		; GFX11-NEXT: v_dual_mov_b32 v0, 3 :: v_dual_mov_b32 v1, 8
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: s_add_i32 s32, s32, 32		; GFX11-NEXT: s_add_i32 s32, s32, 32
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b8 off, v0, s33		; GFX11-NEXT: scratch_store_b8 off, v0, s33
; GFX11-NEXT: scratch_store_b32 off, v1, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v1, s33 offset:4
; GFX11-NEXT: v_mov_b32_e32 v0, s33		; GFX11-NEXT: v_mov_b32_e32 v0, s33
; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_byval_struct_i8_i32@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_byval_struct_i8_i32@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_byval_struct_i8_i32@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_byval_struct_i8_i32@rel32@hi+12
; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)		; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
; GFX11-NEXT: v_readlane_b32 s31, v40, 1		; GFX11-NEXT: v_readlane_b32 s31, v40, 1
; GFX11-NEXT: v_readlane_b32 s30, v40, 0		; GFX11-NEXT: v_readlane_b32 s30, v40, 0
; GFX11-NEXT: v_readlane_b32 s0, v41, 0		; GFX11-NEXT: v_readlane_b32 s0, v41, 0
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:8		; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:8
; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:12		; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:12
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
Show All 10 Lines
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 offset:8 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 offset:8 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:12 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:12 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 3		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 3
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 8		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 8
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 32		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 32
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s33		; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s33
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v1, s33 offset:4		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v1, s33 offset:4
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, s33		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, s33
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_byval_struct_i8_i32@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_byval_struct_i8_i32@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_byval_struct_i8_i32@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_byval_struct_i8_i32@rel32@hi+12
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: s_clause 0x1		; GFX10-SCRATCH-NEXT: s_clause 0x1
; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 offset:8		; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 offset:8
; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:12		; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:12
Show All 20 Lines
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:20 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:20 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
; GFX9-NEXT: v_mov_b32_e32 v0, 3		; GFX9-NEXT: v_mov_b32_e32 v0, 3
; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s33		; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s33
; GFX9-NEXT: v_mov_b32_e32 v0, 8		; GFX9-NEXT: v_mov_b32_e32 v0, 8
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:4		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:4
; GFX9-NEXT: v_lshrrev_b32_e64 v0, 6, s33		; GFX9-NEXT: v_lshrrev_b32_e64 v0, 6, s33
; GFX9-NEXT: s_addk_i32 s32, 0x800		; GFX9-NEXT: s_addk_i32 s32, 0x800
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_add_u32_e32 v0, 8, v0		; GFX9-NEXT: v_add_u32_e32 v0, 8, v0
; GFX9-NEXT: v_lshrrev_b32_e64 v1, 6, s33		; GFX9-NEXT: v_lshrrev_b32_e64 v1, 6, s33
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
Show All 28 Lines
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:20 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:20 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_mov_b32_e32 v0, 3		; GFX10-NEXT: v_mov_b32_e32 v0, 3
; GFX10-NEXT: v_mov_b32_e32 v1, 8		; GFX10-NEXT: v_mov_b32_e32 v1, 8
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: s_addk_i32 s32, 0x400		; GFX10-NEXT: s_addk_i32 s32, 0x400
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s33		; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s33
; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:4		; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:4
; GFX10-NEXT: v_lshrrev_b32_e64 v0, 5, s33		; GFX10-NEXT: v_lshrrev_b32_e64 v0, 5, s33
; GFX10-NEXT: v_lshrrev_b32_e64 v1, 5, s33		; GFX10-NEXT: v_lshrrev_b32_e64 v1, 5, s33
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@lo+4
Show All 35 Lines
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_dual_mov_b32 v0, 3 :: v_dual_mov_b32 v1, 8		; GFX11-NEXT: v_dual_mov_b32 v0, 3 :: v_dual_mov_b32 v1, 8
; GFX11-NEXT: s_add_i32 s32, s32, 32		; GFX11-NEXT: s_add_i32 s32, s32, 32
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@hi+12
; GFX11-NEXT: s_add_i32 vcc_lo, s33, 8		; GFX11-NEXT: s_add_i32 vcc_lo, s33, 8
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b8 off, v0, s33		; GFX11-NEXT: scratch_store_b8 off, v0, s33
; GFX11-NEXT: scratch_store_b32 off, v1, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v1, s33 offset:4
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: v_dual_mov_b32 v0, vcc_lo :: v_dual_mov_b32 v1, s33		; GFX11-NEXT: v_dual_mov_b32 v0, vcc_lo :: v_dual_mov_b32 v1, s33
; GFX11-NEXT: v_writelane_b32 v40, s31, 1		; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_load_u8 v0, off, s33 offset:8		; GFX11-NEXT: scratch_load_u8 v0, off, s33 offset:8
; GFX11-NEXT: scratch_load_b32 v1, off, s33 offset:12		; GFX11-NEXT: scratch_load_b32 v1, off, s33 offset:12
; GFX11-NEXT: v_readlane_b32 s31, v40, 1		; GFX11-NEXT: v_readlane_b32 s31, v40, 1
; GFX11-NEXT: v_readlane_b32 s30, v40, 0		; GFX11-NEXT: v_readlane_b32 s30, v40, 0
Show All 27 Lines
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 3		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 3
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 32		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 32
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 8		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 8
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@hi+12
; GFX10-SCRATCH-NEXT: s_add_i32 vcc_lo, s33, 8		; GFX10-SCRATCH-NEXT: s_add_i32 vcc_lo, s33, 8
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s33		; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s33
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v1, s33 offset:4		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v1, s33 offset:4
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, vcc_lo		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, vcc_lo
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, s33		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, s33
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: s_clause 0x1		; GFX10-SCRATCH-NEXT: s_clause 0x1
; GFX10-SCRATCH-NEXT: scratch_load_ubyte v0, off, s33 offset:8		; GFX10-SCRATCH-NEXT: scratch_load_ubyte v0, off, s33 offset:8
; GFX10-SCRATCH-NEXT: scratch_load_dword v1, off, s33 offset:12		; GFX10-SCRATCH-NEXT: scratch_load_dword v1, off, s33 offset:12
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
Show All 39 Lines
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0		; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
; GFX9-NEXT: v_mov_b32_e32 v0, 0		; GFX9-NEXT: v_mov_b32_e32 v0, 0
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: global_load_dwordx4 v[0:3], v0, s[34:35]		; GFX9-NEXT: global_load_dwordx4 v[0:3], v0, s[34:35]
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v16i8@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v16i8@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v16i8@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v16i8@rel32@hi+12
Show All 38 Lines
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0		; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
; GFX10-NEXT: v_mov_b32_e32 v0, 0		; GFX10-NEXT: v_mov_b32_e32 v0, 0
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-NEXT: global_load_dwordx4 v[0:3], v0, s[34:35]		; GFX10-NEXT: global_load_dwordx4 v[0:3], v0, s[34:35]
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v16i8@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v16i8@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v16i8@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v16i8@rel32@hi+12
; GFX10-NEXT: s_waitcnt vmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0)
Show All 39 Lines
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0		; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
; GFX11-NEXT: v_mov_b32_e32 v0, 0		; GFX11-NEXT: v_mov_b32_e32 v0, 0
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: v_writelane_b32 v40, s31, 1		; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_waitcnt lgkmcnt(0)		; GFX11-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-NEXT: global_load_b128 v[0:3], v0, s[0:1]		; GFX11-NEXT: global_load_b128 v[0:3], v0, s[0:1]
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v16i8@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v16i8@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v16i8@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v16i8@rel32@hi+12
; GFX11-NEXT: s_waitcnt vmcnt(0)		; GFX11-NEXT: s_waitcnt vmcnt(0)
; GFX11-NEXT: v_lshrrev_b32_e32 v16, 8, v0		; GFX11-NEXT: v_lshrrev_b32_e32 v16, 8, v0
Show All 35 Lines
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0		; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v0, s[0:1]		; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v0, s[0:1]
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v16i8@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v16i8@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v16i8@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v16i8@rel32@hi+12
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
Show All 34 Lines	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
call amdgpu_gfx void @external_void_func_v16i8(<16 x i8> %val)		call amdgpu_gfx void @external_void_func_v16i8(<16 x i8> %val)
ret void		ret void
}		}

define void @tail_call_byval_align16(<32 x i32> %val, double %tmp) #0 {		define void @tail_call_byval_align16(<32 x i32> %val, double %tmp) #0 {
; GFX9-LABEL: tail_call_byval_align16:		; GFX9-LABEL: tail_call_byval_align16:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s6, s33		; GFX9-NEXT: s_mov_b32 s8, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1		; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:24 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:24 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[4:5]		; GFX9-NEXT: s_mov_b64 exec, s[4:5]
; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s33 offset:20		; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s33 offset:20
; GFX9-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:16		; GFX9-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:16
; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s33		; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s33
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: s_addk_i32 s32, 0x800
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: v_writelane_b32 v40, s34, 2		; GFX9-NEXT: v_writelane_b32 v40, s34, 2
; GFX9-NEXT: v_writelane_b32 v40, s35, 3		; GFX9-NEXT: v_writelane_b32 v40, s35, 3
; GFX9-NEXT: v_writelane_b32 v40, s36, 4		; GFX9-NEXT: v_writelane_b32 v40, s36, 4
; GFX9-NEXT: v_writelane_b32 v40, s37, 5		; GFX9-NEXT: v_writelane_b32 v40, s37, 5
; GFX9-NEXT: v_writelane_b32 v40, s38, 6		; GFX9-NEXT: v_writelane_b32 v40, s38, 6
; GFX9-NEXT: v_writelane_b32 v40, s39, 7		; GFX9-NEXT: v_writelane_b32 v40, s39, 7
Show All 14 Lines
; GFX9-NEXT: v_writelane_b32 v40, s54, 22		; GFX9-NEXT: v_writelane_b32 v40, s54, 22
; GFX9-NEXT: v_writelane_b32 v40, s55, 23		; GFX9-NEXT: v_writelane_b32 v40, s55, 23
; GFX9-NEXT: v_writelane_b32 v40, s56, 24		; GFX9-NEXT: v_writelane_b32 v40, s56, 24
; GFX9-NEXT: v_writelane_b32 v40, s57, 25		; GFX9-NEXT: v_writelane_b32 v40, s57, 25
; GFX9-NEXT: v_writelane_b32 v40, s58, 26		; GFX9-NEXT: v_writelane_b32 v40, s58, 26
; GFX9-NEXT: v_writelane_b32 v40, s59, 27		; GFX9-NEXT: v_writelane_b32 v40, s59, 27
; GFX9-NEXT: v_writelane_b32 v40, s60, 28		; GFX9-NEXT: v_writelane_b32 v40, s60, 28
; GFX9-NEXT: v_writelane_b32 v40, s61, 29		; GFX9-NEXT: v_writelane_b32 v40, s61, 29
; GFX9-NEXT: s_addk_i32 s32, 0x800
; GFX9-NEXT: v_writelane_b32 v40, s62, 30		; GFX9-NEXT: v_writelane_b32 v40, s62, 30
; GFX9-NEXT: v_writelane_b32 v40, s63, 31		; GFX9-NEXT: v_writelane_b32 v40, s63, 31
; GFX9-NEXT: s_getpc_b64 s[4:5]		; GFX9-NEXT: s_getpc_b64 s[4:5]
; GFX9-NEXT: s_add_u32 s4, s4, byval_align16_f64_arg@rel32@lo+4		; GFX9-NEXT: s_add_u32 s4, s4, byval_align16_f64_arg@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s5, s5, byval_align16_f64_arg@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s5, s5, byval_align16_f64_arg@rel32@hi+12
; GFX9-NEXT: s_waitcnt vmcnt(2)		; GFX9-NEXT: s_waitcnt vmcnt(2)
; GFX9-NEXT: buffer_store_dword v32, off, s[0:3], s32 offset:4		; GFX9-NEXT: buffer_store_dword v32, off, s[0:3], s32 offset:4
; GFX9-NEXT: s_waitcnt vmcnt(2)		; GFX9-NEXT: s_waitcnt vmcnt(2)
Show All 30 Lines
; GFX9-NEXT: v_readlane_b32 s35, v40, 3		; GFX9-NEXT: v_readlane_b32 s35, v40, 3
; GFX9-NEXT: v_readlane_b32 s34, v40, 2		; GFX9-NEXT: v_readlane_b32 s34, v40, 2
; GFX9-NEXT: v_readlane_b32 s31, v40, 1		; GFX9-NEXT: v_readlane_b32 s31, v40, 1
; GFX9-NEXT: v_readlane_b32 s30, v40, 0		; GFX9-NEXT: v_readlane_b32 s30, v40, 0
; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1		; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:24 ; 4-byte Folded Reload		; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:24 ; 4-byte Folded Reload
; GFX9-NEXT: s_mov_b64 exec, s[4:5]		; GFX9-NEXT: s_mov_b64 exec, s[4:5]
; GFX9-NEXT: s_addk_i32 s32, 0xf800		; GFX9-NEXT: s_addk_i32 s32, 0xf800
; GFX9-NEXT: s_mov_b32 s33, s6		; GFX9-NEXT: s_mov_b32 s33, s8
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: s_setpc_b64 s[30:31]		; GFX9-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-LABEL: tail_call_byval_align16:		; GFX10-LABEL: tail_call_byval_align16:
; GFX10: ; %bb.0: ; %entry		; GFX10: ; %bb.0: ; %entry
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s6, s33		; GFX10-NEXT: s_mov_b32 s7, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s4, -1		; GFX10-NEXT: s_or_saveexec_b32 s4, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:24 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:24 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s4		; GFX10-NEXT: s_mov_b32 exec_lo, s4
; GFX10-NEXT: s_clause 0x2		; GFX10-NEXT: s_clause 0x2
; GFX10-NEXT: buffer_load_dword v32, off, s[0:3], s33 offset:20		; GFX10-NEXT: buffer_load_dword v32, off, s[0:3], s33 offset:20
; GFX10-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:16		; GFX10-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:16
; GFX10-NEXT: buffer_load_dword v31, off, s[0:3], s33		; GFX10-NEXT: buffer_load_dword v31, off, s[0:3], s33
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: s_addk_i32 s32, 0x400		; GFX10-NEXT: s_addk_i32 s32, 0x400
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: s_getpc_b64 s[4:5]		; GFX10-NEXT: s_getpc_b64 s[4:5]
; GFX10-NEXT: s_add_u32 s4, s4, byval_align16_f64_arg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s4, s4, byval_align16_f64_arg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s5, s5, byval_align16_f64_arg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s5, s5, byval_align16_f64_arg@rel32@hi+12
; GFX10-NEXT: s_waitcnt vmcnt(2)		; GFX10-NEXT: s_waitcnt vmcnt(2)
; GFX10-NEXT: buffer_store_dword v32, off, s[0:3], s32 offset:4		; GFX10-NEXT: buffer_store_dword v32, off, s[0:3], s32 offset:4
; GFX10-NEXT: s_waitcnt vmcnt(1)		; GFX10-NEXT: s_waitcnt vmcnt(1)
; GFX10-NEXT: buffer_store_dword v33, off, s[0:3], s32		; GFX10-NEXT: buffer_store_dword v33, off, s[0:3], s32
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines
; GFX10-NEXT: v_readlane_b32 s34, v40, 2		; GFX10-NEXT: v_readlane_b32 s34, v40, 2
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-NEXT: v_readlane_b32 s30, v40, 0
; GFX10-NEXT: s_or_saveexec_b32 s4, -1		; GFX10-NEXT: s_or_saveexec_b32 s4, -1
; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:24 ; 4-byte Folded Reload		; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:24 ; 4-byte Folded Reload
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s4		; GFX10-NEXT: s_mov_b32 exec_lo, s4
; GFX10-NEXT: s_addk_i32 s32, 0xfc00		; GFX10-NEXT: s_addk_i32 s32, 0xfc00
; GFX10-NEXT: s_mov_b32 s33, s6		; GFX10-NEXT: s_mov_b32 s33, s7
; GFX10-NEXT: s_waitcnt vmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0)
; GFX10-NEXT: s_setpc_b64 s[30:31]		; GFX10-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX11-LABEL: tail_call_byval_align16:		; GFX11-LABEL: tail_call_byval_align16:
; GFX11: ; %bb.0: ; %entry		; GFX11: ; %bb.0: ; %entry
; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s4, s33		; GFX11-NEXT: s_mov_b32 s5, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s0, -1		; GFX11-NEXT: s_or_saveexec_b32 s0, -1
; GFX11-NEXT: scratch_store_b32 off, v40, s33 offset:24 ; 4-byte Folded Spill		; GFX11-NEXT: scratch_store_b32 off, v40, s33 offset:24 ; 4-byte Folded Spill
; GFX11-NEXT: s_mov_b32 exec_lo, s0		; GFX11-NEXT: s_mov_b32 exec_lo, s0
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_load_b64 v[32:33], off, s33 offset:16		; GFX11-NEXT: scratch_load_b64 v[32:33], off, s33 offset:16
; GFX11-NEXT: scratch_load_b32 v31, off, s33		; GFX11-NEXT: scratch_load_b32 v31, off, s33
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: s_add_i32 s32, s32, 32		; GFX11-NEXT: s_add_i32 s32, s32, 32
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, byval_align16_f64_arg@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, byval_align16_f64_arg@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, byval_align16_f64_arg@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, byval_align16_f64_arg@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s31, 1		; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: v_writelane_b32 v40, s34, 2		; GFX11-NEXT: v_writelane_b32 v40, s34, 2
; GFX11-NEXT: v_writelane_b32 v40, s35, 3		; GFX11-NEXT: v_writelane_b32 v40, s35, 3
; GFX11-NEXT: v_writelane_b32 v40, s36, 4		; GFX11-NEXT: v_writelane_b32 v40, s36, 4
; GFX11-NEXT: v_writelane_b32 v40, s37, 5		; GFX11-NEXT: v_writelane_b32 v40, s37, 5
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines
; GFX11-NEXT: v_readlane_b32 s35, v40, 3		; GFX11-NEXT: v_readlane_b32 s35, v40, 3
; GFX11-NEXT: v_readlane_b32 s34, v40, 2		; GFX11-NEXT: v_readlane_b32 s34, v40, 2
; GFX11-NEXT: v_readlane_b32 s31, v40, 1		; GFX11-NEXT: v_readlane_b32 s31, v40, 1
; GFX11-NEXT: v_readlane_b32 s30, v40, 0		; GFX11-NEXT: v_readlane_b32 s30, v40, 0
; GFX11-NEXT: s_or_saveexec_b32 s0, -1		; GFX11-NEXT: s_or_saveexec_b32 s0, -1
; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:24 ; 4-byte Folded Reload		; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:24 ; 4-byte Folded Reload
; GFX11-NEXT: s_mov_b32 exec_lo, s0		; GFX11-NEXT: s_mov_b32 exec_lo, s0
; GFX11-NEXT: s_addk_i32 s32, 0xffe0		; GFX11-NEXT: s_addk_i32 s32, 0xffe0
; GFX11-NEXT: s_mov_b32 s33, s4		; GFX11-NEXT: s_mov_b32 s33, s5
; GFX11-NEXT: s_waitcnt vmcnt(0)		; GFX11-NEXT: s_waitcnt vmcnt(0)
; GFX11-NEXT: s_setpc_b64 s[30:31]		; GFX11-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-SCRATCH-LABEL: tail_call_byval_align16:		; GFX10-SCRATCH-LABEL: tail_call_byval_align16:
; GFX10-SCRATCH: ; %bb.0: ; %entry		; GFX10-SCRATCH: ; %bb.0: ; %entry
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s4, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s5, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 offset:24 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 offset:24 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: s_clause 0x1		; GFX10-SCRATCH-NEXT: s_clause 0x1
; GFX10-SCRATCH-NEXT: scratch_load_dwordx2 v[32:33], off, s33 offset:16		; GFX10-SCRATCH-NEXT: scratch_load_dwordx2 v[32:33], off, s33 offset:16
; GFX10-SCRATCH-NEXT: scratch_load_dword v31, off, s33		; GFX10-SCRATCH-NEXT: scratch_load_dword v31, off, s33
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 32		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 32
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, byval_align16_f64_arg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, byval_align16_f64_arg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, byval_align16_f64_arg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, byval_align16_f64_arg@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s34, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s34, 2
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s35, 3		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s35, 3
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s36, 4		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s36, 4
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s37, 5		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s37, 5
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
; GFX10-SCRATCH-NEXT: v_readlane_b32 s34, v40, 2		; GFX10-SCRATCH-NEXT: v_readlane_b32 s34, v40, 2
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 offset:24 ; 4-byte Folded Reload		; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 offset:24 ; 4-byte Folded Reload
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
; GFX10-SCRATCH-NEXT: s_addk_i32 s32, 0xffe0		; GFX10-SCRATCH-NEXT: s_addk_i32 s32, 0xffe0
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s4		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s5
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]		; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
entry:		entry:
%alloca = alloca double, align 8, addrspace(5)		%alloca = alloca double, align 8, addrspace(5)
tail call amdgpu_gfx void @byval_align16_f64_arg(<32 x i32> %val, double addrspace(5)* byval(double) align 16 %alloca)		tail call amdgpu_gfx void @byval_align16_f64_arg(<32 x i32> %val, double addrspace(5)* byval(double) align 16 %alloca)
ret void		ret void
}		}

; inreg arguments are put in sgprs		; inreg arguments are put in sgprs
define amdgpu_gfx void @test_call_external_void_func_i1_imm_inreg() #0 {		define amdgpu_gfx void @test_call_external_void_func_i1_imm_inreg() #0 {
; GFX9-LABEL: test_call_external_void_func_i1_imm_inreg:		; GFX9-LABEL: test_call_external_void_func_i1_imm_inreg:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_mov_b32_e32 v0, 1		; GFX9-NEXT: v_mov_b32_e32 v0, 1
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i1_inreg@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i1_inreg@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i1_inreg@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i1_inreg@rel32@hi+12
Show All 17 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v0, 1		; GFX10-NEXT: v_mov_b32_e32 v0, 1
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i1_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i1_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i1_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i1_inreg@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s32		; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s32
		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-NEXT: v_readlane_b32 s30, v40, 0
; GFX10-NEXT: v_readlane_b32 s34, v41, 0		; GFX10-NEXT: v_readlane_b32 s34, v41, 0
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: s_clause 0x1		; GFX10-NEXT: s_clause 0x1
; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33		; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33
; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4		; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
Show All 10 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: v_mov_b32_e32 v0, 1		; GFX11-NEXT: v_mov_b32_e32 v0, 1
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i1_inreg@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i1_inreg@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i1_inreg@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i1_inreg@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: scratch_store_b8 off, v0, s32		; GFX11-NEXT: scratch_store_b8 off, v0, s32
		; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
; GFX11-NEXT: v_readlane_b32 s31, v40, 1		; GFX11-NEXT: v_readlane_b32 s31, v40, 1
; GFX11-NEXT: v_readlane_b32 s30, v40, 0		; GFX11-NEXT: v_readlane_b32 s30, v40, 0
; GFX11-NEXT: v_readlane_b32 s0, v41, 0		; GFX11-NEXT: v_readlane_b32 s0, v41, 0
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_load_b32 v40, off, s33		; GFX11-NEXT: scratch_load_b32 v40, off, s33
; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4		; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: s_add_i32 s32, s32, -16		; GFX11-NEXT: s_add_i32 s32, s32, -16
; GFX11-NEXT: s_mov_b32 s33, s0		; GFX11-NEXT: s_mov_b32 s33, s0
; GFX11-NEXT: s_waitcnt vmcnt(0)		; GFX11-NEXT: s_waitcnt vmcnt(0)
; GFX11-NEXT: s_setpc_b64 s[30:31]		; GFX11-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-SCRATCH-LABEL: test_call_external_void_func_i1_imm_inreg:		; GFX10-SCRATCH-LABEL: test_call_external_void_func_i1_imm_inreg:
; GFX10-SCRATCH: ; %bb.0:		; GFX10-SCRATCH: ; %bb.0:
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i1_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i1_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i1_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i1_inreg@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s32		; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s32
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0		; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v41, 0
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: s_clause 0x1		; GFX10-SCRATCH-NEXT: s_clause 0x1
; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33		; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33
; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4		; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
Show All 12 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s30, 1		; GFX9-NEXT: v_writelane_b32 v40, s30, 1
; GFX9-NEXT: s_movk_i32 s4, 0x7b		; GFX9-NEXT: s_movk_i32 s4, 0x7b
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 2		; GFX9-NEXT: v_writelane_b32 v40, s31, 2
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i8_inreg@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i8_inreg@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i8_inreg@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i8_inreg@rel32@hi+12
; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
Show All 16 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
		; GFX10-NEXT: ; implicit-def: $vgpr40
		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_movk_i32 s4, 0x7b		; GFX10-NEXT: s_movk_i32 s4, 0x7b
; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i8_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i8_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i8_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i8_inreg@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s30, 1		; GFX10-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-NEXT: v_writelane_b32 v40, s31, 2		; GFX10-NEXT: v_writelane_b32 v40, s31, 2
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 2		; GFX10-NEXT: v_readlane_b32 s31, v40, 2
Show All 17 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
		; GFX11-NEXT: ; implicit-def: $vgpr40
		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v40, s4, 0		; GFX11-NEXT: v_writelane_b32 v40, s4, 0
; GFX11-NEXT: s_movk_i32 s4, 0x7b		; GFX11-NEXT: s_movk_i32 s4, 0x7b
; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i8_inreg@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i8_inreg@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i8_inreg@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i8_inreg@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s30, 1		; GFX11-NEXT: v_writelane_b32 v40, s30, 1
; GFX11-NEXT: v_writelane_b32 v40, s31, 2		; GFX11-NEXT: v_writelane_b32 v40, s31, 2
; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
Show All 17 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_movk_i32 s4, 0x7b		; GFX10-SCRATCH-NEXT: s_movk_i32 s4, 0x7b
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i8_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i8_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i8_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i8_inreg@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2
Show All 19 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s30, 1		; GFX9-NEXT: v_writelane_b32 v40, s30, 1
; GFX9-NEXT: s_movk_i32 s4, 0x7b		; GFX9-NEXT: s_movk_i32 s4, 0x7b
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 2		; GFX9-NEXT: v_writelane_b32 v40, s31, 2
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i16_inreg@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i16_inreg@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i16_inreg@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i16_inreg@rel32@hi+12
; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
Show All 16 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
		; GFX10-NEXT: ; implicit-def: $vgpr40
		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_movk_i32 s4, 0x7b		; GFX10-NEXT: s_movk_i32 s4, 0x7b
; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i16_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i16_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i16_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i16_inreg@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s30, 1		; GFX10-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-NEXT: v_writelane_b32 v40, s31, 2		; GFX10-NEXT: v_writelane_b32 v40, s31, 2
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 2		; GFX10-NEXT: v_readlane_b32 s31, v40, 2
Show All 17 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
		; GFX11-NEXT: ; implicit-def: $vgpr40
		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v40, s4, 0		; GFX11-NEXT: v_writelane_b32 v40, s4, 0
; GFX11-NEXT: s_movk_i32 s4, 0x7b		; GFX11-NEXT: s_movk_i32 s4, 0x7b
; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i16_inreg@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i16_inreg@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i16_inreg@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i16_inreg@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s30, 1		; GFX11-NEXT: v_writelane_b32 v40, s30, 1
; GFX11-NEXT: v_writelane_b32 v40, s31, 2		; GFX11-NEXT: v_writelane_b32 v40, s31, 2
; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
Show All 17 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_movk_i32 s4, 0x7b		; GFX10-SCRATCH-NEXT: s_movk_i32 s4, 0x7b
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i16_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i16_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i16_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i16_inreg@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2
Show All 19 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s30, 1		; GFX9-NEXT: v_writelane_b32 v40, s30, 1
; GFX9-NEXT: s_mov_b32 s4, 42		; GFX9-NEXT: s_mov_b32 s4, 42
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 2		; GFX9-NEXT: v_writelane_b32 v40, s31, 2
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i32_inreg@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i32_inreg@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i32_inreg@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i32_inreg@rel32@hi+12
; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
Show All 16 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
		; GFX10-NEXT: ; implicit-def: $vgpr40
		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_mov_b32 s4, 42		; GFX10-NEXT: s_mov_b32 s4, 42
; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i32_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i32_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i32_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i32_inreg@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s30, 1		; GFX10-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-NEXT: v_writelane_b32 v40, s31, 2		; GFX10-NEXT: v_writelane_b32 v40, s31, 2
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 2		; GFX10-NEXT: v_readlane_b32 s31, v40, 2
Show All 17 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
		; GFX11-NEXT: ; implicit-def: $vgpr40
		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v40, s4, 0		; GFX11-NEXT: v_writelane_b32 v40, s4, 0
; GFX11-NEXT: s_mov_b32 s4, 42		; GFX11-NEXT: s_mov_b32 s4, 42
; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i32_inreg@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i32_inreg@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i32_inreg@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i32_inreg@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s30, 1		; GFX11-NEXT: v_writelane_b32 v40, s30, 1
; GFX11-NEXT: v_writelane_b32 v40, s31, 2		; GFX11-NEXT: v_writelane_b32 v40, s31, 2
; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
Show All 17 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 42		; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 42
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i32_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i32_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i32_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i32_inreg@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2
Show All 19 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 2		; GFX9-NEXT: v_writelane_b32 v40, s30, 2
; GFX9-NEXT: s_movk_i32 s4, 0x7b		; GFX9-NEXT: s_movk_i32 s4, 0x7b
; GFX9-NEXT: s_mov_b32 s5, 0		; GFX9-NEXT: s_mov_b32 s5, 0
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 3		; GFX9-NEXT: v_writelane_b32 v40, s31, 3
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i64_inreg@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i64_inreg@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i64_inreg@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i64_inreg@rel32@hi+12
Show All 18 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
		; GFX10-NEXT: ; implicit-def: $vgpr40
		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_movk_i32 s4, 0x7b		; GFX10-NEXT: s_movk_i32 s4, 0x7b
; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i64_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i64_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i64_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i64_inreg@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: s_mov_b32 s5, 0		; GFX10-NEXT: s_mov_b32 s5, 0
; GFX10-NEXT: v_writelane_b32 v40, s30, 2		; GFX10-NEXT: v_writelane_b32 v40, s30, 2
; GFX10-NEXT: v_writelane_b32 v40, s31, 3		; GFX10-NEXT: v_writelane_b32 v40, s31, 3
Show All 20 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
		; GFX11-NEXT: ; implicit-def: $vgpr40
		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v40, s4, 0		; GFX11-NEXT: v_writelane_b32 v40, s4, 0
; GFX11-NEXT: s_movk_i32 s4, 0x7b		; GFX11-NEXT: s_movk_i32 s4, 0x7b
; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i64_inreg@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_i64_inreg@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i64_inreg@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_i64_inreg@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s5, 1		; GFX11-NEXT: v_writelane_b32 v40, s5, 1
; GFX11-NEXT: s_mov_b32 s5, 0		; GFX11-NEXT: s_mov_b32 s5, 0
; GFX11-NEXT: v_writelane_b32 v40, s30, 2		; GFX11-NEXT: v_writelane_b32 v40, s30, 2
; GFX11-NEXT: v_writelane_b32 v40, s31, 3		; GFX11-NEXT: v_writelane_b32 v40, s31, 3
Show All 20 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_movk_i32 s4, 0x7b		; GFX10-SCRATCH-NEXT: s_movk_i32 s4, 0x7b
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i64_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i64_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i64_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i64_inreg@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 0		; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3
Show All 22 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s6, 2		; GFX9-NEXT: v_writelane_b32 v40, s6, 2
; GFX9-NEXT: s_mov_b64 s[34:35], 0		; GFX9-NEXT: s_mov_b64 s[34:35], 0
; GFX9-NEXT: v_writelane_b32 v40, s7, 3		; GFX9-NEXT: v_writelane_b32 v40, s7, 3
; GFX9-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0		; GFX9-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 4		; GFX9-NEXT: v_writelane_b32 v40, s30, 4
; GFX9-NEXT: v_writelane_b32 v40, s31, 5		; GFX9-NEXT: v_writelane_b32 v40, s31, 5
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
Show All 22 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_mov_b64 s[34:35], 0		; GFX10-NEXT: s_mov_b64 s[34:35], 0
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-NEXT: v_writelane_b32 v40, s7, 3		; GFX10-NEXT: v_writelane_b32 v40, s7, 3
; GFX10-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0		; GFX10-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i64_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i64_inreg@rel32@lo+4
Show All 25 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_writelane_b32 v40, s4, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
		; GFX11-NEXT: v_writelane_b32 v40, s4, 0
; GFX11-NEXT: s_mov_b64 s[0:1], 0		; GFX11-NEXT: s_mov_b64 s[0:1], 0
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v40, s5, 1		; GFX11-NEXT: v_writelane_b32 v40, s5, 1
; GFX11-NEXT: v_writelane_b32 v40, s6, 2		; GFX11-NEXT: v_writelane_b32 v40, s6, 2
; GFX11-NEXT: v_writelane_b32 v40, s7, 3		; GFX11-NEXT: v_writelane_b32 v40, s7, 3
; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x0		; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i64_inreg@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i64_inreg@rel32@lo+4
Show All 25 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_mov_b64 s[0:1], 0		; GFX10-SCRATCH-NEXT: s_mov_b64 s[0:1], 0
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
; GFX10-SCRATCH-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x0		; GFX10-SCRATCH-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i64_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i64_inreg@rel32@lo+4
Show All 28 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: v_writelane_b32 v40, s6, 2		; GFX9-NEXT: v_writelane_b32 v40, s6, 2
; GFX9-NEXT: v_writelane_b32 v40, s7, 3		; GFX9-NEXT: v_writelane_b32 v40, s7, 3
; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 4		; GFX9-NEXT: v_writelane_b32 v40, s30, 4
; GFX9-NEXT: s_mov_b32 s4, 1		; GFX9-NEXT: s_mov_b32 s4, 1
; GFX9-NEXT: s_mov_b32 s5, 2		; GFX9-NEXT: s_mov_b32 s5, 2
; GFX9-NEXT: s_mov_b32 s6, 3		; GFX9-NEXT: s_mov_b32 s6, 3
; GFX9-NEXT: s_mov_b32 s7, 4		; GFX9-NEXT: s_mov_b32 s7, 4
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 5		; GFX9-NEXT: v_writelane_b32 v40, s31, 5
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
Show All 22 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
		; GFX10-NEXT: ; implicit-def: $vgpr40
		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_mov_b32 s4, 1		; GFX10-NEXT: s_mov_b32 s4, 1
; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i64_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i64_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64_inreg@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: s_mov_b32 s5, 2		; GFX10-NEXT: s_mov_b32 s5, 2
; GFX10-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-NEXT: s_mov_b32 s6, 3		; GFX10-NEXT: s_mov_b32 s6, 3
Show All 26 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
		; GFX11-NEXT: ; implicit-def: $vgpr40
		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v40, s4, 0		; GFX11-NEXT: v_writelane_b32 v40, s4, 0
; GFX11-NEXT: s_mov_b32 s4, 1		; GFX11-NEXT: s_mov_b32 s4, 1
; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i64_inreg@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i64_inreg@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64_inreg@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64_inreg@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s5, 1		; GFX11-NEXT: v_writelane_b32 v40, s5, 1
; GFX11-NEXT: s_mov_b32 s5, 2		; GFX11-NEXT: s_mov_b32 s5, 2
; GFX11-NEXT: v_writelane_b32 v40, s6, 2		; GFX11-NEXT: v_writelane_b32 v40, s6, 2
; GFX11-NEXT: s_mov_b32 s6, 3		; GFX11-NEXT: s_mov_b32 s6, 3
Show All 26 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1		; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i64_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i64_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64_inreg@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2		; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 3		; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 3
Show All 28 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s6, 2		; GFX9-NEXT: v_writelane_b32 v40, s6, 2
; GFX9-NEXT: s_mov_b64 s[34:35], 0		; GFX9-NEXT: s_mov_b64 s[34:35], 0
; GFX9-NEXT: v_writelane_b32 v40, s7, 3		; GFX9-NEXT: v_writelane_b32 v40, s7, 3
; GFX9-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0		; GFX9-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0
; GFX9-NEXT: v_writelane_b32 v40, s8, 4		; GFX9-NEXT: v_writelane_b32 v40, s8, 4
; GFX9-NEXT: v_writelane_b32 v40, s9, 5		; GFX9-NEXT: v_writelane_b32 v40, s9, 5
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 6		; GFX9-NEXT: v_writelane_b32 v40, s30, 6
Show All 28 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_mov_b64 s[34:35], 0		; GFX10-NEXT: s_mov_b64 s[34:35], 0
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-NEXT: v_writelane_b32 v40, s7, 3		; GFX10-NEXT: v_writelane_b32 v40, s7, 3
; GFX10-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0		; GFX10-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i64_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i64_inreg@rel32@lo+4
Show All 31 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_writelane_b32 v40, s4, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
		; GFX11-NEXT: v_writelane_b32 v40, s4, 0
; GFX11-NEXT: s_mov_b64 s[0:1], 0		; GFX11-NEXT: s_mov_b64 s[0:1], 0
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v40, s5, 1		; GFX11-NEXT: v_writelane_b32 v40, s5, 1
; GFX11-NEXT: v_writelane_b32 v40, s6, 2		; GFX11-NEXT: v_writelane_b32 v40, s6, 2
; GFX11-NEXT: v_writelane_b32 v40, s7, 3		; GFX11-NEXT: v_writelane_b32 v40, s7, 3
; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x0		; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i64_inreg@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i64_inreg@rel32@lo+4
Show All 31 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_mov_b64 s[0:1], 0		; GFX10-SCRATCH-NEXT: s_mov_b64 s[0:1], 0
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
; GFX10-SCRATCH-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x0		; GFX10-SCRATCH-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i64_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i64_inreg@rel32@lo+4
Show All 36 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: v_writelane_b32 v40, s6, 2		; GFX9-NEXT: v_writelane_b32 v40, s6, 2
; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s7, 3		; GFX9-NEXT: v_writelane_b32 v40, s7, 3
; GFX9-NEXT: s_mov_b64 s[34:35], 0		; GFX9-NEXT: s_mov_b64 s[34:35], 0
; GFX9-NEXT: v_writelane_b32 v40, s8, 4		; GFX9-NEXT: v_writelane_b32 v40, s8, 4
; GFX9-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0		; GFX9-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0
; GFX9-NEXT: v_writelane_b32 v40, s9, 5		; GFX9-NEXT: v_writelane_b32 v40, s9, 5
; GFX9-NEXT: v_writelane_b32 v40, s10, 6		; GFX9-NEXT: v_writelane_b32 v40, s10, 6
; GFX9-NEXT: v_writelane_b32 v40, s11, 7		; GFX9-NEXT: v_writelane_b32 v40, s11, 7
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
Show All 33 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_mov_b64 s[34:35], 0		; GFX10-NEXT: s_mov_b64 s[34:35], 0
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-NEXT: v_writelane_b32 v40, s7, 3		; GFX10-NEXT: v_writelane_b32 v40, s7, 3
; GFX10-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0		; GFX10-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i64_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i64_inreg@rel32@lo+4
Show All 37 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_writelane_b32 v40, s4, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
		; GFX11-NEXT: v_writelane_b32 v40, s4, 0
; GFX11-NEXT: s_mov_b64 s[0:1], 0		; GFX11-NEXT: s_mov_b64 s[0:1], 0
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v40, s5, 1		; GFX11-NEXT: v_writelane_b32 v40, s5, 1
; GFX11-NEXT: v_writelane_b32 v40, s6, 2		; GFX11-NEXT: v_writelane_b32 v40, s6, 2
; GFX11-NEXT: v_writelane_b32 v40, s7, 3		; GFX11-NEXT: v_writelane_b32 v40, s7, 3
; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x0		; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i64_inreg@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i64_inreg@rel32@lo+4
Show All 37 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_mov_b64 s[0:1], 0		; GFX10-SCRATCH-NEXT: s_mov_b64 s[0:1], 0
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
; GFX10-SCRATCH-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x0		; GFX10-SCRATCH-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i64_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i64_inreg@rel32@lo+4
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s30, 1		; GFX9-NEXT: v_writelane_b32 v40, s30, 1
; GFX9-NEXT: s_movk_i32 s4, 0x4400		; GFX9-NEXT: s_movk_i32 s4, 0x4400
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 2		; GFX9-NEXT: v_writelane_b32 v40, s31, 2
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f16_inreg@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f16_inreg@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f16_inreg@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f16_inreg@rel32@hi+12
; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
Show All 16 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
		; GFX10-NEXT: ; implicit-def: $vgpr40
		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_movk_i32 s4, 0x4400		; GFX10-NEXT: s_movk_i32 s4, 0x4400
; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f16_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f16_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f16_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f16_inreg@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s30, 1		; GFX10-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-NEXT: v_writelane_b32 v40, s31, 2		; GFX10-NEXT: v_writelane_b32 v40, s31, 2
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 2		; GFX10-NEXT: v_readlane_b32 s31, v40, 2
Show All 17 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
		; GFX11-NEXT: ; implicit-def: $vgpr40
		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v40, s4, 0		; GFX11-NEXT: v_writelane_b32 v40, s4, 0
; GFX11-NEXT: s_movk_i32 s4, 0x4400		; GFX11-NEXT: s_movk_i32 s4, 0x4400
; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_f16_inreg@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_f16_inreg@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_f16_inreg@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_f16_inreg@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s30, 1		; GFX11-NEXT: v_writelane_b32 v40, s30, 1
; GFX11-NEXT: v_writelane_b32 v40, s31, 2		; GFX11-NEXT: v_writelane_b32 v40, s31, 2
; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
Show All 17 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_movk_i32 s4, 0x4400		; GFX10-SCRATCH-NEXT: s_movk_i32 s4, 0x4400
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f16_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f16_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f16_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f16_inreg@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2
Show All 19 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s30, 1		; GFX9-NEXT: v_writelane_b32 v40, s30, 1
; GFX9-NEXT: s_mov_b32 s4, 4.0		; GFX9-NEXT: s_mov_b32 s4, 4.0
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 2		; GFX9-NEXT: v_writelane_b32 v40, s31, 2
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f32_inreg@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f32_inreg@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f32_inreg@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f32_inreg@rel32@hi+12
; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
Show All 16 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
		; GFX10-NEXT: ; implicit-def: $vgpr40
		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_mov_b32 s4, 4.0		; GFX10-NEXT: s_mov_b32 s4, 4.0
; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f32_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f32_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f32_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f32_inreg@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s30, 1		; GFX10-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-NEXT: v_writelane_b32 v40, s31, 2		; GFX10-NEXT: v_writelane_b32 v40, s31, 2
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 2		; GFX10-NEXT: v_readlane_b32 s31, v40, 2
Show All 17 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
		; GFX11-NEXT: ; implicit-def: $vgpr40
		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v40, s4, 0		; GFX11-NEXT: v_writelane_b32 v40, s4, 0
; GFX11-NEXT: s_mov_b32 s4, 4.0		; GFX11-NEXT: s_mov_b32 s4, 4.0
; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_f32_inreg@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_f32_inreg@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_f32_inreg@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_f32_inreg@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s30, 1		; GFX11-NEXT: v_writelane_b32 v40, s30, 1
; GFX11-NEXT: v_writelane_b32 v40, s31, 2		; GFX11-NEXT: v_writelane_b32 v40, s31, 2
; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
Show All 17 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 4.0		; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 4.0
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f32_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f32_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f32_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f32_inreg@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2
Show All 19 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 2		; GFX9-NEXT: v_writelane_b32 v40, s30, 2
; GFX9-NEXT: s_mov_b32 s4, 1.0		; GFX9-NEXT: s_mov_b32 s4, 1.0
; GFX9-NEXT: s_mov_b32 s5, 2.0		; GFX9-NEXT: s_mov_b32 s5, 2.0
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 3		; GFX9-NEXT: v_writelane_b32 v40, s31, 3
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2f32_inreg@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2f32_inreg@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2f32_inreg@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2f32_inreg@rel32@hi+12
Show All 18 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
		; GFX10-NEXT: ; implicit-def: $vgpr40
		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_mov_b32 s4, 1.0		; GFX10-NEXT: s_mov_b32 s4, 1.0
; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f32_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f32_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f32_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f32_inreg@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: s_mov_b32 s5, 2.0		; GFX10-NEXT: s_mov_b32 s5, 2.0
; GFX10-NEXT: v_writelane_b32 v40, s30, 2		; GFX10-NEXT: v_writelane_b32 v40, s30, 2
; GFX10-NEXT: v_writelane_b32 v40, s31, 3		; GFX10-NEXT: v_writelane_b32 v40, s31, 3
Show All 20 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
		; GFX11-NEXT: ; implicit-def: $vgpr40
		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v40, s4, 0		; GFX11-NEXT: v_writelane_b32 v40, s4, 0
; GFX11-NEXT: s_mov_b32 s4, 1.0		; GFX11-NEXT: s_mov_b32 s4, 1.0
; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2f32_inreg@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2f32_inreg@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2f32_inreg@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2f32_inreg@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s5, 1		; GFX11-NEXT: v_writelane_b32 v40, s5, 1
; GFX11-NEXT: s_mov_b32 s5, 2.0		; GFX11-NEXT: s_mov_b32 s5, 2.0
; GFX11-NEXT: v_writelane_b32 v40, s30, 2		; GFX11-NEXT: v_writelane_b32 v40, s30, 2
; GFX11-NEXT: v_writelane_b32 v40, s31, 3		; GFX11-NEXT: v_writelane_b32 v40, s31, 3
Show All 20 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1.0		; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1.0
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f32_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f32_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f32_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f32_inreg@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0		; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3
Show All 22 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: v_writelane_b32 v40, s6, 2		; GFX9-NEXT: v_writelane_b32 v40, s6, 2
; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 3		; GFX9-NEXT: v_writelane_b32 v40, s30, 3
; GFX9-NEXT: s_mov_b32 s4, 1.0		; GFX9-NEXT: s_mov_b32 s4, 1.0
; GFX9-NEXT: s_mov_b32 s5, 2.0		; GFX9-NEXT: s_mov_b32 s5, 2.0
; GFX9-NEXT: s_mov_b32 s6, 4.0		; GFX9-NEXT: s_mov_b32 s6, 4.0
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 4		; GFX9-NEXT: v_writelane_b32 v40, s31, 4
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f32_inreg@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f32_inreg@rel32@lo+4
Show All 20 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
		; GFX10-NEXT: ; implicit-def: $vgpr40
		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_mov_b32 s4, 1.0		; GFX10-NEXT: s_mov_b32 s4, 1.0
; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f32_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f32_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f32_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f32_inreg@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: s_mov_b32 s5, 2.0		; GFX10-NEXT: s_mov_b32 s5, 2.0
; GFX10-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-NEXT: s_mov_b32 s6, 4.0		; GFX10-NEXT: s_mov_b32 s6, 4.0
Show All 23 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
		; GFX11-NEXT: ; implicit-def: $vgpr40
		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v40, s4, 0		; GFX11-NEXT: v_writelane_b32 v40, s4, 0
; GFX11-NEXT: s_mov_b32 s4, 1.0		; GFX11-NEXT: s_mov_b32 s4, 1.0
; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3f32_inreg@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3f32_inreg@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3f32_inreg@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3f32_inreg@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s5, 1		; GFX11-NEXT: v_writelane_b32 v40, s5, 1
; GFX11-NEXT: s_mov_b32 s5, 2.0		; GFX11-NEXT: s_mov_b32 s5, 2.0
; GFX11-NEXT: v_writelane_b32 v40, s6, 2		; GFX11-NEXT: v_writelane_b32 v40, s6, 2
; GFX11-NEXT: s_mov_b32 s6, 4.0		; GFX11-NEXT: s_mov_b32 s6, 4.0
Show All 23 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1.0		; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1.0
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f32_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f32_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f32_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f32_inreg@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0		; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 4.0		; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 4.0
Show All 25 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: v_writelane_b32 v40, s6, 2		; GFX9-NEXT: v_writelane_b32 v40, s6, 2
; GFX9-NEXT: v_writelane_b32 v40, s7, 3		; GFX9-NEXT: v_writelane_b32 v40, s7, 3
; GFX9-NEXT: v_writelane_b32 v40, s8, 4		; GFX9-NEXT: v_writelane_b32 v40, s8, 4
; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 5		; GFX9-NEXT: v_writelane_b32 v40, s30, 5
; GFX9-NEXT: s_mov_b32 s4, 1.0		; GFX9-NEXT: s_mov_b32 s4, 1.0
; GFX9-NEXT: s_mov_b32 s5, 2.0		; GFX9-NEXT: s_mov_b32 s5, 2.0
; GFX9-NEXT: s_mov_b32 s6, 4.0		; GFX9-NEXT: s_mov_b32 s6, 4.0
; GFX9-NEXT: s_mov_b32 s7, -1.0		; GFX9-NEXT: s_mov_b32 s7, -1.0
; GFX9-NEXT: s_mov_b32 s8, 0.5		; GFX9-NEXT: s_mov_b32 s8, 0.5
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 6		; GFX9-NEXT: v_writelane_b32 v40, s31, 6
Show All 24 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
		; GFX10-NEXT: ; implicit-def: $vgpr40
		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_mov_b32 s4, 1.0		; GFX10-NEXT: s_mov_b32 s4, 1.0
; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v5f32_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v5f32_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v5f32_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v5f32_inreg@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: s_mov_b32 s5, 2.0		; GFX10-NEXT: s_mov_b32 s5, 2.0
; GFX10-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-NEXT: s_mov_b32 s6, 4.0		; GFX10-NEXT: s_mov_b32 s6, 4.0
Show All 29 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
		; GFX11-NEXT: ; implicit-def: $vgpr40
		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v40, s4, 0		; GFX11-NEXT: v_writelane_b32 v40, s4, 0
; GFX11-NEXT: s_mov_b32 s4, 1.0		; GFX11-NEXT: s_mov_b32 s4, 1.0
; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v5f32_inreg@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v5f32_inreg@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v5f32_inreg@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v5f32_inreg@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s5, 1		; GFX11-NEXT: v_writelane_b32 v40, s5, 1
; GFX11-NEXT: s_mov_b32 s5, 2.0		; GFX11-NEXT: s_mov_b32 s5, 2.0
; GFX11-NEXT: v_writelane_b32 v40, s6, 2		; GFX11-NEXT: v_writelane_b32 v40, s6, 2
; GFX11-NEXT: s_mov_b32 s6, 4.0		; GFX11-NEXT: s_mov_b32 s6, 4.0
Show All 29 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1.0		; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1.0
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v5f32_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v5f32_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v5f32_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v5f32_inreg@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0		; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 4.0		; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 4.0
Show All 31 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 2		; GFX9-NEXT: v_writelane_b32 v40, s30, 2
; GFX9-NEXT: s_mov_b32 s4, 0		; GFX9-NEXT: s_mov_b32 s4, 0
; GFX9-NEXT: s_mov_b32 s5, 0x40100000		; GFX9-NEXT: s_mov_b32 s5, 0x40100000
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 3		; GFX9-NEXT: v_writelane_b32 v40, s31, 3
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f64_inreg@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f64_inreg@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f64_inreg@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f64_inreg@rel32@hi+12
Show All 18 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
		; GFX10-NEXT: ; implicit-def: $vgpr40
		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_mov_b32 s4, 0		; GFX10-NEXT: s_mov_b32 s4, 0
; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f64_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f64_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f64_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f64_inreg@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: s_mov_b32 s5, 0x40100000		; GFX10-NEXT: s_mov_b32 s5, 0x40100000
; GFX10-NEXT: v_writelane_b32 v40, s30, 2		; GFX10-NEXT: v_writelane_b32 v40, s30, 2
; GFX10-NEXT: v_writelane_b32 v40, s31, 3		; GFX10-NEXT: v_writelane_b32 v40, s31, 3
Show All 20 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
		; GFX11-NEXT: ; implicit-def: $vgpr40
		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v40, s4, 0		; GFX11-NEXT: v_writelane_b32 v40, s4, 0
; GFX11-NEXT: s_mov_b32 s4, 0		; GFX11-NEXT: s_mov_b32 s4, 0
; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_f64_inreg@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_f64_inreg@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_f64_inreg@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_f64_inreg@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s5, 1		; GFX11-NEXT: v_writelane_b32 v40, s5, 1
; GFX11-NEXT: s_mov_b32 s5, 0x40100000		; GFX11-NEXT: s_mov_b32 s5, 0x40100000
; GFX11-NEXT: v_writelane_b32 v40, s30, 2		; GFX11-NEXT: v_writelane_b32 v40, s30, 2
; GFX11-NEXT: v_writelane_b32 v40, s31, 3		; GFX11-NEXT: v_writelane_b32 v40, s31, 3
Show All 20 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0		; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f64_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f64_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f64_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f64_inreg@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 0x40100000		; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 0x40100000
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3
Show All 22 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: v_writelane_b32 v40, s6, 2		; GFX9-NEXT: v_writelane_b32 v40, s6, 2
; GFX9-NEXT: v_writelane_b32 v40, s7, 3		; GFX9-NEXT: v_writelane_b32 v40, s7, 3
; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 4		; GFX9-NEXT: v_writelane_b32 v40, s30, 4
; GFX9-NEXT: s_mov_b32 s4, 0		; GFX9-NEXT: s_mov_b32 s4, 0
; GFX9-NEXT: s_mov_b32 s5, 2.0		; GFX9-NEXT: s_mov_b32 s5, 2.0
; GFX9-NEXT: s_mov_b32 s6, 0		; GFX9-NEXT: s_mov_b32 s6, 0
; GFX9-NEXT: s_mov_b32 s7, 0x40100000		; GFX9-NEXT: s_mov_b32 s7, 0x40100000
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 5		; GFX9-NEXT: v_writelane_b32 v40, s31, 5
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
Show All 22 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
		; GFX10-NEXT: ; implicit-def: $vgpr40
		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_mov_b32 s4, 0		; GFX10-NEXT: s_mov_b32 s4, 0
; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f64_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f64_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f64_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f64_inreg@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: s_mov_b32 s5, 2.0		; GFX10-NEXT: s_mov_b32 s5, 2.0
; GFX10-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-NEXT: s_mov_b32 s6, 0		; GFX10-NEXT: s_mov_b32 s6, 0
Show All 26 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
		; GFX11-NEXT: ; implicit-def: $vgpr40
		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v40, s4, 0		; GFX11-NEXT: v_writelane_b32 v40, s4, 0
; GFX11-NEXT: s_mov_b32 s4, 0		; GFX11-NEXT: s_mov_b32 s4, 0
; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2f64_inreg@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2f64_inreg@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2f64_inreg@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2f64_inreg@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s5, 1		; GFX11-NEXT: v_writelane_b32 v40, s5, 1
; GFX11-NEXT: s_mov_b32 s5, 2.0		; GFX11-NEXT: s_mov_b32 s5, 2.0
; GFX11-NEXT: v_writelane_b32 v40, s6, 2		; GFX11-NEXT: v_writelane_b32 v40, s6, 2
; GFX11-NEXT: s_mov_b32 s6, 0		; GFX11-NEXT: s_mov_b32 s6, 0
Show All 26 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0		; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f64_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f64_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f64_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f64_inreg@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0		; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 0		; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 0
Show All 28 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: v_writelane_b32 v40, s6, 2		; GFX9-NEXT: v_writelane_b32 v40, s6, 2
; GFX9-NEXT: v_writelane_b32 v40, s7, 3		; GFX9-NEXT: v_writelane_b32 v40, s7, 3
; GFX9-NEXT: v_writelane_b32 v40, s8, 4		; GFX9-NEXT: v_writelane_b32 v40, s8, 4
; GFX9-NEXT: v_writelane_b32 v40, s9, 5		; GFX9-NEXT: v_writelane_b32 v40, s9, 5
; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 6		; GFX9-NEXT: v_writelane_b32 v40, s30, 6
; GFX9-NEXT: s_mov_b32 s4, 0		; GFX9-NEXT: s_mov_b32 s4, 0
; GFX9-NEXT: s_mov_b32 s5, 2.0		; GFX9-NEXT: s_mov_b32 s5, 2.0
; GFX9-NEXT: s_mov_b32 s6, 0		; GFX9-NEXT: s_mov_b32 s6, 0
; GFX9-NEXT: s_mov_b32 s7, 0x40100000		; GFX9-NEXT: s_mov_b32 s7, 0x40100000
; GFX9-NEXT: s_mov_b32 s8, 0		; GFX9-NEXT: s_mov_b32 s8, 0
; GFX9-NEXT: s_mov_b32 s9, 0x40200000		; GFX9-NEXT: s_mov_b32 s9, 0x40200000
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
Show All 26 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
		; GFX10-NEXT: ; implicit-def: $vgpr40
		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_mov_b32 s4, 0		; GFX10-NEXT: s_mov_b32 s4, 0
; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f64_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f64_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f64_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f64_inreg@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: s_mov_b32 s5, 2.0		; GFX10-NEXT: s_mov_b32 s5, 2.0
; GFX10-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-NEXT: s_mov_b32 s6, 0		; GFX10-NEXT: s_mov_b32 s6, 0
Show All 32 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
		; GFX11-NEXT: ; implicit-def: $vgpr40
		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v40, s4, 0		; GFX11-NEXT: v_writelane_b32 v40, s4, 0
; GFX11-NEXT: s_mov_b32 s4, 0		; GFX11-NEXT: s_mov_b32 s4, 0
; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3f64_inreg@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3f64_inreg@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3f64_inreg@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3f64_inreg@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s5, 1		; GFX11-NEXT: v_writelane_b32 v40, s5, 1
; GFX11-NEXT: s_mov_b32 s5, 2.0		; GFX11-NEXT: s_mov_b32 s5, 2.0
; GFX11-NEXT: v_writelane_b32 v40, s6, 2		; GFX11-NEXT: v_writelane_b32 v40, s6, 2
; GFX11-NEXT: s_mov_b32 s6, 0		; GFX11-NEXT: s_mov_b32 s6, 0
Show All 32 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0		; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f64_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f64_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f64_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f64_inreg@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0		; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 0		; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 0
Show All 34 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: s_load_dword s4, s[34:35], 0x0		; GFX9-NEXT: s_load_dword s4, s[34:35], 0x0
; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 1		; GFX9-NEXT: v_writelane_b32 v40, s30, 1
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 2		; GFX9-NEXT: v_writelane_b32 v40, s31, 2
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i16_inreg@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i16_inreg@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i16_inreg@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i16_inreg@rel32@hi+12
; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX9-NEXT: v_readlane_b32 s31, v40, 2		; GFX9-NEXT: v_readlane_b32 s31, v40, 2
Show All 15 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
		; GFX10-NEXT: ; implicit-def: $vgpr40
		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_load_dword s4, s[34:35], 0x0		; GFX10-NEXT: s_load_dword s4, s[34:35], 0x0
; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i16_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i16_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i16_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i16_inreg@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s30, 1		; GFX10-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-NEXT: v_writelane_b32 v40, s31, 2		; GFX10-NEXT: v_writelane_b32 v40, s31, 2
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 2		; GFX10-NEXT: v_readlane_b32 s31, v40, 2
Show All 17 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
		; GFX11-NEXT: ; implicit-def: $vgpr40
		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v40, s4, 0		; GFX11-NEXT: v_writelane_b32 v40, s4, 0
; GFX11-NEXT: s_load_b32 s4, s[0:1], 0x0		; GFX11-NEXT: s_load_b32 s4, s[0:1], 0x0
; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i16_inreg@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i16_inreg@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i16_inreg@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i16_inreg@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s30, 1		; GFX11-NEXT: v_writelane_b32 v40, s30, 1
; GFX11-NEXT: v_writelane_b32 v40, s31, 2		; GFX11-NEXT: v_writelane_b32 v40, s31, 2
; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
Show All 17 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_load_dword s4, s[0:1], 0x0		; GFX10-SCRATCH-NEXT: s_load_dword s4, s[0:1], 0x0
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i16_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i16_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i16_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i16_inreg@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2
Show All 20 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0		; GFX9-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0
; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 2		; GFX9-NEXT: v_writelane_b32 v40, s30, 2
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 3		; GFX9-NEXT: v_writelane_b32 v40, s31, 3
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i16_inreg@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i16_inreg@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16_inreg@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16_inreg@rel32@hi+12
; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX9-NEXT: v_readlane_b32 s31, v40, 3		; GFX9-NEXT: v_readlane_b32 s31, v40, 3
Show All 16 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0		; GFX10-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i16_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i16_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16_inreg@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s30, 2		; GFX10-NEXT: v_writelane_b32 v40, s30, 2
; GFX10-NEXT: v_writelane_b32 v40, s31, 3		; GFX10-NEXT: v_writelane_b32 v40, s31, 3
Show All 20 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_writelane_b32 v40, s4, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
		; GFX11-NEXT: v_writelane_b32 v40, s4, 0
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: v_writelane_b32 v40, s5, 1		; GFX11-NEXT: v_writelane_b32 v40, s5, 1
; GFX11-NEXT: s_load_b64 s[4:5], s[0:1], 0x0		; GFX11-NEXT: s_load_b64 s[4:5], s[0:1], 0x0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i16_inreg@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i16_inreg@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16_inreg@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16_inreg@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s30, 2		; GFX11-NEXT: v_writelane_b32 v40, s30, 2
; GFX11-NEXT: v_writelane_b32 v40, s31, 3		; GFX11-NEXT: v_writelane_b32 v40, s31, 3
Show All 20 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0		; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i16_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i16_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16_inreg@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3
Show All 23 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0		; GFX9-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0
; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 2		; GFX9-NEXT: v_writelane_b32 v40, s30, 2
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 3		; GFX9-NEXT: v_writelane_b32 v40, s31, 3
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f16_inreg@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f16_inreg@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16_inreg@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16_inreg@rel32@hi+12
; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX9-NEXT: v_readlane_b32 s31, v40, 3		; GFX9-NEXT: v_readlane_b32 s31, v40, 3
Show All 16 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0		; GFX10-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f16_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f16_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16_inreg@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s30, 2		; GFX10-NEXT: v_writelane_b32 v40, s30, 2
; GFX10-NEXT: v_writelane_b32 v40, s31, 3		; GFX10-NEXT: v_writelane_b32 v40, s31, 3
Show All 20 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_writelane_b32 v40, s4, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
		; GFX11-NEXT: v_writelane_b32 v40, s4, 0
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: v_writelane_b32 v40, s5, 1		; GFX11-NEXT: v_writelane_b32 v40, s5, 1
; GFX11-NEXT: s_load_b64 s[4:5], s[0:1], 0x0		; GFX11-NEXT: s_load_b64 s[4:5], s[0:1], 0x0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3f16_inreg@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3f16_inreg@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16_inreg@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16_inreg@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s30, 2		; GFX11-NEXT: v_writelane_b32 v40, s30, 2
; GFX11-NEXT: v_writelane_b32 v40, s31, 3		; GFX11-NEXT: v_writelane_b32 v40, s31, 3
Show All 20 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0		; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f16_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f16_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16_inreg@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3
Show All 23 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 2		; GFX9-NEXT: v_writelane_b32 v40, s30, 2
; GFX9-NEXT: s_mov_b32 s4, 0x20001		; GFX9-NEXT: s_mov_b32 s4, 0x20001
; GFX9-NEXT: s_mov_b32 s5, 3		; GFX9-NEXT: s_mov_b32 s5, 3
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 3		; GFX9-NEXT: v_writelane_b32 v40, s31, 3
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i16_inreg@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i16_inreg@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16_inreg@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16_inreg@rel32@hi+12
Show All 18 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
		; GFX10-NEXT: ; implicit-def: $vgpr40
		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_mov_b32 s4, 0x20001		; GFX10-NEXT: s_mov_b32 s4, 0x20001
; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i16_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i16_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16_inreg@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: s_mov_b32 s5, 3		; GFX10-NEXT: s_mov_b32 s5, 3
; GFX10-NEXT: v_writelane_b32 v40, s30, 2		; GFX10-NEXT: v_writelane_b32 v40, s30, 2
; GFX10-NEXT: v_writelane_b32 v40, s31, 3		; GFX10-NEXT: v_writelane_b32 v40, s31, 3
Show All 20 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
		; GFX11-NEXT: ; implicit-def: $vgpr40
		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v40, s4, 0		; GFX11-NEXT: v_writelane_b32 v40, s4, 0
; GFX11-NEXT: s_mov_b32 s4, 0x20001		; GFX11-NEXT: s_mov_b32 s4, 0x20001
; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i16_inreg@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i16_inreg@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16_inreg@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16_inreg@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s5, 1		; GFX11-NEXT: v_writelane_b32 v40, s5, 1
; GFX11-NEXT: s_mov_b32 s5, 3		; GFX11-NEXT: s_mov_b32 s5, 3
; GFX11-NEXT: v_writelane_b32 v40, s30, 2		; GFX11-NEXT: v_writelane_b32 v40, s30, 2
; GFX11-NEXT: v_writelane_b32 v40, s31, 3		; GFX11-NEXT: v_writelane_b32 v40, s31, 3
Show All 20 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0x20001		; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0x20001
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i16_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i16_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16_inreg@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 3		; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 3
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3
Show All 22 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 2		; GFX9-NEXT: v_writelane_b32 v40, s30, 2
; GFX9-NEXT: s_mov_b32 s4, 0x40003c00		; GFX9-NEXT: s_mov_b32 s4, 0x40003c00
; GFX9-NEXT: s_movk_i32 s5, 0x4400		; GFX9-NEXT: s_movk_i32 s5, 0x4400
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 3		; GFX9-NEXT: v_writelane_b32 v40, s31, 3
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f16_inreg@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f16_inreg@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16_inreg@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16_inreg@rel32@hi+12
Show All 18 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
		; GFX10-NEXT: ; implicit-def: $vgpr40
		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_mov_b32 s4, 0x40003c00		; GFX10-NEXT: s_mov_b32 s4, 0x40003c00
; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f16_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f16_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16_inreg@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: s_movk_i32 s5, 0x4400		; GFX10-NEXT: s_movk_i32 s5, 0x4400
; GFX10-NEXT: v_writelane_b32 v40, s30, 2		; GFX10-NEXT: v_writelane_b32 v40, s30, 2
; GFX10-NEXT: v_writelane_b32 v40, s31, 3		; GFX10-NEXT: v_writelane_b32 v40, s31, 3
Show All 20 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
		; GFX11-NEXT: ; implicit-def: $vgpr40
		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v40, s4, 0		; GFX11-NEXT: v_writelane_b32 v40, s4, 0
; GFX11-NEXT: s_mov_b32 s4, 0x40003c00		; GFX11-NEXT: s_mov_b32 s4, 0x40003c00
; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3f16_inreg@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3f16_inreg@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16_inreg@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16_inreg@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s5, 1		; GFX11-NEXT: v_writelane_b32 v40, s5, 1
; GFX11-NEXT: s_movk_i32 s5, 0x4400		; GFX11-NEXT: s_movk_i32 s5, 0x4400
; GFX11-NEXT: v_writelane_b32 v40, s30, 2		; GFX11-NEXT: v_writelane_b32 v40, s30, 2
; GFX11-NEXT: v_writelane_b32 v40, s31, 3		; GFX11-NEXT: v_writelane_b32 v40, s31, 3
Show All 20 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0x40003c00		; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0x40003c00
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f16_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f16_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16_inreg@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: s_movk_i32 s5, 0x4400		; GFX10-SCRATCH-NEXT: s_movk_i32 s5, 0x4400
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3
Show All 22 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0		; GFX9-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0
; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 2		; GFX9-NEXT: v_writelane_b32 v40, s30, 2
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 3		; GFX9-NEXT: v_writelane_b32 v40, s31, 3
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i16_inreg@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i16_inreg@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16_inreg@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16_inreg@rel32@hi+12
; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX9-NEXT: v_readlane_b32 s31, v40, 3		; GFX9-NEXT: v_readlane_b32 s31, v40, 3
Show All 16 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0		; GFX10-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i16_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i16_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16_inreg@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s30, 2		; GFX10-NEXT: v_writelane_b32 v40, s30, 2
; GFX10-NEXT: v_writelane_b32 v40, s31, 3		; GFX10-NEXT: v_writelane_b32 v40, s31, 3
Show All 20 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_writelane_b32 v40, s4, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
		; GFX11-NEXT: v_writelane_b32 v40, s4, 0
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: v_writelane_b32 v40, s5, 1		; GFX11-NEXT: v_writelane_b32 v40, s5, 1
; GFX11-NEXT: s_load_b64 s[4:5], s[0:1], 0x0		; GFX11-NEXT: s_load_b64 s[4:5], s[0:1], 0x0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i16_inreg@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i16_inreg@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16_inreg@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16_inreg@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s30, 2		; GFX11-NEXT: v_writelane_b32 v40, s30, 2
; GFX11-NEXT: v_writelane_b32 v40, s31, 3		; GFX11-NEXT: v_writelane_b32 v40, s31, 3
Show All 20 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0		; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i16_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i16_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16_inreg@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3
Show All 23 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 2		; GFX9-NEXT: v_writelane_b32 v40, s30, 2
; GFX9-NEXT: s_mov_b32 s4, 0x20001		; GFX9-NEXT: s_mov_b32 s4, 0x20001
; GFX9-NEXT: s_mov_b32 s5, 0x40003		; GFX9-NEXT: s_mov_b32 s5, 0x40003
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 3		; GFX9-NEXT: v_writelane_b32 v40, s31, 3
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i16_inreg@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i16_inreg@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16_inreg@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16_inreg@rel32@hi+12
Show All 18 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
		; GFX10-NEXT: ; implicit-def: $vgpr40
		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_mov_b32 s4, 0x20001		; GFX10-NEXT: s_mov_b32 s4, 0x20001
; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i16_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i16_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16_inreg@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: s_mov_b32 s5, 0x40003		; GFX10-NEXT: s_mov_b32 s5, 0x40003
; GFX10-NEXT: v_writelane_b32 v40, s30, 2		; GFX10-NEXT: v_writelane_b32 v40, s30, 2
; GFX10-NEXT: v_writelane_b32 v40, s31, 3		; GFX10-NEXT: v_writelane_b32 v40, s31, 3
Show All 20 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
		; GFX11-NEXT: ; implicit-def: $vgpr40
		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v40, s4, 0		; GFX11-NEXT: v_writelane_b32 v40, s4, 0
; GFX11-NEXT: s_mov_b32 s4, 0x20001		; GFX11-NEXT: s_mov_b32 s4, 0x20001
; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i16_inreg@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i16_inreg@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16_inreg@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16_inreg@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s5, 1		; GFX11-NEXT: v_writelane_b32 v40, s5, 1
; GFX11-NEXT: s_mov_b32 s5, 0x40003		; GFX11-NEXT: s_mov_b32 s5, 0x40003
; GFX11-NEXT: v_writelane_b32 v40, s30, 2		; GFX11-NEXT: v_writelane_b32 v40, s30, 2
; GFX11-NEXT: v_writelane_b32 v40, s31, 3		; GFX11-NEXT: v_writelane_b32 v40, s31, 3
Show All 20 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0x20001		; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0x20001
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i16_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i16_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16_inreg@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 0x40003		; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 0x40003
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3
Show All 22 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: s_load_dword s4, s[34:35], 0x0		; GFX9-NEXT: s_load_dword s4, s[34:35], 0x0
; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 1		; GFX9-NEXT: v_writelane_b32 v40, s30, 1
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 2		; GFX9-NEXT: v_writelane_b32 v40, s31, 2
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2f16_inreg@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2f16_inreg@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2f16_inreg@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2f16_inreg@rel32@hi+12
; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX9-NEXT: v_readlane_b32 s31, v40, 2		; GFX9-NEXT: v_readlane_b32 s31, v40, 2
Show All 15 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
		; GFX10-NEXT: ; implicit-def: $vgpr40
		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_load_dword s4, s[34:35], 0x0		; GFX10-NEXT: s_load_dword s4, s[34:35], 0x0
; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f16_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f16_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f16_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f16_inreg@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s30, 1		; GFX10-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-NEXT: v_writelane_b32 v40, s31, 2		; GFX10-NEXT: v_writelane_b32 v40, s31, 2
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 2		; GFX10-NEXT: v_readlane_b32 s31, v40, 2
Show All 17 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
		; GFX11-NEXT: ; implicit-def: $vgpr40
		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v40, s4, 0		; GFX11-NEXT: v_writelane_b32 v40, s4, 0
; GFX11-NEXT: s_load_b32 s4, s[0:1], 0x0		; GFX11-NEXT: s_load_b32 s4, s[0:1], 0x0
; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2f16_inreg@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2f16_inreg@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2f16_inreg@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2f16_inreg@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s30, 1		; GFX11-NEXT: v_writelane_b32 v40, s30, 1
; GFX11-NEXT: v_writelane_b32 v40, s31, 2		; GFX11-NEXT: v_writelane_b32 v40, s31, 2
; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
Show All 17 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_load_dword s4, s[0:1], 0x0		; GFX10-SCRATCH-NEXT: s_load_dword s4, s[0:1], 0x0
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f16_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f16_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f16_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f16_inreg@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2		; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2
Show All 20 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0		; GFX9-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0
; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 2		; GFX9-NEXT: v_writelane_b32 v40, s30, 2
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 3		; GFX9-NEXT: v_writelane_b32 v40, s31, 3
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i32_inreg@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i32_inreg@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32_inreg@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32_inreg@rel32@hi+12
; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX9-NEXT: v_readlane_b32 s31, v40, 3		; GFX9-NEXT: v_readlane_b32 s31, v40, 3
Show All 16 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0		; GFX10-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i32_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i32_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32_inreg@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s30, 2		; GFX10-NEXT: v_writelane_b32 v40, s30, 2
; GFX10-NEXT: v_writelane_b32 v40, s31, 3		; GFX10-NEXT: v_writelane_b32 v40, s31, 3
Show All 20 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_writelane_b32 v40, s4, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
		; GFX11-NEXT: v_writelane_b32 v40, s4, 0
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: v_writelane_b32 v40, s5, 1		; GFX11-NEXT: v_writelane_b32 v40, s5, 1
; GFX11-NEXT: s_load_b64 s[4:5], s[0:1], 0x0		; GFX11-NEXT: s_load_b64 s[4:5], s[0:1], 0x0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i32_inreg@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i32_inreg@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32_inreg@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32_inreg@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s30, 2		; GFX11-NEXT: v_writelane_b32 v40, s30, 2
; GFX11-NEXT: v_writelane_b32 v40, s31, 3		; GFX11-NEXT: v_writelane_b32 v40, s31, 3
Show All 20 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0		; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i32_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i32_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32_inreg@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3
Show All 23 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 2		; GFX9-NEXT: v_writelane_b32 v40, s30, 2
; GFX9-NEXT: s_mov_b32 s4, 1		; GFX9-NEXT: s_mov_b32 s4, 1
; GFX9-NEXT: s_mov_b32 s5, 2		; GFX9-NEXT: s_mov_b32 s5, 2
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 3		; GFX9-NEXT: v_writelane_b32 v40, s31, 3
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i32_inreg@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i32_inreg@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32_inreg@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32_inreg@rel32@hi+12
Show All 18 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
		; GFX10-NEXT: ; implicit-def: $vgpr40
		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_mov_b32 s4, 1		; GFX10-NEXT: s_mov_b32 s4, 1
; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i32_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i32_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32_inreg@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: s_mov_b32 s5, 2		; GFX10-NEXT: s_mov_b32 s5, 2
; GFX10-NEXT: v_writelane_b32 v40, s30, 2		; GFX10-NEXT: v_writelane_b32 v40, s30, 2
; GFX10-NEXT: v_writelane_b32 v40, s31, 3		; GFX10-NEXT: v_writelane_b32 v40, s31, 3
Show All 20 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
		; GFX11-NEXT: ; implicit-def: $vgpr40
		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v40, s4, 0		; GFX11-NEXT: v_writelane_b32 v40, s4, 0
; GFX11-NEXT: s_mov_b32 s4, 1		; GFX11-NEXT: s_mov_b32 s4, 1
; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i32_inreg@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v2i32_inreg@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32_inreg@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32_inreg@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s5, 1		; GFX11-NEXT: v_writelane_b32 v40, s5, 1
; GFX11-NEXT: s_mov_b32 s5, 2		; GFX11-NEXT: s_mov_b32 s5, 2
; GFX11-NEXT: v_writelane_b32 v40, s30, 2		; GFX11-NEXT: v_writelane_b32 v40, s30, 2
; GFX11-NEXT: v_writelane_b32 v40, s31, 3		; GFX11-NEXT: v_writelane_b32 v40, s31, 3
Show All 20 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1		; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i32_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i32_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32_inreg@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2		; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3
Show All 22 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: v_writelane_b32 v40, s6, 2		; GFX9-NEXT: v_writelane_b32 v40, s6, 2
; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 3		; GFX9-NEXT: v_writelane_b32 v40, s30, 3
; GFX9-NEXT: s_mov_b32 s4, 3		; GFX9-NEXT: s_mov_b32 s4, 3
; GFX9-NEXT: s_mov_b32 s5, 4		; GFX9-NEXT: s_mov_b32 s5, 4
; GFX9-NEXT: s_mov_b32 s6, 5		; GFX9-NEXT: s_mov_b32 s6, 5
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 4		; GFX9-NEXT: v_writelane_b32 v40, s31, 4
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i32_inreg@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i32_inreg@rel32@lo+4
Show All 20 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
		; GFX10-NEXT: ; implicit-def: $vgpr40
		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_mov_b32 s4, 3		; GFX10-NEXT: s_mov_b32 s4, 3
; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i32_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i32_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32_inreg@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: s_mov_b32 s5, 4		; GFX10-NEXT: s_mov_b32 s5, 4
; GFX10-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-NEXT: s_mov_b32 s6, 5		; GFX10-NEXT: s_mov_b32 s6, 5
Show All 23 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
		; GFX11-NEXT: ; implicit-def: $vgpr40
		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v40, s4, 0		; GFX11-NEXT: v_writelane_b32 v40, s4, 0
; GFX11-NEXT: s_mov_b32 s4, 3		; GFX11-NEXT: s_mov_b32 s4, 3
; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i32_inreg@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i32_inreg@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32_inreg@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32_inreg@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s5, 1		; GFX11-NEXT: v_writelane_b32 v40, s5, 1
; GFX11-NEXT: s_mov_b32 s5, 4		; GFX11-NEXT: s_mov_b32 s5, 4
; GFX11-NEXT: v_writelane_b32 v40, s6, 2		; GFX11-NEXT: v_writelane_b32 v40, s6, 2
; GFX11-NEXT: s_mov_b32 s6, 5		; GFX11-NEXT: s_mov_b32 s6, 5
Show All 23 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 3		; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 3
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i32_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i32_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32_inreg@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 4		; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 4
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 5		; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 5
Show All 25 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: v_writelane_b32 v40, s6, 2		; GFX9-NEXT: v_writelane_b32 v40, s6, 2
; GFX9-NEXT: v_writelane_b32 v40, s7, 3		; GFX9-NEXT: v_writelane_b32 v40, s7, 3
; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 4		; GFX9-NEXT: v_writelane_b32 v40, s30, 4
; GFX9-NEXT: s_mov_b32 s4, 3		; GFX9-NEXT: s_mov_b32 s4, 3
; GFX9-NEXT: s_mov_b32 s5, 4		; GFX9-NEXT: s_mov_b32 s5, 4
; GFX9-NEXT: s_mov_b32 s6, 5		; GFX9-NEXT: s_mov_b32 s6, 5
; GFX9-NEXT: s_mov_b32 s7, 6		; GFX9-NEXT: s_mov_b32 s7, 6
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 5		; GFX9-NEXT: v_writelane_b32 v40, s31, 5
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
Show All 22 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
		; GFX10-NEXT: ; implicit-def: $vgpr40
		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_mov_b32 s4, 3		; GFX10-NEXT: s_mov_b32 s4, 3
; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i32_i32_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i32_i32_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32_i32_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32_i32_inreg@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: s_mov_b32 s5, 4		; GFX10-NEXT: s_mov_b32 s5, 4
; GFX10-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-NEXT: s_mov_b32 s6, 5		; GFX10-NEXT: s_mov_b32 s6, 5
Show All 26 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
		; GFX11-NEXT: ; implicit-def: $vgpr40
		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v40, s4, 0		; GFX11-NEXT: v_writelane_b32 v40, s4, 0
; GFX11-NEXT: s_mov_b32 s4, 3		; GFX11-NEXT: s_mov_b32 s4, 3
; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i32_i32_inreg@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v3i32_i32_inreg@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32_i32_inreg@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32_i32_inreg@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s5, 1		; GFX11-NEXT: v_writelane_b32 v40, s5, 1
; GFX11-NEXT: s_mov_b32 s5, 4		; GFX11-NEXT: s_mov_b32 s5, 4
; GFX11-NEXT: v_writelane_b32 v40, s6, 2		; GFX11-NEXT: v_writelane_b32 v40, s6, 2
; GFX11-NEXT: s_mov_b32 s6, 5		; GFX11-NEXT: s_mov_b32 s6, 5
Show All 26 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 3		; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 3
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i32_i32_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i32_i32_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32_i32_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32_i32_inreg@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 4		; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 4
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 5		; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 5
Show All 28 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: v_writelane_b32 v40, s6, 2		; GFX9-NEXT: v_writelane_b32 v40, s6, 2
; GFX9-NEXT: v_writelane_b32 v40, s7, 3		; GFX9-NEXT: v_writelane_b32 v40, s7, 3
; GFX9-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0		; GFX9-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0
; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 4		; GFX9-NEXT: v_writelane_b32 v40, s30, 4
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 5		; GFX9-NEXT: v_writelane_b32 v40, s31, 5
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i32_inreg@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i32_inreg@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32_inreg@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32_inreg@rel32@hi+12
; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX9-NEXT: v_readlane_b32 s31, v40, 5		; GFX9-NEXT: v_readlane_b32 s31, v40, 5
Show All 18 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-NEXT: v_writelane_b32 v40, s7, 3		; GFX10-NEXT: v_writelane_b32 v40, s7, 3
; GFX10-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0		; GFX10-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i32_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i32_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32_inreg@rel32@hi+12
Show All 24 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_writelane_b32 v40, s4, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
		; GFX11-NEXT: v_writelane_b32 v40, s4, 0
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: v_writelane_b32 v40, s5, 1		; GFX11-NEXT: v_writelane_b32 v40, s5, 1
; GFX11-NEXT: v_writelane_b32 v40, s6, 2		; GFX11-NEXT: v_writelane_b32 v40, s6, 2
; GFX11-NEXT: v_writelane_b32 v40, s7, 3		; GFX11-NEXT: v_writelane_b32 v40, s7, 3
; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x0		; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i32_inreg@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i32_inreg@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32_inreg@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32_inreg@rel32@hi+12
Show All 24 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
; GFX10-SCRATCH-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x0		; GFX10-SCRATCH-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i32_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i32_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32_inreg@rel32@hi+12
Show All 27 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: v_writelane_b32 v40, s6, 2		; GFX9-NEXT: v_writelane_b32 v40, s6, 2
; GFX9-NEXT: v_writelane_b32 v40, s7, 3		; GFX9-NEXT: v_writelane_b32 v40, s7, 3
; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 4		; GFX9-NEXT: v_writelane_b32 v40, s30, 4
; GFX9-NEXT: s_mov_b32 s4, 1		; GFX9-NEXT: s_mov_b32 s4, 1
; GFX9-NEXT: s_mov_b32 s5, 2		; GFX9-NEXT: s_mov_b32 s5, 2
; GFX9-NEXT: s_mov_b32 s6, 3		; GFX9-NEXT: s_mov_b32 s6, 3
; GFX9-NEXT: s_mov_b32 s7, 4		; GFX9-NEXT: s_mov_b32 s7, 4
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 5		; GFX9-NEXT: v_writelane_b32 v40, s31, 5
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
Show All 22 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
		; GFX10-NEXT: ; implicit-def: $vgpr40
		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_mov_b32 s4, 1		; GFX10-NEXT: s_mov_b32 s4, 1
; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i32_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i32_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32_inreg@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: s_mov_b32 s5, 2		; GFX10-NEXT: s_mov_b32 s5, 2
; GFX10-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-NEXT: s_mov_b32 s6, 3		; GFX10-NEXT: s_mov_b32 s6, 3
Show All 26 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
		; GFX11-NEXT: ; implicit-def: $vgpr40
		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v40, s4, 0		; GFX11-NEXT: v_writelane_b32 v40, s4, 0
; GFX11-NEXT: s_mov_b32 s4, 1		; GFX11-NEXT: s_mov_b32 s4, 1
; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i32_inreg@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v4i32_inreg@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32_inreg@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32_inreg@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s5, 1		; GFX11-NEXT: v_writelane_b32 v40, s5, 1
; GFX11-NEXT: s_mov_b32 s5, 2		; GFX11-NEXT: s_mov_b32 s5, 2
; GFX11-NEXT: v_writelane_b32 v40, s6, 2		; GFX11-NEXT: v_writelane_b32 v40, s6, 2
; GFX11-NEXT: s_mov_b32 s6, 3		; GFX11-NEXT: s_mov_b32 s6, 3
Show All 26 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1		; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i32_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i32_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32_inreg@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2		; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 3		; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 3
Show All 28 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: v_writelane_b32 v40, s6, 2		; GFX9-NEXT: v_writelane_b32 v40, s6, 2
; GFX9-NEXT: v_writelane_b32 v40, s7, 3		; GFX9-NEXT: v_writelane_b32 v40, s7, 3
; GFX9-NEXT: v_writelane_b32 v40, s8, 4		; GFX9-NEXT: v_writelane_b32 v40, s8, 4
; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 5		; GFX9-NEXT: v_writelane_b32 v40, s30, 5
; GFX9-NEXT: s_mov_b32 s4, 1		; GFX9-NEXT: s_mov_b32 s4, 1
; GFX9-NEXT: s_mov_b32 s5, 2		; GFX9-NEXT: s_mov_b32 s5, 2
; GFX9-NEXT: s_mov_b32 s6, 3		; GFX9-NEXT: s_mov_b32 s6, 3
; GFX9-NEXT: s_mov_b32 s7, 4		; GFX9-NEXT: s_mov_b32 s7, 4
; GFX9-NEXT: s_mov_b32 s8, 5		; GFX9-NEXT: s_mov_b32 s8, 5
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 6		; GFX9-NEXT: v_writelane_b32 v40, s31, 6
Show All 24 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
		; GFX10-NEXT: ; implicit-def: $vgpr40
		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_mov_b32 s4, 1		; GFX10-NEXT: s_mov_b32 s4, 1
; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v5i32_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v5i32_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v5i32_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v5i32_inreg@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: s_mov_b32 s5, 2		; GFX10-NEXT: s_mov_b32 s5, 2
; GFX10-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-NEXT: s_mov_b32 s6, 3		; GFX10-NEXT: s_mov_b32 s6, 3
Show All 29 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
		; GFX11-NEXT: ; implicit-def: $vgpr40
		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v40, s4, 0		; GFX11-NEXT: v_writelane_b32 v40, s4, 0
; GFX11-NEXT: s_mov_b32 s4, 1		; GFX11-NEXT: s_mov_b32 s4, 1
; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v5i32_inreg@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v5i32_inreg@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v5i32_inreg@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v5i32_inreg@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s5, 1		; GFX11-NEXT: v_writelane_b32 v40, s5, 1
; GFX11-NEXT: s_mov_b32 s5, 2		; GFX11-NEXT: s_mov_b32 s5, 2
; GFX11-NEXT: v_writelane_b32 v40, s6, 2		; GFX11-NEXT: v_writelane_b32 v40, s6, 2
; GFX11-NEXT: s_mov_b32 s6, 3		; GFX11-NEXT: s_mov_b32 s6, 3
Show All 29 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1		; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v5i32_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v5i32_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v5i32_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v5i32_inreg@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2		; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 3		; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 3
Show All 31 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s6, 2		; GFX9-NEXT: v_writelane_b32 v40, s6, 2
; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0		; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
; GFX9-NEXT: v_writelane_b32 v40, s7, 3		; GFX9-NEXT: v_writelane_b32 v40, s7, 3
; GFX9-NEXT: v_writelane_b32 v40, s8, 4		; GFX9-NEXT: v_writelane_b32 v40, s8, 4
; GFX9-NEXT: v_writelane_b32 v40, s9, 5		; GFX9-NEXT: v_writelane_b32 v40, s9, 5
; GFX9-NEXT: v_writelane_b32 v40, s10, 6		; GFX9-NEXT: v_writelane_b32 v40, s10, 6
; GFX9-NEXT: v_writelane_b32 v40, s11, 7		; GFX9-NEXT: v_writelane_b32 v40, s11, 7
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
Show All 31 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0		; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-NEXT: v_writelane_b32 v40, s7, 3		; GFX10-NEXT: v_writelane_b32 v40, s7, 3
; GFX10-NEXT: v_writelane_b32 v40, s8, 4		; GFX10-NEXT: v_writelane_b32 v40, s8, 4
; GFX10-NEXT: v_writelane_b32 v40, s9, 5		; GFX10-NEXT: v_writelane_b32 v40, s9, 5
; GFX10-NEXT: v_writelane_b32 v40, s10, 6		; GFX10-NEXT: v_writelane_b32 v40, s10, 6
Show All 34 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_writelane_b32 v40, s4, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
		; GFX11-NEXT: v_writelane_b32 v40, s4, 0
; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0		; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v40, s5, 1		; GFX11-NEXT: v_writelane_b32 v40, s5, 1
; GFX11-NEXT: v_writelane_b32 v40, s6, 2		; GFX11-NEXT: v_writelane_b32 v40, s6, 2
; GFX11-NEXT: v_writelane_b32 v40, s7, 3		; GFX11-NEXT: v_writelane_b32 v40, s7, 3
; GFX11-NEXT: v_writelane_b32 v40, s8, 4		; GFX11-NEXT: v_writelane_b32 v40, s8, 4
; GFX11-NEXT: v_writelane_b32 v40, s9, 5		; GFX11-NEXT: v_writelane_b32 v40, s9, 5
; GFX11-NEXT: v_writelane_b32 v40, s10, 6		; GFX11-NEXT: v_writelane_b32 v40, s10, 6
Show All 34 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0		; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s10, 6		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s10, 6
Show All 38 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: v_writelane_b32 v40, s6, 2		; GFX9-NEXT: v_writelane_b32 v40, s6, 2
; GFX9-NEXT: v_writelane_b32 v40, s7, 3		; GFX9-NEXT: v_writelane_b32 v40, s7, 3
; GFX9-NEXT: v_writelane_b32 v40, s8, 4		; GFX9-NEXT: v_writelane_b32 v40, s8, 4
; GFX9-NEXT: v_writelane_b32 v40, s9, 5		; GFX9-NEXT: v_writelane_b32 v40, s9, 5
; GFX9-NEXT: v_writelane_b32 v40, s10, 6		; GFX9-NEXT: v_writelane_b32 v40, s10, 6
; GFX9-NEXT: v_writelane_b32 v40, s11, 7		; GFX9-NEXT: v_writelane_b32 v40, s11, 7
; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 8		; GFX9-NEXT: v_writelane_b32 v40, s30, 8
; GFX9-NEXT: s_mov_b32 s4, 1		; GFX9-NEXT: s_mov_b32 s4, 1
; GFX9-NEXT: s_mov_b32 s5, 2		; GFX9-NEXT: s_mov_b32 s5, 2
; GFX9-NEXT: s_mov_b32 s6, 3		; GFX9-NEXT: s_mov_b32 s6, 3
; GFX9-NEXT: s_mov_b32 s7, 4		; GFX9-NEXT: s_mov_b32 s7, 4
; GFX9-NEXT: s_mov_b32 s8, 5		; GFX9-NEXT: s_mov_b32 s8, 5
; GFX9-NEXT: s_mov_b32 s9, 6		; GFX9-NEXT: s_mov_b32 s9, 6
; GFX9-NEXT: s_mov_b32 s10, 7		; GFX9-NEXT: s_mov_b32 s10, 7
Show All 30 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
		; GFX10-NEXT: ; implicit-def: $vgpr40
		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_mov_b32 s4, 1		; GFX10-NEXT: s_mov_b32 s4, 1
; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v8i32_inreg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v8i32_inreg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v8i32_inreg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v8i32_inreg@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: s_mov_b32 s5, 2		; GFX10-NEXT: s_mov_b32 s5, 2
; GFX10-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-NEXT: s_mov_b32 s6, 3		; GFX10-NEXT: s_mov_b32 s6, 3
Show All 38 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
		; GFX11-NEXT: ; implicit-def: $vgpr40
		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v40, s4, 0		; GFX11-NEXT: v_writelane_b32 v40, s4, 0
; GFX11-NEXT: s_mov_b32 s4, 1		; GFX11-NEXT: s_mov_b32 s4, 1
; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v8i32_inreg@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_v8i32_inreg@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v8i32_inreg@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_v8i32_inreg@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s5, 1		; GFX11-NEXT: v_writelane_b32 v40, s5, 1
; GFX11-NEXT: s_mov_b32 s5, 2		; GFX11-NEXT: s_mov_b32 s5, 2
; GFX11-NEXT: v_writelane_b32 v40, s6, 2		; GFX11-NEXT: v_writelane_b32 v40, s6, 2
; GFX11-NEXT: s_mov_b32 s6, 3		; GFX11-NEXT: s_mov_b32 s6, 3
Show All 38 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1		; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v8i32_inreg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v8i32_inreg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v8i32_inreg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v8i32_inreg@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2		; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 3		; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 3
Show All 40 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: v_writelane_b32 v40, s6, 2		; GFX9-NEXT: v_writelane_b32 v40, s6, 2
; GFX9-NEXT: v_writelane_b32 v40, s7, 3		; GFX9-NEXT: v_writelane_b32 v40, s7, 3
; GFX9-NEXT: v_writelane_b32 v40, s8, 4		; GFX9-NEXT: v_writelane_b32 v40, s8, 4
; GFX9-NEXT: v_writelane_b32 v40, s9, 5		; GFX9-NEXT: v_writelane_b32 v40, s9, 5
; GFX9-NEXT: v_writelane_b32 v40, s10, 6		; GFX9-NEXT: v_writelane_b32 v40, s10, 6
; GFX9-NEXT: v_writelane_b32 v40, s11, 7		; GFX9-NEXT: v_writelane_b32 v40, s11, 7
; GFX9-NEXT: v_writelane_b32 v40, s12, 8		; GFX9-NEXT: v_writelane_b32 v40, s12, 8
; GFX9-NEXT: v_writelane_b32 v40, s13, 9		; GFX9-NEXT: v_writelane_b32 v40, s13, 9
; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s14, 10		; GFX9-NEXT: v_writelane_b32 v40, s14, 10
; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0		; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
; GFX9-NEXT: v_writelane_b32 v40, s15, 11		; GFX9-NEXT: v_writelane_b32 v40, s15, 11
; GFX9-NEXT: v_writelane_b32 v40, s16, 12		; GFX9-NEXT: v_writelane_b32 v40, s16, 12
; GFX9-NEXT: v_writelane_b32 v40, s17, 13		; GFX9-NEXT: v_writelane_b32 v40, s17, 13
; GFX9-NEXT: v_writelane_b32 v40, s18, 14		; GFX9-NEXT: v_writelane_b32 v40, s18, 14
; GFX9-NEXT: v_writelane_b32 v40, s19, 15		; GFX9-NEXT: v_writelane_b32 v40, s19, 15
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
Show All 39 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0		; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-NEXT: v_writelane_b32 v40, s7, 3		; GFX10-NEXT: v_writelane_b32 v40, s7, 3
; GFX10-NEXT: v_writelane_b32 v40, s8, 4		; GFX10-NEXT: v_writelane_b32 v40, s8, 4
; GFX10-NEXT: v_writelane_b32 v40, s9, 5		; GFX10-NEXT: v_writelane_b32 v40, s9, 5
; GFX10-NEXT: v_writelane_b32 v40, s10, 6		; GFX10-NEXT: v_writelane_b32 v40, s10, 6
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_writelane_b32 v40, s4, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
		; GFX11-NEXT: v_writelane_b32 v40, s4, 0
; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0		; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v40, s5, 1		; GFX11-NEXT: v_writelane_b32 v40, s5, 1
; GFX11-NEXT: v_writelane_b32 v40, s6, 2		; GFX11-NEXT: v_writelane_b32 v40, s6, 2
; GFX11-NEXT: v_writelane_b32 v40, s7, 3		; GFX11-NEXT: v_writelane_b32 v40, s7, 3
; GFX11-NEXT: v_writelane_b32 v40, s8, 4		; GFX11-NEXT: v_writelane_b32 v40, s8, 4
; GFX11-NEXT: v_writelane_b32 v40, s9, 5		; GFX11-NEXT: v_writelane_b32 v40, s9, 5
; GFX11-NEXT: v_writelane_b32 v40, s10, 6		; GFX11-NEXT: v_writelane_b32 v40, s10, 6
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0		; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s10, 6		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s10, 6
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: v_writelane_b32 v40, s6, 2		; GFX9-NEXT: v_writelane_b32 v40, s6, 2
; GFX9-NEXT: v_writelane_b32 v40, s7, 3		; GFX9-NEXT: v_writelane_b32 v40, s7, 3
; GFX9-NEXT: v_writelane_b32 v40, s8, 4		; GFX9-NEXT: v_writelane_b32 v40, s8, 4
; GFX9-NEXT: v_writelane_b32 v40, s9, 5		; GFX9-NEXT: v_writelane_b32 v40, s9, 5
; GFX9-NEXT: v_writelane_b32 v40, s10, 6		; GFX9-NEXT: v_writelane_b32 v40, s10, 6
; GFX9-NEXT: v_writelane_b32 v40, s11, 7		; GFX9-NEXT: v_writelane_b32 v40, s11, 7
; GFX9-NEXT: v_writelane_b32 v40, s12, 8		; GFX9-NEXT: v_writelane_b32 v40, s12, 8
; GFX9-NEXT: v_writelane_b32 v40, s13, 9		; GFX9-NEXT: v_writelane_b32 v40, s13, 9
; GFX9-NEXT: v_writelane_b32 v40, s14, 10		; GFX9-NEXT: v_writelane_b32 v40, s14, 10
; GFX9-NEXT: v_writelane_b32 v40, s15, 11		; GFX9-NEXT: v_writelane_b32 v40, s15, 11
; GFX9-NEXT: v_writelane_b32 v40, s16, 12		; GFX9-NEXT: v_writelane_b32 v40, s16, 12
; GFX9-NEXT: v_writelane_b32 v40, s17, 13		; GFX9-NEXT: v_writelane_b32 v40, s17, 13
; GFX9-NEXT: v_writelane_b32 v40, s18, 14		; GFX9-NEXT: v_writelane_b32 v40, s18, 14
; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s19, 15		; GFX9-NEXT: v_writelane_b32 v40, s19, 15
; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0		; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
; GFX9-NEXT: v_writelane_b32 v40, s20, 16		; GFX9-NEXT: v_writelane_b32 v40, s20, 16
; GFX9-NEXT: v_writelane_b32 v40, s21, 17		; GFX9-NEXT: v_writelane_b32 v40, s21, 17
; GFX9-NEXT: v_writelane_b32 v40, s22, 18		; GFX9-NEXT: v_writelane_b32 v40, s22, 18
; GFX9-NEXT: v_writelane_b32 v40, s23, 19		; GFX9-NEXT: v_writelane_b32 v40, s23, 19
; GFX9-NEXT: v_writelane_b32 v40, s24, 20		; GFX9-NEXT: v_writelane_b32 v40, s24, 20
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
▲ Show 20 Lines • Show All 78 Lines • ▼ Show 20 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0		; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-NEXT: v_writelane_b32 v40, s7, 3		; GFX10-NEXT: v_writelane_b32 v40, s7, 3
; GFX10-NEXT: v_writelane_b32 v40, s8, 4		; GFX10-NEXT: v_writelane_b32 v40, s8, 4
; GFX10-NEXT: v_writelane_b32 v40, s9, 5		; GFX10-NEXT: v_writelane_b32 v40, s9, 5
; GFX10-NEXT: v_writelane_b32 v40, s10, 6		; GFX10-NEXT: v_writelane_b32 v40, s10, 6
▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_writelane_b32 v40, s4, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
		; GFX11-NEXT: v_writelane_b32 v40, s4, 0
; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0		; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v40, s5, 1		; GFX11-NEXT: v_writelane_b32 v40, s5, 1
; GFX11-NEXT: v_writelane_b32 v40, s6, 2		; GFX11-NEXT: v_writelane_b32 v40, s6, 2
; GFX11-NEXT: v_writelane_b32 v40, s7, 3		; GFX11-NEXT: v_writelane_b32 v40, s7, 3
; GFX11-NEXT: v_writelane_b32 v40, s8, 4		; GFX11-NEXT: v_writelane_b32 v40, s8, 4
; GFX11-NEXT: v_writelane_b32 v40, s9, 5		; GFX11-NEXT: v_writelane_b32 v40, s9, 5
; GFX11-NEXT: v_writelane_b32 v40, s10, 6		; GFX11-NEXT: v_writelane_b32 v40, s10, 6
▲ Show 20 Lines • Show All 89 Lines • ▼ Show 20 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0		; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s10, 6		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s10, 6
▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: v_writelane_b32 v40, s6, 2		; GFX9-NEXT: v_writelane_b32 v40, s6, 2
; GFX9-NEXT: v_writelane_b32 v40, s7, 3		; GFX9-NEXT: v_writelane_b32 v40, s7, 3
; GFX9-NEXT: v_writelane_b32 v40, s8, 4		; GFX9-NEXT: v_writelane_b32 v40, s8, 4
; GFX9-NEXT: v_writelane_b32 v40, s9, 5		; GFX9-NEXT: v_writelane_b32 v40, s9, 5
; GFX9-NEXT: v_writelane_b32 v40, s10, 6		; GFX9-NEXT: v_writelane_b32 v40, s10, 6
; GFX9-NEXT: v_writelane_b32 v40, s11, 7		; GFX9-NEXT: v_writelane_b32 v40, s11, 7
; GFX9-NEXT: v_writelane_b32 v40, s12, 8		; GFX9-NEXT: v_writelane_b32 v40, s12, 8
; GFX9-NEXT: v_writelane_b32 v40, s13, 9		; GFX9-NEXT: v_writelane_b32 v40, s13, 9
; GFX9-NEXT: v_writelane_b32 v40, s14, 10		; GFX9-NEXT: v_writelane_b32 v40, s14, 10
; GFX9-NEXT: v_writelane_b32 v40, s15, 11		; GFX9-NEXT: v_writelane_b32 v40, s15, 11
; GFX9-NEXT: v_writelane_b32 v40, s16, 12		; GFX9-NEXT: v_writelane_b32 v40, s16, 12
; GFX9-NEXT: v_writelane_b32 v40, s17, 13		; GFX9-NEXT: v_writelane_b32 v40, s17, 13
; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s18, 14		; GFX9-NEXT: v_writelane_b32 v40, s18, 14
; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0		; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
; GFX9-NEXT: v_writelane_b32 v40, s19, 15		; GFX9-NEXT: v_writelane_b32 v40, s19, 15
; GFX9-NEXT: v_writelane_b32 v40, s20, 16		; GFX9-NEXT: v_writelane_b32 v40, s20, 16
; GFX9-NEXT: v_writelane_b32 v40, s21, 17		; GFX9-NEXT: v_writelane_b32 v40, s21, 17
; GFX9-NEXT: v_writelane_b32 v40, s22, 18		; GFX9-NEXT: v_writelane_b32 v40, s22, 18
; GFX9-NEXT: v_writelane_b32 v40, s23, 19		; GFX9-NEXT: v_writelane_b32 v40, s23, 19
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
▲ Show 20 Lines • Show All 84 Lines • ▼ Show 20 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0		; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-NEXT: v_writelane_b32 v40, s7, 3		; GFX10-NEXT: v_writelane_b32 v40, s7, 3
; GFX10-NEXT: v_writelane_b32 v40, s8, 4		; GFX10-NEXT: v_writelane_b32 v40, s8, 4
; GFX10-NEXT: v_writelane_b32 v40, s9, 5		; GFX10-NEXT: v_writelane_b32 v40, s9, 5
; GFX10-NEXT: v_writelane_b32 v40, s10, 6		; GFX10-NEXT: v_writelane_b32 v40, s10, 6
▲ Show 20 Lines • Show All 100 Lines • ▼ Show 20 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_writelane_b32 v40, s4, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
		; GFX11-NEXT: v_writelane_b32 v40, s4, 0
; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0		; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v40, s5, 1		; GFX11-NEXT: v_writelane_b32 v40, s5, 1
; GFX11-NEXT: v_writelane_b32 v40, s6, 2		; GFX11-NEXT: v_writelane_b32 v40, s6, 2
; GFX11-NEXT: v_writelane_b32 v40, s7, 3		; GFX11-NEXT: v_writelane_b32 v40, s7, 3
; GFX11-NEXT: v_writelane_b32 v40, s8, 4		; GFX11-NEXT: v_writelane_b32 v40, s8, 4
; GFX11-NEXT: v_writelane_b32 v40, s9, 5		; GFX11-NEXT: v_writelane_b32 v40, s9, 5
; GFX11-NEXT: v_writelane_b32 v40, s10, 6		; GFX11-NEXT: v_writelane_b32 v40, s10, 6
▲ Show 20 Lines • Show All 92 Lines • ▼ Show 20 Lines
; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0		; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s10, 6		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s10, 6
▲ Show 20 Lines • Show All 103 Lines • ▼ Show 20 Lines
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s33		; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s33
; GFX9-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:4		; GFX9-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:4
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x800		; GFX9-NEXT: s_addk_i32 s32, 0x800
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, stack_passed_f64_arg@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, stack_passed_f64_arg@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, stack_passed_f64_arg@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, stack_passed_f64_arg@rel32@hi+12
; GFX9-NEXT: s_waitcnt vmcnt(1)		; GFX9-NEXT: s_waitcnt vmcnt(1)
Show All 22 Lines
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: s_clause 0x1		; GFX10-NEXT: s_clause 0x1
; GFX10-NEXT: buffer_load_dword v32, off, s[0:3], s33		; GFX10-NEXT: buffer_load_dword v32, off, s[0:3], s33
; GFX10-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:4		; GFX10-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:4
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: s_addk_i32 s32, 0x400		; GFX10-NEXT: s_addk_i32 s32, 0x400
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, stack_passed_f64_arg@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, stack_passed_f64_arg@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, stack_passed_f64_arg@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, stack_passed_f64_arg@rel32@hi+12
; GFX10-NEXT: s_waitcnt vmcnt(1)		; GFX10-NEXT: s_waitcnt vmcnt(1)
; GFX10-NEXT: buffer_store_dword v32, off, s[0:3], s32		; GFX10-NEXT: buffer_store_dword v32, off, s[0:3], s32
; GFX10-NEXT: s_waitcnt vmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0)
; GFX10-NEXT: buffer_store_dword v33, off, s[0:3], s32 offset:4		; GFX10-NEXT: buffer_store_dword v33, off, s[0:3], s32 offset:4
Show All 20 Lines
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33 offset:8		; GFX11-NEXT: scratch_store_b32 off, v40, s33 offset:8
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:12		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:12
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: scratch_load_b64 v[32:33], off, s33		; GFX11-NEXT: scratch_load_b64 v[32:33], off, s33
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: s_add_i32 s32, s32, 32		; GFX11-NEXT: s_add_i32 s32, s32, 32
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, stack_passed_f64_arg@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, stack_passed_f64_arg@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, stack_passed_f64_arg@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, stack_passed_f64_arg@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s31, 1		; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_waitcnt vmcnt(0)		; GFX11-NEXT: s_waitcnt vmcnt(0)
; GFX11-NEXT: scratch_store_b64 off, v[32:33], s32		; GFX11-NEXT: scratch_store_b64 off, v[32:33], s32
; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
Show All 17 Lines
; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33		; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32		; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1		; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 offset:8 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 offset:8 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:12 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:12 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: scratch_load_dwordx2 v[32:33], off, s33		; GFX10-SCRATCH-NEXT: scratch_load_dwordx2 v[32:33], off, s33
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 32		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 32
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0		; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s0, 0
; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]		; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, stack_passed_f64_arg@rel32@lo+4		; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, stack_passed_f64_arg@rel32@lo+4
; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, stack_passed_f64_arg@rel32@hi+12		; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, stack_passed_f64_arg@rel32@hi+12
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)		; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
; GFX10-SCRATCH-NEXT: scratch_store_dwordx2 off, v[32:33], s32		; GFX10-SCRATCH-NEXT: scratch_store_dwordx2 off, v[32:33], s32
; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
Show All 26 Lines
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_mov_b32_e32 v0, 12		; GFX9-NEXT: v_mov_b32_e32 v0, 12
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32
; GFX9-NEXT: v_mov_b32_e32 v0, 13		; GFX9-NEXT: v_mov_b32_e32 v0, 13
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4
; GFX9-NEXT: v_mov_b32_e32 v0, 14		; GFX9-NEXT: v_mov_b32_e32 v0, 14
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8
; GFX9-NEXT: v_mov_b32_e32 v0, 15		; GFX9-NEXT: v_mov_b32_e32 v0, 15
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12
; GFX9-NEXT: v_mov_b32_e32 v0, 0		; GFX9-NEXT: v_mov_b32_e32 v0, 0
; GFX9-NEXT: v_mov_b32_e32 v1, 0		; GFX9-NEXT: v_mov_b32_e32 v1, 0
; GFX9-NEXT: v_mov_b32_e32 v2, 0		; GFX9-NEXT: v_mov_b32_e32 v2, 0
; GFX9-NEXT: v_mov_b32_e32 v3, 1		; GFX9-NEXT: v_mov_b32_e32 v3, 1
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_mov_b32_e32 v0, 12		; GFX10-NEXT: v_mov_b32_e32 v0, 12
; GFX10-NEXT: v_mov_b32_e32 v1, 13		; GFX10-NEXT: v_mov_b32_e32 v1, 13
; GFX10-NEXT: v_mov_b32_e32 v2, 14		; GFX10-NEXT: v_mov_b32_e32 v2, 14
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_mov_b32_e32 v3, 15		; GFX10-NEXT: v_mov_b32_e32 v3, 15
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32		; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32
; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4		; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4
; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:8		; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:8
; GFX10-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:12		; GFX10-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:12
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_mov_b32_e32 v0, 0		; GFX10-NEXT: v_mov_b32_e32 v0, 0
; GFX10-NEXT: v_mov_b32_e32 v1, 0		; GFX10-NEXT: v_mov_b32_e32 v1, 0
; GFX10-NEXT: v_mov_b32_e32 v2, 0		; GFX10-NEXT: v_mov_b32_e32 v2, 0
; GFX10-NEXT: v_mov_b32_e32 v3, 1		; GFX10-NEXT: v_mov_b32_e32 v3, 1
; GFX10-NEXT: v_mov_b32_e32 v4, 1		; GFX10-NEXT: v_mov_b32_e32 v4, 1
; GFX10-NEXT: v_mov_b32_e32 v5, 1		; GFX10-NEXT: v_mov_b32_e32 v5, 1
; GFX10-NEXT: v_mov_b32_e32 v6, 2		; GFX10-NEXT: v_mov_b32_e32 v6, 2
; GFX10-NEXT: v_mov_b32_e32 v7, 2		; GFX10-NEXT: v_mov_b32_e32 v7, 2
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_dual_mov_b32 v0, 12 :: v_dual_mov_b32 v1, 13		; GFX11-NEXT: v_dual_mov_b32 v0, 12 :: v_dual_mov_b32 v1, 13
; GFX11-NEXT: v_dual_mov_b32 v2, 14 :: v_dual_mov_b32 v3, 15		; GFX11-NEXT: v_dual_mov_b32 v2, 14 :: v_dual_mov_b32 v3, 15
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: v_dual_mov_b32 v4, 1 :: v_dual_mov_b32 v5, 1		; GFX11-NEXT: v_dual_mov_b32 v4, 1 :: v_dual_mov_b32 v5, 1
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: scratch_store_b128 off, v[0:3], s32		; GFX11-NEXT: scratch_store_b128 off, v[0:3], s32
; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 0		; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 0
; GFX11-NEXT: v_dual_mov_b32 v2, 0 :: v_dual_mov_b32 v3, 1		; GFX11-NEXT: v_dual_mov_b32 v2, 0 :: v_dual_mov_b32 v3, 1
; GFX11-NEXT: v_dual_mov_b32 v6, 2 :: v_dual_mov_b32 v7, 2		; GFX11-NEXT: v_dual_mov_b32 v6, 2 :: v_dual_mov_b32 v7, 2
; GFX11-NEXT: v_dual_mov_b32 v8, 2 :: v_dual_mov_b32 v9, 3		; GFX11-NEXT: v_dual_mov_b32 v8, 2 :: v_dual_mov_b32 v9, 3
; GFX11-NEXT: v_dual_mov_b32 v10, 3 :: v_dual_mov_b32 v11, 3		; GFX11-NEXT: v_dual_mov_b32 v10, 3 :: v_dual_mov_b32 v11, 3
; GFX11-NEXT: v_dual_mov_b32 v12, 4 :: v_dual_mov_b32 v13, 4		; GFX11-NEXT: v_dual_mov_b32 v12, 4 :: v_dual_mov_b32 v13, 4
; GFX11-NEXT: v_dual_mov_b32 v14, 4 :: v_dual_mov_b32 v15, 5		; GFX11-NEXT: v_dual_mov_b32 v14, 4 :: v_dual_mov_b32 v15, 5
Show All 36 Lines
; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill		; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1		; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 12		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 12
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 13		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 13
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 14		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 14
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 15		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 15
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 1		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 1
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 1		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 1
; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[0:3], s32		; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[0:3], s32
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 1		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 1
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 2		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 2
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 2		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 2
▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8
; GFX9-NEXT: v_mov_b32_e32 v0, 11		; GFX9-NEXT: v_mov_b32_e32 v0, 11
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12
; GFX9-NEXT: v_mov_b32_e32 v0, 12		; GFX9-NEXT: v_mov_b32_e32 v0, 12
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16
; GFX9-NEXT: v_mov_b32_e32 v0, 13		; GFX9-NEXT: v_mov_b32_e32 v0, 13
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:20		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:20
; GFX9-NEXT: v_mov_b32_e32 v0, 14		; GFX9-NEXT: v_mov_b32_e32 v0, 14
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:24		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:24
; GFX9-NEXT: v_mov_b32_e32 v0, 15		; GFX9-NEXT: v_mov_b32_e32 v0, 15
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:28		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:28
; GFX9-NEXT: v_mov_b32_e32 v0, 0		; GFX9-NEXT: v_mov_b32_e32 v0, 0
; GFX9-NEXT: v_mov_b32_e32 v1, 0		; GFX9-NEXT: v_mov_b32_e32 v1, 0
; GFX9-NEXT: v_mov_b32_e32 v2, 0		; GFX9-NEXT: v_mov_b32_e32 v2, 0
; GFX9-NEXT: v_mov_b32_e32 v3, 0		; GFX9-NEXT: v_mov_b32_e32 v3, 0
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_mov_b32_e32 v3, 14		; GFX10-NEXT: v_mov_b32_e32 v3, 14
; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32		; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32
; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4		; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4
; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:8		; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:8
; GFX10-NEXT: v_mov_b32_e32 v0, 11		; GFX10-NEXT: v_mov_b32_e32 v0, 11
; GFX10-NEXT: v_mov_b32_e32 v1, 12		; GFX10-NEXT: v_mov_b32_e32 v1, 12
; GFX10-NEXT: v_mov_b32_e32 v2, 13		; GFX10-NEXT: v_mov_b32_e32 v2, 13
		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v4, 15		; GFX10-NEXT: v_mov_b32_e32 v4, 15
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12		; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12
; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:16		; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:16
; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:20		; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:20
; GFX10-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:24		; GFX10-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:24
; GFX10-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:28		; GFX10-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:28
; GFX10-NEXT: v_mov_b32_e32 v0, 0		; GFX10-NEXT: v_mov_b32_e32 v0, 0
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_dual_mov_b32 v0, 12 :: v_dual_mov_b32 v1, 13		; GFX11-NEXT: v_dual_mov_b32 v0, 12 :: v_dual_mov_b32 v1, 13
; GFX11-NEXT: v_dual_mov_b32 v2, 14 :: v_dual_mov_b32 v3, 15		; GFX11-NEXT: v_dual_mov_b32 v2, 14 :: v_dual_mov_b32 v3, 15
; GFX11-NEXT: v_dual_mov_b32 v4, 8 :: v_dual_mov_b32 v5, 9		; GFX11-NEXT: v_dual_mov_b32 v4, 8 :: v_dual_mov_b32 v5, 9
; GFX11-NEXT: v_dual_mov_b32 v6, 10 :: v_dual_mov_b32 v7, 11		; GFX11-NEXT: v_dual_mov_b32 v6, 10 :: v_dual_mov_b32 v7, 11
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b128 off, v[0:3], s32 offset:16		; GFX11-NEXT: scratch_store_b128 off, v[0:3], s32 offset:16
; GFX11-NEXT: scratch_store_b128 off, v[4:7], s32		; GFX11-NEXT: scratch_store_b128 off, v[4:7], s32
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 0		; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 0
; GFX11-NEXT: v_dual_mov_b32 v2, 0 :: v_dual_mov_b32 v3, 0		; GFX11-NEXT: v_dual_mov_b32 v2, 0 :: v_dual_mov_b32 v3, 0
; GFX11-NEXT: v_dual_mov_b32 v4, 0 :: v_dual_mov_b32 v5, 1		; GFX11-NEXT: v_dual_mov_b32 v4, 0 :: v_dual_mov_b32 v5, 1
; GFX11-NEXT: v_dual_mov_b32 v6, 1 :: v_dual_mov_b32 v7, 1		; GFX11-NEXT: v_dual_mov_b32 v6, 1 :: v_dual_mov_b32 v7, 1
; GFX11-NEXT: v_dual_mov_b32 v8, 1 :: v_dual_mov_b32 v9, 1		; GFX11-NEXT: v_dual_mov_b32 v8, 1 :: v_dual_mov_b32 v9, 1
; GFX11-NEXT: v_dual_mov_b32 v10, 2 :: v_dual_mov_b32 v11, 2		; GFX11-NEXT: v_dual_mov_b32 v10, 2 :: v_dual_mov_b32 v11, 2
; GFX11-NEXT: v_dual_mov_b32 v12, 2 :: v_dual_mov_b32 v13, 2		; GFX11-NEXT: v_dual_mov_b32 v12, 2 :: v_dual_mov_b32 v13, 2
; GFX11-NEXT: v_dual_mov_b32 v14, 2 :: v_dual_mov_b32 v15, 3		; GFX11-NEXT: v_dual_mov_b32 v14, 2 :: v_dual_mov_b32 v15, 3
Show All 40 Lines
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 13		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 13
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 14		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 14
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 15		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 15
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 8		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 8
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 9		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 9
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 10		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 10
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 11		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 11
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:16		; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:16
; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[4:7], s32		; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[4:7], s32
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 1		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 1
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 1		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 1
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 1		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 1
▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8
; GFX9-NEXT: v_mov_b32_e32 v0, 0x41300000		; GFX9-NEXT: v_mov_b32_e32 v0, 0x41300000
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12
; GFX9-NEXT: v_mov_b32_e32 v0, 0x41400000		; GFX9-NEXT: v_mov_b32_e32 v0, 0x41400000
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16
; GFX9-NEXT: v_mov_b32_e32 v0, 0x41500000		; GFX9-NEXT: v_mov_b32_e32 v0, 0x41500000
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:20		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:20
; GFX9-NEXT: v_mov_b32_e32 v0, 0x41600000		; GFX9-NEXT: v_mov_b32_e32 v0, 0x41600000
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:24		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:24
; GFX9-NEXT: v_mov_b32_e32 v0, 0x41700000		; GFX9-NEXT: v_mov_b32_e32 v0, 0x41700000
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:28		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:28
; GFX9-NEXT: v_mov_b32_e32 v0, 0		; GFX9-NEXT: v_mov_b32_e32 v0, 0
; GFX9-NEXT: v_mov_b32_e32 v1, 0		; GFX9-NEXT: v_mov_b32_e32 v1, 0
; GFX9-NEXT: v_mov_b32_e32 v2, 0		; GFX9-NEXT: v_mov_b32_e32 v2, 0
; GFX9-NEXT: v_mov_b32_e32 v3, 0		; GFX9-NEXT: v_mov_b32_e32 v3, 0
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: v_mov_b32_e32 v3, 0x41600000		; GFX10-NEXT: v_mov_b32_e32 v3, 0x41600000
; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32		; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32
; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4		; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4
; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:8		; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:8
; GFX10-NEXT: v_mov_b32_e32 v0, 0x41300000		; GFX10-NEXT: v_mov_b32_e32 v0, 0x41300000
; GFX10-NEXT: v_mov_b32_e32 v1, 0x41400000		; GFX10-NEXT: v_mov_b32_e32 v1, 0x41400000
; GFX10-NEXT: v_mov_b32_e32 v2, 0x41500000		; GFX10-NEXT: v_mov_b32_e32 v2, 0x41500000
		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: v_mov_b32_e32 v4, 0x41700000		; GFX10-NEXT: v_mov_b32_e32 v4, 0x41700000
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12		; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12
; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:16		; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:16
; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:20		; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:20
; GFX10-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:24		; GFX10-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:24
; GFX10-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:28		; GFX10-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:28
; GFX10-NEXT: v_mov_b32_e32 v0, 0		; GFX10-NEXT: v_mov_b32_e32 v0, 0
▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
; GFX11-NEXT: v_mov_b32_e32 v1, 0x41500000		; GFX11-NEXT: v_mov_b32_e32 v1, 0x41500000
; GFX11-NEXT: v_mov_b32_e32 v2, 0x41600000		; GFX11-NEXT: v_mov_b32_e32 v2, 0x41600000
; GFX11-NEXT: v_mov_b32_e32 v3, 0x41700000		; GFX11-NEXT: v_mov_b32_e32 v3, 0x41700000
; GFX11-NEXT: v_mov_b32_e32 v4, 0x41000000		; GFX11-NEXT: v_mov_b32_e32 v4, 0x41000000
; GFX11-NEXT: v_mov_b32_e32 v5, 0x41100000		; GFX11-NEXT: v_mov_b32_e32 v5, 0x41100000
; GFX11-NEXT: v_mov_b32_e32 v6, 0x41200000		; GFX11-NEXT: v_mov_b32_e32 v6, 0x41200000
; GFX11-NEXT: v_mov_b32_e32 v7, 0x41300000		; GFX11-NEXT: v_mov_b32_e32 v7, 0x41300000
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b128 off, v[0:3], s32 offset:16		; GFX11-NEXT: scratch_store_b128 off, v[0:3], s32 offset:16
; GFX11-NEXT: scratch_store_b128 off, v[4:7], s32		; GFX11-NEXT: scratch_store_b128 off, v[4:7], s32
; GFX11-NEXT: v_mov_b32_e32 v6, 1.0		; GFX11-NEXT: v_mov_b32_e32 v6, 1.0
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 0		; GFX11-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 0
; GFX11-NEXT: v_dual_mov_b32 v2, 0 :: v_dual_mov_b32 v3, 0		; GFX11-NEXT: v_dual_mov_b32 v2, 0 :: v_dual_mov_b32 v3, 0
; GFX11-NEXT: v_dual_mov_b32 v4, 0 :: v_dual_mov_b32 v5, 1.0		; GFX11-NEXT: v_dual_mov_b32 v4, 0 :: v_dual_mov_b32 v5, 1.0
; GFX11-NEXT: v_dual_mov_b32 v7, 1.0 :: v_dual_mov_b32 v8, 1.0		; GFX11-NEXT: v_dual_mov_b32 v7, 1.0 :: v_dual_mov_b32 v8, 1.0
; GFX11-NEXT: v_dual_mov_b32 v9, 1.0 :: v_dual_mov_b32 v10, 2.0		; GFX11-NEXT: v_dual_mov_b32 v9, 1.0 :: v_dual_mov_b32 v10, 2.0
; GFX11-NEXT: v_dual_mov_b32 v11, 2.0 :: v_dual_mov_b32 v12, 2.0		; GFX11-NEXT: v_dual_mov_b32 v11, 2.0 :: v_dual_mov_b32 v12, 2.0
; GFX11-NEXT: v_dual_mov_b32 v13, 2.0 :: v_dual_mov_b32 v14, 2.0		; GFX11-NEXT: v_dual_mov_b32 v13, 2.0 :: v_dual_mov_b32 v14, 2.0
; GFX11-NEXT: v_dual_mov_b32 v15, 0x40400000 :: v_dual_mov_b32 v16, 0x40400000		; GFX11-NEXT: v_dual_mov_b32 v15, 0x40400000 :: v_dual_mov_b32 v16, 0x40400000
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0x41500000		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0x41500000
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0x41600000		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0x41600000
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 0x41700000		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 0x41700000
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 0x41000000		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 0x41000000
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 0x41100000		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 0x41100000
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 0x41200000		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 0x41200000
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 0x41300000		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 0x41300000
; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16		; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-SCRATCH-NEXT: ; implicit-def: $vgpr40
; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:16		; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:16
; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[4:7], s32		; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[4:7], s32
		; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 1.0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 1.0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 1.0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 1.0
; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 1.0		; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 1.0
▲ Show 20 Lines • Show All 68 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/gfx-callable-preserved-registers.ll

Show All 9 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s5, 1		; GFX9-NEXT: v_writelane_b32 v40, s5, 1
; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 2		; GFX9-NEXT: v_writelane_b32 v40, s30, 2
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 3		; GFX9-NEXT: v_writelane_b32 v40, s31, 3
; GFX9-NEXT: s_getpc_b64 s[4:5]		; GFX9-NEXT: s_getpc_b64 s[4:5]
; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4		; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12
; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
; GFX9-NEXT: ;;#ASMSTART		; GFX9-NEXT: ;;#ASMSTART
Show All 19 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: v_writelane_b32 v40, s5, 1		; GFX10-NEXT: v_writelane_b32 v40, s5, 1
; GFX10-NEXT: s_getpc_b64 s[4:5]		; GFX10-NEXT: s_getpc_b64 s[4:5]
; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4		; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s30, 2		; GFX10-NEXT: v_writelane_b32 v40, s30, 2
; GFX10-NEXT: v_writelane_b32 v40, s31, 3		; GFX10-NEXT: v_writelane_b32 v40, s31, 3
; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
Show All 22 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_writelane_b32 v40, s4, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
		; GFX11-NEXT: v_writelane_b32 v40, s4, 0
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: v_writelane_b32 v40, s5, 1		; GFX11-NEXT: v_writelane_b32 v40, s5, 1
; GFX11-NEXT: s_getpc_b64 s[4:5]		; GFX11-NEXT: s_getpc_b64 s[4:5]
; GFX11-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4		; GFX11-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s30, 2		; GFX11-NEXT: v_writelane_b32 v40, s30, 2
; GFX11-NEXT: v_writelane_b32 v40, s31, 3		; GFX11-NEXT: v_writelane_b32 v40, s31, 3
; GFX11-NEXT: s_swappc_b64 s[30:31], s[4:5]		; GFX11-NEXT: s_swappc_b64 s[30:31], s[4:5]
Show All 23 Lines

define amdgpu_gfx void @void_func_void_clobber_s28_s29() #1 {		define amdgpu_gfx void @void_func_void_clobber_s28_s29() #1 {
; GFX9-LABEL: void_func_void_clobber_s28_s29:		; GFX9-LABEL: void_func_void_clobber_s28_s29:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_xor_saveexec_b64 s[34:35], -1		; GFX9-NEXT: s_xor_saveexec_b64 s[34:35], -1
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-NEXT: ; implicit-def: $vgpr0
; GFX9-NEXT: v_writelane_b32 v0, s28, 0		; GFX9-NEXT: v_writelane_b32 v0, s28, 0
; GFX9-NEXT: v_writelane_b32 v0, s29, 1		; GFX9-NEXT: v_writelane_b32 v0, s29, 1
; GFX9-NEXT: v_writelane_b32 v0, s30, 2		; GFX9-NEXT: v_writelane_b32 v0, s30, 2
; GFX9-NEXT: v_writelane_b32 v0, s31, 3		; GFX9-NEXT: v_writelane_b32 v0, s31, 3
; GFX9-NEXT: ;;#ASMSTART		; GFX9-NEXT: ;;#ASMSTART
; GFX9-NEXT: ; clobber		; GFX9-NEXT: ; clobber
; GFX9-NEXT: ;;#ASMEND		; GFX9-NEXT: ;;#ASMEND
; GFX9-NEXT: ;;#ASMSTART		; GFX9-NEXT: ;;#ASMSTART
Show All 12 Lines
; GFX10-LABEL: void_func_void_clobber_s28_s29:		; GFX10-LABEL: void_func_void_clobber_s28_s29:
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_xor_saveexec_b32 s34, -1		; GFX10-NEXT: s_xor_saveexec_b32 s34, -1
; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s34		; GFX10-NEXT: s_mov_b32 exec_lo, s34
		; GFX10-NEXT: ; implicit-def: $vgpr0
; GFX10-NEXT: v_writelane_b32 v0, s28, 0		; GFX10-NEXT: v_writelane_b32 v0, s28, 0
; GFX10-NEXT: v_writelane_b32 v0, s29, 1		; GFX10-NEXT: v_writelane_b32 v0, s29, 1
; GFX10-NEXT: v_writelane_b32 v0, s30, 2		; GFX10-NEXT: v_writelane_b32 v0, s30, 2
; GFX10-NEXT: v_writelane_b32 v0, s31, 3		; GFX10-NEXT: v_writelane_b32 v0, s31, 3
; GFX10-NEXT: ;;#ASMSTART		; GFX10-NEXT: ;;#ASMSTART
; GFX10-NEXT: ; clobber		; GFX10-NEXT: ; clobber
; GFX10-NEXT: ;;#ASMEND		; GFX10-NEXT: ;;#ASMEND
; GFX10-NEXT: ;;#ASMSTART		; GFX10-NEXT: ;;#ASMSTART
Show All 10 Lines
; GFX10-NEXT: s_waitcnt vmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_setpc_b64 s[30:31]		; GFX10-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX11-LABEL: void_func_void_clobber_s28_s29:		; GFX11-LABEL: void_func_void_clobber_s28_s29:
; GFX11: ; %bb.0:		; GFX11: ; %bb.0:
; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_xor_saveexec_b32 s0, -1		; GFX11-NEXT: s_xor_saveexec_b32 s1, -1
; GFX11-NEXT: scratch_store_b32 off, v0, s32 ; 4-byte Folded Spill		; GFX11-NEXT: scratch_store_b32 off, v0, s32 ; 4-byte Folded Spill
; GFX11-NEXT: s_mov_b32 exec_lo, s0		; GFX11-NEXT: s_mov_b32 exec_lo, s1
		; GFX11-NEXT: ; implicit-def: $vgpr0
; GFX11-NEXT: v_writelane_b32 v0, s28, 0		; GFX11-NEXT: v_writelane_b32 v0, s28, 0
; GFX11-NEXT: v_writelane_b32 v0, s29, 1		; GFX11-NEXT: v_writelane_b32 v0, s29, 1
; GFX11-NEXT: v_writelane_b32 v0, s30, 2		; GFX11-NEXT: v_writelane_b32 v0, s30, 2
; GFX11-NEXT: v_writelane_b32 v0, s31, 3		; GFX11-NEXT: v_writelane_b32 v0, s31, 3
; GFX11-NEXT: ;;#ASMSTART		; GFX11-NEXT: ;;#ASMSTART
; GFX11-NEXT: ; clobber		; GFX11-NEXT: ; clobber
; GFX11-NEXT: ;;#ASMEND		; GFX11-NEXT: ;;#ASMEND
; GFX11-NEXT: ;;#ASMSTART		; GFX11-NEXT: ;;#ASMSTART
; GFX11-NEXT: ; clobber		; GFX11-NEXT: ; clobber
; GFX11-NEXT: ;;#ASMEND		; GFX11-NEXT: ;;#ASMEND
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
; GFX11-NEXT: v_readlane_b32 s31, v0, 3		; GFX11-NEXT: v_readlane_b32 s31, v0, 3
; GFX11-NEXT: v_readlane_b32 s30, v0, 2		; GFX11-NEXT: v_readlane_b32 s30, v0, 2
; GFX11-NEXT: v_readlane_b32 s29, v0, 1		; GFX11-NEXT: v_readlane_b32 s29, v0, 1
; GFX11-NEXT: v_readlane_b32 s28, v0, 0		; GFX11-NEXT: v_readlane_b32 s28, v0, 0
; GFX11-NEXT: s_xor_saveexec_b32 s0, -1		; GFX11-NEXT: s_xor_saveexec_b32 s1, -1
; GFX11-NEXT: scratch_load_b32 v0, off, s32 ; 4-byte Folded Reload		; GFX11-NEXT: scratch_load_b32 v0, off, s32 ; 4-byte Folded Reload
; GFX11-NEXT: s_mov_b32 exec_lo, s0		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: s_waitcnt vmcnt(0)		; GFX11-NEXT: s_waitcnt vmcnt(0)
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_setpc_b64 s[30:31]		; GFX11-NEXT: s_setpc_b64 s[30:31]
call void asm sideeffect "; clobber", "~{s[30:31]}"() #0		call void asm sideeffect "; clobber", "~{s[30:31]}"() #0
; GCN: v_writelane_b32 v0, s28, 0		; GCN: v_writelane_b32 v0, s28, 0
; GCN: v_writelane_b32 v0, s29, 1		; GCN: v_writelane_b32 v0, s29, 1

; GCN: v_readlane_b32 s28, v0, 0		; GCN: v_readlane_b32 s28, v0, 0
; GCN: v_readlane_b32 s29, v0, 1		; GCN: v_readlane_b32 s29, v0, 1
call void asm sideeffect "; clobber", "~{s[28:29]}"() #0		call void asm sideeffect "; clobber", "~{s[28:29]}"() #0
ret void		ret void
}		}

define amdgpu_gfx void @test_call_void_func_void_mayclobber_s31(i32 addrspace(1)* %out) #0 {		define amdgpu_gfx void @test_call_void_func_void_mayclobber_s31(i32 addrspace(1)* %out) #0 {
; GFX9-LABEL: test_call_void_func_void_mayclobber_s31:		; GFX9-LABEL: test_call_void_func_void_mayclobber_s31:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s30, 1		; GFX9-NEXT: v_writelane_b32 v40, s30, 1
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 2		; GFX9-NEXT: v_writelane_b32 v40, s31, 2
; GFX9-NEXT: ;;#ASMSTART		; GFX9-NEXT: ;;#ASMSTART
; GFX9-NEXT: ; def s31		; GFX9-NEXT: ; def s31
; GFX9-NEXT: ;;#ASMEND		; GFX9-NEXT: ;;#ASMEND
; GFX9-NEXT: s_mov_b32 s4, s31		; GFX9-NEXT: s_mov_b32 s4, s31
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
Show All 23 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s30, 1		; GFX10-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-NEXT: v_writelane_b32 v40, s31, 2		; GFX10-NEXT: v_writelane_b32 v40, s31, 2
; GFX10-NEXT: ;;#ASMSTART		; GFX10-NEXT: ;;#ASMSTART
; GFX10-NEXT: ; def s31		; GFX10-NEXT: ; def s31
Show All 25 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_writelane_b32 v40, s4, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
		; GFX11-NEXT: v_writelane_b32 v40, s4, 0
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_void@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_void@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_void@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_void@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s30, 1		; GFX11-NEXT: v_writelane_b32 v40, s30, 1
; GFX11-NEXT: v_writelane_b32 v40, s31, 2		; GFX11-NEXT: v_writelane_b32 v40, s31, 2
; GFX11-NEXT: ;;#ASMSTART		; GFX11-NEXT: ;;#ASMSTART
; GFX11-NEXT: ; def s31		; GFX11-NEXT: ; def s31
Show All 25 Lines

define amdgpu_gfx void @test_call_void_func_void_mayclobber_v31(i32 addrspace(1)* %out) #0 {		define amdgpu_gfx void @test_call_void_func_void_mayclobber_v31(i32 addrspace(1)* %out) #0 {
; GFX9-LABEL: test_call_void_func_void_mayclobber_v31:		; GFX9-LABEL: test_call_void_func_void_mayclobber_v31:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr41
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v41, s30, 0
; GFX9-NEXT: v_writelane_b32 v42, s34, 0		; GFX9-NEXT: v_writelane_b32 v42, s34, 0
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v41, s31, 1
; GFX9-NEXT: ;;#ASMSTART		; GFX9-NEXT: ;;#ASMSTART
; GFX9-NEXT: ; def v31		; GFX9-NEXT: ; def v31
; GFX9-NEXT: ;;#ASMEND		; GFX9-NEXT: ;;#ASMEND
; GFX9-NEXT: v_mov_b32_e32 v41, v31		; GFX9-NEXT: v_mov_b32_e32 v40, v31
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX9-NEXT: v_mov_b32_e32 v31, v41		; GFX9-NEXT: v_mov_b32_e32 v31, v40
; GFX9-NEXT: ;;#ASMSTART		; GFX9-NEXT: ;;#ASMSTART
; GFX9-NEXT: ; use v31		; GFX9-NEXT: ; use v31
; GFX9-NEXT: ;;#ASMEND		; GFX9-NEXT: ;;#ASMEND
; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload		; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
; GFX9-NEXT: v_readlane_b32 s31, v40, 1		; GFX9-NEXT: v_readlane_b32 s31, v41, 1
; GFX9-NEXT: v_readlane_b32 s30, v40, 0		; GFX9-NEXT: v_readlane_b32 s30, v41, 0
; GFX9-NEXT: v_readlane_b32 s34, v42, 0		; GFX9-NEXT: v_readlane_b32 s34, v42, 0
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload		; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload		; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
; GFX9-NEXT: s_addk_i32 s32, 0xfc00		; GFX9-NEXT: s_addk_i32 s32, 0xfc00
; GFX9-NEXT: s_mov_b32 s33, s34		; GFX9-NEXT: s_mov_b32 s33, s34
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: s_setpc_b64 s[30:31]		; GFX9-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-LABEL: test_call_void_func_void_mayclobber_v31:		; GFX10-LABEL: test_call_void_func_void_mayclobber_v31:
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr41
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
		; GFX10-NEXT: v_writelane_b32 v41, s30, 0
; GFX10-NEXT: v_writelane_b32 v42, s34, 0		; GFX10-NEXT: v_writelane_b32 v42, s34, 0
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: ;;#ASMSTART		; GFX10-NEXT: ;;#ASMSTART
; GFX10-NEXT: ; def v31		; GFX10-NEXT: ; def v31
; GFX10-NEXT: ;;#ASMEND		; GFX10-NEXT: ;;#ASMEND
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_mov_b32_e32 v40, v31
; GFX10-NEXT: v_mov_b32_e32 v41, v31		; GFX10-NEXT: v_writelane_b32 v41, s31, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_mov_b32_e32 v31, v41		; GFX10-NEXT: v_mov_b32_e32 v31, v40
; GFX10-NEXT: ;;#ASMSTART		; GFX10-NEXT: ;;#ASMSTART
; GFX10-NEXT: ; use v31		; GFX10-NEXT: ; use v31
; GFX10-NEXT: ;;#ASMEND		; GFX10-NEXT: ;;#ASMEND
; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload		; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v41, 1
; GFX10-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-NEXT: v_readlane_b32 s30, v41, 0
; GFX10-NEXT: v_readlane_b32 s34, v42, 0		; GFX10-NEXT: v_readlane_b32 s34, v42, 0
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: s_clause 0x1		; GFX10-NEXT: s_clause 0x1
; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4		; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:8		; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:8
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: s_addk_i32 s32, 0xfe00		; GFX10-NEXT: s_addk_i32 s32, 0xfe00
; GFX10-NEXT: s_mov_b32 s33, s34		; GFX10-NEXT: s_mov_b32 s33, s34
; GFX10-NEXT: s_waitcnt vmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0)
; GFX10-NEXT: s_setpc_b64 s[30:31]		; GFX10-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX11-LABEL: test_call_void_func_void_mayclobber_v31:		; GFX11-LABEL: test_call_void_func_void_mayclobber_v31:
; GFX11: ; %bb.0:		; GFX11: ; %bb.0:
; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: scratch_store_b32 off, v42, s33 offset:8		; GFX11-NEXT: scratch_store_b32 off, v42, s33 offset:8
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr41
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
		; GFX11-NEXT: v_writelane_b32 v41, s30, 0
; GFX11-NEXT: v_writelane_b32 v42, s0, 0		; GFX11-NEXT: v_writelane_b32 v42, s0, 0
; GFX11-NEXT: scratch_store_b32 off, v41, s33 ; 4-byte Folded Spill		; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
; GFX11-NEXT: ;;#ASMSTART		; GFX11-NEXT: ;;#ASMSTART
; GFX11-NEXT: ; def v31		; GFX11-NEXT: ; def v31
; GFX11-NEXT: ;;#ASMEND		; GFX11-NEXT: ;;#ASMEND
; GFX11-NEXT: v_writelane_b32 v40, s31, 1		; GFX11-NEXT: v_mov_b32_e32 v40, v31
; GFX11-NEXT: v_mov_b32_e32 v41, v31		; GFX11-NEXT: v_writelane_b32 v41, s31, 1
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_void@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_void@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_void@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_void@rel32@hi+12
; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)		; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX11-NEXT: v_mov_b32_e32 v31, v41		; GFX11-NEXT: v_mov_b32_e32 v31, v40
; GFX11-NEXT: ;;#ASMSTART		; GFX11-NEXT: ;;#ASMSTART
; GFX11-NEXT: ; use v31		; GFX11-NEXT: ; use v31
; GFX11-NEXT: ;;#ASMEND		; GFX11-NEXT: ;;#ASMEND
; GFX11-NEXT: scratch_load_b32 v41, off, s33 ; 4-byte Folded Reload		; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
; GFX11-NEXT: v_readlane_b32 s31, v40, 1		; GFX11-NEXT: v_readlane_b32 s31, v41, 1
; GFX11-NEXT: v_readlane_b32 s30, v40, 0		; GFX11-NEXT: v_readlane_b32 s30, v41, 0
; GFX11-NEXT: v_readlane_b32 s0, v42, 0		; GFX11-NEXT: v_readlane_b32 s0, v42, 0
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:4		; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
; GFX11-NEXT: scratch_load_b32 v42, off, s33 offset:8		; GFX11-NEXT: scratch_load_b32 v42, off, s33 offset:8
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: s_add_i32 s32, s32, -16		; GFX11-NEXT: s_add_i32 s32, s32, -16
; GFX11-NEXT: s_mov_b32 s33, s0		; GFX11-NEXT: s_mov_b32 s33, s0
; GFX11-NEXT: s_waitcnt vmcnt(0)		; GFX11-NEXT: s_waitcnt vmcnt(0)
; GFX11-NEXT: s_setpc_b64 s[30:31]		; GFX11-NEXT: s_setpc_b64 s[30:31]
%v31 = call i32 asm sideeffect "; def $0", "={v31}"()		%v31 = call i32 asm sideeffect "; def $0", "={v31}"()
call amdgpu_gfx void @external_void_func_void()		call amdgpu_gfx void @external_void_func_void()
call void asm sideeffect "; use $0", "{v31}"(i32 %v31)		call void asm sideeffect "; use $0", "{v31}"(i32 %v31)
ret void		ret void
}		}


define amdgpu_gfx void @test_call_void_func_void_preserves_s33(i32 addrspace(1)* %out) #0 {		define amdgpu_gfx void @test_call_void_func_void_preserves_s33(i32 addrspace(1)* %out) #0 {
; GFX9-LABEL: test_call_void_func_void_preserves_s33:		; GFX9-LABEL: test_call_void_func_void_preserves_s33:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s30, 1		; GFX9-NEXT: v_writelane_b32 v40, s30, 1
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 2		; GFX9-NEXT: v_writelane_b32 v40, s31, 2
; GFX9-NEXT: ;;#ASMSTART		; GFX9-NEXT: ;;#ASMSTART
; GFX9-NEXT: ; def s33		; GFX9-NEXT: ; def s33
; GFX9-NEXT: ;;#ASMEND		; GFX9-NEXT: ;;#ASMEND
; GFX9-NEXT: s_mov_b32 s4, s33		; GFX9-NEXT: s_mov_b32 s4, s33
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
Show All 23 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
; GFX10-NEXT: ;;#ASMSTART		; GFX10-NEXT: ;;#ASMSTART
; GFX10-NEXT: ; def s33		; GFX10-NEXT: ; def s33
; GFX10-NEXT: ;;#ASMEND		; GFX10-NEXT: ;;#ASMEND
; GFX10-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-NEXT: s_mov_b32 s4, s33		; GFX10-NEXT: s_mov_b32 s4, s33
		; GFX10-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-NEXT: v_writelane_b32 v40, s31, 2		; GFX10-NEXT: v_writelane_b32 v40, s31, 2
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: s_mov_b32 s33, s4		; GFX10-NEXT: s_mov_b32 s33, s4
; GFX10-NEXT: ;;#ASMSTART		; GFX10-NEXT: ;;#ASMSTART
; GFX10-NEXT: ; use s33		; GFX10-NEXT: ; use s33
; GFX10-NEXT: ;;#ASMEND		; GFX10-NEXT: ;;#ASMEND
; GFX10-NEXT: v_readlane_b32 s31, v40, 2		; GFX10-NEXT: v_readlane_b32 s31, v40, 2
; GFX10-NEXT: v_readlane_b32 s30, v40, 1		; GFX10-NEXT: v_readlane_b32 s30, v40, 1
Show All 16 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_writelane_b32 v40, s4, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
		; GFX11-NEXT: v_writelane_b32 v40, s4, 0
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_void@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_void@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_void@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_void@rel32@hi+12
; GFX11-NEXT: ;;#ASMSTART		; GFX11-NEXT: ;;#ASMSTART
; GFX11-NEXT: ; def s33		; GFX11-NEXT: ; def s33
; GFX11-NEXT: ;;#ASMEND		; GFX11-NEXT: ;;#ASMEND
; GFX11-NEXT: v_writelane_b32 v40, s30, 1
; GFX11-NEXT: s_mov_b32 s4, s33		; GFX11-NEXT: s_mov_b32 s4, s33
		; GFX11-NEXT: v_writelane_b32 v40, s30, 1
; GFX11-NEXT: v_writelane_b32 v40, s31, 2		; GFX11-NEXT: v_writelane_b32 v40, s31, 2
; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX11-NEXT: s_mov_b32 s33, s4		; GFX11-NEXT: s_mov_b32 s33, s4
; GFX11-NEXT: ;;#ASMSTART		; GFX11-NEXT: ;;#ASMSTART
; GFX11-NEXT: ; use s33		; GFX11-NEXT: ; use s33
; GFX11-NEXT: ;;#ASMEND		; GFX11-NEXT: ;;#ASMEND
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
; GFX11-NEXT: v_readlane_b32 s31, v40, 2		; GFX11-NEXT: v_readlane_b32 s31, v40, 2
Show All 20 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 1		; GFX9-NEXT: v_writelane_b32 v40, s30, 1
; GFX9-NEXT: ;;#ASMSTART		; GFX9-NEXT: ;;#ASMSTART
; GFX9-NEXT: ; def s34		; GFX9-NEXT: ; def s34
; GFX9-NEXT: ;;#ASMEND		; GFX9-NEXT: ;;#ASMEND
; GFX9-NEXT: v_writelane_b32 v40, s31, 2		; GFX9-NEXT: v_writelane_b32 v40, s31, 2
; GFX9-NEXT: s_mov_b32 s4, s34		; GFX9-NEXT: s_mov_b32 s4, s34
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
Show All 23 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_getpc_b64 s[36:37]		; GFX10-NEXT: s_getpc_b64 s[36:37]
; GFX10-NEXT: s_add_u32 s36, s36, external_void_func_void@rel32@lo+4		; GFX10-NEXT: s_add_u32 s36, s36, external_void_func_void@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s37, s37, external_void_func_void@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s37, s37, external_void_func_void@rel32@hi+12
; GFX10-NEXT: ;;#ASMSTART		; GFX10-NEXT: ;;#ASMSTART
; GFX10-NEXT: ; def s34		; GFX10-NEXT: ; def s34
; GFX10-NEXT: ;;#ASMEND		; GFX10-NEXT: ;;#ASMEND
; GFX10-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-NEXT: s_mov_b32 s4, s34		; GFX10-NEXT: s_mov_b32 s4, s34
		; GFX10-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-NEXT: v_writelane_b32 v40, s31, 2		; GFX10-NEXT: v_writelane_b32 v40, s31, 2
; GFX10-NEXT: s_swappc_b64 s[30:31], s[36:37]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[36:37]
; GFX10-NEXT: s_mov_b32 s34, s4		; GFX10-NEXT: s_mov_b32 s34, s4
; GFX10-NEXT: ;;#ASMSTART		; GFX10-NEXT: ;;#ASMSTART
; GFX10-NEXT: ; use s34		; GFX10-NEXT: ; use s34
; GFX10-NEXT: ;;#ASMEND		; GFX10-NEXT: ;;#ASMEND
; GFX10-NEXT: v_readlane_b32 s31, v40, 2		; GFX10-NEXT: v_readlane_b32 s31, v40, 2
; GFX10-NEXT: v_readlane_b32 s30, v40, 1		; GFX10-NEXT: v_readlane_b32 s30, v40, 1
Show All 16 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_writelane_b32 v40, s4, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
		; GFX11-NEXT: v_writelane_b32 v40, s4, 0
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_void@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_void@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_void@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_void@rel32@hi+12
; GFX11-NEXT: ;;#ASMSTART		; GFX11-NEXT: ;;#ASMSTART
; GFX11-NEXT: ; def s34		; GFX11-NEXT: ; def s34
; GFX11-NEXT: ;;#ASMEND		; GFX11-NEXT: ;;#ASMEND
; GFX11-NEXT: v_writelane_b32 v40, s30, 1
; GFX11-NEXT: s_mov_b32 s4, s34		; GFX11-NEXT: s_mov_b32 s4, s34
		; GFX11-NEXT: v_writelane_b32 v40, s30, 1
; GFX11-NEXT: v_writelane_b32 v40, s31, 2		; GFX11-NEXT: v_writelane_b32 v40, s31, 2
; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX11-NEXT: s_mov_b32 s34, s4		; GFX11-NEXT: s_mov_b32 s34, s4
; GFX11-NEXT: ;;#ASMSTART		; GFX11-NEXT: ;;#ASMSTART
; GFX11-NEXT: ; use s34		; GFX11-NEXT: ; use s34
; GFX11-NEXT: ;;#ASMEND		; GFX11-NEXT: ;;#ASMEND
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
; GFX11-NEXT: v_readlane_b32 s31, v40, 2		; GFX11-NEXT: v_readlane_b32 s31, v40, 2
Show All 20 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr41
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v41, s30, 0		; GFX9-NEXT: v_writelane_b32 v41, s30, 0
; GFX9-NEXT: v_writelane_b32 v42, s34, 0		; GFX9-NEXT: v_writelane_b32 v42, s34, 0
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: v_writelane_b32 v41, s31, 1		; GFX9-NEXT: v_writelane_b32 v41, s31, 1
; GFX9-NEXT: ;;#ASMSTART		; GFX9-NEXT: ;;#ASMSTART
; GFX9-NEXT: ; def v40		; GFX9-NEXT: ; def v40
; GFX9-NEXT: ;;#ASMEND		; GFX9-NEXT: ;;#ASMEND
Show All 23 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_writelane_b32 v41, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr41
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
		; GFX10-NEXT: v_writelane_b32 v41, s30, 0
; GFX10-NEXT: v_writelane_b32 v42, s34, 0		; GFX10-NEXT: v_writelane_b32 v42, s34, 0
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: ;;#ASMSTART		; GFX10-NEXT: ;;#ASMSTART
; GFX10-NEXT: ; def v40		; GFX10-NEXT: ; def v40
; GFX10-NEXT: ;;#ASMEND		; GFX10-NEXT: ;;#ASMEND
; GFX10-NEXT: v_writelane_b32 v41, s31, 1
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
		; GFX10-NEXT: v_writelane_b32 v41, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: ;;#ASMSTART		; GFX10-NEXT: ;;#ASMSTART
; GFX10-NEXT: ; use v40		; GFX10-NEXT: ; use v40
; GFX10-NEXT: ;;#ASMEND		; GFX10-NEXT: ;;#ASMEND
; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload		; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
; GFX10-NEXT: v_readlane_b32 s31, v41, 1		; GFX10-NEXT: v_readlane_b32 s31, v41, 1
; GFX10-NEXT: v_readlane_b32 s30, v41, 0		; GFX10-NEXT: v_readlane_b32 s30, v41, 0
; GFX10-NEXT: v_readlane_b32 s34, v42, 0		; GFX10-NEXT: v_readlane_b32 s34, v42, 0
Show All 14 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: scratch_store_b32 off, v42, s33 offset:8		; GFX11-NEXT: scratch_store_b32 off, v42, s33 offset:8
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_writelane_b32 v41, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr41
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
		; GFX11-NEXT: v_writelane_b32 v41, s30, 0
; GFX11-NEXT: v_writelane_b32 v42, s0, 0		; GFX11-NEXT: v_writelane_b32 v42, s0, 0
; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill		; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
; GFX11-NEXT: ;;#ASMSTART		; GFX11-NEXT: ;;#ASMSTART
; GFX11-NEXT: ; def v40		; GFX11-NEXT: ; def v40
; GFX11-NEXT: ;;#ASMEND		; GFX11-NEXT: ;;#ASMEND
; GFX11-NEXT: v_writelane_b32 v41, s31, 1
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_void@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_void@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_void@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_void@rel32@hi+12
; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)		; GFX11-NEXT: v_writelane_b32 v41, s31, 1
; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX11-NEXT: ;;#ASMSTART		; GFX11-NEXT: ;;#ASMSTART
; GFX11-NEXT: ; use v40		; GFX11-NEXT: ; use v40
; GFX11-NEXT: ;;#ASMEND		; GFX11-NEXT: ;;#ASMEND
; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload		; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
; GFX11-NEXT: v_readlane_b32 s31, v41, 1		; GFX11-NEXT: v_readlane_b32 s31, v41, 1
; GFX11-NEXT: v_readlane_b32 s30, v41, 0		; GFX11-NEXT: v_readlane_b32 s30, v41, 0
; GFX11-NEXT: v_readlane_b32 s0, v42, 0		; GFX11-NEXT: v_readlane_b32 s0, v42, 0
Show All 11 Lines	; GFX11-NEXT: s_setpc_b64 s[30:31]
call void asm sideeffect "; use $0", "{v40}"(i32 %v40)		call void asm sideeffect "; use $0", "{v40}"(i32 %v40)
ret void		ret void
}		}

define hidden void @void_func_void_clobber_s33() #1 {		define hidden void @void_func_void_clobber_s33() #1 {
; GFX9-LABEL: void_func_void_clobber_s33:		; GFX9-LABEL: void_func_void_clobber_s33:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_xor_saveexec_b64 s[4:5], -1		; GFX9-NEXT: s_xor_saveexec_b64 s[6:7], -1
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[4:5]		; GFX9-NEXT: s_mov_b64 exec, s[6:7]
		; GFX9-NEXT: ; implicit-def: $vgpr0
; GFX9-NEXT: v_writelane_b32 v0, s33, 0		; GFX9-NEXT: v_writelane_b32 v0, s33, 0
; GFX9-NEXT: ;;#ASMSTART		; GFX9-NEXT: ;;#ASMSTART
; GFX9-NEXT: ; clobber		; GFX9-NEXT: ; clobber
; GFX9-NEXT: ;;#ASMEND		; GFX9-NEXT: ;;#ASMEND
; GFX9-NEXT: v_readlane_b32 s33, v0, 0		; GFX9-NEXT: v_readlane_b32 s33, v0, 0
; GFX9-NEXT: s_xor_saveexec_b64 s[4:5], -1		; GFX9-NEXT: s_xor_saveexec_b64 s[6:7], -1
; GFX9-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload		; GFX9-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
; GFX9-NEXT: s_mov_b64 exec, s[4:5]		; GFX9-NEXT: s_mov_b64 exec, s[6:7]
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: s_setpc_b64 s[30:31]		; GFX9-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-LABEL: void_func_void_clobber_s33:		; GFX10-LABEL: void_func_void_clobber_s33:
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_xor_saveexec_b32 s4, -1		; GFX10-NEXT: s_xor_saveexec_b32 s5, -1
; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s4		; GFX10-NEXT: s_mov_b32 exec_lo, s5
		; GFX10-NEXT: ; implicit-def: $vgpr0
; GFX10-NEXT: v_writelane_b32 v0, s33, 0		; GFX10-NEXT: v_writelane_b32 v0, s33, 0
; GFX10-NEXT: ;;#ASMSTART		; GFX10-NEXT: ;;#ASMSTART
; GFX10-NEXT: ; clobber		; GFX10-NEXT: ; clobber
; GFX10-NEXT: ;;#ASMEND		; GFX10-NEXT: ;;#ASMEND
; GFX10-NEXT: v_readlane_b32 s33, v0, 0		; GFX10-NEXT: v_readlane_b32 s33, v0, 0
; GFX10-NEXT: s_xor_saveexec_b32 s4, -1		; GFX10-NEXT: s_xor_saveexec_b32 s5, -1
; GFX10-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload		; GFX10-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s4		; GFX10-NEXT: s_mov_b32 exec_lo, s5
; GFX10-NEXT: s_waitcnt vmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_setpc_b64 s[30:31]		; GFX10-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX11-LABEL: void_func_void_clobber_s33:		; GFX11-LABEL: void_func_void_clobber_s33:
; GFX11: ; %bb.0:		; GFX11: ; %bb.0:
; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_xor_saveexec_b32 s0, -1		; GFX11-NEXT: s_xor_saveexec_b32 s1, -1
; GFX11-NEXT: scratch_store_b32 off, v0, s32 ; 4-byte Folded Spill		; GFX11-NEXT: scratch_store_b32 off, v0, s32 ; 4-byte Folded Spill
; GFX11-NEXT: s_mov_b32 exec_lo, s0		; GFX11-NEXT: s_mov_b32 exec_lo, s1
		; GFX11-NEXT: ; implicit-def: $vgpr0
; GFX11-NEXT: v_writelane_b32 v0, s33, 0		; GFX11-NEXT: v_writelane_b32 v0, s33, 0
; GFX11-NEXT: ;;#ASMSTART		; GFX11-NEXT: ;;#ASMSTART
; GFX11-NEXT: ; clobber		; GFX11-NEXT: ; clobber
; GFX11-NEXT: ;;#ASMEND		; GFX11-NEXT: ;;#ASMEND
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
; GFX11-NEXT: v_readlane_b32 s33, v0, 0		; GFX11-NEXT: v_readlane_b32 s33, v0, 0
; GFX11-NEXT: s_xor_saveexec_b32 s0, -1		; GFX11-NEXT: s_xor_saveexec_b32 s1, -1
; GFX11-NEXT: scratch_load_b32 v0, off, s32 ; 4-byte Folded Reload		; GFX11-NEXT: scratch_load_b32 v0, off, s32 ; 4-byte Folded Reload
; GFX11-NEXT: s_mov_b32 exec_lo, s0		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: s_waitcnt vmcnt(0)		; GFX11-NEXT: s_waitcnt vmcnt(0)
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_setpc_b64 s[30:31]		; GFX11-NEXT: s_setpc_b64 s[30:31]
call void asm sideeffect "; clobber", "~{s33}"() #0		call void asm sideeffect "; clobber", "~{s33}"() #0
ret void		ret void
}		}

define hidden void @void_func_void_clobber_s34() #1 {		define hidden void @void_func_void_clobber_s34() #1 {
; GFX9-LABEL: void_func_void_clobber_s34:		; GFX9-LABEL: void_func_void_clobber_s34:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_xor_saveexec_b64 s[4:5], -1		; GFX9-NEXT: s_xor_saveexec_b64 s[6:7], -1
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[4:5]		; GFX9-NEXT: s_mov_b64 exec, s[6:7]
		; GFX9-NEXT: ; implicit-def: $vgpr0
; GFX9-NEXT: v_writelane_b32 v0, s34, 0		; GFX9-NEXT: v_writelane_b32 v0, s34, 0
; GFX9-NEXT: ;;#ASMSTART		; GFX9-NEXT: ;;#ASMSTART
; GFX9-NEXT: ; clobber		; GFX9-NEXT: ; clobber
; GFX9-NEXT: ;;#ASMEND		; GFX9-NEXT: ;;#ASMEND
; GFX9-NEXT: v_readlane_b32 s34, v0, 0		; GFX9-NEXT: v_readlane_b32 s34, v0, 0
; GFX9-NEXT: s_xor_saveexec_b64 s[4:5], -1		; GFX9-NEXT: s_xor_saveexec_b64 s[6:7], -1
; GFX9-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload		; GFX9-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
; GFX9-NEXT: s_mov_b64 exec, s[4:5]		; GFX9-NEXT: s_mov_b64 exec, s[6:7]
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: s_setpc_b64 s[30:31]		; GFX9-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-LABEL: void_func_void_clobber_s34:		; GFX10-LABEL: void_func_void_clobber_s34:
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_xor_saveexec_b32 s4, -1		; GFX10-NEXT: s_xor_saveexec_b32 s5, -1
; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s4		; GFX10-NEXT: s_mov_b32 exec_lo, s5
		; GFX10-NEXT: ; implicit-def: $vgpr0
; GFX10-NEXT: v_writelane_b32 v0, s34, 0		; GFX10-NEXT: v_writelane_b32 v0, s34, 0
; GFX10-NEXT: ;;#ASMSTART		; GFX10-NEXT: ;;#ASMSTART
; GFX10-NEXT: ; clobber		; GFX10-NEXT: ; clobber
; GFX10-NEXT: ;;#ASMEND		; GFX10-NEXT: ;;#ASMEND
; GFX10-NEXT: v_readlane_b32 s34, v0, 0		; GFX10-NEXT: v_readlane_b32 s34, v0, 0
; GFX10-NEXT: s_xor_saveexec_b32 s4, -1		; GFX10-NEXT: s_xor_saveexec_b32 s5, -1
; GFX10-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload		; GFX10-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s4		; GFX10-NEXT: s_mov_b32 exec_lo, s5
; GFX10-NEXT: s_waitcnt vmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_setpc_b64 s[30:31]		; GFX10-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX11-LABEL: void_func_void_clobber_s34:		; GFX11-LABEL: void_func_void_clobber_s34:
; GFX11: ; %bb.0:		; GFX11: ; %bb.0:
; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_xor_saveexec_b32 s0, -1		; GFX11-NEXT: s_xor_saveexec_b32 s1, -1
; GFX11-NEXT: scratch_store_b32 off, v0, s32 ; 4-byte Folded Spill		; GFX11-NEXT: scratch_store_b32 off, v0, s32 ; 4-byte Folded Spill
; GFX11-NEXT: s_mov_b32 exec_lo, s0		; GFX11-NEXT: s_mov_b32 exec_lo, s1
		; GFX11-NEXT: ; implicit-def: $vgpr0
; GFX11-NEXT: v_writelane_b32 v0, s34, 0		; GFX11-NEXT: v_writelane_b32 v0, s34, 0
; GFX11-NEXT: ;;#ASMSTART		; GFX11-NEXT: ;;#ASMSTART
; GFX11-NEXT: ; clobber		; GFX11-NEXT: ; clobber
; GFX11-NEXT: ;;#ASMEND		; GFX11-NEXT: ;;#ASMEND
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
; GFX11-NEXT: v_readlane_b32 s34, v0, 0		; GFX11-NEXT: v_readlane_b32 s34, v0, 0
; GFX11-NEXT: s_xor_saveexec_b32 s0, -1		; GFX11-NEXT: s_xor_saveexec_b32 s1, -1
; GFX11-NEXT: scratch_load_b32 v0, off, s32 ; 4-byte Folded Reload		; GFX11-NEXT: scratch_load_b32 v0, off, s32 ; 4-byte Folded Reload
; GFX11-NEXT: s_mov_b32 exec_lo, s0		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: s_waitcnt vmcnt(0)		; GFX11-NEXT: s_waitcnt vmcnt(0)
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_setpc_b64 s[30:31]		; GFX11-NEXT: s_setpc_b64 s[30:31]
call void asm sideeffect "; clobber", "~{s34}"() #0		call void asm sideeffect "; clobber", "~{s34}"() #0
ret void		ret void
}		}

define amdgpu_gfx void @test_call_void_func_void_clobber_s33() #0 {		define amdgpu_gfx void @test_call_void_func_void_clobber_s33() #0 {
; GFX9-LABEL: test_call_void_func_void_clobber_s33:		; GFX9-LABEL: test_call_void_func_void_clobber_s33:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, void_func_void_clobber_s33@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, void_func_void_clobber_s33@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, void_func_void_clobber_s33@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, void_func_void_clobber_s33@rel32@hi+12
; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
Show All 15 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, void_func_void_clobber_s33@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, void_func_void_clobber_s33@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, void_func_void_clobber_s33@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, void_func_void_clobber_s33@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-NEXT: v_readlane_b32 s30, v40, 0
Show All 15 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, void_func_void_clobber_s33@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, void_func_void_clobber_s33@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, void_func_void_clobber_s33@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, void_func_void_clobber_s33@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s31, 1		; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
; GFX11-NEXT: v_readlane_b32 s31, v40, 1		; GFX11-NEXT: v_readlane_b32 s31, v40, 1
Show All 17 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, void_func_void_clobber_s34@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, void_func_void_clobber_s34@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, void_func_void_clobber_s34@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, void_func_void_clobber_s34@rel32@hi+12
; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
Show All 15 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_writelane_b32 v40, s30, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, void_func_void_clobber_s34@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, void_func_void_clobber_s34@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, void_func_void_clobber_s34@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, void_func_void_clobber_s34@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: v_readlane_b32 s31, v40, 1		; GFX10-NEXT: v_readlane_b32 s31, v40, 1
; GFX10-NEXT: v_readlane_b32 s30, v40, 0		; GFX10-NEXT: v_readlane_b32 s30, v40, 0
Show All 15 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_writelane_b32 v40, s30, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
		; GFX11-NEXT: v_writelane_b32 v40, s30, 0
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, void_func_void_clobber_s34@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, void_func_void_clobber_s34@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, void_func_void_clobber_s34@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, void_func_void_clobber_s34@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s31, 1		; GFX11-NEXT: v_writelane_b32 v40, s31, 1
; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
; GFX11-NEXT: v_readlane_b32 s31, v40, 1		; GFX11-NEXT: v_readlane_b32 s31, v40, 1
Show All 17 Lines
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: ; implicit-def: $vgpr40
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
		; GFX9-NEXT: v_writelane_b32 v40, s4, 0
; GFX9-NEXT: v_writelane_b32 v40, s30, 1		; GFX9-NEXT: v_writelane_b32 v40, s30, 1
; GFX9-NEXT: v_writelane_b32 v41, s34, 0		; GFX9-NEXT: v_writelane_b32 v41, s34, 0
; GFX9-NEXT: v_writelane_b32 v40, s31, 2		; GFX9-NEXT: v_writelane_b32 v40, s31, 2
; GFX9-NEXT: ;;#ASMSTART		; GFX9-NEXT: ;;#ASMSTART
; GFX9-NEXT: ; def s40		; GFX9-NEXT: ; def s40
; GFX9-NEXT: ;;#ASMEND		; GFX9-NEXT: ;;#ASMEND
; GFX9-NEXT: s_mov_b32 s4, s40		; GFX9-NEXT: s_mov_b32 s4, s40
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
Show All 22 Lines
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: ; implicit-def: $vgpr40
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
		; GFX10-NEXT: v_writelane_b32 v40, s4, 0
; GFX10-NEXT: v_writelane_b32 v41, s34, 0		; GFX10-NEXT: v_writelane_b32 v41, s34, 0
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
; GFX10-NEXT: ;;#ASMSTART		; GFX10-NEXT: ;;#ASMSTART
; GFX10-NEXT: ; def s40		; GFX10-NEXT: ; def s40
; GFX10-NEXT: ;;#ASMEND		; GFX10-NEXT: ;;#ASMEND
; GFX10-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-NEXT: s_mov_b32 s4, s40		; GFX10-NEXT: s_mov_b32 s4, s40
		; GFX10-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-NEXT: v_writelane_b32 v40, s31, 2		; GFX10-NEXT: v_writelane_b32 v40, s31, 2
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: ;;#ASMSTART		; GFX10-NEXT: ;;#ASMSTART
; GFX10-NEXT: ; use s4		; GFX10-NEXT: ; use s4
; GFX10-NEXT: ;;#ASMEND		; GFX10-NEXT: ;;#ASMEND
; GFX10-NEXT: v_readlane_b32 s31, v40, 2		; GFX10-NEXT: v_readlane_b32 s31, v40, 2
; GFX10-NEXT: v_readlane_b32 s30, v40, 1		; GFX10-NEXT: v_readlane_b32 s30, v40, 1
; GFX10-NEXT: v_readlane_b32 s4, v40, 0		; GFX10-NEXT: v_readlane_b32 s4, v40, 0
Show All 15 Lines
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33		; GFX11-NEXT: scratch_store_b32 off, v40, s33
; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_writelane_b32 v40, s4, 0		; GFX11-NEXT: ; implicit-def: $vgpr40
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
		; GFX11-NEXT: v_writelane_b32 v40, s4, 0
; GFX11-NEXT: v_writelane_b32 v41, s0, 0		; GFX11-NEXT: v_writelane_b32 v41, s0, 0
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_void@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_void@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_void@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_void@rel32@hi+12
; GFX11-NEXT: ;;#ASMSTART		; GFX11-NEXT: ;;#ASMSTART
; GFX11-NEXT: ; def s40		; GFX11-NEXT: ; def s40
; GFX11-NEXT: ;;#ASMEND		; GFX11-NEXT: ;;#ASMEND
; GFX11-NEXT: v_writelane_b32 v40, s30, 1
; GFX11-NEXT: s_mov_b32 s4, s40		; GFX11-NEXT: s_mov_b32 s4, s40
		; GFX11-NEXT: v_writelane_b32 v40, s30, 1
; GFX11-NEXT: v_writelane_b32 v40, s31, 2		; GFX11-NEXT: v_writelane_b32 v40, s31, 2
; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX11-NEXT: ;;#ASMSTART		; GFX11-NEXT: ;;#ASMSTART
; GFX11-NEXT: ; use s4		; GFX11-NEXT: ; use s4
; GFX11-NEXT: ;;#ASMEND		; GFX11-NEXT: ;;#ASMEND
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
; GFX11-NEXT: v_readlane_b32 s31, v40, 2		; GFX11-NEXT: v_readlane_b32 s31, v40, 2
; GFX11-NEXT: v_readlane_b32 s30, v40, 1		; GFX11-NEXT: v_readlane_b32 s30, v40, 1
Show All 16 Lines

define amdgpu_gfx void @callee_saved_sgpr_vgpr_kernel() #1 {		define amdgpu_gfx void @callee_saved_sgpr_vgpr_kernel() #1 {
; GFX9-LABEL: callee_saved_sgpr_vgpr_kernel:		; GFX9-LABEL: callee_saved_sgpr_vgpr_kernel:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s34, s33		; GFX9-NEXT: s_mov_b32 s34, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
; GFX9-NEXT: v_writelane_b32 v40, s4, 0		; GFX9-NEXT: ; implicit-def: $vgpr41
; GFX9-NEXT: s_addk_i32 s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: v_writelane_b32 v40, s30, 1		; GFX9-NEXT: v_writelane_b32 v41, s4, 0
		; GFX9-NEXT: v_writelane_b32 v41, s30, 1
; GFX9-NEXT: v_writelane_b32 v42, s34, 0		; GFX9-NEXT: v_writelane_b32 v42, s34, 0
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: v_writelane_b32 v40, s31, 2		; GFX9-NEXT: v_writelane_b32 v41, s31, 2
; GFX9-NEXT: ;;#ASMSTART		; GFX9-NEXT: ;;#ASMSTART
; GFX9-NEXT: ; def s40		; GFX9-NEXT: ; def s40
; GFX9-NEXT: ;;#ASMEND		; GFX9-NEXT: ;;#ASMEND
; GFX9-NEXT: s_mov_b32 s4, s40		; GFX9-NEXT: s_mov_b32 s4, s40
; GFX9-NEXT: ;;#ASMSTART		; GFX9-NEXT: ;;#ASMSTART
; GFX9-NEXT: ; def v32		; GFX9-NEXT: ; def v32
; GFX9-NEXT: ;;#ASMEND		; GFX9-NEXT: ;;#ASMEND
; GFX9-NEXT: v_mov_b32_e32 v41, v32		; GFX9-NEXT: v_mov_b32_e32 v40, v32
; GFX9-NEXT: s_getpc_b64 s[34:35]		; GFX9-NEXT: s_getpc_b64 s[34:35]
; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4		; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX9-NEXT: ;;#ASMSTART		; GFX9-NEXT: ;;#ASMSTART
; GFX9-NEXT: ; use s4		; GFX9-NEXT: ; use s4
; GFX9-NEXT: ;;#ASMEND		; GFX9-NEXT: ;;#ASMEND
; GFX9-NEXT: ;;#ASMSTART		; GFX9-NEXT: ;;#ASMSTART
; GFX9-NEXT: ; use v41		; GFX9-NEXT: ; use v40
; GFX9-NEXT: ;;#ASMEND		; GFX9-NEXT: ;;#ASMEND
; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload		; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
; GFX9-NEXT: v_readlane_b32 s31, v40, 2		; GFX9-NEXT: v_readlane_b32 s31, v41, 2
; GFX9-NEXT: v_readlane_b32 s30, v40, 1		; GFX9-NEXT: v_readlane_b32 s30, v41, 1
; GFX9-NEXT: v_readlane_b32 s4, v40, 0		; GFX9-NEXT: v_readlane_b32 s4, v41, 0
; GFX9-NEXT: v_readlane_b32 s34, v42, 0		; GFX9-NEXT: v_readlane_b32 s34, v42, 0
; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload		; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload		; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
; GFX9-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-NEXT: s_mov_b64 exec, s[36:37]
; GFX9-NEXT: s_addk_i32 s32, 0xfc00		; GFX9-NEXT: s_addk_i32 s32, 0xfc00
; GFX9-NEXT: s_mov_b32 s33, s34		; GFX9-NEXT: s_mov_b32 s33, s34
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: s_setpc_b64 s[30:31]		; GFX9-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-LABEL: callee_saved_sgpr_vgpr_kernel:		; GFX10-LABEL: callee_saved_sgpr_vgpr_kernel:
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s34, s33		; GFX10-NEXT: s_mov_b32 s34, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: v_writelane_b32 v40, s4, 0		; GFX10-NEXT: ; implicit-def: $vgpr41
; GFX10-NEXT: s_addk_i32 s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
		; GFX10-NEXT: v_writelane_b32 v41, s4, 0
; GFX10-NEXT: v_writelane_b32 v42, s34, 0		; GFX10-NEXT: v_writelane_b32 v42, s34, 0
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: ;;#ASMSTART		; GFX10-NEXT: ;;#ASMSTART
; GFX10-NEXT: ; def s40		; GFX10-NEXT: ; def s40
; GFX10-NEXT: ;;#ASMEND		; GFX10-NEXT: ;;#ASMEND
; GFX10-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-NEXT: s_mov_b32 s4, s40		; GFX10-NEXT: s_mov_b32 s4, s40
		; GFX10-NEXT: v_writelane_b32 v41, s30, 1
; GFX10-NEXT: ;;#ASMSTART		; GFX10-NEXT: ;;#ASMSTART
; GFX10-NEXT: ; def v32		; GFX10-NEXT: ; def v32
; GFX10-NEXT: ;;#ASMEND		; GFX10-NEXT: ;;#ASMEND
; GFX10-NEXT: v_mov_b32_e32 v41, v32		; GFX10-NEXT: v_mov_b32_e32 v40, v32
; GFX10-NEXT: s_getpc_b64 s[34:35]		; GFX10-NEXT: s_getpc_b64 s[34:35]
; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4		; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
; GFX10-NEXT: v_writelane_b32 v40, s31, 2		; GFX10-NEXT: v_writelane_b32 v41, s31, 2
; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX10-NEXT: ;;#ASMSTART		; GFX10-NEXT: ;;#ASMSTART
; GFX10-NEXT: ; use s4		; GFX10-NEXT: ; use s4
; GFX10-NEXT: ;;#ASMEND		; GFX10-NEXT: ;;#ASMEND
; GFX10-NEXT: ;;#ASMSTART		; GFX10-NEXT: ;;#ASMSTART
; GFX10-NEXT: ; use v41		; GFX10-NEXT: ; use v40
; GFX10-NEXT: ;;#ASMEND		; GFX10-NEXT: ;;#ASMEND
; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload		; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
; GFX10-NEXT: v_readlane_b32 s31, v40, 2		; GFX10-NEXT: v_readlane_b32 s31, v41, 2
; GFX10-NEXT: v_readlane_b32 s30, v40, 1		; GFX10-NEXT: v_readlane_b32 s30, v41, 1
; GFX10-NEXT: v_readlane_b32 s4, v40, 0		; GFX10-NEXT: v_readlane_b32 s4, v41, 0
; GFX10-NEXT: v_readlane_b32 s34, v42, 0		; GFX10-NEXT: v_readlane_b32 s34, v42, 0
; GFX10-NEXT: s_or_saveexec_b32 s35, -1		; GFX10-NEXT: s_or_saveexec_b32 s35, -1
; GFX10-NEXT: s_clause 0x1		; GFX10-NEXT: s_clause 0x1
; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4		; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:8		; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:8
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s35		; GFX10-NEXT: s_mov_b32 exec_lo, s35
; GFX10-NEXT: s_addk_i32 s32, 0xfe00		; GFX10-NEXT: s_addk_i32 s32, 0xfe00
; GFX10-NEXT: s_mov_b32 s33, s34		; GFX10-NEXT: s_mov_b32 s33, s34
; GFX10-NEXT: s_waitcnt vmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0)
; GFX10-NEXT: s_setpc_b64 s[30:31]		; GFX10-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX11-LABEL: callee_saved_sgpr_vgpr_kernel:		; GFX11-LABEL: callee_saved_sgpr_vgpr_kernel:
; GFX11: ; %bb.0:		; GFX11: ; %bb.0:
; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: s_mov_b32 s0, s33		; GFX11-NEXT: s_mov_b32 s0, s33
; GFX11-NEXT: s_mov_b32 s33, s32		; GFX11-NEXT: s_mov_b32 s33, s32
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_store_b32 off, v40, s33 offset:4		; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:4
; GFX11-NEXT: scratch_store_b32 off, v42, s33 offset:8		; GFX11-NEXT: scratch_store_b32 off, v42, s33 offset:8
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: v_writelane_b32 v40, s4, 0		; GFX11-NEXT: ; implicit-def: $vgpr41
; GFX11-NEXT: s_add_i32 s32, s32, 16		; GFX11-NEXT: s_add_i32 s32, s32, 16
		; GFX11-NEXT: v_writelane_b32 v41, s4, 0
; GFX11-NEXT: v_writelane_b32 v42, s0, 0		; GFX11-NEXT: v_writelane_b32 v42, s0, 0
; GFX11-NEXT: scratch_store_b32 off, v41, s33 ; 4-byte Folded Spill		; GFX11-NEXT: scratch_store_b32 off, v40, s33 ; 4-byte Folded Spill
; GFX11-NEXT: ;;#ASMSTART		; GFX11-NEXT: ;;#ASMSTART
; GFX11-NEXT: ; def s40		; GFX11-NEXT: ; def s40
; GFX11-NEXT: ;;#ASMEND		; GFX11-NEXT: ;;#ASMEND
; GFX11-NEXT: v_writelane_b32 v40, s30, 1
; GFX11-NEXT: s_mov_b32 s4, s40		; GFX11-NEXT: s_mov_b32 s4, s40
		; GFX11-NEXT: v_writelane_b32 v41, s30, 1
; GFX11-NEXT: ;;#ASMSTART		; GFX11-NEXT: ;;#ASMSTART
; GFX11-NEXT: ; def v32		; GFX11-NEXT: ; def v32
; GFX11-NEXT: ;;#ASMEND		; GFX11-NEXT: ;;#ASMEND
; GFX11-NEXT: v_mov_b32_e32 v41, v32		; GFX11-NEXT: v_mov_b32_e32 v40, v32
; GFX11-NEXT: s_getpc_b64 s[0:1]		; GFX11-NEXT: s_getpc_b64 s[0:1]
; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_void@rel32@lo+4		; GFX11-NEXT: s_add_u32 s0, s0, external_void_func_void@rel32@lo+4
; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_void@rel32@hi+12		; GFX11-NEXT: s_addc_u32 s1, s1, external_void_func_void@rel32@hi+12
; GFX11-NEXT: v_writelane_b32 v40, s31, 2		; GFX11-NEXT: v_writelane_b32 v41, s31, 2
; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]		; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
; GFX11-NEXT: ;;#ASMSTART		; GFX11-NEXT: ;;#ASMSTART
; GFX11-NEXT: ; use s4		; GFX11-NEXT: ; use s4
; GFX11-NEXT: ;;#ASMEND		; GFX11-NEXT: ;;#ASMEND
; GFX11-NEXT: ;;#ASMSTART		; GFX11-NEXT: ;;#ASMSTART
; GFX11-NEXT: ; use v41		; GFX11-NEXT: ; use v40
; GFX11-NEXT: ;;#ASMEND		; GFX11-NEXT: ;;#ASMEND
; GFX11-NEXT: scratch_load_b32 v41, off, s33 ; 4-byte Folded Reload		; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
; GFX11-NEXT: v_readlane_b32 s31, v40, 2		; GFX11-NEXT: v_readlane_b32 s31, v41, 2
; GFX11-NEXT: v_readlane_b32 s30, v40, 1		; GFX11-NEXT: v_readlane_b32 s30, v41, 1
; GFX11-NEXT: v_readlane_b32 s4, v40, 0		; GFX11-NEXT: v_readlane_b32 s4, v41, 0
; GFX11-NEXT: v_readlane_b32 s0, v42, 0		; GFX11-NEXT: v_readlane_b32 s0, v42, 0
; GFX11-NEXT: s_or_saveexec_b32 s1, -1		; GFX11-NEXT: s_or_saveexec_b32 s1, -1
; GFX11-NEXT: s_clause 0x1		; GFX11-NEXT: s_clause 0x1
; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:4		; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
; GFX11-NEXT: scratch_load_b32 v42, off, s33 offset:8		; GFX11-NEXT: scratch_load_b32 v42, off, s33 offset:8
; GFX11-NEXT: s_mov_b32 exec_lo, s1		; GFX11-NEXT: s_mov_b32 exec_lo, s1
; GFX11-NEXT: s_add_i32 s32, s32, -16		; GFX11-NEXT: s_add_i32 s32, s32, -16
; GFX11-NEXT: s_mov_b32 s33, s0		; GFX11-NEXT: s_mov_b32 s33, s0
; GFX11-NEXT: s_waitcnt vmcnt(0)		; GFX11-NEXT: s_waitcnt vmcnt(0)
; GFX11-NEXT: s_setpc_b64 s[30:31]		; GFX11-NEXT: s_setpc_b64 s[30:31]
%s40 = call i32 asm sideeffect "; def s40", "={s40}"() #0		%s40 = call i32 asm sideeffect "; def s40", "={s40}"() #0
%v32 = call i32 asm sideeffect "; def v32", "={v32}"() #0		%v32 = call i32 asm sideeffect "; def v32", "={v32}"() #0
call amdgpu_gfx void @external_void_func_void()		call amdgpu_gfx void @external_void_func_void()
call void asm sideeffect "; use $0", "s"(i32 %s40) #0		call void asm sideeffect "; use $0", "s"(i32 %s40) #0
call void asm sideeffect "; use $0", "v"(i32 %v32) #0		call void asm sideeffect "; use $0", "v"(i32 %v32) #0
ret void		ret void
}		}

attributes #0 = { nounwind }		attributes #0 = { nounwind }
attributes #1 = { nounwind noinline }		attributes #1 = { nounwind noinline }

llvm/test/CodeGen/AMDGPU/gfx-callable-return-types.ll

	Show All 28 Lines
	; GFX9-NEXT: s_xor_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_xor_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v1, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v1, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, return_i1@gotpcrel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, return_i1@gotpcrel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, return_i1@gotpcrel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, return_i1@gotpcrel32@hi+12
	; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
				; GFX9-NEXT: ; implicit-def: $vgpr1
	; GFX9-NEXT: v_writelane_b32 v1, s30, 0			; GFX9-NEXT: v_writelane_b32 v1, s30, 0
	; GFX9-NEXT: v_writelane_b32 v1, s31, 1			; GFX9-NEXT: v_writelane_b32 v1, s31, 1
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v1, 1			; GFX9-NEXT: v_readlane_b32 s31, v1, 1
	; GFX9-NEXT: v_readlane_b32 s30, v1, 0			; GFX9-NEXT: v_readlane_b32 s30, v1, 0
	; GFX9-NEXT: s_xor_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_xor_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v1, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v1, off, s[0:3], s33 ; 4-byte Folded Reload
	Show All 12 Lines
	; GFX10-NEXT: s_xor_saveexec_b32 s34, -1			; GFX10-NEXT: s_xor_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, return_i1@gotpcrel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, return_i1@gotpcrel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, return_i1@gotpcrel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, return_i1@gotpcrel32@hi+12
				; GFX10-NEXT: ; implicit-def: $vgpr1
	; GFX10-NEXT: v_writelane_b32 v1, s30, 0			; GFX10-NEXT: v_writelane_b32 v1, s30, 0
	; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX10-NEXT: v_writelane_b32 v1, s31, 1			; GFX10-NEXT: v_writelane_b32 v1, s31, 1
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v1, 1			; GFX10-NEXT: v_readlane_b32 s31, v1, 1
	; GFX10-NEXT: v_readlane_b32 s30, v1, 0			; GFX10-NEXT: v_readlane_b32 s30, v1, 0
	; GFX10-NEXT: s_xor_saveexec_b32 s34, -1			; GFX10-NEXT: s_xor_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v1, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v1, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s36			; GFX10-NEXT: s_mov_b32 s33, s36
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: call_i1:			; GFX11-LABEL: call_i1:
	; GFX11: ; %bb.0: ; %entry			; GFX11: ; %bb.0: ; %entry
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_mov_b32 s2, s33			; GFX11-NEXT: s_mov_b32 s3, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_xor_saveexec_b32 s0, -1			; GFX11-NEXT: s_xor_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v1, s33 ; 4-byte Folded Spill			; GFX11-NEXT: scratch_store_b32 off, v1, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, return_i1@gotpcrel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, return_i1@gotpcrel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, return_i1@gotpcrel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, return_i1@gotpcrel32@hi+12
				; GFX11-NEXT: ; implicit-def: $vgpr1
	; GFX11-NEXT: v_writelane_b32 v1, s30, 0			; GFX11-NEXT: v_writelane_b32 v1, s30, 0
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
	; GFX11-NEXT: v_writelane_b32 v1, s31, 1			; GFX11-NEXT: v_writelane_b32 v1, s31, 1
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v1, 1			; GFX11-NEXT: v_readlane_b32 s31, v1, 1
	; GFX11-NEXT: v_readlane_b32 s30, v1, 0			; GFX11-NEXT: v_readlane_b32 s30, v1, 0
	; GFX11-NEXT: s_xor_saveexec_b32 s0, -1			; GFX11-NEXT: s_xor_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v1, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v1, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s2			; GFX11-NEXT: s_mov_b32 s33, s3
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	call amdgpu_gfx i1 @return_i1()			call amdgpu_gfx i1 @return_i1()
	ret void			ret void
	}			}

	define amdgpu_gfx i16 @return_i16() #0 {			define amdgpu_gfx i16 @return_i16() #0 {
	Show All 22 Lines
	; GFX9-NEXT: s_xor_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_xor_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v1, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v1, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, return_i16@gotpcrel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, return_i16@gotpcrel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, return_i16@gotpcrel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, return_i16@gotpcrel32@hi+12
	; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
				; GFX9-NEXT: ; implicit-def: $vgpr1
	; GFX9-NEXT: v_writelane_b32 v1, s30, 0			; GFX9-NEXT: v_writelane_b32 v1, s30, 0
	; GFX9-NEXT: v_writelane_b32 v1, s31, 1			; GFX9-NEXT: v_writelane_b32 v1, s31, 1
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v1, 1			; GFX9-NEXT: v_readlane_b32 s31, v1, 1
	; GFX9-NEXT: v_readlane_b32 s30, v1, 0			; GFX9-NEXT: v_readlane_b32 s30, v1, 0
	; GFX9-NEXT: s_xor_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_xor_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v1, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v1, off, s[0:3], s33 ; 4-byte Folded Reload
	Show All 12 Lines
	; GFX10-NEXT: s_xor_saveexec_b32 s34, -1			; GFX10-NEXT: s_xor_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, return_i16@gotpcrel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, return_i16@gotpcrel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, return_i16@gotpcrel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, return_i16@gotpcrel32@hi+12
				; GFX10-NEXT: ; implicit-def: $vgpr1
	; GFX10-NEXT: v_writelane_b32 v1, s30, 0			; GFX10-NEXT: v_writelane_b32 v1, s30, 0
	; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX10-NEXT: v_writelane_b32 v1, s31, 1			; GFX10-NEXT: v_writelane_b32 v1, s31, 1
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v1, 1			; GFX10-NEXT: v_readlane_b32 s31, v1, 1
	; GFX10-NEXT: v_readlane_b32 s30, v1, 0			; GFX10-NEXT: v_readlane_b32 s30, v1, 0
	; GFX10-NEXT: s_xor_saveexec_b32 s34, -1			; GFX10-NEXT: s_xor_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v1, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v1, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s36			; GFX10-NEXT: s_mov_b32 s33, s36
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: call_i16:			; GFX11-LABEL: call_i16:
	; GFX11: ; %bb.0: ; %entry			; GFX11: ; %bb.0: ; %entry
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_mov_b32 s2, s33			; GFX11-NEXT: s_mov_b32 s3, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_xor_saveexec_b32 s0, -1			; GFX11-NEXT: s_xor_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v1, s33 ; 4-byte Folded Spill			; GFX11-NEXT: scratch_store_b32 off, v1, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, return_i16@gotpcrel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, return_i16@gotpcrel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, return_i16@gotpcrel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, return_i16@gotpcrel32@hi+12
				; GFX11-NEXT: ; implicit-def: $vgpr1
	; GFX11-NEXT: v_writelane_b32 v1, s30, 0			; GFX11-NEXT: v_writelane_b32 v1, s30, 0
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
	; GFX11-NEXT: v_writelane_b32 v1, s31, 1			; GFX11-NEXT: v_writelane_b32 v1, s31, 1
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v1, 1			; GFX11-NEXT: v_readlane_b32 s31, v1, 1
	; GFX11-NEXT: v_readlane_b32 s30, v1, 0			; GFX11-NEXT: v_readlane_b32 s30, v1, 0
	; GFX11-NEXT: s_xor_saveexec_b32 s0, -1			; GFX11-NEXT: s_xor_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v1, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v1, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s2			; GFX11-NEXT: s_mov_b32 s33, s3
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	call amdgpu_gfx i16 @return_i16()			call amdgpu_gfx i16 @return_i16()
	ret void			ret void
	}			}

	define amdgpu_gfx <2 x i16> @return_2xi16() #0 {			define amdgpu_gfx <2 x i16> @return_2xi16() #0 {
	Show All 22 Lines
	; GFX9-NEXT: s_xor_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_xor_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v1, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v1, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, return_2xi16@gotpcrel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, return_2xi16@gotpcrel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, return_2xi16@gotpcrel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, return_2xi16@gotpcrel32@hi+12
	; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
				; GFX9-NEXT: ; implicit-def: $vgpr1
	; GFX9-NEXT: v_writelane_b32 v1, s30, 0			; GFX9-NEXT: v_writelane_b32 v1, s30, 0
	; GFX9-NEXT: v_writelane_b32 v1, s31, 1			; GFX9-NEXT: v_writelane_b32 v1, s31, 1
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v1, 1			; GFX9-NEXT: v_readlane_b32 s31, v1, 1
	; GFX9-NEXT: v_readlane_b32 s30, v1, 0			; GFX9-NEXT: v_readlane_b32 s30, v1, 0
	; GFX9-NEXT: s_xor_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_xor_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v1, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v1, off, s[0:3], s33 ; 4-byte Folded Reload
	Show All 12 Lines
	; GFX10-NEXT: s_xor_saveexec_b32 s34, -1			; GFX10-NEXT: s_xor_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, return_2xi16@gotpcrel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, return_2xi16@gotpcrel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, return_2xi16@gotpcrel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, return_2xi16@gotpcrel32@hi+12
				; GFX10-NEXT: ; implicit-def: $vgpr1
	; GFX10-NEXT: v_writelane_b32 v1, s30, 0			; GFX10-NEXT: v_writelane_b32 v1, s30, 0
	; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX10-NEXT: v_writelane_b32 v1, s31, 1			; GFX10-NEXT: v_writelane_b32 v1, s31, 1
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v1, 1			; GFX10-NEXT: v_readlane_b32 s31, v1, 1
	; GFX10-NEXT: v_readlane_b32 s30, v1, 0			; GFX10-NEXT: v_readlane_b32 s30, v1, 0
	; GFX10-NEXT: s_xor_saveexec_b32 s34, -1			; GFX10-NEXT: s_xor_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v1, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v1, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s36			; GFX10-NEXT: s_mov_b32 s33, s36
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: call_2xi16:			; GFX11-LABEL: call_2xi16:
	; GFX11: ; %bb.0: ; %entry			; GFX11: ; %bb.0: ; %entry
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_mov_b32 s2, s33			; GFX11-NEXT: s_mov_b32 s3, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_xor_saveexec_b32 s0, -1			; GFX11-NEXT: s_xor_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v1, s33 ; 4-byte Folded Spill			; GFX11-NEXT: scratch_store_b32 off, v1, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, return_2xi16@gotpcrel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, return_2xi16@gotpcrel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, return_2xi16@gotpcrel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, return_2xi16@gotpcrel32@hi+12
				; GFX11-NEXT: ; implicit-def: $vgpr1
	; GFX11-NEXT: v_writelane_b32 v1, s30, 0			; GFX11-NEXT: v_writelane_b32 v1, s30, 0
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
	; GFX11-NEXT: v_writelane_b32 v1, s31, 1			; GFX11-NEXT: v_writelane_b32 v1, s31, 1
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v1, 1			; GFX11-NEXT: v_readlane_b32 s31, v1, 1
	; GFX11-NEXT: v_readlane_b32 s30, v1, 0			; GFX11-NEXT: v_readlane_b32 s30, v1, 0
	; GFX11-NEXT: s_xor_saveexec_b32 s0, -1			; GFX11-NEXT: s_xor_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v1, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v1, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s2			; GFX11-NEXT: s_mov_b32 s33, s3
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	call amdgpu_gfx <2 x i16> @return_2xi16()			call amdgpu_gfx <2 x i16> @return_2xi16()
	ret void			ret void
	}			}

	define amdgpu_gfx <3 x i16> @return_3xi16() #0 {			define amdgpu_gfx <3 x i16> @return_3xi16() #0 {
	Show All 31 Lines
	; GFX9-NEXT: s_xor_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_xor_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v2, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v2, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, return_3xi16@gotpcrel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, return_3xi16@gotpcrel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, return_3xi16@gotpcrel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, return_3xi16@gotpcrel32@hi+12
	; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
				; GFX9-NEXT: ; implicit-def: $vgpr2
	; GFX9-NEXT: v_writelane_b32 v2, s30, 0			; GFX9-NEXT: v_writelane_b32 v2, s30, 0
	; GFX9-NEXT: v_writelane_b32 v2, s31, 1			; GFX9-NEXT: v_writelane_b32 v2, s31, 1
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v2, 1			; GFX9-NEXT: v_readlane_b32 s31, v2, 1
	; GFX9-NEXT: v_readlane_b32 s30, v2, 0			; GFX9-NEXT: v_readlane_b32 s30, v2, 0
	; GFX9-NEXT: s_xor_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_xor_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v2, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v2, off, s[0:3], s33 ; 4-byte Folded Reload
	Show All 12 Lines
	; GFX10-NEXT: s_xor_saveexec_b32 s34, -1			; GFX10-NEXT: s_xor_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, return_3xi16@gotpcrel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, return_3xi16@gotpcrel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, return_3xi16@gotpcrel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, return_3xi16@gotpcrel32@hi+12
				; GFX10-NEXT: ; implicit-def: $vgpr2
	; GFX10-NEXT: v_writelane_b32 v2, s30, 0			; GFX10-NEXT: v_writelane_b32 v2, s30, 0
	; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX10-NEXT: v_writelane_b32 v2, s31, 1			; GFX10-NEXT: v_writelane_b32 v2, s31, 1
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v2, 1			; GFX10-NEXT: v_readlane_b32 s31, v2, 1
	; GFX10-NEXT: v_readlane_b32 s30, v2, 0			; GFX10-NEXT: v_readlane_b32 s30, v2, 0
	; GFX10-NEXT: s_xor_saveexec_b32 s34, -1			; GFX10-NEXT: s_xor_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v2, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v2, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: s_mov_b32 s33, s36			; GFX10-NEXT: s_mov_b32 s33, s36
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: call_3xi16:			; GFX11-LABEL: call_3xi16:
	; GFX11: ; %bb.0: ; %entry			; GFX11: ; %bb.0: ; %entry
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_mov_b32 s2, s33			; GFX11-NEXT: s_mov_b32 s3, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_xor_saveexec_b32 s0, -1			; GFX11-NEXT: s_xor_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v2, s33 ; 4-byte Folded Spill			; GFX11-NEXT: scratch_store_b32 off, v2, s33 ; 4-byte Folded Spill
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, return_3xi16@gotpcrel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, return_3xi16@gotpcrel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, return_3xi16@gotpcrel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, return_3xi16@gotpcrel32@hi+12
				; GFX11-NEXT: ; implicit-def: $vgpr2
	; GFX11-NEXT: v_writelane_b32 v2, s30, 0			; GFX11-NEXT: v_writelane_b32 v2, s30, 0
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
	; GFX11-NEXT: v_writelane_b32 v2, s31, 1			; GFX11-NEXT: v_writelane_b32 v2, s31, 1
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v2, 1			; GFX11-NEXT: v_readlane_b32 s31, v2, 1
	; GFX11-NEXT: v_readlane_b32 s30, v2, 0			; GFX11-NEXT: v_readlane_b32 s30, v2, 0
	; GFX11-NEXT: s_xor_saveexec_b32 s0, -1			; GFX11-NEXT: s_xor_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v2, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v2, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_add_i32 s32, s32, -16
	; GFX11-NEXT: s_mov_b32 s33, s2			; GFX11-NEXT: s_mov_b32 s33, s3
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	call amdgpu_gfx <3 x i16> @return_3xi16()			call amdgpu_gfx <3 x i16> @return_3xi16()
	ret void			ret void
	}			}

	; Check that return values that do not fit in registers do not crash			; Check that return values that do not fit in registers do not crash
	▲ Show 20 Lines • Show All 1,196 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: s_xor_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_xor_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v2, off, s[0:3], s33 offset:2048 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v2, off, s[0:3], s33 offset:2048 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_add_i32 s32, s32, 0x60000			; GFX9-NEXT: s_add_i32 s32, s32, 0x60000
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, return_512xi32@gotpcrel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, return_512xi32@gotpcrel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, return_512xi32@gotpcrel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, return_512xi32@gotpcrel32@hi+12
	; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX9-NEXT: v_writelane_b32 v2, s30, 0			; GFX9-NEXT: ; implicit-def: $vgpr2
	; GFX9-NEXT: v_lshrrev_b32_e64 v0, 6, s33			; GFX9-NEXT: v_lshrrev_b32_e64 v0, 6, s33
				; GFX9-NEXT: v_writelane_b32 v2, s30, 0
	; GFX9-NEXT: v_writelane_b32 v2, s31, 1			; GFX9-NEXT: v_writelane_b32 v2, s31, 1
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v2, 1			; GFX9-NEXT: v_readlane_b32 s31, v2, 1
	; GFX9-NEXT: v_readlane_b32 s30, v2, 0			; GFX9-NEXT: v_readlane_b32 s30, v2, 0
	; GFX9-NEXT: s_xor_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_xor_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v2, off, s[0:3], s33 offset:2048 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v2, off, s[0:3], s33 offset:2048 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	Show All 12 Lines
	; GFX10-NEXT: s_xor_saveexec_b32 s34, -1			; GFX10-NEXT: s_xor_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s33 offset:2048 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s33 offset:2048 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_add_i32 s32, s32, 0x30000			; GFX10-NEXT: s_add_i32 s32, s32, 0x30000
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, return_512xi32@gotpcrel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, return_512xi32@gotpcrel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, return_512xi32@gotpcrel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, return_512xi32@gotpcrel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v2, s30, 0			; GFX10-NEXT: ; implicit-def: $vgpr2
	; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX10-NEXT: v_lshrrev_b32_e64 v0, 5, s33			; GFX10-NEXT: v_lshrrev_b32_e64 v0, 5, s33
				; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
				; GFX10-NEXT: v_writelane_b32 v2, s30, 0
	; GFX10-NEXT: v_writelane_b32 v2, s31, 1			; GFX10-NEXT: v_writelane_b32 v2, s31, 1
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v2, 1			; GFX10-NEXT: v_readlane_b32 s31, v2, 1
	; GFX10-NEXT: v_readlane_b32 s30, v2, 0			; GFX10-NEXT: v_readlane_b32 s30, v2, 0
	; GFX10-NEXT: s_xor_saveexec_b32 s34, -1			; GFX10-NEXT: s_xor_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v2, off, s[0:3], s33 offset:2048 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v2, off, s[0:3], s33 offset:2048 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	Show All 13 Lines
	; GFX11-NEXT: s_and_b32 s33, s33, 0xfffff800			; GFX11-NEXT: s_and_b32 s33, s33, 0xfffff800
	; GFX11-NEXT: s_xor_saveexec_b32 s0, -1			; GFX11-NEXT: s_xor_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_store_b32 off, v5, s33 offset:2048 ; 4-byte Folded Spill			; GFX11-NEXT: scratch_store_b32 off, v5, s33 offset:2048 ; 4-byte Folded Spill
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_addk_i32 s32, 0x1800			; GFX11-NEXT: s_addk_i32 s32, 0x1800
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, return_512xi32@gotpcrel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, return_512xi32@gotpcrel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, return_512xi32@gotpcrel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, return_512xi32@gotpcrel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v5, s30, 0			; GFX11-NEXT: ; implicit-def: $vgpr5
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
	; GFX11-NEXT: v_mov_b32_e32 v0, s33			; GFX11-NEXT: v_mov_b32_e32 v0, s33
				; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
				; GFX11-NEXT: v_writelane_b32 v5, s30, 0
	; GFX11-NEXT: v_writelane_b32 v5, s31, 1			; GFX11-NEXT: v_writelane_b32 v5, s31, 1
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v5, 1			; GFX11-NEXT: v_readlane_b32 s31, v5, 1
	; GFX11-NEXT: v_readlane_b32 s30, v5, 0			; GFX11-NEXT: v_readlane_b32 s30, v5, 0
	; GFX11-NEXT: s_xor_saveexec_b32 s0, -1			; GFX11-NEXT: s_xor_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v5, off, s33 offset:2048 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v5, off, s33 offset:2048 ; 4-byte Folded Reload
	Show All 11 Lines

llvm/test/CodeGen/AMDGPU/indirect-call.ll

Show First 20 Lines • Show All 395 Lines • ▼ Show 20 Lines
; GCN-NEXT: s_mov_b32 s16, s33		; GCN-NEXT: s_mov_b32 s16, s33
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32
; GCN-NEXT: s_or_saveexec_b64 s[18:19], -1		; GCN-NEXT: s_or_saveexec_b64 s[18:19], -1
; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, s[18:19]		; GCN-NEXT: s_mov_b64 exec, s[18:19]
; GCN-NEXT: v_writelane_b32 v41, s16, 0		; GCN-NEXT: v_writelane_b32 v41, s16, 0
; GCN-NEXT: s_addk_i32 s32, 0x400		; GCN-NEXT: s_addk_i32 s32, 0x400
		; GCN-NEXT: ; implicit-def: $vgpr40
; GCN-NEXT: v_writelane_b32 v40, s30, 0		; GCN-NEXT: v_writelane_b32 v40, s30, 0
; GCN-NEXT: v_writelane_b32 v40, s31, 1		; GCN-NEXT: v_writelane_b32 v40, s31, 1
; GCN-NEXT: v_writelane_b32 v40, s34, 2		; GCN-NEXT: v_writelane_b32 v40, s34, 2
; GCN-NEXT: v_writelane_b32 v40, s35, 3		; GCN-NEXT: v_writelane_b32 v40, s35, 3
; GCN-NEXT: v_writelane_b32 v40, s36, 4		; GCN-NEXT: v_writelane_b32 v40, s36, 4
; GCN-NEXT: v_writelane_b32 v40, s37, 5		; GCN-NEXT: v_writelane_b32 v40, s37, 5
; GCN-NEXT: v_writelane_b32 v40, s38, 6		; GCN-NEXT: v_writelane_b32 v40, s38, 6
; GCN-NEXT: v_writelane_b32 v40, s39, 7		; GCN-NEXT: v_writelane_b32 v40, s39, 7
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines
; GISEL-NEXT: s_mov_b32 s16, s33		; GISEL-NEXT: s_mov_b32 s16, s33
; GISEL-NEXT: s_mov_b32 s33, s32		; GISEL-NEXT: s_mov_b32 s33, s32
; GISEL-NEXT: s_or_saveexec_b64 s[18:19], -1		; GISEL-NEXT: s_or_saveexec_b64 s[18:19], -1
; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GISEL-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GISEL-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GISEL-NEXT: s_mov_b64 exec, s[18:19]		; GISEL-NEXT: s_mov_b64 exec, s[18:19]
; GISEL-NEXT: v_writelane_b32 v41, s16, 0		; GISEL-NEXT: v_writelane_b32 v41, s16, 0
; GISEL-NEXT: s_addk_i32 s32, 0x400		; GISEL-NEXT: s_addk_i32 s32, 0x400
		; GISEL-NEXT: ; implicit-def: $vgpr40
; GISEL-NEXT: v_writelane_b32 v40, s30, 0		; GISEL-NEXT: v_writelane_b32 v40, s30, 0
; GISEL-NEXT: v_writelane_b32 v40, s31, 1		; GISEL-NEXT: v_writelane_b32 v40, s31, 1
; GISEL-NEXT: v_writelane_b32 v40, s34, 2		; GISEL-NEXT: v_writelane_b32 v40, s34, 2
; GISEL-NEXT: v_writelane_b32 v40, s35, 3		; GISEL-NEXT: v_writelane_b32 v40, s35, 3
; GISEL-NEXT: v_writelane_b32 v40, s36, 4		; GISEL-NEXT: v_writelane_b32 v40, s36, 4
; GISEL-NEXT: v_writelane_b32 v40, s37, 5		; GISEL-NEXT: v_writelane_b32 v40, s37, 5
; GISEL-NEXT: v_writelane_b32 v40, s38, 6		; GISEL-NEXT: v_writelane_b32 v40, s38, 6
; GISEL-NEXT: v_writelane_b32 v40, s39, 7		; GISEL-NEXT: v_writelane_b32 v40, s39, 7
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines
; GCN-NEXT: s_mov_b32 s16, s33		; GCN-NEXT: s_mov_b32 s16, s33
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32
; GCN-NEXT: s_or_saveexec_b64 s[18:19], -1		; GCN-NEXT: s_or_saveexec_b64 s[18:19], -1
; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, s[18:19]		; GCN-NEXT: s_mov_b64 exec, s[18:19]
; GCN-NEXT: v_writelane_b32 v41, s16, 0		; GCN-NEXT: v_writelane_b32 v41, s16, 0
; GCN-NEXT: s_addk_i32 s32, 0x400		; GCN-NEXT: s_addk_i32 s32, 0x400
		; GCN-NEXT: ; implicit-def: $vgpr40
; GCN-NEXT: v_writelane_b32 v40, s30, 0		; GCN-NEXT: v_writelane_b32 v40, s30, 0
; GCN-NEXT: v_writelane_b32 v40, s31, 1		; GCN-NEXT: v_writelane_b32 v40, s31, 1
; GCN-NEXT: v_writelane_b32 v40, s34, 2		; GCN-NEXT: v_writelane_b32 v40, s34, 2
; GCN-NEXT: v_writelane_b32 v40, s35, 3		; GCN-NEXT: v_writelane_b32 v40, s35, 3
; GCN-NEXT: v_writelane_b32 v40, s36, 4		; GCN-NEXT: v_writelane_b32 v40, s36, 4
; GCN-NEXT: v_writelane_b32 v40, s37, 5		; GCN-NEXT: v_writelane_b32 v40, s37, 5
; GCN-NEXT: v_writelane_b32 v40, s38, 6		; GCN-NEXT: v_writelane_b32 v40, s38, 6
; GCN-NEXT: v_writelane_b32 v40, s39, 7		; GCN-NEXT: v_writelane_b32 v40, s39, 7
▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines
; GISEL-NEXT: s_mov_b32 s16, s33		; GISEL-NEXT: s_mov_b32 s16, s33
; GISEL-NEXT: s_mov_b32 s33, s32		; GISEL-NEXT: s_mov_b32 s33, s32
; GISEL-NEXT: s_or_saveexec_b64 s[18:19], -1		; GISEL-NEXT: s_or_saveexec_b64 s[18:19], -1
; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GISEL-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GISEL-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GISEL-NEXT: s_mov_b64 exec, s[18:19]		; GISEL-NEXT: s_mov_b64 exec, s[18:19]
; GISEL-NEXT: v_writelane_b32 v41, s16, 0		; GISEL-NEXT: v_writelane_b32 v41, s16, 0
; GISEL-NEXT: s_addk_i32 s32, 0x400		; GISEL-NEXT: s_addk_i32 s32, 0x400
		; GISEL-NEXT: ; implicit-def: $vgpr40
; GISEL-NEXT: v_writelane_b32 v40, s30, 0		; GISEL-NEXT: v_writelane_b32 v40, s30, 0
; GISEL-NEXT: v_writelane_b32 v40, s31, 1		; GISEL-NEXT: v_writelane_b32 v40, s31, 1
; GISEL-NEXT: v_writelane_b32 v40, s34, 2		; GISEL-NEXT: v_writelane_b32 v40, s34, 2
; GISEL-NEXT: v_writelane_b32 v40, s35, 3		; GISEL-NEXT: v_writelane_b32 v40, s35, 3
; GISEL-NEXT: v_writelane_b32 v40, s36, 4		; GISEL-NEXT: v_writelane_b32 v40, s36, 4
; GISEL-NEXT: v_writelane_b32 v40, s37, 5		; GISEL-NEXT: v_writelane_b32 v40, s37, 5
; GISEL-NEXT: v_writelane_b32 v40, s38, 6		; GISEL-NEXT: v_writelane_b32 v40, s38, 6
; GISEL-NEXT: v_writelane_b32 v40, s39, 7		; GISEL-NEXT: v_writelane_b32 v40, s39, 7
▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines
; GCN-NEXT: s_mov_b32 s16, s33		; GCN-NEXT: s_mov_b32 s16, s33
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32
; GCN-NEXT: s_or_saveexec_b64 s[18:19], -1		; GCN-NEXT: s_or_saveexec_b64 s[18:19], -1
; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, s[18:19]		; GCN-NEXT: s_mov_b64 exec, s[18:19]
; GCN-NEXT: v_writelane_b32 v41, s16, 0		; GCN-NEXT: v_writelane_b32 v41, s16, 0
; GCN-NEXT: s_addk_i32 s32, 0x400		; GCN-NEXT: s_addk_i32 s32, 0x400
		; GCN-NEXT: ; implicit-def: $vgpr40
; GCN-NEXT: v_writelane_b32 v40, s30, 0		; GCN-NEXT: v_writelane_b32 v40, s30, 0
; GCN-NEXT: v_writelane_b32 v40, s31, 1		; GCN-NEXT: v_writelane_b32 v40, s31, 1
; GCN-NEXT: v_writelane_b32 v40, s34, 2		; GCN-NEXT: v_writelane_b32 v40, s34, 2
; GCN-NEXT: v_writelane_b32 v40, s35, 3		; GCN-NEXT: v_writelane_b32 v40, s35, 3
; GCN-NEXT: v_writelane_b32 v40, s36, 4		; GCN-NEXT: v_writelane_b32 v40, s36, 4
; GCN-NEXT: v_writelane_b32 v40, s37, 5		; GCN-NEXT: v_writelane_b32 v40, s37, 5
; GCN-NEXT: v_writelane_b32 v40, s38, 6		; GCN-NEXT: v_writelane_b32 v40, s38, 6
; GCN-NEXT: v_writelane_b32 v40, s39, 7		; GCN-NEXT: v_writelane_b32 v40, s39, 7
▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines
; GISEL-NEXT: s_mov_b32 s16, s33		; GISEL-NEXT: s_mov_b32 s16, s33
; GISEL-NEXT: s_mov_b32 s33, s32		; GISEL-NEXT: s_mov_b32 s33, s32
; GISEL-NEXT: s_or_saveexec_b64 s[18:19], -1		; GISEL-NEXT: s_or_saveexec_b64 s[18:19], -1
; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GISEL-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GISEL-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GISEL-NEXT: s_mov_b64 exec, s[18:19]		; GISEL-NEXT: s_mov_b64 exec, s[18:19]
; GISEL-NEXT: v_writelane_b32 v41, s16, 0		; GISEL-NEXT: v_writelane_b32 v41, s16, 0
; GISEL-NEXT: s_addk_i32 s32, 0x400		; GISEL-NEXT: s_addk_i32 s32, 0x400
		; GISEL-NEXT: ; implicit-def: $vgpr40
; GISEL-NEXT: v_writelane_b32 v40, s30, 0		; GISEL-NEXT: v_writelane_b32 v40, s30, 0
; GISEL-NEXT: v_writelane_b32 v40, s31, 1		; GISEL-NEXT: v_writelane_b32 v40, s31, 1
; GISEL-NEXT: v_writelane_b32 v40, s34, 2		; GISEL-NEXT: v_writelane_b32 v40, s34, 2
; GISEL-NEXT: v_writelane_b32 v40, s35, 3		; GISEL-NEXT: v_writelane_b32 v40, s35, 3
; GISEL-NEXT: v_writelane_b32 v40, s36, 4		; GISEL-NEXT: v_writelane_b32 v40, s36, 4
; GISEL-NEXT: v_writelane_b32 v40, s37, 5		; GISEL-NEXT: v_writelane_b32 v40, s37, 5
; GISEL-NEXT: v_writelane_b32 v40, s38, 6		; GISEL-NEXT: v_writelane_b32 v40, s38, 6
; GISEL-NEXT: v_writelane_b32 v40, s39, 7		; GISEL-NEXT: v_writelane_b32 v40, s39, 7
▲ Show 20 Lines • Show All 77 Lines • ▼ Show 20 Lines
; GCN-NEXT: s_mov_b32 s16, s33		; GCN-NEXT: s_mov_b32 s16, s33
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32
; GCN-NEXT: s_or_saveexec_b64 s[18:19], -1		; GCN-NEXT: s_or_saveexec_b64 s[18:19], -1
; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, s[18:19]		; GCN-NEXT: s_mov_b64 exec, s[18:19]
; GCN-NEXT: v_writelane_b32 v41, s16, 0		; GCN-NEXT: v_writelane_b32 v41, s16, 0
; GCN-NEXT: s_addk_i32 s32, 0x400		; GCN-NEXT: s_addk_i32 s32, 0x400
		; GCN-NEXT: ; implicit-def: $vgpr40
; GCN-NEXT: v_writelane_b32 v40, s30, 0		; GCN-NEXT: v_writelane_b32 v40, s30, 0
; GCN-NEXT: v_writelane_b32 v40, s31, 1		; GCN-NEXT: v_writelane_b32 v40, s31, 1
; GCN-NEXT: v_writelane_b32 v40, s34, 2		; GCN-NEXT: v_writelane_b32 v40, s34, 2
; GCN-NEXT: v_writelane_b32 v40, s35, 3		; GCN-NEXT: v_writelane_b32 v40, s35, 3
; GCN-NEXT: v_writelane_b32 v40, s36, 4		; GCN-NEXT: v_writelane_b32 v40, s36, 4
; GCN-NEXT: v_writelane_b32 v40, s37, 5		; GCN-NEXT: v_writelane_b32 v40, s37, 5
; GCN-NEXT: v_writelane_b32 v40, s38, 6		; GCN-NEXT: v_writelane_b32 v40, s38, 6
; GCN-NEXT: v_writelane_b32 v40, s39, 7		; GCN-NEXT: v_writelane_b32 v40, s39, 7
▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines
; GISEL-NEXT: s_mov_b32 s16, s33		; GISEL-NEXT: s_mov_b32 s16, s33
; GISEL-NEXT: s_mov_b32 s33, s32		; GISEL-NEXT: s_mov_b32 s33, s32
; GISEL-NEXT: s_or_saveexec_b64 s[18:19], -1		; GISEL-NEXT: s_or_saveexec_b64 s[18:19], -1
; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GISEL-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GISEL-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GISEL-NEXT: s_mov_b64 exec, s[18:19]		; GISEL-NEXT: s_mov_b64 exec, s[18:19]
; GISEL-NEXT: v_writelane_b32 v41, s16, 0		; GISEL-NEXT: v_writelane_b32 v41, s16, 0
; GISEL-NEXT: s_addk_i32 s32, 0x400		; GISEL-NEXT: s_addk_i32 s32, 0x400
		; GISEL-NEXT: ; implicit-def: $vgpr40
; GISEL-NEXT: v_writelane_b32 v40, s30, 0		; GISEL-NEXT: v_writelane_b32 v40, s30, 0
; GISEL-NEXT: v_writelane_b32 v40, s31, 1		; GISEL-NEXT: v_writelane_b32 v40, s31, 1
; GISEL-NEXT: v_writelane_b32 v40, s34, 2		; GISEL-NEXT: v_writelane_b32 v40, s34, 2
; GISEL-NEXT: v_writelane_b32 v40, s35, 3		; GISEL-NEXT: v_writelane_b32 v40, s35, 3
; GISEL-NEXT: v_writelane_b32 v40, s36, 4		; GISEL-NEXT: v_writelane_b32 v40, s36, 4
; GISEL-NEXT: v_writelane_b32 v40, s37, 5		; GISEL-NEXT: v_writelane_b32 v40, s37, 5
; GISEL-NEXT: v_writelane_b32 v40, s38, 6		; GISEL-NEXT: v_writelane_b32 v40, s38, 6
; GISEL-NEXT: v_writelane_b32 v40, s39, 7		; GISEL-NEXT: v_writelane_b32 v40, s39, 7
▲ Show 20 Lines • Show All 90 Lines • ▼ Show 20 Lines
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: s_mov_b32 s5, s33		; GCN-NEXT: s_mov_b32 s5, s33
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32
; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1		; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, s[6:7]		; GCN-NEXT: s_mov_b64 exec, s[6:7]
; GCN-NEXT: s_addk_i32 s32, 0x400		; GCN-NEXT: s_addk_i32 s32, 0x400
		; GCN-NEXT: ; implicit-def: $vgpr40
; GCN-NEXT: v_writelane_b32 v40, s30, 0		; GCN-NEXT: v_writelane_b32 v40, s30, 0
; GCN-NEXT: v_writelane_b32 v40, s31, 1		; GCN-NEXT: v_writelane_b32 v40, s31, 1
; GCN-NEXT: v_writelane_b32 v40, s34, 2		; GCN-NEXT: v_writelane_b32 v40, s34, 2
; GCN-NEXT: v_writelane_b32 v40, s35, 3		; GCN-NEXT: v_writelane_b32 v40, s35, 3
; GCN-NEXT: v_writelane_b32 v40, s36, 4		; GCN-NEXT: v_writelane_b32 v40, s36, 4
; GCN-NEXT: v_writelane_b32 v40, s37, 5		; GCN-NEXT: v_writelane_b32 v40, s37, 5
; GCN-NEXT: v_writelane_b32 v40, s38, 6		; GCN-NEXT: v_writelane_b32 v40, s38, 6
; GCN-NEXT: v_writelane_b32 v40, s39, 7		; GCN-NEXT: v_writelane_b32 v40, s39, 7
▲ Show 20 Lines • Show All 78 Lines • ▼ Show 20 Lines
; GISEL: ; %bb.0:		; GISEL: ; %bb.0:
; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GISEL-NEXT: s_mov_b32 s5, s33		; GISEL-NEXT: s_mov_b32 s5, s33
; GISEL-NEXT: s_mov_b32 s33, s32		; GISEL-NEXT: s_mov_b32 s33, s32
; GISEL-NEXT: s_or_saveexec_b64 s[6:7], -1		; GISEL-NEXT: s_or_saveexec_b64 s[6:7], -1
; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GISEL-NEXT: s_mov_b64 exec, s[6:7]		; GISEL-NEXT: s_mov_b64 exec, s[6:7]
; GISEL-NEXT: s_addk_i32 s32, 0x400		; GISEL-NEXT: s_addk_i32 s32, 0x400
		; GISEL-NEXT: ; implicit-def: $vgpr40
; GISEL-NEXT: v_writelane_b32 v40, s30, 0		; GISEL-NEXT: v_writelane_b32 v40, s30, 0
; GISEL-NEXT: v_writelane_b32 v40, s31, 1		; GISEL-NEXT: v_writelane_b32 v40, s31, 1
; GISEL-NEXT: v_writelane_b32 v40, s34, 2		; GISEL-NEXT: v_writelane_b32 v40, s34, 2
; GISEL-NEXT: v_writelane_b32 v40, s35, 3		; GISEL-NEXT: v_writelane_b32 v40, s35, 3
; GISEL-NEXT: v_writelane_b32 v40, s36, 4		; GISEL-NEXT: v_writelane_b32 v40, s36, 4
; GISEL-NEXT: v_writelane_b32 v40, s37, 5		; GISEL-NEXT: v_writelane_b32 v40, s37, 5
; GISEL-NEXT: v_writelane_b32 v40, s38, 6		; GISEL-NEXT: v_writelane_b32 v40, s38, 6
; GISEL-NEXT: v_writelane_b32 v40, s39, 7		; GISEL-NEXT: v_writelane_b32 v40, s39, 7
▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines	; GISEL-NEXT: s_setpc_b64 s[30:31]
call amdgpu_gfx void %fptr(i32 inreg 123)		call amdgpu_gfx void %fptr(i32 inreg 123)
ret void		ret void
}		}

define i32 @test_indirect_call_vgpr_ptr_arg_and_reuse(i32 %i, void(i32)* %fptr) {		define i32 @test_indirect_call_vgpr_ptr_arg_and_reuse(i32 %i, void(i32)* %fptr) {
; GCN-LABEL: test_indirect_call_vgpr_ptr_arg_and_reuse:		; GCN-LABEL: test_indirect_call_vgpr_ptr_arg_and_reuse:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: s_mov_b32 s10, s33		; GCN-NEXT: s_mov_b32 s12, s33
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32
; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1		; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, s[4:5]		; GCN-NEXT: s_mov_b64 exec, s[4:5]
; GCN-NEXT: s_addk_i32 s32, 0x400		; GCN-NEXT: s_addk_i32 s32, 0x400
; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GCN-NEXT: v_writelane_b32 v40, s30, 0		; GCN-NEXT: ; implicit-def: $vgpr41
; GCN-NEXT: v_writelane_b32 v40, s31, 1		; GCN-NEXT: v_writelane_b32 v41, s30, 0
; GCN-NEXT: v_writelane_b32 v40, s34, 2		; GCN-NEXT: v_writelane_b32 v41, s31, 1
; GCN-NEXT: v_writelane_b32 v40, s35, 3		; GCN-NEXT: v_writelane_b32 v41, s34, 2
; GCN-NEXT: v_writelane_b32 v40, s36, 4		; GCN-NEXT: v_writelane_b32 v41, s35, 3
; GCN-NEXT: v_writelane_b32 v40, s37, 5		; GCN-NEXT: v_writelane_b32 v41, s36, 4
; GCN-NEXT: v_writelane_b32 v40, s38, 6		; GCN-NEXT: v_writelane_b32 v41, s37, 5
; GCN-NEXT: v_writelane_b32 v40, s39, 7		; GCN-NEXT: v_writelane_b32 v41, s38, 6
; GCN-NEXT: v_writelane_b32 v40, s40, 8		; GCN-NEXT: v_writelane_b32 v41, s39, 7
; GCN-NEXT: v_writelane_b32 v40, s41, 9		; GCN-NEXT: v_writelane_b32 v41, s40, 8
; GCN-NEXT: v_writelane_b32 v40, s42, 10		; GCN-NEXT: v_writelane_b32 v41, s41, 9
; GCN-NEXT: v_writelane_b32 v40, s43, 11		; GCN-NEXT: v_writelane_b32 v41, s42, 10
; GCN-NEXT: v_writelane_b32 v40, s44, 12		; GCN-NEXT: v_writelane_b32 v41, s43, 11
; GCN-NEXT: v_writelane_b32 v40, s45, 13		; GCN-NEXT: v_writelane_b32 v41, s44, 12
; GCN-NEXT: v_writelane_b32 v40, s46, 14		; GCN-NEXT: v_writelane_b32 v41, s45, 13
; GCN-NEXT: v_writelane_b32 v40, s47, 15		; GCN-NEXT: v_writelane_b32 v41, s46, 14
; GCN-NEXT: v_writelane_b32 v40, s48, 16		; GCN-NEXT: v_writelane_b32 v41, s47, 15
; GCN-NEXT: v_writelane_b32 v40, s49, 17		; GCN-NEXT: v_writelane_b32 v41, s48, 16
; GCN-NEXT: v_writelane_b32 v40, s50, 18		; GCN-NEXT: v_writelane_b32 v41, s49, 17
; GCN-NEXT: v_writelane_b32 v40, s51, 19		; GCN-NEXT: v_writelane_b32 v41, s50, 18
; GCN-NEXT: v_writelane_b32 v40, s52, 20		; GCN-NEXT: v_writelane_b32 v41, s51, 19
; GCN-NEXT: v_writelane_b32 v40, s53, 21		; GCN-NEXT: v_writelane_b32 v41, s52, 20
; GCN-NEXT: v_writelane_b32 v40, s54, 22		; GCN-NEXT: v_writelane_b32 v41, s53, 21
; GCN-NEXT: v_writelane_b32 v40, s55, 23		; GCN-NEXT: v_writelane_b32 v41, s54, 22
; GCN-NEXT: v_writelane_b32 v40, s56, 24		; GCN-NEXT: v_writelane_b32 v41, s55, 23
; GCN-NEXT: v_writelane_b32 v40, s57, 25		; GCN-NEXT: v_writelane_b32 v41, s56, 24
; GCN-NEXT: v_writelane_b32 v40, s58, 26		; GCN-NEXT: v_writelane_b32 v41, s57, 25
; GCN-NEXT: v_writelane_b32 v40, s59, 27		; GCN-NEXT: v_writelane_b32 v41, s58, 26
; GCN-NEXT: v_writelane_b32 v40, s60, 28		; GCN-NEXT: v_writelane_b32 v41, s59, 27
; GCN-NEXT: v_writelane_b32 v40, s61, 29		; GCN-NEXT: v_writelane_b32 v41, s60, 28
; GCN-NEXT: v_writelane_b32 v40, s62, 30		; GCN-NEXT: v_writelane_b32 v41, s61, 29
; GCN-NEXT: v_writelane_b32 v40, s63, 31		; GCN-NEXT: v_writelane_b32 v41, s62, 30
; GCN-NEXT: v_mov_b32_e32 v41, v0		; GCN-NEXT: v_writelane_b32 v41, s63, 31
		; GCN-NEXT: v_mov_b32_e32 v40, v0
; GCN-NEXT: s_mov_b64 s[4:5], exec		; GCN-NEXT: s_mov_b64 s[4:5], exec
; GCN-NEXT: .LBB7_1: ; =>This Inner Loop Header: Depth=1		; GCN-NEXT: .LBB7_1: ; =>This Inner Loop Header: Depth=1
; GCN-NEXT: v_readfirstlane_b32 s6, v1		; GCN-NEXT: v_readfirstlane_b32 s6, v1
; GCN-NEXT: v_readfirstlane_b32 s7, v2		; GCN-NEXT: v_readfirstlane_b32 s7, v2
; GCN-NEXT: v_cmp_eq_u64_e32 vcc, s[6:7], v[1:2]		; GCN-NEXT: v_cmp_eq_u64_e32 vcc, s[6:7], v[1:2]
; GCN-NEXT: s_and_saveexec_b64 s[8:9], vcc		; GCN-NEXT: s_and_saveexec_b64 s[8:9], vcc
; GCN-NEXT: v_mov_b32_e32 v0, v41		; GCN-NEXT: v_mov_b32_e32 v0, v40
; GCN-NEXT: s_swappc_b64 s[30:31], s[6:7]		; GCN-NEXT: s_swappc_b64 s[30:31], s[6:7]
; GCN-NEXT: ; implicit-def: $vgpr1_vgpr2		; GCN-NEXT: ; implicit-def: $vgpr1_vgpr2
; GCN-NEXT: s_xor_b64 exec, exec, s[8:9]		; GCN-NEXT: s_xor_b64 exec, exec, s[8:9]
; GCN-NEXT: s_cbranch_execnz .LBB7_1		; GCN-NEXT: s_cbranch_execnz .LBB7_1
; GCN-NEXT: ; %bb.2:		; GCN-NEXT: ; %bb.2:
; GCN-NEXT: s_mov_b64 exec, s[4:5]		; GCN-NEXT: s_mov_b64 exec, s[4:5]
; GCN-NEXT: v_mov_b32_e32 v0, v41		; GCN-NEXT: v_mov_b32_e32 v0, v40
; GCN-NEXT: v_readlane_b32 s63, v40, 31		; GCN-NEXT: v_readlane_b32 s63, v41, 31
; GCN-NEXT: v_readlane_b32 s62, v40, 30		; GCN-NEXT: v_readlane_b32 s62, v41, 30
; GCN-NEXT: v_readlane_b32 s61, v40, 29		; GCN-NEXT: v_readlane_b32 s61, v41, 29
; GCN-NEXT: v_readlane_b32 s60, v40, 28		; GCN-NEXT: v_readlane_b32 s60, v41, 28
; GCN-NEXT: v_readlane_b32 s59, v40, 27		; GCN-NEXT: v_readlane_b32 s59, v41, 27
; GCN-NEXT: v_readlane_b32 s58, v40, 26		; GCN-NEXT: v_readlane_b32 s58, v41, 26
; GCN-NEXT: v_readlane_b32 s57, v40, 25		; GCN-NEXT: v_readlane_b32 s57, v41, 25
; GCN-NEXT: v_readlane_b32 s56, v40, 24		; GCN-NEXT: v_readlane_b32 s56, v41, 24
; GCN-NEXT: v_readlane_b32 s55, v40, 23		; GCN-NEXT: v_readlane_b32 s55, v41, 23
; GCN-NEXT: v_readlane_b32 s54, v40, 22		; GCN-NEXT: v_readlane_b32 s54, v41, 22
; GCN-NEXT: v_readlane_b32 s53, v40, 21		; GCN-NEXT: v_readlane_b32 s53, v41, 21
; GCN-NEXT: v_readlane_b32 s52, v40, 20		; GCN-NEXT: v_readlane_b32 s52, v41, 20
; GCN-NEXT: v_readlane_b32 s51, v40, 19		; GCN-NEXT: v_readlane_b32 s51, v41, 19
; GCN-NEXT: v_readlane_b32 s50, v40, 18		; GCN-NEXT: v_readlane_b32 s50, v41, 18
; GCN-NEXT: v_readlane_b32 s49, v40, 17		; GCN-NEXT: v_readlane_b32 s49, v41, 17
; GCN-NEXT: v_readlane_b32 s48, v40, 16		; GCN-NEXT: v_readlane_b32 s48, v41, 16
; GCN-NEXT: v_readlane_b32 s47, v40, 15		; GCN-NEXT: v_readlane_b32 s47, v41, 15
; GCN-NEXT: v_readlane_b32 s46, v40, 14		; GCN-NEXT: v_readlane_b32 s46, v41, 14
; GCN-NEXT: v_readlane_b32 s45, v40, 13		; GCN-NEXT: v_readlane_b32 s45, v41, 13
; GCN-NEXT: v_readlane_b32 s44, v40, 12		; GCN-NEXT: v_readlane_b32 s44, v41, 12
; GCN-NEXT: v_readlane_b32 s43, v40, 11		; GCN-NEXT: v_readlane_b32 s43, v41, 11
; GCN-NEXT: v_readlane_b32 s42, v40, 10		; GCN-NEXT: v_readlane_b32 s42, v41, 10
; GCN-NEXT: v_readlane_b32 s41, v40, 9		; GCN-NEXT: v_readlane_b32 s41, v41, 9
; GCN-NEXT: v_readlane_b32 s40, v40, 8		; GCN-NEXT: v_readlane_b32 s40, v41, 8
; GCN-NEXT: v_readlane_b32 s39, v40, 7		; GCN-NEXT: v_readlane_b32 s39, v41, 7
; GCN-NEXT: v_readlane_b32 s38, v40, 6		; GCN-NEXT: v_readlane_b32 s38, v41, 6
; GCN-NEXT: v_readlane_b32 s37, v40, 5		; GCN-NEXT: v_readlane_b32 s37, v41, 5
; GCN-NEXT: v_readlane_b32 s36, v40, 4		; GCN-NEXT: v_readlane_b32 s36, v41, 4
; GCN-NEXT: v_readlane_b32 s35, v40, 3		; GCN-NEXT: v_readlane_b32 s35, v41, 3
; GCN-NEXT: v_readlane_b32 s34, v40, 2		; GCN-NEXT: v_readlane_b32 s34, v41, 2
; GCN-NEXT: v_readlane_b32 s31, v40, 1		; GCN-NEXT: v_readlane_b32 s31, v41, 1
; GCN-NEXT: v_readlane_b32 s30, v40, 0		; GCN-NEXT: v_readlane_b32 s30, v41, 0
; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1		; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, s[4:5]		; GCN-NEXT: s_mov_b64 exec, s[4:5]
; GCN-NEXT: s_addk_i32 s32, 0xfc00		; GCN-NEXT: s_addk_i32 s32, 0xfc00
; GCN-NEXT: s_mov_b32 s33, s10		; GCN-NEXT: s_mov_b32 s33, s12
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
;		;
; GISEL-LABEL: test_indirect_call_vgpr_ptr_arg_and_reuse:		; GISEL-LABEL: test_indirect_call_vgpr_ptr_arg_and_reuse:
; GISEL: ; %bb.0:		; GISEL: ; %bb.0:
; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GISEL-NEXT: s_mov_b32 s10, s33		; GISEL-NEXT: s_mov_b32 s12, s33
; GISEL-NEXT: s_mov_b32 s33, s32		; GISEL-NEXT: s_mov_b32 s33, s32
; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1		; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1
; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GISEL-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GISEL-NEXT: s_mov_b64 exec, s[4:5]		; GISEL-NEXT: s_mov_b64 exec, s[4:5]
; GISEL-NEXT: s_addk_i32 s32, 0x400		; GISEL-NEXT: s_addk_i32 s32, 0x400
; GISEL-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill		; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GISEL-NEXT: v_writelane_b32 v40, s30, 0		; GISEL-NEXT: ; implicit-def: $vgpr41
; GISEL-NEXT: v_writelane_b32 v40, s31, 1		; GISEL-NEXT: v_writelane_b32 v41, s30, 0
; GISEL-NEXT: v_writelane_b32 v40, s34, 2		; GISEL-NEXT: v_writelane_b32 v41, s31, 1
; GISEL-NEXT: v_writelane_b32 v40, s35, 3		; GISEL-NEXT: v_writelane_b32 v41, s34, 2
; GISEL-NEXT: v_writelane_b32 v40, s36, 4		; GISEL-NEXT: v_writelane_b32 v41, s35, 3
; GISEL-NEXT: v_writelane_b32 v40, s37, 5		; GISEL-NEXT: v_writelane_b32 v41, s36, 4
; GISEL-NEXT: v_writelane_b32 v40, s38, 6		; GISEL-NEXT: v_writelane_b32 v41, s37, 5
; GISEL-NEXT: v_writelane_b32 v40, s39, 7		; GISEL-NEXT: v_writelane_b32 v41, s38, 6
; GISEL-NEXT: v_writelane_b32 v40, s40, 8		; GISEL-NEXT: v_writelane_b32 v41, s39, 7
; GISEL-NEXT: v_writelane_b32 v40, s41, 9		; GISEL-NEXT: v_writelane_b32 v41, s40, 8
; GISEL-NEXT: v_writelane_b32 v40, s42, 10		; GISEL-NEXT: v_writelane_b32 v41, s41, 9
; GISEL-NEXT: v_writelane_b32 v40, s43, 11		; GISEL-NEXT: v_writelane_b32 v41, s42, 10
; GISEL-NEXT: v_writelane_b32 v40, s44, 12		; GISEL-NEXT: v_writelane_b32 v41, s43, 11
; GISEL-NEXT: v_writelane_b32 v40, s45, 13		; GISEL-NEXT: v_writelane_b32 v41, s44, 12
; GISEL-NEXT: v_writelane_b32 v40, s46, 14		; GISEL-NEXT: v_writelane_b32 v41, s45, 13
; GISEL-NEXT: v_writelane_b32 v40, s47, 15		; GISEL-NEXT: v_writelane_b32 v41, s46, 14
; GISEL-NEXT: v_writelane_b32 v40, s48, 16		; GISEL-NEXT: v_writelane_b32 v41, s47, 15
; GISEL-NEXT: v_writelane_b32 v40, s49, 17		; GISEL-NEXT: v_writelane_b32 v41, s48, 16
; GISEL-NEXT: v_writelane_b32 v40, s50, 18		; GISEL-NEXT: v_writelane_b32 v41, s49, 17
; GISEL-NEXT: v_writelane_b32 v40, s51, 19		; GISEL-NEXT: v_writelane_b32 v41, s50, 18
; GISEL-NEXT: v_writelane_b32 v40, s52, 20		; GISEL-NEXT: v_writelane_b32 v41, s51, 19
; GISEL-NEXT: v_writelane_b32 v40, s53, 21		; GISEL-NEXT: v_writelane_b32 v41, s52, 20
; GISEL-NEXT: v_writelane_b32 v40, s54, 22		; GISEL-NEXT: v_writelane_b32 v41, s53, 21
; GISEL-NEXT: v_writelane_b32 v40, s55, 23		; GISEL-NEXT: v_writelane_b32 v41, s54, 22
; GISEL-NEXT: v_writelane_b32 v40, s56, 24		; GISEL-NEXT: v_writelane_b32 v41, s55, 23
; GISEL-NEXT: v_writelane_b32 v40, s57, 25		; GISEL-NEXT: v_writelane_b32 v41, s56, 24
; GISEL-NEXT: v_writelane_b32 v40, s58, 26		; GISEL-NEXT: v_writelane_b32 v41, s57, 25
; GISEL-NEXT: v_writelane_b32 v40, s59, 27		; GISEL-NEXT: v_writelane_b32 v41, s58, 26
; GISEL-NEXT: v_writelane_b32 v40, s60, 28		; GISEL-NEXT: v_writelane_b32 v41, s59, 27
; GISEL-NEXT: v_writelane_b32 v40, s61, 29		; GISEL-NEXT: v_writelane_b32 v41, s60, 28
; GISEL-NEXT: v_writelane_b32 v40, s62, 30		; GISEL-NEXT: v_writelane_b32 v41, s61, 29
; GISEL-NEXT: v_writelane_b32 v40, s63, 31		; GISEL-NEXT: v_writelane_b32 v41, s62, 30
; GISEL-NEXT: v_mov_b32_e32 v41, v0		; GISEL-NEXT: v_writelane_b32 v41, s63, 31
		; GISEL-NEXT: v_mov_b32_e32 v40, v0
; GISEL-NEXT: s_mov_b64 s[4:5], exec		; GISEL-NEXT: s_mov_b64 s[4:5], exec
; GISEL-NEXT: .LBB7_1: ; =>This Inner Loop Header: Depth=1		; GISEL-NEXT: .LBB7_1: ; =>This Inner Loop Header: Depth=1
; GISEL-NEXT: v_readfirstlane_b32 s6, v1		; GISEL-NEXT: v_readfirstlane_b32 s6, v1
; GISEL-NEXT: v_readfirstlane_b32 s7, v2		; GISEL-NEXT: v_readfirstlane_b32 s7, v2
; GISEL-NEXT: v_cmp_eq_u64_e32 vcc, s[6:7], v[1:2]		; GISEL-NEXT: v_cmp_eq_u64_e32 vcc, s[6:7], v[1:2]
; GISEL-NEXT: s_and_saveexec_b64 s[8:9], vcc		; GISEL-NEXT: s_and_saveexec_b64 s[8:9], vcc
; GISEL-NEXT: v_mov_b32_e32 v0, v41		; GISEL-NEXT: v_mov_b32_e32 v0, v40
; GISEL-NEXT: s_swappc_b64 s[30:31], s[6:7]		; GISEL-NEXT: s_swappc_b64 s[30:31], s[6:7]
; GISEL-NEXT: ; implicit-def: $vgpr1		; GISEL-NEXT: ; implicit-def: $vgpr1
; GISEL-NEXT: s_xor_b64 exec, exec, s[8:9]		; GISEL-NEXT: s_xor_b64 exec, exec, s[8:9]
; GISEL-NEXT: s_cbranch_execnz .LBB7_1		; GISEL-NEXT: s_cbranch_execnz .LBB7_1
; GISEL-NEXT: ; %bb.2:		; GISEL-NEXT: ; %bb.2:
; GISEL-NEXT: s_mov_b64 exec, s[4:5]		; GISEL-NEXT: s_mov_b64 exec, s[4:5]
; GISEL-NEXT: v_mov_b32_e32 v0, v41		; GISEL-NEXT: v_mov_b32_e32 v0, v40
; GISEL-NEXT: v_readlane_b32 s63, v40, 31		; GISEL-NEXT: v_readlane_b32 s63, v41, 31
; GISEL-NEXT: v_readlane_b32 s62, v40, 30		; GISEL-NEXT: v_readlane_b32 s62, v41, 30
; GISEL-NEXT: v_readlane_b32 s61, v40, 29		; GISEL-NEXT: v_readlane_b32 s61, v41, 29
; GISEL-NEXT: v_readlane_b32 s60, v40, 28		; GISEL-NEXT: v_readlane_b32 s60, v41, 28
; GISEL-NEXT: v_readlane_b32 s59, v40, 27		; GISEL-NEXT: v_readlane_b32 s59, v41, 27
; GISEL-NEXT: v_readlane_b32 s58, v40, 26		; GISEL-NEXT: v_readlane_b32 s58, v41, 26
; GISEL-NEXT: v_readlane_b32 s57, v40, 25		; GISEL-NEXT: v_readlane_b32 s57, v41, 25
; GISEL-NEXT: v_readlane_b32 s56, v40, 24		; GISEL-NEXT: v_readlane_b32 s56, v41, 24
; GISEL-NEXT: v_readlane_b32 s55, v40, 23		; GISEL-NEXT: v_readlane_b32 s55, v41, 23
; GISEL-NEXT: v_readlane_b32 s54, v40, 22		; GISEL-NEXT: v_readlane_b32 s54, v41, 22
; GISEL-NEXT: v_readlane_b32 s53, v40, 21		; GISEL-NEXT: v_readlane_b32 s53, v41, 21
; GISEL-NEXT: v_readlane_b32 s52, v40, 20		; GISEL-NEXT: v_readlane_b32 s52, v41, 20
; GISEL-NEXT: v_readlane_b32 s51, v40, 19		; GISEL-NEXT: v_readlane_b32 s51, v41, 19
; GISEL-NEXT: v_readlane_b32 s50, v40, 18		; GISEL-NEXT: v_readlane_b32 s50, v41, 18
; GISEL-NEXT: v_readlane_b32 s49, v40, 17		; GISEL-NEXT: v_readlane_b32 s49, v41, 17
; GISEL-NEXT: v_readlane_b32 s48, v40, 16		; GISEL-NEXT: v_readlane_b32 s48, v41, 16
; GISEL-NEXT: v_readlane_b32 s47, v40, 15		; GISEL-NEXT: v_readlane_b32 s47, v41, 15
; GISEL-NEXT: v_readlane_b32 s46, v40, 14		; GISEL-NEXT: v_readlane_b32 s46, v41, 14
; GISEL-NEXT: v_readlane_b32 s45, v40, 13		; GISEL-NEXT: v_readlane_b32 s45, v41, 13
; GISEL-NEXT: v_readlane_b32 s44, v40, 12		; GISEL-NEXT: v_readlane_b32 s44, v41, 12
; GISEL-NEXT: v_readlane_b32 s43, v40, 11		; GISEL-NEXT: v_readlane_b32 s43, v41, 11
; GISEL-NEXT: v_readlane_b32 s42, v40, 10		; GISEL-NEXT: v_readlane_b32 s42, v41, 10
; GISEL-NEXT: v_readlane_b32 s41, v40, 9		; GISEL-NEXT: v_readlane_b32 s41, v41, 9
; GISEL-NEXT: v_readlane_b32 s40, v40, 8		; GISEL-NEXT: v_readlane_b32 s40, v41, 8
; GISEL-NEXT: v_readlane_b32 s39, v40, 7		; GISEL-NEXT: v_readlane_b32 s39, v41, 7
; GISEL-NEXT: v_readlane_b32 s38, v40, 6		; GISEL-NEXT: v_readlane_b32 s38, v41, 6
; GISEL-NEXT: v_readlane_b32 s37, v40, 5		; GISEL-NEXT: v_readlane_b32 s37, v41, 5
; GISEL-NEXT: v_readlane_b32 s36, v40, 4		; GISEL-NEXT: v_readlane_b32 s36, v41, 4
; GISEL-NEXT: v_readlane_b32 s35, v40, 3		; GISEL-NEXT: v_readlane_b32 s35, v41, 3
; GISEL-NEXT: v_readlane_b32 s34, v40, 2		; GISEL-NEXT: v_readlane_b32 s34, v41, 2
; GISEL-NEXT: v_readlane_b32 s31, v40, 1		; GISEL-NEXT: v_readlane_b32 s31, v41, 1
; GISEL-NEXT: v_readlane_b32 s30, v40, 0		; GISEL-NEXT: v_readlane_b32 s30, v41, 0
; GISEL-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload		; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1		; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1
; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload		; GISEL-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
; GISEL-NEXT: s_mov_b64 exec, s[4:5]		; GISEL-NEXT: s_mov_b64 exec, s[4:5]
; GISEL-NEXT: s_addk_i32 s32, 0xfc00		; GISEL-NEXT: s_addk_i32 s32, 0xfc00
; GISEL-NEXT: s_mov_b32 s33, s10		; GISEL-NEXT: s_mov_b32 s33, s12
; GISEL-NEXT: s_waitcnt vmcnt(0)		; GISEL-NEXT: s_waitcnt vmcnt(0)
; GISEL-NEXT: s_setpc_b64 s[30:31]		; GISEL-NEXT: s_setpc_b64 s[30:31]
call amdgpu_gfx void %fptr(i32 %i)		call amdgpu_gfx void %fptr(i32 %i)
ret i32 %i		ret i32 %i
}		}

; Use a variable inside a waterfall loop and use the return variable after the loop.		; Use a variable inside a waterfall loop and use the return variable after the loop.
; TODO The argument and return variable could be in the same physical register, but the register		; TODO The argument and return variable could be in the same physical register, but the register
; allocator is not able to do that because the return value clashes with the liverange of an		; allocator is not able to do that because the return value clashes with the liverange of an
; IMPLICIT_DEF of the argument.		; IMPLICIT_DEF of the argument.
define i32 @test_indirect_call_vgpr_ptr_arg_and_return(i32 %i, i32(i32)* %fptr) {		define i32 @test_indirect_call_vgpr_ptr_arg_and_return(i32 %i, i32(i32)* %fptr) {
; GCN-LABEL: test_indirect_call_vgpr_ptr_arg_and_return:		; GCN-LABEL: test_indirect_call_vgpr_ptr_arg_and_return:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: s_mov_b32 s10, s33		; GCN-NEXT: s_mov_b32 s12, s33
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32
; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1		; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, s[4:5]		; GCN-NEXT: s_mov_b64 exec, s[4:5]
; GCN-NEXT: s_addk_i32 s32, 0x400		; GCN-NEXT: s_addk_i32 s32, 0x400
		; GCN-NEXT: ; implicit-def: $vgpr40
; GCN-NEXT: v_writelane_b32 v40, s30, 0		; GCN-NEXT: v_writelane_b32 v40, s30, 0
; GCN-NEXT: v_writelane_b32 v40, s31, 1		; GCN-NEXT: v_writelane_b32 v40, s31, 1
; GCN-NEXT: v_writelane_b32 v40, s34, 2		; GCN-NEXT: v_writelane_b32 v40, s34, 2
; GCN-NEXT: v_writelane_b32 v40, s35, 3		; GCN-NEXT: v_writelane_b32 v40, s35, 3
; GCN-NEXT: v_writelane_b32 v40, s36, 4		; GCN-NEXT: v_writelane_b32 v40, s36, 4
; GCN-NEXT: v_writelane_b32 v40, s37, 5		; GCN-NEXT: v_writelane_b32 v40, s37, 5
; GCN-NEXT: v_writelane_b32 v40, s38, 6		; GCN-NEXT: v_writelane_b32 v40, s38, 6
; GCN-NEXT: v_writelane_b32 v40, s39, 7		; GCN-NEXT: v_writelane_b32 v40, s39, 7
▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines
; GCN-NEXT: v_readlane_b32 s35, v40, 3		; GCN-NEXT: v_readlane_b32 s35, v40, 3
; GCN-NEXT: v_readlane_b32 s34, v40, 2		; GCN-NEXT: v_readlane_b32 s34, v40, 2
; GCN-NEXT: v_readlane_b32 s31, v40, 1		; GCN-NEXT: v_readlane_b32 s31, v40, 1
; GCN-NEXT: v_readlane_b32 s30, v40, 0		; GCN-NEXT: v_readlane_b32 s30, v40, 0
; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1		; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, s[4:5]		; GCN-NEXT: s_mov_b64 exec, s[4:5]
; GCN-NEXT: s_addk_i32 s32, 0xfc00		; GCN-NEXT: s_addk_i32 s32, 0xfc00
; GCN-NEXT: s_mov_b32 s33, s10		; GCN-NEXT: s_mov_b32 s33, s12
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
;		;
; GISEL-LABEL: test_indirect_call_vgpr_ptr_arg_and_return:		; GISEL-LABEL: test_indirect_call_vgpr_ptr_arg_and_return:
; GISEL: ; %bb.0:		; GISEL: ; %bb.0:
; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GISEL-NEXT: s_mov_b32 s10, s33		; GISEL-NEXT: s_mov_b32 s12, s33
; GISEL-NEXT: s_mov_b32 s33, s32		; GISEL-NEXT: s_mov_b32 s33, s32
; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1		; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1
; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GISEL-NEXT: s_mov_b64 exec, s[4:5]		; GISEL-NEXT: s_mov_b64 exec, s[4:5]
; GISEL-NEXT: s_addk_i32 s32, 0x400		; GISEL-NEXT: s_addk_i32 s32, 0x400
		; GISEL-NEXT: ; implicit-def: $vgpr40
; GISEL-NEXT: v_writelane_b32 v40, s30, 0		; GISEL-NEXT: v_writelane_b32 v40, s30, 0
; GISEL-NEXT: v_writelane_b32 v40, s31, 1		; GISEL-NEXT: v_writelane_b32 v40, s31, 1
; GISEL-NEXT: v_writelane_b32 v40, s34, 2		; GISEL-NEXT: v_writelane_b32 v40, s34, 2
; GISEL-NEXT: v_writelane_b32 v40, s35, 3		; GISEL-NEXT: v_writelane_b32 v40, s35, 3
; GISEL-NEXT: v_writelane_b32 v40, s36, 4		; GISEL-NEXT: v_writelane_b32 v40, s36, 4
; GISEL-NEXT: v_writelane_b32 v40, s37, 5		; GISEL-NEXT: v_writelane_b32 v40, s37, 5
; GISEL-NEXT: v_writelane_b32 v40, s38, 6		; GISEL-NEXT: v_writelane_b32 v40, s38, 6
; GISEL-NEXT: v_writelane_b32 v40, s39, 7		; GISEL-NEXT: v_writelane_b32 v40, s39, 7
▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines
; GISEL-NEXT: v_readlane_b32 s35, v40, 3		; GISEL-NEXT: v_readlane_b32 s35, v40, 3
; GISEL-NEXT: v_readlane_b32 s34, v40, 2		; GISEL-NEXT: v_readlane_b32 s34, v40, 2
; GISEL-NEXT: v_readlane_b32 s31, v40, 1		; GISEL-NEXT: v_readlane_b32 s31, v40, 1
; GISEL-NEXT: v_readlane_b32 s30, v40, 0		; GISEL-NEXT: v_readlane_b32 s30, v40, 0
; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1		; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1
; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload		; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
; GISEL-NEXT: s_mov_b64 exec, s[4:5]		; GISEL-NEXT: s_mov_b64 exec, s[4:5]
; GISEL-NEXT: s_addk_i32 s32, 0xfc00		; GISEL-NEXT: s_addk_i32 s32, 0xfc00
; GISEL-NEXT: s_mov_b32 s33, s10		; GISEL-NEXT: s_mov_b32 s33, s12
; GISEL-NEXT: s_waitcnt vmcnt(0)		; GISEL-NEXT: s_waitcnt vmcnt(0)
; GISEL-NEXT: s_setpc_b64 s[30:31]		; GISEL-NEXT: s_setpc_b64 s[30:31]
%ret = call amdgpu_gfx i32 %fptr(i32 %i)		%ret = call amdgpu_gfx i32 %fptr(i32 %i)
ret i32 %ret		ret i32 %ret
}		}

; Calling a vgpr can never be a tail call.		; Calling a vgpr can never be a tail call.
define void @test_indirect_tail_call_vgpr_ptr(void()* %fptr) {		define void @test_indirect_tail_call_vgpr_ptr(void()* %fptr) {
; GCN-LABEL: test_indirect_tail_call_vgpr_ptr:		; GCN-LABEL: test_indirect_tail_call_vgpr_ptr:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: s_mov_b32 s10, s33		; GCN-NEXT: s_mov_b32 s12, s33
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32
; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1		; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, s[4:5]		; GCN-NEXT: s_mov_b64 exec, s[4:5]
; GCN-NEXT: s_addk_i32 s32, 0x400		; GCN-NEXT: s_addk_i32 s32, 0x400
		; GCN-NEXT: ; implicit-def: $vgpr40
; GCN-NEXT: v_writelane_b32 v40, s30, 0		; GCN-NEXT: v_writelane_b32 v40, s30, 0
; GCN-NEXT: v_writelane_b32 v40, s31, 1		; GCN-NEXT: v_writelane_b32 v40, s31, 1
; GCN-NEXT: v_writelane_b32 v40, s34, 2		; GCN-NEXT: v_writelane_b32 v40, s34, 2
; GCN-NEXT: v_writelane_b32 v40, s35, 3		; GCN-NEXT: v_writelane_b32 v40, s35, 3
; GCN-NEXT: v_writelane_b32 v40, s36, 4		; GCN-NEXT: v_writelane_b32 v40, s36, 4
; GCN-NEXT: v_writelane_b32 v40, s37, 5		; GCN-NEXT: v_writelane_b32 v40, s37, 5
; GCN-NEXT: v_writelane_b32 v40, s38, 6		; GCN-NEXT: v_writelane_b32 v40, s38, 6
; GCN-NEXT: v_writelane_b32 v40, s39, 7		; GCN-NEXT: v_writelane_b32 v40, s39, 7
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines
; GCN-NEXT: v_readlane_b32 s35, v40, 3		; GCN-NEXT: v_readlane_b32 s35, v40, 3
; GCN-NEXT: v_readlane_b32 s34, v40, 2		; GCN-NEXT: v_readlane_b32 s34, v40, 2
; GCN-NEXT: v_readlane_b32 s31, v40, 1		; GCN-NEXT: v_readlane_b32 s31, v40, 1
; GCN-NEXT: v_readlane_b32 s30, v40, 0		; GCN-NEXT: v_readlane_b32 s30, v40, 0
; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1		; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, s[4:5]		; GCN-NEXT: s_mov_b64 exec, s[4:5]
; GCN-NEXT: s_addk_i32 s32, 0xfc00		; GCN-NEXT: s_addk_i32 s32, 0xfc00
; GCN-NEXT: s_mov_b32 s33, s10		; GCN-NEXT: s_mov_b32 s33, s12
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
;		;
; GISEL-LABEL: test_indirect_tail_call_vgpr_ptr:		; GISEL-LABEL: test_indirect_tail_call_vgpr_ptr:
; GISEL: ; %bb.0:		; GISEL: ; %bb.0:
; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GISEL-NEXT: s_mov_b32 s10, s33		; GISEL-NEXT: s_mov_b32 s12, s33
; GISEL-NEXT: s_mov_b32 s33, s32		; GISEL-NEXT: s_mov_b32 s33, s32
; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1		; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1
; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
; GISEL-NEXT: s_mov_b64 exec, s[4:5]		; GISEL-NEXT: s_mov_b64 exec, s[4:5]
; GISEL-NEXT: s_addk_i32 s32, 0x400		; GISEL-NEXT: s_addk_i32 s32, 0x400
		; GISEL-NEXT: ; implicit-def: $vgpr40
; GISEL-NEXT: v_writelane_b32 v40, s30, 0		; GISEL-NEXT: v_writelane_b32 v40, s30, 0
; GISEL-NEXT: v_writelane_b32 v40, s31, 1		; GISEL-NEXT: v_writelane_b32 v40, s31, 1
; GISEL-NEXT: v_writelane_b32 v40, s34, 2		; GISEL-NEXT: v_writelane_b32 v40, s34, 2
; GISEL-NEXT: v_writelane_b32 v40, s35, 3		; GISEL-NEXT: v_writelane_b32 v40, s35, 3
; GISEL-NEXT: v_writelane_b32 v40, s36, 4		; GISEL-NEXT: v_writelane_b32 v40, s36, 4
; GISEL-NEXT: v_writelane_b32 v40, s37, 5		; GISEL-NEXT: v_writelane_b32 v40, s37, 5
; GISEL-NEXT: v_writelane_b32 v40, s38, 6		; GISEL-NEXT: v_writelane_b32 v40, s38, 6
; GISEL-NEXT: v_writelane_b32 v40, s39, 7		; GISEL-NEXT: v_writelane_b32 v40, s39, 7
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines
; GISEL-NEXT: v_readlane_b32 s35, v40, 3		; GISEL-NEXT: v_readlane_b32 s35, v40, 3
; GISEL-NEXT: v_readlane_b32 s34, v40, 2		; GISEL-NEXT: v_readlane_b32 s34, v40, 2
; GISEL-NEXT: v_readlane_b32 s31, v40, 1		; GISEL-NEXT: v_readlane_b32 s31, v40, 1
; GISEL-NEXT: v_readlane_b32 s30, v40, 0		; GISEL-NEXT: v_readlane_b32 s30, v40, 0
; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1		; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1
; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload		; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
; GISEL-NEXT: s_mov_b64 exec, s[4:5]		; GISEL-NEXT: s_mov_b64 exec, s[4:5]
; GISEL-NEXT: s_addk_i32 s32, 0xfc00		; GISEL-NEXT: s_addk_i32 s32, 0xfc00
; GISEL-NEXT: s_mov_b32 s33, s10		; GISEL-NEXT: s_mov_b32 s33, s12
; GISEL-NEXT: s_waitcnt vmcnt(0)		; GISEL-NEXT: s_waitcnt vmcnt(0)
; GISEL-NEXT: s_setpc_b64 s[30:31]		; GISEL-NEXT: s_setpc_b64 s[30:31]
tail call amdgpu_gfx void %fptr()		tail call amdgpu_gfx void %fptr()
ret void		ret void
}		}

llvm/test/CodeGen/AMDGPU/kernel-vgpr-spill-mubuf-with-voffset.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx906 -O0 -verify-machineinstrs %s -o - \| FileCheck %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx906 -O0 -verify-machineinstrs %s -o - \| FileCheck %s

	; The forced spill to preserve the scratch VGPR require the voffset to hold the large offset			; The forced spill to preserve the scratch VGPR require the voffset to hold the large offset
	; value in the MUBUF instruction being emitted before s_cbranch_scc1 as it clobbers the SCC.			; value in the MUBUF instruction being emitted before s_cbranch_scc1 as it clobbers the SCC.

	define amdgpu_kernel void @test_kernel(i32 %val) #0 {			define amdgpu_kernel void @test_kernel(i32 %val) #0 {
	; CHECK-LABEL: test_kernel:			; CHECK-LABEL: test_kernel:
	; CHECK: ; %bb.0:			; CHECK: ; %bb.0:
	; CHECK-NEXT: s_mov_b32 s32, 0x180000			; CHECK-NEXT: s_mov_b32 s32, 0x180000
	; CHECK-NEXT: s_mov_b32 s33, 0			; CHECK-NEXT: s_mov_b32 s33, 0
	; CHECK-NEXT: s_add_u32 flat_scratch_lo, s12, s17			; CHECK-NEXT: s_add_u32 flat_scratch_lo, s12, s17
	; CHECK-NEXT: s_addc_u32 flat_scratch_hi, s13, 0			; CHECK-NEXT: s_addc_u32 flat_scratch_hi, s13, 0
	; CHECK-NEXT: s_add_u32 s0, s0, s17			; CHECK-NEXT: s_add_u32 s0, s0, s17
	; CHECK-NEXT: s_addc_u32 s1, s1, 0			; CHECK-NEXT: s_addc_u32 s1, s1, 0
	; CHECK-NEXT: v_writelane_b32 v40, s16, 0			; CHECK-NEXT: ; implicit-def: $vgpr3
				; CHECK-NEXT: v_writelane_b32 v3, s16, 0
				; CHECK-NEXT: s_or_saveexec_b64 s[34:35], -1
				; CHECK-NEXT: s_add_i32 s12, s33, 0x100200
				; CHECK-NEXT: buffer_store_dword v3, off, s[0:3], s12 ; 4-byte Folded Spill
				; CHECK-NEXT: s_mov_b64 exec, s[34:35]
	; CHECK-NEXT: s_mov_b32 s13, s15			; CHECK-NEXT: s_mov_b32 s13, s15
	; CHECK-NEXT: s_mov_b32 s12, s14			; CHECK-NEXT: s_mov_b32 s12, s14
	; CHECK-NEXT: v_readlane_b32 s14, v40, 0			; CHECK-NEXT: v_readlane_b32 s14, v3, 0
	; CHECK-NEXT: s_mov_b64 s[16:17], s[8:9]			; CHECK-NEXT: s_mov_b64 s[16:17], s[8:9]
	; CHECK-NEXT: v_mov_b32_e32 v3, v2			; CHECK-NEXT: v_mov_b32_e32 v3, v2
	; CHECK-NEXT: v_mov_b32_e32 v2, v1			; CHECK-NEXT: v_mov_b32_e32 v2, v1
	; CHECK-NEXT: v_mov_b32_e32 v1, v0			; CHECK-NEXT: v_mov_b32_e32 v1, v0
				; CHECK-NEXT: s_or_saveexec_b64 s[34:35], -1
				; CHECK-NEXT: s_add_i32 s8, s33, 0x100200
				; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], s8 ; 4-byte Folded Reload
				; CHECK-NEXT: s_mov_b64 exec, s[34:35]
	; CHECK-NEXT: s_load_dword s8, s[16:17], 0x0			; CHECK-NEXT: s_load_dword s8, s[16:17], 0x0
	; CHECK-NEXT: s_waitcnt lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; CHECK-NEXT: v_writelane_b32 v40, s8, 1			; CHECK-NEXT: v_writelane_b32 v0, s8, 1
				; CHECK-NEXT: s_or_saveexec_b64 s[34:35], -1
				; CHECK-NEXT: s_add_i32 s8, s33, 0x100200
				; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s8 ; 4-byte Folded Spill
				; CHECK-NEXT: s_mov_b64 exec, s[34:35]
	; CHECK-NEXT: ;;#ASMSTART			; CHECK-NEXT: ;;#ASMSTART
	; CHECK-NEXT: ; def vgpr10			; CHECK-NEXT: ; def vgpr10
	; CHECK-NEXT: ;;#ASMEND			; CHECK-NEXT: ;;#ASMEND
	; CHECK-NEXT: s_add_i32 s8, s33, 0x100100			; CHECK-NEXT: s_add_i32 s8, s33, 0x100100
	; CHECK-NEXT: buffer_store_dword v10, off, s[0:3], s8 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v10, off, s[0:3], s8 ; 4-byte Folded Spill
	; CHECK-NEXT: s_mov_b64 s[18:19], 8			; CHECK-NEXT: s_mov_b64 s[18:19], 8
	; CHECK-NEXT: s_mov_b32 s8, s16			; CHECK-NEXT: s_mov_b32 s8, s16
	; CHECK-NEXT: s_mov_b32 s9, s17			; CHECK-NEXT: s_mov_b32 s9, s17
	Show All 16 Lines
	; CHECK-NEXT: s_mov_b32 s15, 10			; CHECK-NEXT: s_mov_b32 s15, 10
	; CHECK-NEXT: v_lshlrev_b32_e64 v2, s15, v2			; CHECK-NEXT: v_lshlrev_b32_e64 v2, s15, v2
	; CHECK-NEXT: v_or3_b32 v31, v1, v2, v3			; CHECK-NEXT: v_or3_b32 v31, v1, v2, v3
	; CHECK-NEXT: ; implicit-def: $sgpr15			; CHECK-NEXT: ; implicit-def: $sgpr15
	; CHECK-NEXT: s_mov_b64 s[0:1], s[20:21]			; CHECK-NEXT: s_mov_b64 s[0:1], s[20:21]
	; CHECK-NEXT: s_mov_b64 s[2:3], s[22:23]			; CHECK-NEXT: s_mov_b64 s[2:3], s[22:23]
	; CHECK-NEXT: s_waitcnt lgkmcnt(0)			; CHECK-NEXT: s_waitcnt lgkmcnt(0)
	; CHECK-NEXT: s_swappc_b64 s[30:31], s[16:17]			; CHECK-NEXT: s_swappc_b64 s[30:31], s[16:17]
				; CHECK-NEXT: s_or_saveexec_b64 s[34:35], -1
				; CHECK-NEXT: s_add_i32 s4, s33, 0x100200
				; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], s4 ; 4-byte Folded Reload
				; CHECK-NEXT: s_mov_b64 exec, s[34:35]
	; CHECK-NEXT: s_add_i32 s4, s33, 0x100100			; CHECK-NEXT: s_add_i32 s4, s33, 0x100100
	; CHECK-NEXT: buffer_load_dword v10, off, s[0:3], s4 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v10, off, s[0:3], s4 ; 4-byte Folded Reload
	; CHECK-NEXT: v_readlane_b32 s4, v40, 1			; CHECK-NEXT: s_waitcnt vmcnt(1)
				; CHECK-NEXT: v_readlane_b32 s4, v0, 1
	; CHECK-NEXT: s_mov_b32 s5, 0			; CHECK-NEXT: s_mov_b32 s5, 0
	; CHECK-NEXT: s_cmp_eq_u32 s4, s5			; CHECK-NEXT: s_cmp_eq_u32 s4, s5
	; CHECK-NEXT: v_mov_b32_e32 v0, 0x4000			; CHECK-NEXT: v_mov_b32_e32 v0, 0x4000
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: buffer_store_dword v10, v0, s[0:3], s33 offen ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v10, v0, s[0:3], s33 offen ; 4-byte Folded Spill
	; CHECK-NEXT: s_cbranch_scc1 .LBB0_2			; CHECK-NEXT: s_cbranch_scc1 .LBB0_2
	; CHECK-NEXT: ; %bb.1: ; %store			; CHECK-NEXT: ; %bb.1: ; %store
	; CHECK-NEXT: s_add_i32 s4, s33, 0x100000			; CHECK-NEXT: s_add_i32 s4, s33, 0x100000
	Show All 25 Lines

llvm/test/CodeGen/AMDGPU/load-constant-i16.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 3,031 Lines • ▼ Show 20 Lines
	; GCN-HSA-NEXT: s_lshr_b32 s24, s5, 16			; GCN-HSA-NEXT: s_lshr_b32 s24, s5, 16
	; GCN-HSA-NEXT: s_lshr_b32 s26, s4, 16			; GCN-HSA-NEXT: s_lshr_b32 s26, s4, 16
	; GCN-HSA-NEXT: s_lshr_b32 s28, s7, 16			; GCN-HSA-NEXT: s_lshr_b32 s28, s7, 16
	; GCN-HSA-NEXT: s_lshr_b32 s30, s6, 16			; GCN-HSA-NEXT: s_lshr_b32 s30, s6, 16
	; GCN-HSA-NEXT: s_lshr_b32 s33, s9, 16			; GCN-HSA-NEXT: s_lshr_b32 s33, s9, 16
	; GCN-HSA-NEXT: s_lshr_b32 s35, s8, 16			; GCN-HSA-NEXT: s_lshr_b32 s35, s8, 16
	; GCN-HSA-NEXT: s_lshr_b32 s37, s11, 16			; GCN-HSA-NEXT: s_lshr_b32 s37, s11, 16
	; GCN-HSA-NEXT: s_lshr_b32 s38, s10, 16			; GCN-HSA-NEXT: s_lshr_b32 s38, s10, 16
	; GCN-HSA-NEXT: s_lshr_b32 s39, s13, 16			; GCN-HSA-NEXT: s_lshr_b32 s40, s13, 16
	; GCN-HSA-NEXT: s_lshr_b32 s40, s12, 16			; GCN-HSA-NEXT: s_lshr_b32 s41, s12, 16
	; GCN-HSA-NEXT: s_lshr_b32 s41, s15, 16			; GCN-HSA-NEXT: s_lshr_b32 s42, s15, 16
	; GCN-HSA-NEXT: s_lshr_b32 s42, s14, 16			; GCN-HSA-NEXT: s_lshr_b32 s43, s14, 16
	; GCN-HSA-NEXT: s_and_b32 s25, s1, 0xffff			; GCN-HSA-NEXT: s_and_b32 s25, s1, 0xffff
	; GCN-HSA-NEXT: s_and_b32 s27, s0, 0xffff			; GCN-HSA-NEXT: s_and_b32 s27, s0, 0xffff
	; GCN-HSA-NEXT: s_and_b32 s29, s3, 0xffff			; GCN-HSA-NEXT: s_and_b32 s29, s3, 0xffff
	; GCN-HSA-NEXT: s_and_b32 s31, s2, 0xffff			; GCN-HSA-NEXT: s_and_b32 s31, s2, 0xffff
	; GCN-HSA-NEXT: s_and_b32 s34, s5, 0xffff			; GCN-HSA-NEXT: s_and_b32 s34, s5, 0xffff
	; GCN-HSA-NEXT: s_and_b32 s36, s4, 0xffff			; GCN-HSA-NEXT: s_and_b32 s36, s4, 0xffff
	; GCN-HSA-NEXT: s_and_b32 s43, s7, 0xffff			; GCN-HSA-NEXT: s_and_b32 s39, s7, 0xffff
	; GCN-HSA-NEXT: s_and_b32 s44, s6, 0xffff			; GCN-HSA-NEXT: s_and_b32 s44, s6, 0xffff
	; GCN-HSA-NEXT: s_and_b32 s45, s9, 0xffff			; GCN-HSA-NEXT: s_and_b32 s45, s9, 0xffff
	; GCN-HSA-NEXT: s_and_b32 s46, s8, 0xffff			; GCN-HSA-NEXT: s_and_b32 s46, s8, 0xffff
	; GCN-HSA-NEXT: s_and_b32 s47, s11, 0xffff			; GCN-HSA-NEXT: s_and_b32 s47, s11, 0xffff
	; GCN-HSA-NEXT: s_and_b32 s48, s10, 0xffff			; GCN-HSA-NEXT: s_and_b32 s48, s10, 0xffff
	; GCN-HSA-NEXT: s_and_b32 s49, s13, 0xffff			; GCN-HSA-NEXT: s_and_b32 s49, s13, 0xffff
	; GCN-HSA-NEXT: s_and_b32 s50, s12, 0xffff			; GCN-HSA-NEXT: s_and_b32 s50, s12, 0xffff
	; GCN-HSA-NEXT: s_and_b32 s51, s15, 0xffff			; GCN-HSA-NEXT: s_and_b32 s51, s15, 0xffff
	▲ Show 20 Lines • Show All 108 Lines • ▼ Show 20 Lines
	; GCN-HSA-NEXT: v_mov_b32_e32 v4, s68			; GCN-HSA-NEXT: v_mov_b32_e32 v4, s68
	; GCN-HSA-NEXT: v_mov_b32_e32 v23, s53			; GCN-HSA-NEXT: v_mov_b32_e32 v23, s53
	; GCN-HSA-NEXT: v_mov_b32_e32 v5, s19			; GCN-HSA-NEXT: v_mov_b32_e32 v5, s19
	; GCN-HSA-NEXT: s_addc_u32 s1, s17, 0			; GCN-HSA-NEXT: s_addc_u32 s1, s17, 0
	; GCN-HSA-NEXT: v_mov_b32_e32 v6, s67			; GCN-HSA-NEXT: v_mov_b32_e32 v6, s67
	; GCN-HSA-NEXT: v_mov_b32_e32 v8, s52			; GCN-HSA-NEXT: v_mov_b32_e32 v8, s52
	; GCN-HSA-NEXT: v_mov_b32_e32 v7, s18			; GCN-HSA-NEXT: v_mov_b32_e32 v7, s18
	; GCN-HSA-NEXT: flat_store_dwordx4 v[9:10], v[0:3]			; GCN-HSA-NEXT: flat_store_dwordx4 v[9:10], v[0:3]
	; GCN-HSA-NEXT: v_mov_b32_e32 v9, s42			; GCN-HSA-NEXT: v_mov_b32_e32 v9, s43
	; GCN-HSA-NEXT: v_mov_b32_e32 v0, s50			; GCN-HSA-NEXT: v_mov_b32_e32 v0, s50
	; GCN-HSA-NEXT: v_mov_b32_e32 v10, s51			; GCN-HSA-NEXT: v_mov_b32_e32 v10, s51
	; GCN-HSA-NEXT: v_mov_b32_e32 v11, s41			; GCN-HSA-NEXT: v_mov_b32_e32 v11, s42
	; GCN-HSA-NEXT: v_mov_b32_e32 v1, s40			; GCN-HSA-NEXT: v_mov_b32_e32 v1, s41
	; GCN-HSA-NEXT: v_mov_b32_e32 v2, s49			; GCN-HSA-NEXT: v_mov_b32_e32 v2, s49
	; GCN-HSA-NEXT: v_mov_b32_e32 v3, s39			; GCN-HSA-NEXT: v_mov_b32_e32 v3, s40
	; GCN-HSA-NEXT: flat_store_dwordx4 v[12:13], v[20:23]			; GCN-HSA-NEXT: flat_store_dwordx4 v[12:13], v[20:23]
	; GCN-HSA-NEXT: flat_store_dwordx4 v[14:15], v[4:7]			; GCN-HSA-NEXT: flat_store_dwordx4 v[14:15], v[4:7]
	; GCN-HSA-NEXT: flat_store_dwordx4 v[16:17], v[8:11]			; GCN-HSA-NEXT: flat_store_dwordx4 v[16:17], v[8:11]
	; GCN-HSA-NEXT: flat_store_dwordx4 v[18:19], v[0:3]			; GCN-HSA-NEXT: flat_store_dwordx4 v[18:19], v[0:3]
	; GCN-HSA-NEXT: v_mov_b32_e32 v5, s1			; GCN-HSA-NEXT: v_mov_b32_e32 v5, s1
	; GCN-HSA-NEXT: v_mov_b32_e32 v4, s0			; GCN-HSA-NEXT: v_mov_b32_e32 v4, s0
	; GCN-HSA-NEXT: s_add_u32 s0, s16, 64			; GCN-HSA-NEXT: s_add_u32 s0, s16, 64
	; GCN-HSA-NEXT: v_mov_b32_e32 v0, s48			; GCN-HSA-NEXT: v_mov_b32_e32 v0, s48
	Show All 11 Lines
	; GCN-HSA-NEXT: v_mov_b32_e32 v3, s33			; GCN-HSA-NEXT: v_mov_b32_e32 v3, s33
	; GCN-HSA-NEXT: s_addc_u32 s1, s17, 0			; GCN-HSA-NEXT: s_addc_u32 s1, s17, 0
	; GCN-HSA-NEXT: flat_store_dwordx4 v[4:5], v[0:3]			; GCN-HSA-NEXT: flat_store_dwordx4 v[4:5], v[0:3]
	; GCN-HSA-NEXT: v_mov_b32_e32 v5, s1			; GCN-HSA-NEXT: v_mov_b32_e32 v5, s1
	; GCN-HSA-NEXT: v_mov_b32_e32 v4, s0			; GCN-HSA-NEXT: v_mov_b32_e32 v4, s0
	; GCN-HSA-NEXT: s_add_u32 s0, s16, 32			; GCN-HSA-NEXT: s_add_u32 s0, s16, 32
	; GCN-HSA-NEXT: v_mov_b32_e32 v0, s44			; GCN-HSA-NEXT: v_mov_b32_e32 v0, s44
	; GCN-HSA-NEXT: v_mov_b32_e32 v1, s30			; GCN-HSA-NEXT: v_mov_b32_e32 v1, s30
	; GCN-HSA-NEXT: v_mov_b32_e32 v2, s43			; GCN-HSA-NEXT: v_mov_b32_e32 v2, s39
	; GCN-HSA-NEXT: v_mov_b32_e32 v3, s28			; GCN-HSA-NEXT: v_mov_b32_e32 v3, s28
	; GCN-HSA-NEXT: s_addc_u32 s1, s17, 0			; GCN-HSA-NEXT: s_addc_u32 s1, s17, 0
	; GCN-HSA-NEXT: flat_store_dwordx4 v[4:5], v[0:3]			; GCN-HSA-NEXT: flat_store_dwordx4 v[4:5], v[0:3]
	; GCN-HSA-NEXT: v_mov_b32_e32 v5, s1			; GCN-HSA-NEXT: v_mov_b32_e32 v5, s1
	; GCN-HSA-NEXT: v_mov_b32_e32 v4, s0			; GCN-HSA-NEXT: v_mov_b32_e32 v4, s0
	; GCN-HSA-NEXT: s_add_u32 s0, s16, 16			; GCN-HSA-NEXT: s_add_u32 s0, s16, 16
	; GCN-HSA-NEXT: v_mov_b32_e32 v0, s36			; GCN-HSA-NEXT: v_mov_b32_e32 v0, s36
	; GCN-HSA-NEXT: v_mov_b32_e32 v1, s26			; GCN-HSA-NEXT: v_mov_b32_e32 v1, s26
	▲ Show 20 Lines • Show All 2,958 Lines • ▼ Show 20 Lines
	; GCN-HSA-NEXT: s_lshr_b32 s27, s14, 16			; GCN-HSA-NEXT: s_lshr_b32 s27, s14, 16
	; GCN-HSA-NEXT: s_lshr_b32 s28, s12, 16			; GCN-HSA-NEXT: s_lshr_b32 s28, s12, 16
	; GCN-HSA-NEXT: s_lshr_b32 s29, s10, 16			; GCN-HSA-NEXT: s_lshr_b32 s29, s10, 16
	; GCN-HSA-NEXT: s_lshr_b32 s30, s8, 16			; GCN-HSA-NEXT: s_lshr_b32 s30, s8, 16
	; GCN-HSA-NEXT: s_lshr_b32 s31, s6, 16			; GCN-HSA-NEXT: s_lshr_b32 s31, s6, 16
	; GCN-HSA-NEXT: s_lshr_b32 s33, s4, 16			; GCN-HSA-NEXT: s_lshr_b32 s33, s4, 16
	; GCN-HSA-NEXT: s_lshr_b32 s34, s2, 16			; GCN-HSA-NEXT: s_lshr_b32 s34, s2, 16
	; GCN-HSA-NEXT: s_lshr_b32 s18, s0, 16			; GCN-HSA-NEXT: s_lshr_b32 s18, s0, 16
	; GCN-HSA-NEXT: s_and_b32 s35, s0, 0xffff			; GCN-HSA-NEXT: s_and_b32 s0, s0, 0xffff
	; GCN-HSA-NEXT: s_and_b32 s2, s2, 0xffff			; GCN-HSA-NEXT: s_and_b32 s35, s2, 0xffff
	; GCN-HSA-NEXT: s_and_b32 s4, s4, 0xffff			; GCN-HSA-NEXT: s_and_b32 s4, s4, 0xffff
	; GCN-HSA-NEXT: s_and_b32 s6, s6, 0xffff			; GCN-HSA-NEXT: s_and_b32 s6, s6, 0xffff
	; GCN-HSA-NEXT: s_and_b32 s8, s8, 0xffff			; GCN-HSA-NEXT: s_and_b32 s8, s8, 0xffff
	; GCN-HSA-NEXT: s_and_b32 s10, s10, 0xffff			; GCN-HSA-NEXT: s_and_b32 s10, s10, 0xffff
	; GCN-HSA-NEXT: s_and_b32 s12, s12, 0xffff			; GCN-HSA-NEXT: s_and_b32 s12, s12, 0xffff
	; GCN-HSA-NEXT: s_and_b32 s14, s14, 0xffff			; GCN-HSA-NEXT: s_and_b32 s14, s14, 0xffff
	; GCN-HSA-NEXT: s_and_b32 s36, s1, 0xffff			; GCN-HSA-NEXT: s_and_b32 s1, s1, 0xffff
	; GCN-HSA-NEXT: s_and_b32 s3, s3, 0xffff			; GCN-HSA-NEXT: s_and_b32 s36, s3, 0xffff
	; GCN-HSA-NEXT: s_and_b32 s5, s5, 0xffff			; GCN-HSA-NEXT: s_and_b32 s5, s5, 0xffff
	; GCN-HSA-NEXT: s_and_b32 s7, s7, 0xffff			; GCN-HSA-NEXT: s_and_b32 s7, s7, 0xffff
	; GCN-HSA-NEXT: s_and_b32 s9, s9, 0xffff			; GCN-HSA-NEXT: s_and_b32 s9, s9, 0xffff
	; GCN-HSA-NEXT: s_and_b32 s11, s11, 0xffff			; GCN-HSA-NEXT: s_and_b32 s11, s11, 0xffff
	; GCN-HSA-NEXT: s_and_b32 s13, s13, 0xffff			; GCN-HSA-NEXT: s_and_b32 s13, s13, 0xffff
	; GCN-HSA-NEXT: s_and_b32 s15, s15, 0xffff			; GCN-HSA-NEXT: s_and_b32 s15, s15, 0xffff
	; GCN-HSA-NEXT: s_add_u32 s0, s16, 0xf0			; GCN-HSA-NEXT: s_add_u32 s2, s16, 0xf0
	; GCN-HSA-NEXT: s_addc_u32 s1, s17, 0			; GCN-HSA-NEXT: s_addc_u32 s3, s17, 0
	; GCN-HSA-NEXT: v_mov_b32_e32 v5, s1			; GCN-HSA-NEXT: v_mov_b32_e32 v5, s3
	; GCN-HSA-NEXT: v_mov_b32_e32 v4, s0			; GCN-HSA-NEXT: v_mov_b32_e32 v4, s2
	; GCN-HSA-NEXT: s_add_u32 s0, s16, 0xd0			; GCN-HSA-NEXT: s_add_u32 s2, s16, 0xd0
	; GCN-HSA-NEXT: s_addc_u32 s1, s17, 0			; GCN-HSA-NEXT: s_addc_u32 s3, s17, 0
	; GCN-HSA-NEXT: v_mov_b32_e32 v7, s1			; GCN-HSA-NEXT: v_mov_b32_e32 v7, s3
	; GCN-HSA-NEXT: v_mov_b32_e32 v6, s0			; GCN-HSA-NEXT: v_mov_b32_e32 v6, s2
	; GCN-HSA-NEXT: s_add_u32 s0, s16, 0xb0			; GCN-HSA-NEXT: s_add_u32 s2, s16, 0xb0
	; GCN-HSA-NEXT: s_addc_u32 s1, s17, 0			; GCN-HSA-NEXT: s_addc_u32 s3, s17, 0
	; GCN-HSA-NEXT: v_mov_b32_e32 v9, s1			; GCN-HSA-NEXT: v_mov_b32_e32 v9, s3
	; GCN-HSA-NEXT: v_mov_b32_e32 v8, s0			; GCN-HSA-NEXT: v_mov_b32_e32 v8, s2
	; GCN-HSA-NEXT: s_add_u32 s0, s16, 0x90			; GCN-HSA-NEXT: s_add_u32 s2, s16, 0x90
	; GCN-HSA-NEXT: s_addc_u32 s1, s17, 0			; GCN-HSA-NEXT: s_addc_u32 s3, s17, 0
	; GCN-HSA-NEXT: v_mov_b32_e32 v11, s1			; GCN-HSA-NEXT: v_mov_b32_e32 v11, s3
	; GCN-HSA-NEXT: v_mov_b32_e32 v10, s0			; GCN-HSA-NEXT: v_mov_b32_e32 v10, s2
	; GCN-HSA-NEXT: v_mov_b32_e32 v0, s15			; GCN-HSA-NEXT: v_mov_b32_e32 v0, s15
	; GCN-HSA-NEXT: v_mov_b32_e32 v2, s26			; GCN-HSA-NEXT: v_mov_b32_e32 v2, s26
	; GCN-HSA-NEXT: s_add_u32 s0, s16, 0x70			; GCN-HSA-NEXT: s_add_u32 s2, s16, 0x70
	; GCN-HSA-NEXT: flat_store_dwordx4 v[4:5], v[0:3]			; GCN-HSA-NEXT: flat_store_dwordx4 v[4:5], v[0:3]
	; GCN-HSA-NEXT: s_addc_u32 s1, s17, 0			; GCN-HSA-NEXT: s_addc_u32 s3, s17, 0
	; GCN-HSA-NEXT: v_mov_b32_e32 v0, s13			; GCN-HSA-NEXT: v_mov_b32_e32 v0, s13
	; GCN-HSA-NEXT: v_mov_b32_e32 v2, s25			; GCN-HSA-NEXT: v_mov_b32_e32 v2, s25
	; GCN-HSA-NEXT: flat_store_dwordx4 v[6:7], v[0:3]			; GCN-HSA-NEXT: flat_store_dwordx4 v[6:7], v[0:3]
	; GCN-HSA-NEXT: v_mov_b32_e32 v5, s1			; GCN-HSA-NEXT: v_mov_b32_e32 v5, s3
	; GCN-HSA-NEXT: v_mov_b32_e32 v0, s11			; GCN-HSA-NEXT: v_mov_b32_e32 v0, s11
	; GCN-HSA-NEXT: v_mov_b32_e32 v2, s24			; GCN-HSA-NEXT: v_mov_b32_e32 v2, s24
	; GCN-HSA-NEXT: flat_store_dwordx4 v[8:9], v[0:3]			; GCN-HSA-NEXT: flat_store_dwordx4 v[8:9], v[0:3]
	; GCN-HSA-NEXT: v_mov_b32_e32 v4, s0			; GCN-HSA-NEXT: v_mov_b32_e32 v4, s2
	; GCN-HSA-NEXT: v_mov_b32_e32 v0, s9			; GCN-HSA-NEXT: v_mov_b32_e32 v0, s9
	; GCN-HSA-NEXT: v_mov_b32_e32 v2, s23			; GCN-HSA-NEXT: v_mov_b32_e32 v2, s23
	; GCN-HSA-NEXT: s_add_u32 s0, s16, 0x50			; GCN-HSA-NEXT: s_add_u32 s2, s16, 0x50
	; GCN-HSA-NEXT: flat_store_dwordx4 v[10:11], v[0:3]			; GCN-HSA-NEXT: flat_store_dwordx4 v[10:11], v[0:3]
	; GCN-HSA-NEXT: s_addc_u32 s1, s17, 0			; GCN-HSA-NEXT: s_addc_u32 s3, s17, 0
	; GCN-HSA-NEXT: v_mov_b32_e32 v0, s7			; GCN-HSA-NEXT: v_mov_b32_e32 v0, s7
	; GCN-HSA-NEXT: v_mov_b32_e32 v2, s22			; GCN-HSA-NEXT: v_mov_b32_e32 v2, s22
	; GCN-HSA-NEXT: flat_store_dwordx4 v[4:5], v[0:3]			; GCN-HSA-NEXT: flat_store_dwordx4 v[4:5], v[0:3]
	; GCN-HSA-NEXT: v_mov_b32_e32 v5, s1			; GCN-HSA-NEXT: v_mov_b32_e32 v5, s3
	; GCN-HSA-NEXT: v_mov_b32_e32 v4, s0			; GCN-HSA-NEXT: v_mov_b32_e32 v4, s2
	; GCN-HSA-NEXT: s_add_u32 s0, s16, 48			; GCN-HSA-NEXT: s_add_u32 s2, s16, 48
	; GCN-HSA-NEXT: v_mov_b32_e32 v0, s5			; GCN-HSA-NEXT: v_mov_b32_e32 v0, s5
	; GCN-HSA-NEXT: v_mov_b32_e32 v2, s21			; GCN-HSA-NEXT: v_mov_b32_e32 v2, s21
	; GCN-HSA-NEXT: s_addc_u32 s1, s17, 0			; GCN-HSA-NEXT: s_addc_u32 s3, s17, 0
	; GCN-HSA-NEXT: flat_store_dwordx4 v[4:5], v[0:3]			; GCN-HSA-NEXT: flat_store_dwordx4 v[4:5], v[0:3]
	; GCN-HSA-NEXT: v_mov_b32_e32 v5, s1			; GCN-HSA-NEXT: v_mov_b32_e32 v5, s3
	; GCN-HSA-NEXT: v_mov_b32_e32 v4, s0			; GCN-HSA-NEXT: v_mov_b32_e32 v4, s2
	; GCN-HSA-NEXT: s_add_u32 s0, s16, 16			; GCN-HSA-NEXT: s_add_u32 s2, s16, 16
	; GCN-HSA-NEXT: v_mov_b32_e32 v0, s3			; GCN-HSA-NEXT: v_mov_b32_e32 v0, s36
	; GCN-HSA-NEXT: v_mov_b32_e32 v2, s20			; GCN-HSA-NEXT: v_mov_b32_e32 v2, s20
	; GCN-HSA-NEXT: s_addc_u32 s1, s17, 0			; GCN-HSA-NEXT: s_addc_u32 s3, s17, 0
	; GCN-HSA-NEXT: flat_store_dwordx4 v[4:5], v[0:3]			; GCN-HSA-NEXT: flat_store_dwordx4 v[4:5], v[0:3]
	; GCN-HSA-NEXT: v_mov_b32_e32 v5, s1			; GCN-HSA-NEXT: v_mov_b32_e32 v5, s3
	; GCN-HSA-NEXT: v_mov_b32_e32 v4, s0			; GCN-HSA-NEXT: v_mov_b32_e32 v4, s2
	; GCN-HSA-NEXT: s_add_u32 s0, s16, 0xe0			; GCN-HSA-NEXT: s_add_u32 s2, s16, 0xe0
	; GCN-HSA-NEXT: v_mov_b32_e32 v0, s36			; GCN-HSA-NEXT: v_mov_b32_e32 v0, s1
	; GCN-HSA-NEXT: v_mov_b32_e32 v2, s19			; GCN-HSA-NEXT: v_mov_b32_e32 v2, s19
	; GCN-HSA-NEXT: s_addc_u32 s1, s17, 0			; GCN-HSA-NEXT: s_addc_u32 s3, s17, 0
	; GCN-HSA-NEXT: flat_store_dwordx4 v[4:5], v[0:3]			; GCN-HSA-NEXT: flat_store_dwordx4 v[4:5], v[0:3]
	; GCN-HSA-NEXT: v_mov_b32_e32 v5, s1			; GCN-HSA-NEXT: v_mov_b32_e32 v5, s3
	; GCN-HSA-NEXT: v_mov_b32_e32 v4, s0			; GCN-HSA-NEXT: v_mov_b32_e32 v4, s2
	; GCN-HSA-NEXT: s_add_u32 s0, s16, 0xc0			; GCN-HSA-NEXT: s_add_u32 s2, s16, 0xc0
	; GCN-HSA-NEXT: v_mov_b32_e32 v0, s14			; GCN-HSA-NEXT: v_mov_b32_e32 v0, s14
	; GCN-HSA-NEXT: v_mov_b32_e32 v2, s27			; GCN-HSA-NEXT: v_mov_b32_e32 v2, s27
	; GCN-HSA-NEXT: s_addc_u32 s1, s17, 0			; GCN-HSA-NEXT: s_addc_u32 s3, s17, 0
	; GCN-HSA-NEXT: flat_store_dwordx4 v[4:5], v[0:3]			; GCN-HSA-NEXT: flat_store_dwordx4 v[4:5], v[0:3]
	; GCN-HSA-NEXT: v_mov_b32_e32 v5, s1			; GCN-HSA-NEXT: v_mov_b32_e32 v5, s3
	; GCN-HSA-NEXT: v_mov_b32_e32 v4, s0			; GCN-HSA-NEXT: v_mov_b32_e32 v4, s2
	; GCN-HSA-NEXT: s_add_u32 s0, s16, 0xa0			; GCN-HSA-NEXT: s_add_u32 s2, s16, 0xa0
	; GCN-HSA-NEXT: v_mov_b32_e32 v0, s12			; GCN-HSA-NEXT: v_mov_b32_e32 v0, s12
	; GCN-HSA-NEXT: v_mov_b32_e32 v2, s28			; GCN-HSA-NEXT: v_mov_b32_e32 v2, s28
	; GCN-HSA-NEXT: s_addc_u32 s1, s17, 0			; GCN-HSA-NEXT: s_addc_u32 s3, s17, 0
	; GCN-HSA-NEXT: flat_store_dwordx4 v[4:5], v[0:3]			; GCN-HSA-NEXT: flat_store_dwordx4 v[4:5], v[0:3]
	; GCN-HSA-NEXT: v_mov_b32_e32 v5, s1			; GCN-HSA-NEXT: v_mov_b32_e32 v5, s3
	; GCN-HSA-NEXT: v_mov_b32_e32 v4, s0			; GCN-HSA-NEXT: v_mov_b32_e32 v4, s2
	; GCN-HSA-NEXT: s_add_u32 s0, s16, 0x80			; GCN-HSA-NEXT: s_add_u32 s2, s16, 0x80
	; GCN-HSA-NEXT: v_mov_b32_e32 v0, s10			; GCN-HSA-NEXT: v_mov_b32_e32 v0, s10
	; GCN-HSA-NEXT: v_mov_b32_e32 v2, s29			; GCN-HSA-NEXT: v_mov_b32_e32 v2, s29
	; GCN-HSA-NEXT: s_addc_u32 s1, s17, 0			; GCN-HSA-NEXT: s_addc_u32 s3, s17, 0
	; GCN-HSA-NEXT: flat_store_dwordx4 v[4:5], v[0:3]			; GCN-HSA-NEXT: flat_store_dwordx4 v[4:5], v[0:3]
	; GCN-HSA-NEXT: v_mov_b32_e32 v5, s1			; GCN-HSA-NEXT: v_mov_b32_e32 v5, s3
	; GCN-HSA-NEXT: v_mov_b32_e32 v4, s0			; GCN-HSA-NEXT: v_mov_b32_e32 v4, s2
	; GCN-HSA-NEXT: s_add_u32 s0, s16, 0x60			; GCN-HSA-NEXT: s_add_u32 s2, s16, 0x60
	; GCN-HSA-NEXT: v_mov_b32_e32 v0, s8			; GCN-HSA-NEXT: v_mov_b32_e32 v0, s8
	; GCN-HSA-NEXT: v_mov_b32_e32 v2, s30			; GCN-HSA-NEXT: v_mov_b32_e32 v2, s30
	; GCN-HSA-NEXT: s_addc_u32 s1, s17, 0			; GCN-HSA-NEXT: s_addc_u32 s3, s17, 0
	; GCN-HSA-NEXT: flat_store_dwordx4 v[4:5], v[0:3]			; GCN-HSA-NEXT: flat_store_dwordx4 v[4:5], v[0:3]
	; GCN-HSA-NEXT: v_mov_b32_e32 v5, s1			; GCN-HSA-NEXT: v_mov_b32_e32 v5, s3
	; GCN-HSA-NEXT: v_mov_b32_e32 v4, s0			; GCN-HSA-NEXT: v_mov_b32_e32 v4, s2
	; GCN-HSA-NEXT: s_add_u32 s0, s16, 64			; GCN-HSA-NEXT: s_add_u32 s2, s16, 64
	; GCN-HSA-NEXT: v_mov_b32_e32 v0, s6			; GCN-HSA-NEXT: v_mov_b32_e32 v0, s6
	; GCN-HSA-NEXT: v_mov_b32_e32 v2, s31			; GCN-HSA-NEXT: v_mov_b32_e32 v2, s31
	; GCN-HSA-NEXT: s_addc_u32 s1, s17, 0			; GCN-HSA-NEXT: s_addc_u32 s3, s17, 0
	; GCN-HSA-NEXT: flat_store_dwordx4 v[4:5], v[0:3]			; GCN-HSA-NEXT: flat_store_dwordx4 v[4:5], v[0:3]
	; GCN-HSA-NEXT: v_mov_b32_e32 v5, s1			; GCN-HSA-NEXT: v_mov_b32_e32 v5, s3
	; GCN-HSA-NEXT: v_mov_b32_e32 v4, s0			; GCN-HSA-NEXT: v_mov_b32_e32 v4, s2
	; GCN-HSA-NEXT: s_add_u32 s0, s16, 32			; GCN-HSA-NEXT: s_add_u32 s2, s16, 32
	; GCN-HSA-NEXT: v_mov_b32_e32 v0, s4			; GCN-HSA-NEXT: v_mov_b32_e32 v0, s4
	; GCN-HSA-NEXT: v_mov_b32_e32 v2, s33			; GCN-HSA-NEXT: v_mov_b32_e32 v2, s33
	; GCN-HSA-NEXT: s_addc_u32 s1, s17, 0			; GCN-HSA-NEXT: s_addc_u32 s3, s17, 0
	; GCN-HSA-NEXT: flat_store_dwordx4 v[4:5], v[0:3]			; GCN-HSA-NEXT: flat_store_dwordx4 v[4:5], v[0:3]
	; GCN-HSA-NEXT: v_mov_b32_e32 v5, s1			; GCN-HSA-NEXT: v_mov_b32_e32 v5, s3
	; GCN-HSA-NEXT: v_mov_b32_e32 v0, s2			; GCN-HSA-NEXT: v_mov_b32_e32 v0, s35
	; GCN-HSA-NEXT: v_mov_b32_e32 v2, s34			; GCN-HSA-NEXT: v_mov_b32_e32 v2, s34
	; GCN-HSA-NEXT: v_mov_b32_e32 v4, s0			; GCN-HSA-NEXT: v_mov_b32_e32 v4, s2
	; GCN-HSA-NEXT: flat_store_dwordx4 v[4:5], v[0:3]			; GCN-HSA-NEXT: flat_store_dwordx4 v[4:5], v[0:3]
	; GCN-HSA-NEXT: v_mov_b32_e32 v4, s16			; GCN-HSA-NEXT: v_mov_b32_e32 v4, s16
	; GCN-HSA-NEXT: v_mov_b32_e32 v0, s35			; GCN-HSA-NEXT: v_mov_b32_e32 v0, s0
	; GCN-HSA-NEXT: v_mov_b32_e32 v2, s18			; GCN-HSA-NEXT: v_mov_b32_e32 v2, s18
	; GCN-HSA-NEXT: v_mov_b32_e32 v5, s17			; GCN-HSA-NEXT: v_mov_b32_e32 v5, s17
	; GCN-HSA-NEXT: flat_store_dwordx4 v[4:5], v[0:3]			; GCN-HSA-NEXT: flat_store_dwordx4 v[4:5], v[0:3]
	; GCN-HSA-NEXT: s_endpgm			; GCN-HSA-NEXT: s_endpgm
	;			;
	; GCN-NOHSA-VI-LABEL: constant_zextload_v32i16_to_v32i64:			; GCN-NOHSA-VI-LABEL: constant_zextload_v32i16_to_v32i64:
	; GCN-NOHSA-VI: ; %bb.0:			; GCN-NOHSA-VI: ; %bb.0:
	; GCN-NOHSA-VI-NEXT: s_load_dwordx4 s[16:19], s[0:1], 0x24			; GCN-NOHSA-VI-NEXT: s_load_dwordx4 s[16:19], s[0:1], 0x24
	▲ Show 20 Lines • Show All 269 Lines • ▼ Show 20 Lines
	; GCN-NOHSA-SI-NEXT: s_mov_b32 s46, s7			; GCN-NOHSA-SI-NEXT: s_mov_b32 s46, s7
	; GCN-NOHSA-SI-NEXT: s_mov_b32 s44, s5			; GCN-NOHSA-SI-NEXT: s_mov_b32 s44, s5
	; GCN-NOHSA-SI-NEXT: s_mov_b32 s36, s3			; GCN-NOHSA-SI-NEXT: s_mov_b32 s36, s3
	; GCN-NOHSA-SI-NEXT: s_mov_b32 s38, s1			; GCN-NOHSA-SI-NEXT: s_mov_b32 s38, s1
	; GCN-NOHSA-SI-NEXT: s_lshr_b32 s22, s14, 16			; GCN-NOHSA-SI-NEXT: s_lshr_b32 s22, s14, 16
	; GCN-NOHSA-SI-NEXT: s_lshr_b32 s26, s12, 16			; GCN-NOHSA-SI-NEXT: s_lshr_b32 s26, s12, 16
	; GCN-NOHSA-SI-NEXT: s_lshr_b32 s28, s10, 16			; GCN-NOHSA-SI-NEXT: s_lshr_b32 s28, s10, 16
	; GCN-NOHSA-SI-NEXT: s_lshr_b32 s30, s8, 16			; GCN-NOHSA-SI-NEXT: s_lshr_b32 s30, s8, 16
	; GCN-NOHSA-SI-NEXT: s_bfe_i64 s[48:49], s[20:21], 0x100000			; GCN-NOHSA-SI-NEXT: s_bfe_i64 s[50:51], s[20:21], 0x100000
	; GCN-NOHSA-SI-NEXT: s_bfe_i64 s[50:51], s[18:19], 0x100000			; GCN-NOHSA-SI-NEXT: s_bfe_i64 s[52:53], s[18:19], 0x100000
	; GCN-NOHSA-SI-NEXT: s_lshr_b32 s52, s6, 16			; GCN-NOHSA-SI-NEXT: s_lshr_b32 s54, s6, 16
	; GCN-NOHSA-SI-NEXT: s_lshr_b32 s54, s4, 16			; GCN-NOHSA-SI-NEXT: s_lshr_b32 s56, s4, 16
	; GCN-NOHSA-SI-NEXT: s_lshr_b32 s56, s2, 16			; GCN-NOHSA-SI-NEXT: s_lshr_b32 s58, s2, 16
	; GCN-NOHSA-SI-NEXT: s_lshr_b32 s58, s0, 16			; GCN-NOHSA-SI-NEXT: s_lshr_b32 s60, s0, 16
	; GCN-NOHSA-SI-NEXT: s_bfe_i64 s[18:19], s[0:1], 0x100000			; GCN-NOHSA-SI-NEXT: s_bfe_i64 s[18:19], s[0:1], 0x100000
	; GCN-NOHSA-SI-NEXT: s_bfe_i64 s[20:21], s[2:3], 0x100000			; GCN-NOHSA-SI-NEXT: s_bfe_i64 s[20:21], s[2:3], 0x100000
	; GCN-NOHSA-SI-NEXT: s_bfe_i64 s[24:25], s[4:5], 0x100000			; GCN-NOHSA-SI-NEXT: s_bfe_i64 s[24:25], s[4:5], 0x100000
	; GCN-NOHSA-SI-NEXT: s_bfe_i64 s[34:35], s[6:7], 0x100000			; GCN-NOHSA-SI-NEXT: s_bfe_i64 s[34:35], s[6:7], 0x100000
	; GCN-NOHSA-SI-NEXT: s_bfe_i64 s[60:61], s[8:9], 0x100000			; GCN-NOHSA-SI-NEXT: s_bfe_i64 s[48:49], s[8:9], 0x100000
	; GCN-NOHSA-SI-NEXT: s_bfe_i64 s[62:63], s[10:11], 0x100000			; GCN-NOHSA-SI-NEXT: s_bfe_i64 s[62:63], s[10:11], 0x100000
	; GCN-NOHSA-SI-NEXT: s_bfe_i64 s[64:65], s[12:13], 0x100000			; GCN-NOHSA-SI-NEXT: s_bfe_i64 s[64:65], s[12:13], 0x100000
	; GCN-NOHSA-SI-NEXT: s_bfe_i64 s[66:67], s[14:15], 0x100000			; GCN-NOHSA-SI-NEXT: s_bfe_i64 s[66:67], s[14:15], 0x100000
	; GCN-NOHSA-SI-NEXT: s_ashr_i64 s[68:69], s[0:1], 48			; GCN-NOHSA-SI-NEXT: s_ashr_i64 s[68:69], s[0:1], 48
	; GCN-NOHSA-SI-NEXT: s_ashr_i64 s[70:71], s[2:3], 48			; GCN-NOHSA-SI-NEXT: s_ashr_i64 s[70:71], s[2:3], 48
	; GCN-NOHSA-SI-NEXT: s_ashr_i64 s[6:7], s[6:7], 48			; GCN-NOHSA-SI-NEXT: s_ashr_i64 s[6:7], s[6:7], 48
	; GCN-NOHSA-SI-NEXT: s_ashr_i64 s[8:9], s[8:9], 48			; GCN-NOHSA-SI-NEXT: s_ashr_i64 s[8:9], s[8:9], 48
	; GCN-NOHSA-SI-NEXT: s_ashr_i64 s[10:11], s[10:11], 48			; GCN-NOHSA-SI-NEXT: s_ashr_i64 s[10:11], s[10:11], 48
	; GCN-NOHSA-SI-NEXT: s_ashr_i64 s[2:3], s[12:13], 48			; GCN-NOHSA-SI-NEXT: s_ashr_i64 s[2:3], s[12:13], 48
	; GCN-NOHSA-SI-NEXT: s_ashr_i64 s[12:13], s[14:15], 48			; GCN-NOHSA-SI-NEXT: s_ashr_i64 s[12:13], s[14:15], 48
	; GCN-NOHSA-SI-NEXT: s_ashr_i64 s[4:5], s[4:5], 48			; GCN-NOHSA-SI-NEXT: s_ashr_i64 s[4:5], s[4:5], 48
	; GCN-NOHSA-SI-NEXT: s_mov_b32 s0, s16			; GCN-NOHSA-SI-NEXT: s_mov_b32 s0, s16
	; GCN-NOHSA-SI-NEXT: s_mov_b32 s1, s17			; GCN-NOHSA-SI-NEXT: s_mov_b32 s1, s17
	; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v0, s50			; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v0, s52
	; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v1, s51			; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v1, s53
	; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v2, s12			; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v2, s12
	; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v3, s13			; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v3, s13
	; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v4, s48			; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v4, s50
	; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v5, s49			; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v5, s51
	; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v6, s2			; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v6, s2
	; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v7, s3			; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v7, s3
	; GCN-NOHSA-SI-NEXT: s_mov_b32 s3, 0xf000			; GCN-NOHSA-SI-NEXT: s_mov_b32 s3, 0xf000
	; GCN-NOHSA-SI-NEXT: s_mov_b32 s2, -1			; GCN-NOHSA-SI-NEXT: s_mov_b32 s2, -1
	; GCN-NOHSA-SI-NEXT: s_bfe_i64 s[12:13], s[46:47], 0x100000			; GCN-NOHSA-SI-NEXT: s_bfe_i64 s[12:13], s[46:47], 0x100000
	; GCN-NOHSA-SI-NEXT: s_bfe_i64 s[14:15], s[42:43], 0x100000			; GCN-NOHSA-SI-NEXT: s_bfe_i64 s[14:15], s[42:43], 0x100000
	; GCN-NOHSA-SI-NEXT: s_bfe_i64 s[16:17], s[40:41], 0x100000			; GCN-NOHSA-SI-NEXT: s_bfe_i64 s[16:17], s[40:41], 0x100000
	; GCN-NOHSA-SI-NEXT: s_bfe_i64 s[40:41], s[44:45], 0x100000			; GCN-NOHSA-SI-NEXT: s_bfe_i64 s[40:41], s[44:45], 0x100000
	Show All 12 Lines
	; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v18, s6			; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v18, s6
	; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v19, s7			; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v19, s7
	; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v20, s40			; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v20, s40
	; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 offset:240			; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 offset:240
	; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v21, s41			; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v21, s41
	; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v22, s4			; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v22, s4
	; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v23, s5			; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v23, s5
	; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[4:7], off, s[0:3], 0 offset:208			; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[4:7], off, s[0:3], 0 offset:208
	; GCN-NOHSA-SI-NEXT: s_bfe_i64 s[4:5], s[58:59], 0x100000			; GCN-NOHSA-SI-NEXT: s_bfe_i64 s[4:5], s[60:61], 0x100000
	; GCN-NOHSA-SI-NEXT: s_bfe_i64 s[6:7], s[56:57], 0x100000			; GCN-NOHSA-SI-NEXT: s_bfe_i64 s[6:7], s[58:59], 0x100000
	; GCN-NOHSA-SI-NEXT: s_bfe_i64 s[8:9], s[54:55], 0x100000			; GCN-NOHSA-SI-NEXT: s_bfe_i64 s[8:9], s[56:57], 0x100000
	; GCN-NOHSA-SI-NEXT: s_bfe_i64 s[10:11], s[52:53], 0x100000			; GCN-NOHSA-SI-NEXT: s_bfe_i64 s[10:11], s[54:55], 0x100000
	; GCN-NOHSA-SI-NEXT: s_bfe_i64 s[12:13], s[30:31], 0x100000			; GCN-NOHSA-SI-NEXT: s_bfe_i64 s[12:13], s[30:31], 0x100000
	; GCN-NOHSA-SI-NEXT: s_bfe_i64 s[14:15], s[28:29], 0x100000			; GCN-NOHSA-SI-NEXT: s_bfe_i64 s[14:15], s[28:29], 0x100000
	; GCN-NOHSA-SI-NEXT: s_bfe_i64 s[16:17], s[26:27], 0x100000			; GCN-NOHSA-SI-NEXT: s_bfe_i64 s[16:17], s[26:27], 0x100000
	; GCN-NOHSA-SI-NEXT: s_bfe_i64 s[22:23], s[22:23], 0x100000			; GCN-NOHSA-SI-NEXT: s_bfe_i64 s[22:23], s[22:23], 0x100000
	; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[8:11], off, s[0:3], 0 offset:176			; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[8:11], off, s[0:3], 0 offset:176
	; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[12:15], off, s[0:3], 0 offset:144			; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[12:15], off, s[0:3], 0 offset:144
	; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[16:19], off, s[0:3], 0 offset:112			; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[16:19], off, s[0:3], 0 offset:112
	; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[20:23], off, s[0:3], 0 offset:80			; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[20:23], off, s[0:3], 0 offset:80
	Show All 11 Lines
	; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 offset:16			; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 offset:16
	; GCN-NOHSA-SI-NEXT: s_waitcnt expcnt(0)			; GCN-NOHSA-SI-NEXT: s_waitcnt expcnt(0)
	; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v0, s66			; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v0, s66
	; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v1, s67			; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v1, s67
	; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v4, s64			; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v4, s64
	; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v5, s65			; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v5, s65
	; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v8, s62			; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v8, s62
	; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v9, s63			; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v9, s63
	; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v12, s60			; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v12, s48
	; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v13, s61			; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v13, s49
	; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v16, s34			; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v16, s34
	; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v17, s35			; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v17, s35
	; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v20, s24			; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v20, s24
	; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v21, s25			; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v21, s25
	; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v24, s20			; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v24, s20
	; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v25, s21			; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v25, s21
	; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v2, s22			; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v2, s22
	; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v3, s23			; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v3, s23
	▲ Show 20 Lines • Show All 227 Lines • ▼ Show 20 Lines
	; GCN-HSA-NEXT: s_endpgm			; GCN-HSA-NEXT: s_endpgm
	;			;
	; GCN-NOHSA-VI-LABEL: constant_sextload_v32i16_to_v32i64:			; GCN-NOHSA-VI-LABEL: constant_sextload_v32i16_to_v32i64:
	; GCN-NOHSA-VI: ; %bb.0:			; GCN-NOHSA-VI: ; %bb.0:
	; GCN-NOHSA-VI-NEXT: s_load_dwordx4 s[16:19], s[0:1], 0x24			; GCN-NOHSA-VI-NEXT: s_load_dwordx4 s[16:19], s[0:1], 0x24
	; GCN-NOHSA-VI-NEXT: s_waitcnt lgkmcnt(0)			; GCN-NOHSA-VI-NEXT: s_waitcnt lgkmcnt(0)
	; GCN-NOHSA-VI-NEXT: s_load_dwordx16 s[0:15], s[18:19], 0x0			; GCN-NOHSA-VI-NEXT: s_load_dwordx16 s[0:15], s[18:19], 0x0
	; GCN-NOHSA-VI-NEXT: s_waitcnt lgkmcnt(0)			; GCN-NOHSA-VI-NEXT: s_waitcnt lgkmcnt(0)
	; GCN-NOHSA-VI-NEXT: s_mov_b32 s36, s15			; GCN-NOHSA-VI-NEXT: s_mov_b32 s38, s15
	; GCN-NOHSA-VI-NEXT: s_mov_b32 s38, s13			; GCN-NOHSA-VI-NEXT: s_mov_b32 s40, s13
	; GCN-NOHSA-VI-NEXT: s_ashr_i64 s[82:83], s[14:15], 48			; GCN-NOHSA-VI-NEXT: s_ashr_i64 s[82:83], s[14:15], 48
	; GCN-NOHSA-VI-NEXT: s_bfe_i64 s[36:37], s[36:37], 0x100000			; GCN-NOHSA-VI-NEXT: s_bfe_i64 s[38:39], s[38:39], 0x100000
	; GCN-NOHSA-VI-NEXT: s_mov_b32 s40, s11			; GCN-NOHSA-VI-NEXT: s_mov_b32 s42, s11
	; GCN-NOHSA-VI-NEXT: s_mov_b32 s48, s3			; GCN-NOHSA-VI-NEXT: s_mov_b32 s50, s3
	; GCN-NOHSA-VI-NEXT: s_mov_b32 s50, s1			; GCN-NOHSA-VI-NEXT: s_mov_b32 s52, s1
	; GCN-NOHSA-VI-NEXT: s_lshr_b32 s64, s2, 16			; GCN-NOHSA-VI-NEXT: s_lshr_b32 s66, s2, 16
	; GCN-NOHSA-VI-NEXT: s_lshr_b32 s66, s0, 16			; GCN-NOHSA-VI-NEXT: s_lshr_b32 s68, s0, 16
	; GCN-NOHSA-VI-NEXT: s_bfe_i64 s[18:19], s[0:1], 0x100000			; GCN-NOHSA-VI-NEXT: s_bfe_i64 s[18:19], s[0:1], 0x100000
	; GCN-NOHSA-VI-NEXT: s_bfe_i64 s[20:21], s[2:3], 0x100000			; GCN-NOHSA-VI-NEXT: s_bfe_i64 s[20:21], s[2:3], 0x100000
	; GCN-NOHSA-VI-NEXT: s_ashr_i64 s[68:69], s[0:1], 48			; GCN-NOHSA-VI-NEXT: s_ashr_i64 s[36:37], s[0:1], 48
	; GCN-NOHSA-VI-NEXT: s_ashr_i64 s[70:71], s[2:3], 48			; GCN-NOHSA-VI-NEXT: s_ashr_i64 s[70:71], s[2:3], 48
	; GCN-NOHSA-VI-NEXT: s_ashr_i64 s[80:81], s[12:13], 48			; GCN-NOHSA-VI-NEXT: s_ashr_i64 s[80:81], s[12:13], 48
	; GCN-NOHSA-VI-NEXT: s_mov_b32 s3, 0xf000			; GCN-NOHSA-VI-NEXT: s_mov_b32 s3, 0xf000
	; GCN-NOHSA-VI-NEXT: s_mov_b32 s2, -1			; GCN-NOHSA-VI-NEXT: s_mov_b32 s2, -1
	; GCN-NOHSA-VI-NEXT: s_mov_b32 s0, s16			; GCN-NOHSA-VI-NEXT: s_mov_b32 s0, s16
	; GCN-NOHSA-VI-NEXT: s_mov_b32 s1, s17			; GCN-NOHSA-VI-NEXT: s_mov_b32 s1, s17
	; GCN-NOHSA-VI-NEXT: s_bfe_i64 s[38:39], s[38:39], 0x100000			; GCN-NOHSA-VI-NEXT: s_bfe_i64 s[40:41], s[40:41], 0x100000
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v0, s36			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v0, s38
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v1, s37			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v1, s39
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v2, s82			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v2, s82
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v3, s83			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v3, s83
	; GCN-NOHSA-VI-NEXT: s_mov_b32 s42, s9			; GCN-NOHSA-VI-NEXT: s_mov_b32 s44, s9
	; GCN-NOHSA-VI-NEXT: s_ashr_i64 s[78:79], s[10:11], 48			; GCN-NOHSA-VI-NEXT: s_ashr_i64 s[78:79], s[10:11], 48
	; GCN-NOHSA-VI-NEXT: s_bfe_i64 s[40:41], s[40:41], 0x100000			; GCN-NOHSA-VI-NEXT: s_bfe_i64 s[42:43], s[42:43], 0x100000
	; GCN-NOHSA-VI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 offset:240			; GCN-NOHSA-VI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 offset:240
	; GCN-NOHSA-VI-NEXT: s_mov_b32 s44, s7			; GCN-NOHSA-VI-NEXT: s_mov_b32 s46, s7
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v0, s38			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v0, s40
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v1, s39			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v1, s41
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v2, s80			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v2, s80
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v3, s81			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v3, s81
	; GCN-NOHSA-VI-NEXT: s_ashr_i64 s[76:77], s[8:9], 48			; GCN-NOHSA-VI-NEXT: s_ashr_i64 s[76:77], s[8:9], 48
	; GCN-NOHSA-VI-NEXT: s_bfe_i64 s[42:43], s[42:43], 0x100000			; GCN-NOHSA-VI-NEXT: s_bfe_i64 s[44:45], s[44:45], 0x100000
	; GCN-NOHSA-VI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 offset:208			; GCN-NOHSA-VI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 offset:208
	; GCN-NOHSA-VI-NEXT: s_mov_b32 s46, s5			; GCN-NOHSA-VI-NEXT: s_mov_b32 s48, s5
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v0, s40			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v0, s42
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v1, s41			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v1, s43
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v2, s78			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v2, s78
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v3, s79			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v3, s79
	; GCN-NOHSA-VI-NEXT: s_ashr_i64 s[74:75], s[6:7], 48			; GCN-NOHSA-VI-NEXT: s_ashr_i64 s[74:75], s[6:7], 48
	; GCN-NOHSA-VI-NEXT: s_bfe_i64 s[44:45], s[44:45], 0x100000			; GCN-NOHSA-VI-NEXT: s_bfe_i64 s[46:47], s[46:47], 0x100000
	; GCN-NOHSA-VI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 offset:176			; GCN-NOHSA-VI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 offset:176
	; GCN-NOHSA-VI-NEXT: s_ashr_i64 s[72:73], s[4:5], 48			; GCN-NOHSA-VI-NEXT: s_ashr_i64 s[72:73], s[4:5], 48
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v0, s42			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v0, s44
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v1, s43			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v1, s45
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v2, s76			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v2, s76
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v3, s77			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v3, s77
	; GCN-NOHSA-VI-NEXT: s_bfe_i64 s[46:47], s[46:47], 0x100000
	; GCN-NOHSA-VI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 offset:144
	; GCN-NOHSA-VI-NEXT: s_bfe_i64 s[48:49], s[48:49], 0x100000			; GCN-NOHSA-VI-NEXT: s_bfe_i64 s[48:49], s[48:49], 0x100000
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v0, s44			; GCN-NOHSA-VI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 offset:144
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v1, s45			; GCN-NOHSA-VI-NEXT: s_bfe_i64 s[50:51], s[50:51], 0x100000
				; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v0, s46
				; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v1, s47
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v2, s74			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v2, s74
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v3, s75			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v3, s75
	; GCN-NOHSA-VI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 offset:112			; GCN-NOHSA-VI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 offset:112
	; GCN-NOHSA-VI-NEXT: s_lshr_b32 s52, s14, 16			; GCN-NOHSA-VI-NEXT: s_lshr_b32 s54, s14, 16
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v0, s46			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v0, s48
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v1, s47			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v1, s49
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v2, s72			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v2, s72
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v3, s73			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v3, s73
	; GCN-NOHSA-VI-NEXT: s_bfe_i64 s[50:51], s[50:51], 0x100000			; GCN-NOHSA-VI-NEXT: s_bfe_i64 s[52:53], s[52:53], 0x100000
	; GCN-NOHSA-VI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 offset:80			; GCN-NOHSA-VI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 offset:80
	; GCN-NOHSA-VI-NEXT: s_lshr_b32 s54, s12, 16			; GCN-NOHSA-VI-NEXT: s_lshr_b32 s56, s12, 16
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v0, s48			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v0, s50
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v1, s49			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v1, s51
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v2, s70			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v2, s70
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v3, s71			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v3, s71
	; GCN-NOHSA-VI-NEXT: s_bfe_i64 s[34:35], s[14:15], 0x100000			; GCN-NOHSA-VI-NEXT: s_bfe_i64 s[34:35], s[14:15], 0x100000
	; GCN-NOHSA-VI-NEXT: s_bfe_i64 s[52:53], s[52:53], 0x100000			; GCN-NOHSA-VI-NEXT: s_bfe_i64 s[54:55], s[54:55], 0x100000
	; GCN-NOHSA-VI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 offset:48			; GCN-NOHSA-VI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 offset:48
	; GCN-NOHSA-VI-NEXT: s_lshr_b32 s56, s10, 16			; GCN-NOHSA-VI-NEXT: s_lshr_b32 s58, s10, 16
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v0, s50			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v0, s52
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v1, s51			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v1, s53
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v2, s68			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v2, s36
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v3, s69			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v3, s37
	; GCN-NOHSA-VI-NEXT: s_bfe_i64 s[30:31], s[12:13], 0x100000			; GCN-NOHSA-VI-NEXT: s_bfe_i64 s[30:31], s[12:13], 0x100000
	; GCN-NOHSA-VI-NEXT: s_bfe_i64 s[16:17], s[54:55], 0x100000			; GCN-NOHSA-VI-NEXT: s_bfe_i64 s[16:17], s[56:57], 0x100000
	; GCN-NOHSA-VI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 offset:16			; GCN-NOHSA-VI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 offset:16
	; GCN-NOHSA-VI-NEXT: s_lshr_b32 s58, s8, 16			; GCN-NOHSA-VI-NEXT: s_lshr_b32 s60, s8, 16
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v0, s34			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v0, s34
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v1, s35			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v1, s35
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v2, s52			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v2, s54
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v3, s53			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v3, s55
	; GCN-NOHSA-VI-NEXT: s_bfe_i64 s[28:29], s[10:11], 0x100000			; GCN-NOHSA-VI-NEXT: s_bfe_i64 s[28:29], s[10:11], 0x100000
	; GCN-NOHSA-VI-NEXT: s_bfe_i64 s[14:15], s[56:57], 0x100000			; GCN-NOHSA-VI-NEXT: s_bfe_i64 s[14:15], s[58:59], 0x100000
	; GCN-NOHSA-VI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 offset:224			; GCN-NOHSA-VI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 offset:224
	; GCN-NOHSA-VI-NEXT: s_lshr_b32 s60, s6, 16			; GCN-NOHSA-VI-NEXT: s_lshr_b32 s62, s6, 16
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v0, s30			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v0, s30
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v1, s31			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v1, s31
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v2, s16			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v2, s16
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v3, s17			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v3, s17
	; GCN-NOHSA-VI-NEXT: s_bfe_i64 s[26:27], s[8:9], 0x100000			; GCN-NOHSA-VI-NEXT: s_bfe_i64 s[26:27], s[8:9], 0x100000
	; GCN-NOHSA-VI-NEXT: s_bfe_i64 s[12:13], s[58:59], 0x100000			; GCN-NOHSA-VI-NEXT: s_bfe_i64 s[12:13], s[60:61], 0x100000
	; GCN-NOHSA-VI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 offset:192			; GCN-NOHSA-VI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 offset:192
	; GCN-NOHSA-VI-NEXT: s_lshr_b32 s62, s4, 16			; GCN-NOHSA-VI-NEXT: s_lshr_b32 s64, s4, 16
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v0, s28			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v0, s28
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v1, s29			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v1, s29
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v2, s14			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v2, s14
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v3, s15			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v3, s15
	; GCN-NOHSA-VI-NEXT: s_bfe_i64 s[24:25], s[6:7], 0x100000			; GCN-NOHSA-VI-NEXT: s_bfe_i64 s[24:25], s[6:7], 0x100000
	; GCN-NOHSA-VI-NEXT: s_bfe_i64 s[10:11], s[60:61], 0x100000			; GCN-NOHSA-VI-NEXT: s_bfe_i64 s[10:11], s[62:63], 0x100000
	; GCN-NOHSA-VI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 offset:160			; GCN-NOHSA-VI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 offset:160
	; GCN-NOHSA-VI-NEXT: s_bfe_i64 s[22:23], s[4:5], 0x100000			; GCN-NOHSA-VI-NEXT: s_bfe_i64 s[22:23], s[4:5], 0x100000
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v0, s26			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v0, s26
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v1, s27			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v1, s27
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v2, s12			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v2, s12
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v3, s13			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v3, s13
	; GCN-NOHSA-VI-NEXT: s_bfe_i64 s[8:9], s[62:63], 0x100000			; GCN-NOHSA-VI-NEXT: s_bfe_i64 s[8:9], s[64:65], 0x100000
	; GCN-NOHSA-VI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 offset:128			; GCN-NOHSA-VI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 offset:128
	; GCN-NOHSA-VI-NEXT: s_bfe_i64 s[6:7], s[64:65], 0x100000			; GCN-NOHSA-VI-NEXT: s_bfe_i64 s[6:7], s[66:67], 0x100000
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v0, s24			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v0, s24
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v1, s25			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v1, s25
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v2, s10			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v2, s10
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v3, s11			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v3, s11
	; GCN-NOHSA-VI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 offset:96			; GCN-NOHSA-VI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 offset:96
	; GCN-NOHSA-VI-NEXT: s_bfe_i64 s[4:5], s[66:67], 0x100000			; GCN-NOHSA-VI-NEXT: s_bfe_i64 s[4:5], s[68:69], 0x100000
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v0, s22			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v0, s22
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v1, s23			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v1, s23
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v2, s8			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v2, s8
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v3, s9			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v3, s9
	; GCN-NOHSA-VI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 offset:64			; GCN-NOHSA-VI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 offset:64
	; GCN-NOHSA-VI-NEXT: s_nop 0			; GCN-NOHSA-VI-NEXT: s_nop 0
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v0, s20			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v0, s20
	; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v1, s21			; GCN-NOHSA-VI-NEXT: v_mov_b32_e32 v1, s21
	▲ Show 20 Lines • Show All 199 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/mubuf-legalize-operands.ll

	Show First 20 Lines • Show All 225 Lines • ▼ Show 20 Lines
	; W64-O0-LABEL: mubuf_vgpr_outside_entry			; W64-O0-LABEL: mubuf_vgpr_outside_entry

	; W64-O0-DAG: s_mov_b32 [[IDX_S:s[0-9]+]], s{{[0-9]+}}			; W64-O0-DAG: s_mov_b32 [[IDX_S:s[0-9]+]], s{{[0-9]+}}
	; W64-O0-DAG: v_mov_b32_e32 [[IDX_V:v[0-9]+]], s{{[0-9]+}}			; W64-O0-DAG: v_mov_b32_e32 [[IDX_V:v[0-9]+]], s{{[0-9]+}}
	; W64-O0-DAG: buffer_store_dword [[IDX_V]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:[[RES_OFF_TMP:[0-9]+]] ; 4-byte Folded Spill			; W64-O0-DAG: buffer_store_dword [[IDX_V]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:[[RES_OFF_TMP:[0-9]+]] ; 4-byte Folded Spill
	; W64-O0-DAG: s_mov_b64 [[SAVEEXEC:s\[[0-9]+:[0-9]+\]]], exec			; W64-O0-DAG: s_mov_b64 [[SAVEEXEC:s\[[0-9]+:[0-9]+\]]], exec

	; W64-O0: [[LOOPBB0:.LBB[0-9]+_[0-9]+]]: ; =>This Inner Loop Header: Depth=1			; W64-O0: [[LOOPBB0:.LBB[0-9]+_[0-9]+]]: ; =>This Inner Loop Header: Depth=1
	; W64-O0: buffer_load_dword v[[VRSRC0:[0-9]+]], off, s[0:3], s32 offset:28 ; 4-byte Folded Reload			; W64-O0: buffer_load_dword v[[VRSRC0:[0-9]+]], off, s[0:3], s32 offset:32 ; 4-byte Folded Reload
	; W64-O0: buffer_load_dword v[[VRSRC1:[0-9]+]], off, s[0:3], s32 offset:32 ; 4-byte Folded Reload			; W64-O0: buffer_load_dword v[[VRSRC1:[0-9]+]], off, s[0:3], s32 offset:36 ; 4-byte Folded Reload
	; W64-O0: buffer_load_dword v[[VRSRC2:[0-9]+]], off, s[0:3], s32 offset:36 ; 4-byte Folded Reload			; W64-O0: buffer_load_dword v[[VRSRC2:[0-9]+]], off, s[0:3], s32 offset:40 ; 4-byte Folded Reload
	; W64-O0: buffer_load_dword v[[VRSRC3:[0-9]+]], off, s[0:3], s32 offset:40 ; 4-byte Folded Reload			; W64-O0: buffer_load_dword v[[VRSRC3:[0-9]+]], off, s[0:3], s32 offset:44 ; 4-byte Folded Reload
	; W64-O0: s_waitcnt vmcnt(0)			; W64-O0: s_waitcnt vmcnt(0)
	; W64-O0-DAG: v_readfirstlane_b32 s[[S0:[0-9]+]], v[[VRSRC0]]			; W64-O0-DAG: v_readfirstlane_b32 s[[S0:[0-9]+]], v[[VRSRC0]]
	; W64-O0-DAG: v_readfirstlane_b32 s[[SRSRCTMP1:[0-9]+]], v[[VRSRC1]]			; W64-O0-DAG: v_readfirstlane_b32 s[[SRSRCTMP1:[0-9]+]], v[[VRSRC1]]
	; W64-O0-DAG: s_mov_b32 s[[SRSRC0:[0-9]+]], s[[S0]]			; W64-O0-DAG: s_mov_b32 s[[SRSRC0:[0-9]+]], s[[S0]]
	; W64-O0-DAG: s_mov_b32 s[[SRSRC1:[0-9]+]], s[[SRSRCTMP1]]			; W64-O0-DAG: s_mov_b32 s[[SRSRC1:[0-9]+]], s[[SRSRCTMP1]]
	; W64-O0-DAG: v_cmp_eq_u64_e64 [[CMP0:s\[[0-9]+:[0-9]+\]]], s[[[SRSRC0]]:[[SRSRC1]]], v[[[VRSRC0]]:[[VRSRC1]]]			; W64-O0-DAG: v_cmp_eq_u64_e64 [[CMP0:s\[[0-9]+:[0-9]+\]]], s[[[SRSRC0]]:[[SRSRC1]]], v[[[VRSRC0]]:[[VRSRC1]]]
	; W64-O0-DAG: v_readfirstlane_b32 s[[SRSRCTMP2:[0-9]+]], v[[VRSRC2]]			; W64-O0-DAG: v_readfirstlane_b32 s[[SRSRCTMP2:[0-9]+]], v[[VRSRC2]]
	; W64-O0-DAG: v_readfirstlane_b32 s[[SRSRCTMP3:[0-9]+]], v[[VRSRC3]]			; W64-O0-DAG: v_readfirstlane_b32 s[[SRSRCTMP3:[0-9]+]], v[[VRSRC3]]
	; W64-O0-DAG: s_mov_b32 s[[SRSRC2:[0-9]+]], s[[SRSRCTMP2]]			; W64-O0-DAG: s_mov_b32 s[[SRSRC2:[0-9]+]], s[[SRSRCTMP2]]
	; W64-O0-DAG: s_mov_b32 s[[SRSRC3:[0-9]+]], s[[SRSRCTMP3]]			; W64-O0-DAG: s_mov_b32 s[[SRSRC3:[0-9]+]], s[[SRSRCTMP3]]
	; W64-O0-DAG: v_cmp_eq_u64_e64 [[CMP1:s\[[0-9]+:[0-9]+\]]], s[[[SRSRC2]]:[[SRSRC3]]], v[[[VRSRC2]]:[[VRSRC3]]]			; W64-O0-DAG: v_cmp_eq_u64_e64 [[CMP1:s\[[0-9]+:[0-9]+\]]], s[[[SRSRC2]]:[[SRSRC3]]], v[[[VRSRC2]]:[[VRSRC3]]]
	; W64-O0-DAG: s_and_b64 [[AND:s\[[0-9]+:[0-9]+\]]], [[CMP0]], [[CMP1]]			; W64-O0-DAG: s_and_b64 [[AND:s\[[0-9]+:[0-9]+\]]], [[CMP0]], [[CMP1]]
	; W64-O0-DAG: s_mov_b32 s[[S1:[0-9]+]], s[[SRSRCTMP1]]			; W64-O0-DAG: s_mov_b32 s[[S1:[0-9]+]], s[[SRSRCTMP1]]
	; W64-O0-DAG: s_mov_b32 s[[S2:[0-9]+]], s[[SRSRCTMP2]]			; W64-O0-DAG: s_mov_b32 s[[S2:[0-9]+]], s[[SRSRCTMP2]]
	; W64-O0-DAG: s_mov_b32 s[[S3:[0-9]+]], s[[SRSRCTMP3]]			; W64-O0-DAG: s_mov_b32 s[[S3:[0-9]+]], s[[SRSRCTMP3]]
	; W64-O0: s_and_saveexec_b64 [[SAVE:s\[[0-9]+:[0-9]+\]]], [[AND]]			; W64-O0: s_and_saveexec_b64 [[SAVE:s\[[0-9]+:[0-9]+\]]], [[AND]]
	; W64-O0: buffer_load_dword [[IDX:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, s32 ; 4-byte Folded Reload			; W64-O0: buffer_load_dword [[IDX:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, s32 offset:4 ; 4-byte Folded Reload
	; W64-O0: buffer_load_format_x [[RES:v[0-9]+]], [[IDX]], s[[[S0]]:[[S3]]], {{.*}} idxen			; W64-O0: buffer_load_format_x [[RES:v[0-9]+]], [[IDX]], s[[[S0]]:[[S3]]], {{.*}} idxen
	; W64-O0: s_waitcnt vmcnt(0)			; W64-O0: s_waitcnt vmcnt(0)
	; W64-O0: buffer_store_dword [[RES]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:[[RES_OFF_TMP:[0-9]+]] ; 4-byte Folded Spill			; W64-O0: buffer_store_dword [[RES]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:[[RES_OFF_TMP:[0-9]+]] ; 4-byte Folded Spill
	; W64-O0: s_xor_b64 exec, exec, [[SAVE]]			; W64-O0: s_xor_b64 exec, exec, [[SAVE]]
	; W64-O0-NEXT: s_cbranch_execnz [[LOOPBB0]]			; W64-O0-NEXT: s_cbranch_execnz [[LOOPBB0]]

	; XXX-W64-O0: s_mov_b64 exec, [[SAVEEXEC]]			; XXX-W64-O0: s_mov_b64 exec, [[SAVEEXEC]]
	; W64-O0: buffer_load_dword [[RES:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:[[RES_OFF_TMP]] ; 4-byte Folded Reload			; W64-O0: buffer_load_dword [[RES:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:[[RES_OFF_TMP]] ; 4-byte Folded Reload
	; W64-O0: buffer_store_dword [[RES]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:[[RES_OFF:[0-9]+]] ; 4-byte Folded Spill			; W64-O0: buffer_store_dword [[RES]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:[[RES_OFF:[0-9]+]] ; 4-byte Folded Spill
	; W64-O0: s_cbranch_execz [[TERMBB:.LBB[0-9]+_[0-9]+]]			; W64-O0: s_cbranch_execz [[TERMBB:.LBB[0-9]+_[0-9]+]]

	; W64-O0: ; %bb.{{[0-9]+}}: ; %bb1			; W64-O0: ; %bb.{{[0-9]+}}: ; %bb1
	; W64-O0-DAG: buffer_store_dword {{v[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s32 offset:[[IDX_OFF:[0-9]+]] ; 4-byte Folded Spill			; W64-O0: buffer_load_dword
				; W64-O0: buffer_store_dword {{v[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s32 offset:[[IDX_OFF:[0-9]+]] ; 4-byte Folded Spill
	; W64-O0-DAG: s_mov_b64 s[[[SAVEEXEC0:[0-9]+]]:[[SAVEEXEC1:[0-9]+]]], exec			; W64-O0-DAG: s_mov_b64 s[[[SAVEEXEC0:[0-9]+]]:[[SAVEEXEC1:[0-9]+]]], exec
	; W64-O0: v_writelane_b32 [[VSAVEEXEC:v[0-9]+]], s[[SAVEEXEC0]], [[SAVEEXEC_IDX0:[0-9]+]]			; W64-O0: v_writelane_b32 [[VSAVEEXEC:v[0-9]+]], s[[SAVEEXEC0]], [[SAVEEXEC_IDX0:[0-9]+]]
	; W64-O0: v_writelane_b32 [[VSAVEEXEC]], s[[SAVEEXEC1]], [[SAVEEXEC_IDX1:[0-9]+]]			; W64-O0: v_writelane_b32 [[VSAVEEXEC]], s[[SAVEEXEC1]], [[SAVEEXEC_IDX1:[0-9]+]]
				; W64-O0: buffer_store_dword [[VSAVEEXEC]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} ; 4-byte Folded Spill

	; W64-O0: [[LOOPBB1:.LBB[0-9]+_[0-9]+]]: ; =>This Inner Loop Header: Depth=1			; W64-O0: [[LOOPBB1:.LBB[0-9]+_[0-9]+]]: ; =>This Inner Loop Header: Depth=1
	; W64-O0: buffer_load_dword v[[VRSRC0:[0-9]+]], off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; W64-O0: buffer_load_dword v[[VRSRC0:[0-9]+]], off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; W64-O0: buffer_load_dword v[[VRSRC1:[0-9]+]], off, s[0:3], s32 offset:8 ; 4-byte Folded Reload			; W64-O0: buffer_load_dword v[[VRSRC1:[0-9]+]], off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
	; W64-O0: buffer_load_dword v[[VRSRC2:[0-9]+]], off, s[0:3], s32 offset:12 ; 4-byte Folded Reload			; W64-O0: buffer_load_dword v[[VRSRC2:[0-9]+]], off, s[0:3], s32 offset:16 ; 4-byte Folded Reload
	; W64-O0: buffer_load_dword v[[VRSRC3:[0-9]+]], off, s[0:3], s32 offset:16 ; 4-byte Folded Reload			; W64-O0: buffer_load_dword v[[VRSRC3:[0-9]+]], off, s[0:3], s32 offset:20 ; 4-byte Folded Reload
	; W64-O0: s_waitcnt vmcnt(0)			; W64-O0: s_waitcnt vmcnt(0)
	; W64-O0-DAG: v_readfirstlane_b32 s[[S0:[0-9]+]], v[[VRSRC0]]			; W64-O0-DAG: v_readfirstlane_b32 s[[S0:[0-9]+]], v[[VRSRC0]]
	; W64-O0-DAG: v_readfirstlane_b32 s[[SRSRCTMP1:[0-9]+]], v[[VRSRC1]]			; W64-O0-DAG: v_readfirstlane_b32 s[[SRSRCTMP1:[0-9]+]], v[[VRSRC1]]
	; W64-O0-DAG: s_mov_b32 s[[SRSRC0:[0-9]+]], s[[S0]]			; W64-O0-DAG: s_mov_b32 s[[SRSRC0:[0-9]+]], s[[S0]]
	; W64-O0-DAG: s_mov_b32 s[[SRSRC1:[0-9]+]], s[[SRSRCTMP1]]			; W64-O0-DAG: s_mov_b32 s[[SRSRC1:[0-9]+]], s[[SRSRCTMP1]]
	; W64-O0-DAG: v_cmp_eq_u64_e64 [[CMP0:s\[[0-9]+:[0-9]+\]]], s[[[SRSRC0]]:[[SRSRC1]]], v[[[VRSRC0]]:[[VRSRC1]]]			; W64-O0-DAG: v_cmp_eq_u64_e64 [[CMP0:s\[[0-9]+:[0-9]+\]]], s[[[SRSRC0]]:[[SRSRC1]]], v[[[VRSRC0]]:[[VRSRC1]]]
	; W64-O0-DAG: v_readfirstlane_b32 s[[SRSRCTMP2:[0-9]+]], v[[VRSRC2]]			; W64-O0-DAG: v_readfirstlane_b32 s[[SRSRCTMP2:[0-9]+]], v[[VRSRC2]]
	; W64-O0-DAG: v_readfirstlane_b32 s[[SRSRCTMP3:[0-9]+]], v[[VRSRC3]]			; W64-O0-DAG: v_readfirstlane_b32 s[[SRSRCTMP3:[0-9]+]], v[[VRSRC3]]
	; W64-O0-DAG: s_mov_b32 s[[SRSRC2:[0-9]+]], s[[SRSRCTMP2]]			; W64-O0-DAG: s_mov_b32 s[[SRSRC2:[0-9]+]], s[[SRSRCTMP2]]
	; W64-O0-DAG: s_mov_b32 s[[SRSRC3:[0-9]+]], s[[SRSRCTMP3]]			; W64-O0-DAG: s_mov_b32 s[[SRSRC3:[0-9]+]], s[[SRSRCTMP3]]
	; W64-O0-DAG: v_cmp_eq_u64_e64 [[CMP1:s\[[0-9]+:[0-9]+\]]], s[[[SRSRC2]]:[[SRSRC3]]], v[[[VRSRC2]]:[[VRSRC3]]]			; W64-O0-DAG: v_cmp_eq_u64_e64 [[CMP1:s\[[0-9]+:[0-9]+\]]], s[[[SRSRC2]]:[[SRSRC3]]], v[[[VRSRC2]]:[[VRSRC3]]]
	; W64-O0-DAG: s_and_b64 [[AND:s\[[0-9]+:[0-9]+\]]], [[CMP0]], [[CMP1]]			; W64-O0-DAG: s_and_b64 [[AND:s\[[0-9]+:[0-9]+\]]], [[CMP0]], [[CMP1]]
	; W64-O0-DAG: s_mov_b32 s[[S1:[0-9]+]], s[[SRSRCTMP1]]			; W64-O0-DAG: s_mov_b32 s[[S1:[0-9]+]], s[[SRSRCTMP1]]
	; W64-O0-DAG: s_mov_b32 s[[S2:[0-9]+]], s[[SRSRCTMP2]]			; W64-O0-DAG: s_mov_b32 s[[S2:[0-9]+]], s[[SRSRCTMP2]]
	; W64-O0-DAG: s_mov_b32 s[[S3:[0-9]+]], s[[SRSRCTMP3]]			; W64-O0-DAG: s_mov_b32 s[[S3:[0-9]+]], s[[SRSRCTMP3]]
	; W64-O0: s_and_saveexec_b64 [[SAVE:s\[[0-9]+:[0-9]+\]]], [[AND]]			; W64-O0: s_and_saveexec_b64 [[SAVE:s\[[0-9]+:[0-9]+\]]], [[AND]]
				; W64-O0: buffer_store_dword
	; W64-O0: buffer_load_dword [[IDX:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, s32 offset:[[IDX_OFF]] ; 4-byte Folded Reload			; W64-O0: buffer_load_dword [[IDX:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, s32 offset:[[IDX_OFF]] ; 4-byte Folded Reload
				; W64-O0: buffer_load_dword
	; W64-O0: buffer_load_format_x [[RES:v[0-9]+]], [[IDX]], s[[[S0]]:[[S3]]], {{.*}} idxen			; W64-O0: buffer_load_format_x [[RES:v[0-9]+]], [[IDX]], s[[[S0]]:[[S3]]], {{.*}} idxen
	; W64-O0: s_waitcnt vmcnt(0)			; W64-O0: s_waitcnt vmcnt(0)
	; W64-O0: buffer_store_dword [[RES]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:[[RES_OFF_TMP:[0-9]+]] ; 4-byte Folded Spill			; W64-O0: buffer_store_dword [[RES]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:[[RES_OFF_TMP:[0-9]+]] ; 4-byte Folded Spill
	; W64-O0: s_xor_b64 exec, exec, [[SAVE]]			; W64-O0: s_xor_b64 exec, exec, [[SAVE]]
	; W64-O0-NEXT: s_cbranch_execnz [[LOOPBB1]]			; W64-O0-NEXT: s_cbranch_execnz [[LOOPBB1]]

	; W64-O0: buffer_load_dword [[RES:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:[[RES_OFF_TMP]] ; 4-byte Folded Reload			; W64-O0: buffer_load_dword [[RES:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:[[RES_OFF_TMP]] ; 4-byte Folded Reload
	; W64-O0: v_readlane_b32 s[[SAVEEXEC0:[0-9]+]], [[VSAVEEXEC]], [[SAVEEXEC_IDX0]]			; W64-O0: buffer_load_dword [[VSAVEEXEC1:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} ; 4-byte Folded Reload
	; W64-O0: v_readlane_b32 s[[SAVEEXEC1:[0-9]+]], [[VSAVEEXEC]], [[SAVEEXEC_IDX1]]			; W64-O0: v_readlane_b32 s[[SAVEEXEC0:[0-9]+]], [[VSAVEEXEC1]], [[SAVEEXEC_IDX0]]
				; W64-O0: v_readlane_b32 s[[SAVEEXEC1:[0-9]+]], [[VSAVEEXEC1]], [[SAVEEXEC_IDX1]]
	; W64-O0: s_mov_b64 exec, s[[[SAVEEXEC0]]:[[SAVEEXEC1]]]			; W64-O0: s_mov_b64 exec, s[[[SAVEEXEC0]]:[[SAVEEXEC1]]]
	; W64-O0: buffer_store_dword [[RES]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:[[RES_OFF]] ; 4-byte Folded Spill			; W64-O0: buffer_store_dword [[RES]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:[[RES_OFF]] ; 4-byte Folded Spill

	; W64-O0: [[TERMBB]]:			; W64-O0: [[TERMBB]]:
	; W64-O0: buffer_load_dword [[RES:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:[[RES_OFF]] ; 4-byte Folded Reload			; W64-O0: buffer_load_dword [[RES:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:[[RES_OFF]] ; 4-byte Folded Reload
	; W64-O0: global_store_dword v[{{[0-9]+:[0-9]+}}], [[RES]], off			; W64-O0: global_store_dword v[{{[0-9]+:[0-9]+}}], [[RES]], off

	define void @mubuf_vgpr_outside_entry(<4 x i32> %i, <4 x i32> %j, i32 %c, float addrspace(1)* %in, float addrspace(1)* %out) #0 {			define void @mubuf_vgpr_outside_entry(<4 x i32> %i, <4 x i32> %j, i32 %c, float addrspace(1)* %in, float addrspace(1)* %out) #0 {
	Show All 22 Lines

llvm/test/CodeGen/AMDGPU/mul24-pass-ordering.ll

	Show First 20 Lines • Show All 185 Lines • ▼ Show 20 Lines

	define void @slsr1_1(i32 %b.arg, i32 %s.arg) #0 {			define void @slsr1_1(i32 %b.arg, i32 %s.arg) #0 {
	; GFX9-LABEL: slsr1_1:			; GFX9-LABEL: slsr1_1:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s4, s33			; GFX9-NEXT: s_mov_b32 s4, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: ; implicit-def: $vgpr42
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_addk_i32 s32, 0x800			; GFX9-NEXT: s_addk_i32 s32, 0x800
	; GFX9-NEXT: v_writelane_b32 v40, s34, 2			; GFX9-NEXT: v_writelane_b32 v42, s30, 0
				; GFX9-NEXT: v_writelane_b32 v42, s31, 1
				; GFX9-NEXT: v_writelane_b32 v42, s34, 2
	; GFX9-NEXT: v_writelane_b32 v44, s4, 0			; GFX9-NEXT: v_writelane_b32 v44, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s36, 3			; GFX9-NEXT: v_writelane_b32 v42, s36, 3
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, foo@gotpcrel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, foo@gotpcrel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, foo@gotpcrel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, foo@gotpcrel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s37, 4			; GFX9-NEXT: v_writelane_b32 v42, s37, 4
	; GFX9-NEXT: s_load_dwordx2 s[36:37], s[4:5], 0x0			; GFX9-NEXT: s_load_dwordx2 s[36:37], s[4:5], 0x0
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v43, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v43, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: v_mov_b32_e32 v41, v1			; GFX9-NEXT: v_mov_b32_e32 v40, v1
	; GFX9-NEXT: v_mov_b32_e32 v42, v0			; GFX9-NEXT: v_mov_b32_e32 v41, v0
	; GFX9-NEXT: v_mul_u32_u24_e32 v0, v42, v41			; GFX9-NEXT: v_mul_u32_u24_e32 v0, v41, v40
	; GFX9-NEXT: s_mov_b32 s34, s15			; GFX9-NEXT: s_mov_b32 s34, s15
	; GFX9-NEXT: v_and_b32_e32 v43, 0xffffff, v41			; GFX9-NEXT: v_and_b32_e32 v43, 0xffffff, v40
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[36:37]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[36:37]
	; GFX9-NEXT: v_mad_u32_u24 v41, v42, v41, v43			; GFX9-NEXT: v_mad_u32_u24 v40, v41, v40, v43
	; GFX9-NEXT: s_mov_b32 s15, s34			; GFX9-NEXT: s_mov_b32 s15, s34
	; GFX9-NEXT: v_mov_b32_e32 v0, v41			; GFX9-NEXT: v_mov_b32_e32 v0, v40
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[36:37]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[36:37]
	; GFX9-NEXT: v_add_u32_e32 v0, v41, v43			; GFX9-NEXT: v_add_u32_e32 v0, v40, v43
	; GFX9-NEXT: s_mov_b32 s15, s34			; GFX9-NEXT: s_mov_b32 s15, s34
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[36:37]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[36:37]
	; GFX9-NEXT: buffer_load_dword v43, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v43, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GFX9-NEXT: v_readlane_b32 s37, v40, 4			; GFX9-NEXT: v_readlane_b32 s37, v42, 4
	; GFX9-NEXT: v_readlane_b32 s36, v40, 3			; GFX9-NEXT: v_readlane_b32 s36, v42, 3
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v42, 2
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v42, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v42, 0
	; GFX9-NEXT: v_readlane_b32 s4, v44, 0			; GFX9-NEXT: v_readlane_b32 s4, v44, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_addk_i32 s32, 0xf800			; GFX9-NEXT: s_addk_i32 s32, 0xf800
	; GFX9-NEXT: s_mov_b32 s33, s4			; GFX9-NEXT: s_mov_b32 s33, s4
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	%b = and i32 %b.arg, 16777215			%b = and i32 %b.arg, 16777215
	%s = and i32 %s.arg, 16777215			%s = and i32 %s.arg, 16777215
	Show All 29 Lines

llvm/test/CodeGen/AMDGPU/need-fp-from-vgpr-spills.ll

	Show All 21 Lines

	; Has no stack objects, but introduces them due to the CSR spill. We			; Has no stack objects, but introduces them due to the CSR spill. We
	; see the FP modified in the callee with IPRA. We should not have			; see the FP modified in the callee with IPRA. We should not have
	; redundant spills of s33 or assert.			; redundant spills of s33 or assert.
	define internal fastcc void @csr_vgpr_spill_fp_callee() #0 {			define internal fastcc void @csr_vgpr_spill_fp_callee() #0 {
	; CHECK-LABEL: csr_vgpr_spill_fp_callee:			; CHECK-LABEL: csr_vgpr_spill_fp_callee:
	; CHECK: ; %bb.0: ; %bb			; CHECK: ; %bb.0: ; %bb
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_mov_b32 s6, s33			; CHECK-NEXT: s_mov_b32 s14, s33
	; CHECK-NEXT: s_mov_b32 s33, s32			; CHECK-NEXT: s_mov_b32 s33, s32
	; CHECK-NEXT: s_xor_saveexec_b64 s[4:5], -1			; CHECK-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; CHECK-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; CHECK-NEXT: s_mov_b64 exec, s[4:5]			; CHECK-NEXT: s_mov_b64 exec, s[4:5]
	; CHECK-NEXT: s_add_i32 s32, s32, 0x400			; CHECK-NEXT: s_add_i32 s32, s32, 0x400
	; CHECK-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; CHECK-NEXT: v_writelane_b32 v1, s30, 0			; CHECK-NEXT: ; implicit-def: $vgpr0
	; CHECK-NEXT: v_writelane_b32 v1, s31, 1			; CHECK-NEXT: v_writelane_b32 v0, s30, 0
				; CHECK-NEXT: v_writelane_b32 v0, s31, 1
				; CHECK-NEXT: s_or_saveexec_b64 s[12:13], -1
				; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
				; CHECK-NEXT: s_mov_b64 exec, s[12:13]
	; CHECK-NEXT: s_getpc_b64 s[4:5]			; CHECK-NEXT: s_getpc_b64 s[4:5]
	; CHECK-NEXT: s_add_u32 s4, s4, callee_has_fp@rel32@lo+4			; CHECK-NEXT: s_add_u32 s4, s4, callee_has_fp@rel32@lo+4
	; CHECK-NEXT: s_addc_u32 s5, s5, callee_has_fp@rel32@hi+12			; CHECK-NEXT: s_addc_u32 s5, s5, callee_has_fp@rel32@hi+12
	; CHECK-NEXT: s_mov_b64 s[10:11], s[2:3]			; CHECK-NEXT: s_mov_b64 s[10:11], s[2:3]
	; CHECK-NEXT: s_mov_b64 s[8:9], s[0:1]			; CHECK-NEXT: s_mov_b64 s[8:9], s[0:1]
	; CHECK-NEXT: s_mov_b64 s[0:1], s[8:9]			; CHECK-NEXT: s_mov_b64 s[0:1], s[8:9]
	; CHECK-NEXT: s_mov_b64 s[2:3], s[10:11]			; CHECK-NEXT: s_mov_b64 s[2:3], s[10:11]
	; CHECK-NEXT: s_swappc_b64 s[30:31], s[4:5]			; CHECK-NEXT: s_swappc_b64 s[30:31], s[4:5]
				; CHECK-NEXT: s_or_saveexec_b64 s[12:13], -1
				; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
				; CHECK-NEXT: s_mov_b64 exec, s[12:13]
	; CHECK-NEXT: ;;#ASMSTART			; CHECK-NEXT: ;;#ASMSTART
	; CHECK-NEXT: ; clobber csr v40			; CHECK-NEXT: ; clobber csr v40
	; CHECK-NEXT: ;;#ASMEND			; CHECK-NEXT: ;;#ASMEND
	; CHECK-NEXT: v_readlane_b32 s31, v1, 1			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: v_readlane_b32 s30, v1, 0			; CHECK-NEXT: v_readlane_b32 s31, v0, 1
				; CHECK-NEXT: v_readlane_b32 s30, v0, 0
	; CHECK-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; CHECK-NEXT: s_xor_saveexec_b64 s[4:5], -1			; CHECK-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; CHECK-NEXT: buffer_load_dword v1, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; CHECK-NEXT: s_mov_b64 exec, s[4:5]			; CHECK-NEXT: s_mov_b64 exec, s[4:5]
	; CHECK-NEXT: s_add_i32 s32, s32, 0xfffffc00			; CHECK-NEXT: s_add_i32 s32, s32, 0xfffffc00
	; CHECK-NEXT: s_mov_b32 s33, s6			; CHECK-NEXT: s_mov_b32 s33, s14
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	bb:			bb:
	call fastcc void @callee_has_fp()			call fastcc void @callee_has_fp()
	call void asm sideeffect "; clobber csr v40", "~{v40}"()			call void asm sideeffect "; clobber csr v40", "~{v40}"()
	ret void			ret void
	}			}

	Show All 20 Lines
	}			}

	; Same, except with a tail call.			; Same, except with a tail call.
	define internal fastcc void @csr_vgpr_spill_fp_tailcall_callee() #0 {			define internal fastcc void @csr_vgpr_spill_fp_tailcall_callee() #0 {
	; CHECK-LABEL: csr_vgpr_spill_fp_tailcall_callee:			; CHECK-LABEL: csr_vgpr_spill_fp_tailcall_callee:
	; CHECK: ; %bb.0: ; %bb			; CHECK: ; %bb.0: ; %bb
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_xor_saveexec_b64 s[4:5], -1			; CHECK-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; CHECK-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; CHECK-NEXT: s_mov_b64 exec, s[4:5]			; CHECK-NEXT: s_mov_b64 exec, s[4:5]
	; CHECK-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; CHECK-NEXT: v_writelane_b32 v1, s33, 0			; CHECK-NEXT: ; implicit-def: $vgpr0
				; CHECK-NEXT: v_writelane_b32 v0, s33, 0
	; CHECK-NEXT: ;;#ASMSTART			; CHECK-NEXT: ;;#ASMSTART
	; CHECK-NEXT: ; clobber csr v40			; CHECK-NEXT: ; clobber csr v40
	; CHECK-NEXT: ;;#ASMEND			; CHECK-NEXT: ;;#ASMEND
	; CHECK-NEXT: s_getpc_b64 s[4:5]			; CHECK-NEXT: s_getpc_b64 s[4:5]
	; CHECK-NEXT: s_add_u32 s4, s4, callee_has_fp@rel32@lo+4			; CHECK-NEXT: s_add_u32 s4, s4, callee_has_fp@rel32@lo+4
	; CHECK-NEXT: s_addc_u32 s5, s5, callee_has_fp@rel32@hi+12			; CHECK-NEXT: s_addc_u32 s5, s5, callee_has_fp@rel32@hi+12
	; CHECK-NEXT: v_readlane_b32 s33, v1, 0			; CHECK-NEXT: v_readlane_b32 s33, v0, 0
	; CHECK-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; CHECK-NEXT: s_xor_saveexec_b64 s[6:7], -1			; CHECK-NEXT: s_xor_saveexec_b64 s[8:9], -1
	; CHECK-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; CHECK-NEXT: s_mov_b64 exec, s[6:7]			; CHECK-NEXT: s_mov_b64 exec, s[8:9]
	; CHECK-NEXT: s_setpc_b64 s[4:5]			; CHECK-NEXT: s_setpc_b64 s[4:5]
	bb:			bb:
	call void asm sideeffect "; clobber csr v40", "~{v40}"()			call void asm sideeffect "; clobber csr v40", "~{v40}"()
	tail call fastcc void @callee_has_fp()			tail call fastcc void @callee_has_fp()
	ret void			ret void
	}			}

	define amdgpu_kernel void @kernel_tailcall() {			define amdgpu_kernel void @kernel_tailcall() {
	Show All 30 Lines
	entry:			entry:
	ret i32 0			ret i32 0
	}			}

	define hidden i32 @caller_save_vgpr_spill_fp_tail_call() #0 {			define hidden i32 @caller_save_vgpr_spill_fp_tail_call() #0 {
	; CHECK-LABEL: caller_save_vgpr_spill_fp_tail_call:			; CHECK-LABEL: caller_save_vgpr_spill_fp_tail_call:
	; CHECK: ; %bb.0: ; %entry			; CHECK: ; %bb.0: ; %entry
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_mov_b32 s6, s33			; CHECK-NEXT: s_mov_b32 s12, s33
	; CHECK-NEXT: s_mov_b32 s33, s32			; CHECK-NEXT: s_mov_b32 s33, s32
	; CHECK-NEXT: s_xor_saveexec_b64 s[4:5], -1			; CHECK-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; CHECK-NEXT: buffer_store_dword v1, off, s[0:3], s33 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v1, off, s[0:3], s33 ; 4-byte Folded Spill
	; CHECK-NEXT: s_mov_b64 exec, s[4:5]			; CHECK-NEXT: s_mov_b64 exec, s[4:5]
	; CHECK-NEXT: s_add_i32 s32, s32, 0x400			; CHECK-NEXT: s_add_i32 s32, s32, 0x400
				; CHECK-NEXT: ; implicit-def: $vgpr1
	; CHECK-NEXT: v_writelane_b32 v1, s30, 0			; CHECK-NEXT: v_writelane_b32 v1, s30, 0
	; CHECK-NEXT: v_writelane_b32 v1, s31, 1			; CHECK-NEXT: v_writelane_b32 v1, s31, 1
	; CHECK-NEXT: s_getpc_b64 s[4:5]			; CHECK-NEXT: s_getpc_b64 s[4:5]
	; CHECK-NEXT: s_add_u32 s4, s4, tail_call@rel32@lo+4			; CHECK-NEXT: s_add_u32 s4, s4, tail_call@rel32@lo+4
	; CHECK-NEXT: s_addc_u32 s5, s5, tail_call@rel32@hi+12			; CHECK-NEXT: s_addc_u32 s5, s5, tail_call@rel32@hi+12
	; CHECK-NEXT: s_mov_b64 s[10:11], s[2:3]			; CHECK-NEXT: s_mov_b64 s[10:11], s[2:3]
	; CHECK-NEXT: s_mov_b64 s[8:9], s[0:1]			; CHECK-NEXT: s_mov_b64 s[8:9], s[0:1]
	; CHECK-NEXT: s_mov_b64 s[0:1], s[8:9]			; CHECK-NEXT: s_mov_b64 s[0:1], s[8:9]
	; CHECK-NEXT: s_mov_b64 s[2:3], s[10:11]			; CHECK-NEXT: s_mov_b64 s[2:3], s[10:11]
	; CHECK-NEXT: s_swappc_b64 s[30:31], s[4:5]			; CHECK-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; CHECK-NEXT: v_readlane_b32 s31, v1, 1			; CHECK-NEXT: v_readlane_b32 s31, v1, 1
	; CHECK-NEXT: v_readlane_b32 s30, v1, 0			; CHECK-NEXT: v_readlane_b32 s30, v1, 0
	; CHECK-NEXT: s_xor_saveexec_b64 s[4:5], -1			; CHECK-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; CHECK-NEXT: buffer_load_dword v1, off, s[0:3], s33 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v1, off, s[0:3], s33 ; 4-byte Folded Reload
	; CHECK-NEXT: s_mov_b64 exec, s[4:5]			; CHECK-NEXT: s_mov_b64 exec, s[4:5]
	; CHECK-NEXT: s_add_i32 s32, s32, 0xfffffc00			; CHECK-NEXT: s_add_i32 s32, s32, 0xfffffc00
	; CHECK-NEXT: s_mov_b32 s33, s6			; CHECK-NEXT: s_mov_b32 s33, s12
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	%call = call i32 @tail_call()			%call = call i32 @tail_call()
	ret i32 %call			ret i32 %call
	}			}

	define hidden i32 @caller_save_vgpr_spill_fp() #0 {			define hidden i32 @caller_save_vgpr_spill_fp() #0 {
	; CHECK-LABEL: caller_save_vgpr_spill_fp:			; CHECK-LABEL: caller_save_vgpr_spill_fp:
	; CHECK: ; %bb.0: ; %entry			; CHECK: ; %bb.0: ; %entry
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_mov_b32 s7, s33			; CHECK-NEXT: s_mov_b32 s13, s33
	; CHECK-NEXT: s_mov_b32 s33, s32			; CHECK-NEXT: s_mov_b32 s33, s32
	; CHECK-NEXT: s_xor_saveexec_b64 s[4:5], -1			; CHECK-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; CHECK-NEXT: buffer_store_dword v2, off, s[0:3], s33 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
				; CHECK-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; CHECK-NEXT: s_mov_b64 exec, s[4:5]			; CHECK-NEXT: s_mov_b64 exec, s[4:5]
	; CHECK-NEXT: s_add_i32 s32, s32, 0x400			; CHECK-NEXT: s_add_i32 s32, s32, 0x400
	; CHECK-NEXT: v_writelane_b32 v2, s30, 0			; CHECK-NEXT: ; implicit-def: $vgpr0
	; CHECK-NEXT: v_writelane_b32 v2, s31, 1			; CHECK-NEXT: v_writelane_b32 v0, s30, 0
				; CHECK-NEXT: v_writelane_b32 v0, s31, 1
				; CHECK-NEXT: s_or_saveexec_b64 s[14:15], -1
				arsenmUnsubmitted Not Done Reply Inline Actions This is an unfortunate regression but what I expected arsenm: This is an unfortunate regression but what I expected
				; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s33 ; 4-byte Folded Spill
				; CHECK-NEXT: s_mov_b64 exec, s[14:15]
	; CHECK-NEXT: s_getpc_b64 s[4:5]			; CHECK-NEXT: s_getpc_b64 s[4:5]
	; CHECK-NEXT: s_add_u32 s4, s4, caller_save_vgpr_spill_fp_tail_call@rel32@lo+4			; CHECK-NEXT: s_add_u32 s4, s4, caller_save_vgpr_spill_fp_tail_call@rel32@lo+4
	; CHECK-NEXT: s_addc_u32 s5, s5, caller_save_vgpr_spill_fp_tail_call@rel32@hi+12			; CHECK-NEXT: s_addc_u32 s5, s5, caller_save_vgpr_spill_fp_tail_call@rel32@hi+12
	; CHECK-NEXT: s_mov_b64 s[10:11], s[2:3]			; CHECK-NEXT: s_mov_b64 s[10:11], s[2:3]
	; CHECK-NEXT: s_mov_b64 s[8:9], s[0:1]			; CHECK-NEXT: s_mov_b64 s[8:9], s[0:1]
	; CHECK-NEXT: s_mov_b64 s[0:1], s[8:9]			; CHECK-NEXT: s_mov_b64 s[0:1], s[8:9]
	; CHECK-NEXT: s_mov_b64 s[2:3], s[10:11]			; CHECK-NEXT: s_mov_b64 s[2:3], s[10:11]
	; CHECK-NEXT: s_swappc_b64 s[30:31], s[4:5]			; CHECK-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; CHECK-NEXT: v_readlane_b32 s31, v2, 1			; CHECK-NEXT: s_or_saveexec_b64 s[14:15], -1
	; CHECK-NEXT: v_readlane_b32 s30, v2, 0			; CHECK-NEXT: buffer_load_dword v1, off, s[0:3], s33 ; 4-byte Folded Reload
				; CHECK-NEXT: s_mov_b64 exec, s[14:15]
				; CHECK-NEXT: s_waitcnt vmcnt(0)
				; CHECK-NEXT: v_readlane_b32 s31, v1, 1
				; CHECK-NEXT: v_readlane_b32 s30, v1, 0
	; CHECK-NEXT: s_xor_saveexec_b64 s[4:5], -1			; CHECK-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; CHECK-NEXT: buffer_load_dword v2, off, s[0:3], s33 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
				; CHECK-NEXT: buffer_load_dword v1, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; CHECK-NEXT: s_mov_b64 exec, s[4:5]			; CHECK-NEXT: s_mov_b64 exec, s[4:5]
	; CHECK-NEXT: s_add_i32 s32, s32, 0xfffffc00			; CHECK-NEXT: s_add_i32 s32, s32, 0xfffffc00
	; CHECK-NEXT: s_mov_b32 s33, s7			; CHECK-NEXT: s_mov_b32 s33, s13
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	%call = call i32 @caller_save_vgpr_spill_fp_tail_call()			%call = call i32 @caller_save_vgpr_spill_fp_tail_call()
	ret i32 %call			ret i32 %call
	}			}

	define protected amdgpu_kernel void @kernel() {			define protected amdgpu_kernel void @kernel() {
	Show All 23 Lines

llvm/test/CodeGen/AMDGPU/no-source-locations-in-prologue.ll

	Show All 9 Lines
	; CHECK-NEXT: .file 0 "/tmp" "lane-info.cpp" md5 0x4ab9b75a30baffdf0f6f536a80e3e382			; CHECK-NEXT: .file 0 "/tmp" "lane-info.cpp" md5 0x4ab9b75a30baffdf0f6f536a80e3e382
	; CHECK-NEXT: .loc 0 30 0 ; lane-info.cpp:30:0			; CHECK-NEXT: .loc 0 30 0 ; lane-info.cpp:30:0
	; CHECK-NEXT: .cfi_sections .debug_frame			; CHECK-NEXT: .cfi_sections .debug_frame
	; CHECK-NEXT: .cfi_startproc			; CHECK-NEXT: .cfi_startproc
	; CHECK-NEXT: ; %bb.0: ; %entry			; CHECK-NEXT: ; %bb.0: ; %entry
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_mov_b32 s16, s33			; CHECK-NEXT: s_mov_b32 s16, s33
	; CHECK-NEXT: s_mov_b32 s33, s32			; CHECK-NEXT: s_mov_b32 s33, s32
	; CHECK-NEXT: s_or_saveexec_b64 s[18:19], -1			; CHECK-NEXT: s_xor_saveexec_b64 s[18:19], -1
	; CHECK-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; CHECK-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; CHECK-NEXT: s_mov_b64 exec, -1
				; CHECK-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; CHECK-NEXT: s_mov_b64 exec, s[18:19]			; CHECK-NEXT: s_mov_b64 exec, s[18:19]
	; CHECK-NEXT: v_writelane_b32 v41, s16, 0			; CHECK-NEXT: v_writelane_b32 v40, s34, 0
				; CHECK-NEXT: v_writelane_b32 v40, s35, 1
				; CHECK-NEXT: v_writelane_b32 v40, s16, 2
	; CHECK-NEXT: s_add_i32 s32, s32, 0x400			; CHECK-NEXT: s_add_i32 s32, s32, 0x400
	; CHECK-NEXT: v_writelane_b32 v40, s30, 0			; CHECK-NEXT: ; implicit-def: $vgpr0
	; CHECK-NEXT: v_writelane_b32 v40, s31, 1			; CHECK-NEXT: v_writelane_b32 v0, s30, 0
				; CHECK-NEXT: v_writelane_b32 v0, s31, 1
				; CHECK-NEXT: s_or_saveexec_b64 s[34:35], -1
				; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s33 ; 4-byte Folded Spill
				; CHECK-NEXT: s_mov_b64 exec, s[34:35]
	; CHECK-NEXT: .Ltmp0:			; CHECK-NEXT: .Ltmp0:
	; CHECK-NEXT: .loc 0 31 3 prologue_end ; lane-info.cpp:31:3			; CHECK-NEXT: .loc 0 31 3 prologue_end ; lane-info.cpp:31:3
	; CHECK-NEXT: s_getpc_b64 s[16:17]			; CHECK-NEXT: s_getpc_b64 s[16:17]
	; CHECK-NEXT: s_add_u32 s16, s16, _ZL13sleep_foreverv@gotpcrel32@lo+4			; CHECK-NEXT: s_add_u32 s16, s16, _ZL13sleep_foreverv@gotpcrel32@lo+4
	; CHECK-NEXT: s_addc_u32 s17, s17, _ZL13sleep_foreverv@gotpcrel32@hi+12			; CHECK-NEXT: s_addc_u32 s17, s17, _ZL13sleep_foreverv@gotpcrel32@hi+12
	; CHECK-NEXT: s_load_dwordx2 s[16:17], s[16:17], 0x0			; CHECK-NEXT: s_load_dwordx2 s[16:17], s[16:17], 0x0
	; CHECK-NEXT: s_mov_b64 s[22:23], s[2:3]			; CHECK-NEXT: s_mov_b64 s[22:23], s[2:3]
	; CHECK-NEXT: s_mov_b64 s[20:21], s[0:1]			; CHECK-NEXT: s_mov_b64 s[20:21], s[0:1]
	; CHECK-NEXT: s_mov_b64 s[0:1], s[20:21]			; CHECK-NEXT: s_mov_b64 s[0:1], s[20:21]
	; CHECK-NEXT: s_mov_b64 s[2:3], s[22:23]			; CHECK-NEXT: s_mov_b64 s[2:3], s[22:23]
	; CHECK-NEXT: s_waitcnt lgkmcnt(0)			; CHECK-NEXT: s_waitcnt lgkmcnt(0)
	; CHECK-NEXT: s_swappc_b64 s[30:31], s[16:17]			; CHECK-NEXT: s_swappc_b64 s[30:31], s[16:17]
	; CHECK-NEXT: .Ltmp1:			; CHECK-NEXT: .Ltmp1:
				; CHECK-NEXT: s_or_saveexec_b64 s[34:35], -1
				; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], s33 ; 4-byte Folded Reload
				; CHECK-NEXT: s_mov_b64 exec, s[34:35]
	; CHECK-NEXT: .loc 0 32 1 ; lane-info.cpp:32:1			; CHECK-NEXT: .loc 0 32 1 ; lane-info.cpp:32:1
	; CHECK-NEXT: v_readlane_b32 s31, v40, 1			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: v_readlane_b32 s30, v40, 0			; CHECK-NEXT: v_readlane_b32 s31, v0, 1
	; CHECK-NEXT: v_readlane_b32 s4, v41, 0			; CHECK-NEXT: v_readlane_b32 s30, v0, 0
	; CHECK-NEXT: s_or_saveexec_b64 s[6:7], -1			; CHECK-NEXT: v_readlane_b32 s34, v40, 0
	; CHECK-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; CHECK-NEXT: v_readlane_b32 s35, v40, 1
	; CHECK-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; CHECK-NEXT: v_readlane_b32 s4, v40, 2
				; CHECK-NEXT: s_xor_saveexec_b64 s[6:7], -1
				; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
				; CHECK-NEXT: s_mov_b64 exec, -1
				; CHECK-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; CHECK-NEXT: s_mov_b64 exec, s[6:7]			; CHECK-NEXT: s_mov_b64 exec, s[6:7]
	; CHECK-NEXT: s_add_i32 s32, s32, 0xfffffc00			; CHECK-NEXT: s_add_i32 s32, s32, 0xfffffc00
	; CHECK-NEXT: s_mov_b32 s33, s4			; CHECK-NEXT: s_mov_b32 s33, s4
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	; CHECK-NEXT: .Ltmp2:			; CHECK-NEXT: .Ltmp2:
	entry:			entry:
	call void @_ZL13sleep_foreverv(), !dbg !1646			call void @_ZL13sleep_foreverv(), !dbg !1646
	Show All 22 Lines

llvm/test/CodeGen/AMDGPU/partial-sgpr-to-vgpr-spills.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -O0 -march=amdgcn -mcpu=hawaii -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s		; RUN: llc -O0 -march=amdgcn -mcpu=hawaii -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s

; FIXME: we should disable sdwa peephole because dead-code elimination, that		; FIXME: we should disable sdwa peephole because dead-code elimination, that
; runs after peephole, ruins this test (different register numbers)		; runs after peephole, ruins this test (different register numbers)

; Spill all SGPRs so multiple VGPRs are required for spilling all of them.		; Spill all SGPRs so multiple VGPRs are required for spilling all of them.

; Ideally we only need 2 VGPRs for all spilling. The VGPRs are		; Ideally we only need 2 VGPRs for all spilling. The VGPRs are
; allocated per-frame index, so it's possible to get up with more.		; allocated per-frame index, so it's possible to get up with more.
define amdgpu_kernel void @spill_sgprs_to_multiple_vgprs(i32 addrspace(1)* %out, i32 %in) #0 {		define amdgpu_kernel void @spill_sgprs_to_multiple_vgprs(i32 addrspace(1)* %out, i32 %in) #0 {
; GCN-LABEL: spill_sgprs_to_multiple_vgprs:		; GCN-LABEL: spill_sgprs_to_multiple_vgprs:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
		; GCN-NEXT: s_mov_b32 s92, SCRATCH_RSRC_DWORD0
		; GCN-NEXT: s_mov_b32 s93, SCRATCH_RSRC_DWORD1
		; GCN-NEXT: s_mov_b32 s94, -1
		; GCN-NEXT: s_mov_b32 s95, 0xe8f000
		; GCN-NEXT: s_add_u32 s92, s92, s3
		; GCN-NEXT: s_addc_u32 s93, s93, 0
; GCN-NEXT: s_load_dword s0, s[0:1], 0xb		; GCN-NEXT: s_load_dword s0, s[0:1], 0xb
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:11]		; GCN-NEXT: ; def s[4:11]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
		; GCN-NEXT: ; implicit-def: $vgpr0
; GCN-NEXT: v_writelane_b32 v0, s4, 0		; GCN-NEXT: v_writelane_b32 v0, s4, 0
; GCN-NEXT: v_writelane_b32 v0, s5, 1		; GCN-NEXT: v_writelane_b32 v0, s5, 1
; GCN-NEXT: v_writelane_b32 v0, s6, 2		; GCN-NEXT: v_writelane_b32 v0, s6, 2
; GCN-NEXT: v_writelane_b32 v0, s7, 3		; GCN-NEXT: v_writelane_b32 v0, s7, 3
; GCN-NEXT: v_writelane_b32 v0, s8, 4		; GCN-NEXT: v_writelane_b32 v0, s8, 4
; GCN-NEXT: v_writelane_b32 v0, s9, 5		; GCN-NEXT: v_writelane_b32 v0, s9, 5
; GCN-NEXT: v_writelane_b32 v0, s10, 6		; GCN-NEXT: v_writelane_b32 v0, s10, 6
; GCN-NEXT: v_writelane_b32 v0, s11, 7		; GCN-NEXT: v_writelane_b32 v0, s11, 7
▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines
; GCN-NEXT: v_writelane_b32 v0, s4, 56		; GCN-NEXT: v_writelane_b32 v0, s4, 56
; GCN-NEXT: v_writelane_b32 v0, s5, 57		; GCN-NEXT: v_writelane_b32 v0, s5, 57
; GCN-NEXT: v_writelane_b32 v0, s6, 58		; GCN-NEXT: v_writelane_b32 v0, s6, 58
; GCN-NEXT: v_writelane_b32 v0, s7, 59		; GCN-NEXT: v_writelane_b32 v0, s7, 59
; GCN-NEXT: v_writelane_b32 v0, s8, 60		; GCN-NEXT: v_writelane_b32 v0, s8, 60
; GCN-NEXT: v_writelane_b32 v0, s9, 61		; GCN-NEXT: v_writelane_b32 v0, s9, 61
; GCN-NEXT: v_writelane_b32 v0, s10, 62		; GCN-NEXT: v_writelane_b32 v0, s10, 62
; GCN-NEXT: v_writelane_b32 v0, s11, 63		; GCN-NEXT: v_writelane_b32 v0, s11, 63
		; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1
		; GCN-NEXT: buffer_store_dword v0, off, s[92:95], 0 offset:12 ; 4-byte Folded Spill
		; GCN-NEXT: s_mov_b64 exec, s[34:35]
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:11]		; GCN-NEXT: ; def s[4:11]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v1, s4, 0		; GCN-NEXT: ; implicit-def: $vgpr0
; GCN-NEXT: v_writelane_b32 v1, s5, 1		; GCN-NEXT: v_writelane_b32 v0, s4, 0
; GCN-NEXT: v_writelane_b32 v1, s6, 2		; GCN-NEXT: v_writelane_b32 v0, s5, 1
; GCN-NEXT: v_writelane_b32 v1, s7, 3		; GCN-NEXT: v_writelane_b32 v0, s6, 2
; GCN-NEXT: v_writelane_b32 v1, s8, 4		; GCN-NEXT: v_writelane_b32 v0, s7, 3
; GCN-NEXT: v_writelane_b32 v1, s9, 5		; GCN-NEXT: v_writelane_b32 v0, s8, 4
; GCN-NEXT: v_writelane_b32 v1, s10, 6		; GCN-NEXT: v_writelane_b32 v0, s9, 5
; GCN-NEXT: v_writelane_b32 v1, s11, 7		; GCN-NEXT: v_writelane_b32 v0, s10, 6
		; GCN-NEXT: v_writelane_b32 v0, s11, 7
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:11]		; GCN-NEXT: ; def s[4:11]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v1, s4, 8		; GCN-NEXT: v_writelane_b32 v0, s4, 8
; GCN-NEXT: v_writelane_b32 v1, s5, 9		; GCN-NEXT: v_writelane_b32 v0, s5, 9
; GCN-NEXT: v_writelane_b32 v1, s6, 10		; GCN-NEXT: v_writelane_b32 v0, s6, 10
; GCN-NEXT: v_writelane_b32 v1, s7, 11		; GCN-NEXT: v_writelane_b32 v0, s7, 11
; GCN-NEXT: v_writelane_b32 v1, s8, 12		; GCN-NEXT: v_writelane_b32 v0, s8, 12
; GCN-NEXT: v_writelane_b32 v1, s9, 13		; GCN-NEXT: v_writelane_b32 v0, s9, 13
; GCN-NEXT: v_writelane_b32 v1, s10, 14		; GCN-NEXT: v_writelane_b32 v0, s10, 14
; GCN-NEXT: v_writelane_b32 v1, s11, 15		; GCN-NEXT: v_writelane_b32 v0, s11, 15
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:11]		; GCN-NEXT: ; def s[4:11]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v1, s4, 16		; GCN-NEXT: v_writelane_b32 v0, s4, 16
; GCN-NEXT: v_writelane_b32 v1, s5, 17		; GCN-NEXT: v_writelane_b32 v0, s5, 17
; GCN-NEXT: v_writelane_b32 v1, s6, 18		; GCN-NEXT: v_writelane_b32 v0, s6, 18
; GCN-NEXT: v_writelane_b32 v1, s7, 19		; GCN-NEXT: v_writelane_b32 v0, s7, 19
; GCN-NEXT: v_writelane_b32 v1, s8, 20		; GCN-NEXT: v_writelane_b32 v0, s8, 20
; GCN-NEXT: v_writelane_b32 v1, s9, 21		; GCN-NEXT: v_writelane_b32 v0, s9, 21
; GCN-NEXT: v_writelane_b32 v1, s10, 22		; GCN-NEXT: v_writelane_b32 v0, s10, 22
; GCN-NEXT: v_writelane_b32 v1, s11, 23		; GCN-NEXT: v_writelane_b32 v0, s11, 23
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:11]		; GCN-NEXT: ; def s[4:11]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v1, s4, 24		; GCN-NEXT: v_writelane_b32 v0, s4, 24
; GCN-NEXT: v_writelane_b32 v1, s5, 25		; GCN-NEXT: v_writelane_b32 v0, s5, 25
; GCN-NEXT: v_writelane_b32 v1, s6, 26		; GCN-NEXT: v_writelane_b32 v0, s6, 26
; GCN-NEXT: v_writelane_b32 v1, s7, 27		; GCN-NEXT: v_writelane_b32 v0, s7, 27
; GCN-NEXT: v_writelane_b32 v1, s8, 28		; GCN-NEXT: v_writelane_b32 v0, s8, 28
; GCN-NEXT: v_writelane_b32 v1, s9, 29		; GCN-NEXT: v_writelane_b32 v0, s9, 29
; GCN-NEXT: v_writelane_b32 v1, s10, 30		; GCN-NEXT: v_writelane_b32 v0, s10, 30
; GCN-NEXT: v_writelane_b32 v1, s11, 31		; GCN-NEXT: v_writelane_b32 v0, s11, 31
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:11]		; GCN-NEXT: ; def s[4:11]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v1, s4, 32		; GCN-NEXT: v_writelane_b32 v0, s4, 32
; GCN-NEXT: v_writelane_b32 v1, s5, 33		; GCN-NEXT: v_writelane_b32 v0, s5, 33
; GCN-NEXT: v_writelane_b32 v1, s6, 34		; GCN-NEXT: v_writelane_b32 v0, s6, 34
; GCN-NEXT: v_writelane_b32 v1, s7, 35		; GCN-NEXT: v_writelane_b32 v0, s7, 35
; GCN-NEXT: v_writelane_b32 v1, s8, 36		; GCN-NEXT: v_writelane_b32 v0, s8, 36
; GCN-NEXT: v_writelane_b32 v1, s9, 37		; GCN-NEXT: v_writelane_b32 v0, s9, 37
; GCN-NEXT: v_writelane_b32 v1, s10, 38		; GCN-NEXT: v_writelane_b32 v0, s10, 38
; GCN-NEXT: v_writelane_b32 v1, s11, 39		; GCN-NEXT: v_writelane_b32 v0, s11, 39
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:11]		; GCN-NEXT: ; def s[4:11]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v1, s4, 40		; GCN-NEXT: v_writelane_b32 v0, s4, 40
; GCN-NEXT: v_writelane_b32 v1, s5, 41		; GCN-NEXT: v_writelane_b32 v0, s5, 41
; GCN-NEXT: v_writelane_b32 v1, s6, 42		; GCN-NEXT: v_writelane_b32 v0, s6, 42
; GCN-NEXT: v_writelane_b32 v1, s7, 43		; GCN-NEXT: v_writelane_b32 v0, s7, 43
; GCN-NEXT: v_writelane_b32 v1, s8, 44		; GCN-NEXT: v_writelane_b32 v0, s8, 44
; GCN-NEXT: v_writelane_b32 v1, s9, 45		; GCN-NEXT: v_writelane_b32 v0, s9, 45
; GCN-NEXT: v_writelane_b32 v1, s10, 46		; GCN-NEXT: v_writelane_b32 v0, s10, 46
; GCN-NEXT: v_writelane_b32 v1, s11, 47		; GCN-NEXT: v_writelane_b32 v0, s11, 47
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:11]		; GCN-NEXT: ; def s[4:11]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v1, s4, 48		; GCN-NEXT: v_writelane_b32 v0, s4, 48
; GCN-NEXT: v_writelane_b32 v1, s5, 49		; GCN-NEXT: v_writelane_b32 v0, s5, 49
; GCN-NEXT: v_writelane_b32 v1, s6, 50		; GCN-NEXT: v_writelane_b32 v0, s6, 50
; GCN-NEXT: v_writelane_b32 v1, s7, 51		; GCN-NEXT: v_writelane_b32 v0, s7, 51
; GCN-NEXT: v_writelane_b32 v1, s8, 52		; GCN-NEXT: v_writelane_b32 v0, s8, 52
; GCN-NEXT: v_writelane_b32 v1, s9, 53		; GCN-NEXT: v_writelane_b32 v0, s9, 53
; GCN-NEXT: v_writelane_b32 v1, s10, 54		; GCN-NEXT: v_writelane_b32 v0, s10, 54
; GCN-NEXT: v_writelane_b32 v1, s11, 55		; GCN-NEXT: v_writelane_b32 v0, s11, 55
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:11]		; GCN-NEXT: ; def s[4:11]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v1, s4, 56		; GCN-NEXT: v_writelane_b32 v0, s4, 56
; GCN-NEXT: v_writelane_b32 v1, s5, 57		; GCN-NEXT: v_writelane_b32 v0, s5, 57
; GCN-NEXT: v_writelane_b32 v1, s6, 58		; GCN-NEXT: v_writelane_b32 v0, s6, 58
; GCN-NEXT: v_writelane_b32 v1, s7, 59		; GCN-NEXT: v_writelane_b32 v0, s7, 59
; GCN-NEXT: v_writelane_b32 v1, s8, 60		; GCN-NEXT: v_writelane_b32 v0, s8, 60
; GCN-NEXT: v_writelane_b32 v1, s9, 61		; GCN-NEXT: v_writelane_b32 v0, s9, 61
; GCN-NEXT: v_writelane_b32 v1, s10, 62		; GCN-NEXT: v_writelane_b32 v0, s10, 62
; GCN-NEXT: v_writelane_b32 v1, s11, 63		; GCN-NEXT: v_writelane_b32 v0, s11, 63
		; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1
		; GCN-NEXT: buffer_store_dword v0, off, s[92:95], 0 offset:8 ; 4-byte Folded Spill
		; GCN-NEXT: s_mov_b64 exec, s[34:35]
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:11]		; GCN-NEXT: ; def s[4:11]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v2, s4, 0		; GCN-NEXT: ; implicit-def: $vgpr0
; GCN-NEXT: v_writelane_b32 v2, s5, 1		; GCN-NEXT: v_writelane_b32 v0, s4, 0
; GCN-NEXT: v_writelane_b32 v2, s6, 2		; GCN-NEXT: v_writelane_b32 v0, s5, 1
; GCN-NEXT: v_writelane_b32 v2, s7, 3		; GCN-NEXT: v_writelane_b32 v0, s6, 2
; GCN-NEXT: v_writelane_b32 v2, s8, 4		; GCN-NEXT: v_writelane_b32 v0, s7, 3
; GCN-NEXT: v_writelane_b32 v2, s9, 5		; GCN-NEXT: v_writelane_b32 v0, s8, 4
; GCN-NEXT: v_writelane_b32 v2, s10, 6		; GCN-NEXT: v_writelane_b32 v0, s9, 5
; GCN-NEXT: v_writelane_b32 v2, s11, 7		; GCN-NEXT: v_writelane_b32 v0, s10, 6
		; GCN-NEXT: v_writelane_b32 v0, s11, 7
		; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1
		; GCN-NEXT: buffer_store_dword v0, off, s[92:95], 0 offset:4 ; 4-byte Folded Spill
		; GCN-NEXT: s_mov_b64 exec, s[34:35]
; GCN-NEXT: s_mov_b32 s1, 0		; GCN-NEXT: s_mov_b32 s1, 0
; GCN-NEXT: s_waitcnt lgkmcnt(0)		; GCN-NEXT: s_waitcnt lgkmcnt(0)
; GCN-NEXT: s_cmp_lg_u32 s0, s1		; GCN-NEXT: s_cmp_lg_u32 s0, s1
; GCN-NEXT: s_cbranch_scc1 .LBB0_2		; GCN-NEXT: s_cbranch_scc1 .LBB0_2
; GCN-NEXT: ; %bb.1: ; %bb0		; GCN-NEXT: ; %bb.1: ; %bb0
; GCN-NEXT: v_readlane_b32 s8, v1, 56		; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1
; GCN-NEXT: v_readlane_b32 s9, v1, 57		; GCN-NEXT: buffer_load_dword v0, off, s[92:95], 0 offset:4 ; 4-byte Folded Reload
; GCN-NEXT: v_readlane_b32 s10, v1, 58		; GCN-NEXT: s_mov_b64 exec, s[34:35]
; GCN-NEXT: v_readlane_b32 s11, v1, 59		; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1
; GCN-NEXT: v_readlane_b32 s12, v1, 60		; GCN-NEXT: buffer_load_dword v1, off, s[92:95], 0 offset:12 ; 4-byte Folded Reload
; GCN-NEXT: v_readlane_b32 s13, v1, 61		; GCN-NEXT: s_mov_b64 exec, s[34:35]
; GCN-NEXT: v_readlane_b32 s14, v1, 62		; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1
; GCN-NEXT: v_readlane_b32 s15, v1, 63		; GCN-NEXT: buffer_load_dword v2, off, s[92:95], 0 offset:8 ; 4-byte Folded Reload
; GCN-NEXT: v_readlane_b32 s16, v1, 48		; GCN-NEXT: s_mov_b64 exec, s[34:35]
; GCN-NEXT: v_readlane_b32 s17, v1, 49		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: v_readlane_b32 s18, v1, 50		; GCN-NEXT: v_readlane_b32 s8, v2, 56
; GCN-NEXT: v_readlane_b32 s19, v1, 51		; GCN-NEXT: v_readlane_b32 s9, v2, 57
; GCN-NEXT: v_readlane_b32 s20, v1, 52		; GCN-NEXT: v_readlane_b32 s10, v2, 58
; GCN-NEXT: v_readlane_b32 s21, v1, 53		; GCN-NEXT: v_readlane_b32 s11, v2, 59
; GCN-NEXT: v_readlane_b32 s22, v1, 54		; GCN-NEXT: v_readlane_b32 s12, v2, 60
; GCN-NEXT: v_readlane_b32 s23, v1, 55		; GCN-NEXT: v_readlane_b32 s13, v2, 61
; GCN-NEXT: v_readlane_b32 s24, v1, 40		; GCN-NEXT: v_readlane_b32 s14, v2, 62
; GCN-NEXT: v_readlane_b32 s25, v1, 41		; GCN-NEXT: v_readlane_b32 s15, v2, 63
; GCN-NEXT: v_readlane_b32 s26, v1, 42		; GCN-NEXT: v_readlane_b32 s16, v2, 48
; GCN-NEXT: v_readlane_b32 s27, v1, 43		; GCN-NEXT: v_readlane_b32 s17, v2, 49
; GCN-NEXT: v_readlane_b32 s28, v1, 44		; GCN-NEXT: v_readlane_b32 s18, v2, 50
; GCN-NEXT: v_readlane_b32 s29, v1, 45		; GCN-NEXT: v_readlane_b32 s19, v2, 51
; GCN-NEXT: v_readlane_b32 s30, v1, 46		; GCN-NEXT: v_readlane_b32 s20, v2, 52
; GCN-NEXT: v_readlane_b32 s31, v1, 47		; GCN-NEXT: v_readlane_b32 s21, v2, 53
; GCN-NEXT: v_readlane_b32 s36, v1, 32		; GCN-NEXT: v_readlane_b32 s22, v2, 54
; GCN-NEXT: v_readlane_b32 s37, v1, 33		; GCN-NEXT: v_readlane_b32 s23, v2, 55
; GCN-NEXT: v_readlane_b32 s38, v1, 34		; GCN-NEXT: v_readlane_b32 s24, v2, 40
; GCN-NEXT: v_readlane_b32 s39, v1, 35		; GCN-NEXT: v_readlane_b32 s25, v2, 41
; GCN-NEXT: v_readlane_b32 s40, v1, 36		; GCN-NEXT: v_readlane_b32 s26, v2, 42
; GCN-NEXT: v_readlane_b32 s41, v1, 37		; GCN-NEXT: v_readlane_b32 s27, v2, 43
; GCN-NEXT: v_readlane_b32 s42, v1, 38		; GCN-NEXT: v_readlane_b32 s28, v2, 44
; GCN-NEXT: v_readlane_b32 s43, v1, 39		; GCN-NEXT: v_readlane_b32 s29, v2, 45
; GCN-NEXT: v_readlane_b32 s44, v1, 24		; GCN-NEXT: v_readlane_b32 s30, v2, 46
; GCN-NEXT: v_readlane_b32 s45, v1, 25		; GCN-NEXT: v_readlane_b32 s31, v2, 47
; GCN-NEXT: v_readlane_b32 s46, v1, 26		; GCN-NEXT: v_readlane_b32 s36, v2, 32
; GCN-NEXT: v_readlane_b32 s47, v1, 27		; GCN-NEXT: v_readlane_b32 s37, v2, 33
; GCN-NEXT: v_readlane_b32 s48, v1, 28		; GCN-NEXT: v_readlane_b32 s38, v2, 34
; GCN-NEXT: v_readlane_b32 s49, v1, 29		; GCN-NEXT: v_readlane_b32 s39, v2, 35
; GCN-NEXT: v_readlane_b32 s50, v1, 30		; GCN-NEXT: v_readlane_b32 s40, v2, 36
; GCN-NEXT: v_readlane_b32 s51, v1, 31		; GCN-NEXT: v_readlane_b32 s41, v2, 37
; GCN-NEXT: v_readlane_b32 s52, v1, 16		; GCN-NEXT: v_readlane_b32 s42, v2, 38
; GCN-NEXT: v_readlane_b32 s53, v1, 17		; GCN-NEXT: v_readlane_b32 s43, v2, 39
; GCN-NEXT: v_readlane_b32 s54, v1, 18		; GCN-NEXT: v_readlane_b32 s44, v2, 24
; GCN-NEXT: v_readlane_b32 s55, v1, 19		; GCN-NEXT: v_readlane_b32 s45, v2, 25
; GCN-NEXT: v_readlane_b32 s56, v1, 20		; GCN-NEXT: v_readlane_b32 s46, v2, 26
; GCN-NEXT: v_readlane_b32 s57, v1, 21		; GCN-NEXT: v_readlane_b32 s47, v2, 27
; GCN-NEXT: v_readlane_b32 s58, v1, 22		; GCN-NEXT: v_readlane_b32 s48, v2, 28
; GCN-NEXT: v_readlane_b32 s59, v1, 23		; GCN-NEXT: v_readlane_b32 s49, v2, 29
; GCN-NEXT: v_readlane_b32 s60, v1, 8		; GCN-NEXT: v_readlane_b32 s50, v2, 30
; GCN-NEXT: v_readlane_b32 s61, v1, 9		; GCN-NEXT: v_readlane_b32 s51, v2, 31
; GCN-NEXT: v_readlane_b32 s62, v1, 10		; GCN-NEXT: v_readlane_b32 s52, v2, 16
; GCN-NEXT: v_readlane_b32 s63, v1, 11		; GCN-NEXT: v_readlane_b32 s53, v2, 17
; GCN-NEXT: v_readlane_b32 s64, v1, 12		; GCN-NEXT: v_readlane_b32 s54, v2, 18
; GCN-NEXT: v_readlane_b32 s65, v1, 13		; GCN-NEXT: v_readlane_b32 s55, v2, 19
; GCN-NEXT: v_readlane_b32 s66, v1, 14		; GCN-NEXT: v_readlane_b32 s56, v2, 20
; GCN-NEXT: v_readlane_b32 s67, v1, 15		; GCN-NEXT: v_readlane_b32 s57, v2, 21
; GCN-NEXT: v_readlane_b32 s68, v1, 0		; GCN-NEXT: v_readlane_b32 s58, v2, 22
; GCN-NEXT: v_readlane_b32 s69, v1, 1		; GCN-NEXT: v_readlane_b32 s59, v2, 23
; GCN-NEXT: v_readlane_b32 s70, v1, 2		; GCN-NEXT: v_readlane_b32 s60, v2, 8
; GCN-NEXT: v_readlane_b32 s71, v1, 3		; GCN-NEXT: v_readlane_b32 s61, v2, 9
; GCN-NEXT: v_readlane_b32 s72, v1, 4		; GCN-NEXT: v_readlane_b32 s62, v2, 10
; GCN-NEXT: v_readlane_b32 s73, v1, 5		; GCN-NEXT: v_readlane_b32 s63, v2, 11
; GCN-NEXT: v_readlane_b32 s74, v1, 6		; GCN-NEXT: v_readlane_b32 s64, v2, 12
; GCN-NEXT: v_readlane_b32 s75, v1, 7		; GCN-NEXT: v_readlane_b32 s65, v2, 13
; GCN-NEXT: v_readlane_b32 s76, v0, 56		; GCN-NEXT: v_readlane_b32 s66, v2, 14
; GCN-NEXT: v_readlane_b32 s77, v0, 57		; GCN-NEXT: v_readlane_b32 s67, v2, 15
; GCN-NEXT: v_readlane_b32 s78, v0, 58		; GCN-NEXT: v_readlane_b32 s68, v2, 0
; GCN-NEXT: v_readlane_b32 s79, v0, 59		; GCN-NEXT: v_readlane_b32 s69, v2, 1
; GCN-NEXT: v_readlane_b32 s80, v0, 60		; GCN-NEXT: v_readlane_b32 s70, v2, 2
; GCN-NEXT: v_readlane_b32 s81, v0, 61		; GCN-NEXT: v_readlane_b32 s71, v2, 3
; GCN-NEXT: v_readlane_b32 s82, v0, 62		; GCN-NEXT: v_readlane_b32 s72, v2, 4
; GCN-NEXT: v_readlane_b32 s83, v0, 63		; GCN-NEXT: v_readlane_b32 s73, v2, 5
; GCN-NEXT: v_readlane_b32 s84, v0, 48		; GCN-NEXT: v_readlane_b32 s74, v2, 6
; GCN-NEXT: v_readlane_b32 s85, v0, 49		; GCN-NEXT: v_readlane_b32 s75, v2, 7
; GCN-NEXT: v_readlane_b32 s86, v0, 50		; GCN-NEXT: v_readlane_b32 s76, v1, 56
; GCN-NEXT: v_readlane_b32 s87, v0, 51		; GCN-NEXT: v_readlane_b32 s77, v1, 57
; GCN-NEXT: v_readlane_b32 s88, v0, 52		; GCN-NEXT: v_readlane_b32 s78, v1, 58
; GCN-NEXT: v_readlane_b32 s89, v0, 53		; GCN-NEXT: v_readlane_b32 s79, v1, 59
; GCN-NEXT: v_readlane_b32 s90, v0, 54		; GCN-NEXT: v_readlane_b32 s80, v1, 60
; GCN-NEXT: v_readlane_b32 s91, v0, 55		; GCN-NEXT: v_readlane_b32 s81, v1, 61
; GCN-NEXT: v_readlane_b32 s0, v0, 0		; GCN-NEXT: v_readlane_b32 s82, v1, 62
; GCN-NEXT: v_readlane_b32 s1, v0, 1		; GCN-NEXT: v_readlane_b32 s83, v1, 63
; GCN-NEXT: v_readlane_b32 s2, v0, 2		; GCN-NEXT: v_readlane_b32 s84, v1, 48
; GCN-NEXT: v_readlane_b32 s3, v0, 3		; GCN-NEXT: v_readlane_b32 s85, v1, 49
; GCN-NEXT: v_readlane_b32 s4, v0, 4		; GCN-NEXT: v_readlane_b32 s86, v1, 50
; GCN-NEXT: v_readlane_b32 s5, v0, 5		; GCN-NEXT: v_readlane_b32 s87, v1, 51
; GCN-NEXT: v_readlane_b32 s6, v0, 6		; GCN-NEXT: v_readlane_b32 s88, v1, 52
; GCN-NEXT: v_readlane_b32 s7, v0, 7		; GCN-NEXT: v_readlane_b32 s89, v1, 53
		; GCN-NEXT: v_readlane_b32 s90, v1, 54
		; GCN-NEXT: v_readlane_b32 s91, v1, 55
		; GCN-NEXT: v_readlane_b32 s0, v1, 0
		; GCN-NEXT: v_readlane_b32 s1, v1, 1
		; GCN-NEXT: v_readlane_b32 s2, v1, 2
		; GCN-NEXT: v_readlane_b32 s3, v1, 3
		; GCN-NEXT: v_readlane_b32 s4, v1, 4
		; GCN-NEXT: v_readlane_b32 s5, v1, 5
		; GCN-NEXT: v_readlane_b32 s6, v1, 6
		; GCN-NEXT: v_readlane_b32 s7, v1, 7
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:7]		; GCN-NEXT: ; use s[0:7]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_readlane_b32 s0, v0, 8		; GCN-NEXT: v_readlane_b32 s0, v1, 8
; GCN-NEXT: v_readlane_b32 s1, v0, 9		; GCN-NEXT: v_readlane_b32 s1, v1, 9
; GCN-NEXT: v_readlane_b32 s2, v0, 10		; GCN-NEXT: v_readlane_b32 s2, v1, 10
; GCN-NEXT: v_readlane_b32 s3, v0, 11		; GCN-NEXT: v_readlane_b32 s3, v1, 11
; GCN-NEXT: v_readlane_b32 s4, v0, 12		; GCN-NEXT: v_readlane_b32 s4, v1, 12
; GCN-NEXT: v_readlane_b32 s5, v0, 13		; GCN-NEXT: v_readlane_b32 s5, v1, 13
; GCN-NEXT: v_readlane_b32 s6, v0, 14		; GCN-NEXT: v_readlane_b32 s6, v1, 14
; GCN-NEXT: v_readlane_b32 s7, v0, 15		; GCN-NEXT: v_readlane_b32 s7, v1, 15
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:7]		; GCN-NEXT: ; use s[0:7]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_readlane_b32 s0, v0, 16		; GCN-NEXT: v_readlane_b32 s0, v1, 16
; GCN-NEXT: v_readlane_b32 s1, v0, 17		; GCN-NEXT: v_readlane_b32 s1, v1, 17
; GCN-NEXT: v_readlane_b32 s2, v0, 18		; GCN-NEXT: v_readlane_b32 s2, v1, 18
; GCN-NEXT: v_readlane_b32 s3, v0, 19		; GCN-NEXT: v_readlane_b32 s3, v1, 19
; GCN-NEXT: v_readlane_b32 s4, v0, 20		; GCN-NEXT: v_readlane_b32 s4, v1, 20
; GCN-NEXT: v_readlane_b32 s5, v0, 21		; GCN-NEXT: v_readlane_b32 s5, v1, 21
; GCN-NEXT: v_readlane_b32 s6, v0, 22		; GCN-NEXT: v_readlane_b32 s6, v1, 22
; GCN-NEXT: v_readlane_b32 s7, v0, 23		; GCN-NEXT: v_readlane_b32 s7, v1, 23
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:7]		; GCN-NEXT: ; use s[0:7]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_readlane_b32 s0, v0, 24		; GCN-NEXT: v_readlane_b32 s0, v1, 24
; GCN-NEXT: v_readlane_b32 s1, v0, 25		; GCN-NEXT: v_readlane_b32 s1, v1, 25
; GCN-NEXT: v_readlane_b32 s2, v0, 26		; GCN-NEXT: v_readlane_b32 s2, v1, 26
; GCN-NEXT: v_readlane_b32 s3, v0, 27		; GCN-NEXT: v_readlane_b32 s3, v1, 27
; GCN-NEXT: v_readlane_b32 s4, v0, 28		; GCN-NEXT: v_readlane_b32 s4, v1, 28
; GCN-NEXT: v_readlane_b32 s5, v0, 29		; GCN-NEXT: v_readlane_b32 s5, v1, 29
; GCN-NEXT: v_readlane_b32 s6, v0, 30		; GCN-NEXT: v_readlane_b32 s6, v1, 30
; GCN-NEXT: v_readlane_b32 s7, v0, 31		; GCN-NEXT: v_readlane_b32 s7, v1, 31
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:7]		; GCN-NEXT: ; use s[0:7]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_readlane_b32 s0, v0, 32		; GCN-NEXT: v_readlane_b32 s0, v1, 32
; GCN-NEXT: v_readlane_b32 s1, v0, 33		; GCN-NEXT: v_readlane_b32 s1, v1, 33
; GCN-NEXT: v_readlane_b32 s2, v0, 34		; GCN-NEXT: v_readlane_b32 s2, v1, 34
; GCN-NEXT: v_readlane_b32 s3, v0, 35		; GCN-NEXT: v_readlane_b32 s3, v1, 35
; GCN-NEXT: v_readlane_b32 s4, v0, 36		; GCN-NEXT: v_readlane_b32 s4, v1, 36
; GCN-NEXT: v_readlane_b32 s5, v0, 37		; GCN-NEXT: v_readlane_b32 s5, v1, 37
; GCN-NEXT: v_readlane_b32 s6, v0, 38		; GCN-NEXT: v_readlane_b32 s6, v1, 38
; GCN-NEXT: v_readlane_b32 s7, v0, 39		; GCN-NEXT: v_readlane_b32 s7, v1, 39
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:7]		; GCN-NEXT: ; use s[0:7]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_readlane_b32 s0, v0, 40		; GCN-NEXT: v_readlane_b32 s0, v1, 40
; GCN-NEXT: v_readlane_b32 s1, v0, 41		; GCN-NEXT: v_readlane_b32 s1, v1, 41
; GCN-NEXT: v_readlane_b32 s2, v0, 42		; GCN-NEXT: v_readlane_b32 s2, v1, 42
; GCN-NEXT: v_readlane_b32 s3, v0, 43		; GCN-NEXT: v_readlane_b32 s3, v1, 43
; GCN-NEXT: v_readlane_b32 s4, v0, 44		; GCN-NEXT: v_readlane_b32 s4, v1, 44
; GCN-NEXT: v_readlane_b32 s5, v0, 45		; GCN-NEXT: v_readlane_b32 s5, v1, 45
; GCN-NEXT: v_readlane_b32 s6, v0, 46		; GCN-NEXT: v_readlane_b32 s6, v1, 46
; GCN-NEXT: v_readlane_b32 s7, v0, 47		; GCN-NEXT: v_readlane_b32 s7, v1, 47
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:7]		; GCN-NEXT: ; use s[0:7]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_readlane_b32 s0, v2, 0		; GCN-NEXT: v_readlane_b32 s0, v0, 0
; GCN-NEXT: v_readlane_b32 s1, v2, 1		; GCN-NEXT: v_readlane_b32 s1, v0, 1
; GCN-NEXT: v_readlane_b32 s2, v2, 2		; GCN-NEXT: v_readlane_b32 s2, v0, 2
; GCN-NEXT: v_readlane_b32 s3, v2, 3		; GCN-NEXT: v_readlane_b32 s3, v0, 3
; GCN-NEXT: v_readlane_b32 s4, v2, 4		; GCN-NEXT: v_readlane_b32 s4, v0, 4
; GCN-NEXT: v_readlane_b32 s5, v2, 5		; GCN-NEXT: v_readlane_b32 s5, v0, 5
; GCN-NEXT: v_readlane_b32 s6, v2, 6		; GCN-NEXT: v_readlane_b32 s6, v0, 6
; GCN-NEXT: v_readlane_b32 s7, v2, 7		; GCN-NEXT: v_readlane_b32 s7, v0, 7
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[84:91]		; GCN-NEXT: ; use s[84:91]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[76:83]		; GCN-NEXT: ; use s[76:83]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[68:75]		; GCN-NEXT: ; use s[68:75]
▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	ret:
ret void		ret void
}		}

; Some of the lanes of an SGPR spill are in one VGPR and some forced		; Some of the lanes of an SGPR spill are in one VGPR and some forced
; into the next available VGPR.		; into the next available VGPR.
define amdgpu_kernel void @split_sgpr_spill_2_vgprs(i32 addrspace(1)* %out, i32 %in) #1 {		define amdgpu_kernel void @split_sgpr_spill_2_vgprs(i32 addrspace(1)* %out, i32 %in) #1 {
; GCN-LABEL: split_sgpr_spill_2_vgprs:		; GCN-LABEL: split_sgpr_spill_2_vgprs:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
		; GCN-NEXT: s_mov_b32 s52, SCRATCH_RSRC_DWORD0
		; GCN-NEXT: s_mov_b32 s53, SCRATCH_RSRC_DWORD1
		; GCN-NEXT: s_mov_b32 s54, -1
		; GCN-NEXT: s_mov_b32 s55, 0xe8f000
		; GCN-NEXT: s_add_u32 s52, s52, s3
		; GCN-NEXT: s_addc_u32 s53, s53, 0
; GCN-NEXT: s_load_dword s0, s[0:1], 0xb		; GCN-NEXT: s_load_dword s0, s[0:1], 0xb
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:19]		; GCN-NEXT: ; def s[4:19]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
		; GCN-NEXT: ; implicit-def: $vgpr0
; GCN-NEXT: v_writelane_b32 v0, s4, 0		; GCN-NEXT: v_writelane_b32 v0, s4, 0
; GCN-NEXT: v_writelane_b32 v0, s5, 1		; GCN-NEXT: v_writelane_b32 v0, s5, 1
; GCN-NEXT: v_writelane_b32 v0, s6, 2		; GCN-NEXT: v_writelane_b32 v0, s6, 2
; GCN-NEXT: v_writelane_b32 v0, s7, 3		; GCN-NEXT: v_writelane_b32 v0, s7, 3
; GCN-NEXT: v_writelane_b32 v0, s8, 4		; GCN-NEXT: v_writelane_b32 v0, s8, 4
; GCN-NEXT: v_writelane_b32 v0, s9, 5		; GCN-NEXT: v_writelane_b32 v0, s9, 5
; GCN-NEXT: v_writelane_b32 v0, s10, 6		; GCN-NEXT: v_writelane_b32 v0, s10, 6
; GCN-NEXT: v_writelane_b32 v0, s11, 7		; GCN-NEXT: v_writelane_b32 v0, s11, 7
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines
; GCN-NEXT: v_writelane_b32 v0, s12, 56		; GCN-NEXT: v_writelane_b32 v0, s12, 56
; GCN-NEXT: v_writelane_b32 v0, s13, 57		; GCN-NEXT: v_writelane_b32 v0, s13, 57
; GCN-NEXT: v_writelane_b32 v0, s14, 58		; GCN-NEXT: v_writelane_b32 v0, s14, 58
; GCN-NEXT: v_writelane_b32 v0, s15, 59		; GCN-NEXT: v_writelane_b32 v0, s15, 59
; GCN-NEXT: v_writelane_b32 v0, s16, 60		; GCN-NEXT: v_writelane_b32 v0, s16, 60
; GCN-NEXT: v_writelane_b32 v0, s17, 61		; GCN-NEXT: v_writelane_b32 v0, s17, 61
; GCN-NEXT: v_writelane_b32 v0, s18, 62		; GCN-NEXT: v_writelane_b32 v0, s18, 62
; GCN-NEXT: v_writelane_b32 v0, s19, 63		; GCN-NEXT: v_writelane_b32 v0, s19, 63
		; GCN-NEXT: s_or_saveexec_b64 s[28:29], -1
		; GCN-NEXT: buffer_store_dword v0, off, s[52:55], 0 offset:8 ; 4-byte Folded Spill
		; GCN-NEXT: s_mov_b64 exec, s[28:29]
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:11]		; GCN-NEXT: ; def s[4:11]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v1, s4, 0		; GCN-NEXT: ; implicit-def: $vgpr0
; GCN-NEXT: v_writelane_b32 v1, s5, 1		; GCN-NEXT: v_writelane_b32 v0, s4, 0
; GCN-NEXT: v_writelane_b32 v1, s6, 2		; GCN-NEXT: v_writelane_b32 v0, s5, 1
; GCN-NEXT: v_writelane_b32 v1, s7, 3		; GCN-NEXT: v_writelane_b32 v0, s6, 2
; GCN-NEXT: v_writelane_b32 v1, s8, 4		; GCN-NEXT: v_writelane_b32 v0, s7, 3
; GCN-NEXT: v_writelane_b32 v1, s9, 5		; GCN-NEXT: v_writelane_b32 v0, s8, 4
; GCN-NEXT: v_writelane_b32 v1, s10, 6		; GCN-NEXT: v_writelane_b32 v0, s9, 5
; GCN-NEXT: v_writelane_b32 v1, s11, 7		; GCN-NEXT: v_writelane_b32 v0, s10, 6
		; GCN-NEXT: v_writelane_b32 v0, s11, 7
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[2:3]		; GCN-NEXT: ; def s[2:3]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v1, s2, 8		; GCN-NEXT: v_writelane_b32 v0, s2, 8
; GCN-NEXT: v_writelane_b32 v1, s3, 9		; GCN-NEXT: v_writelane_b32 v0, s3, 9
		; GCN-NEXT: s_or_saveexec_b64 s[28:29], -1
		; GCN-NEXT: buffer_store_dword v0, off, s[52:55], 0 offset:4 ; 4-byte Folded Spill
		; GCN-NEXT: s_mov_b64 exec, s[28:29]
; GCN-NEXT: s_mov_b32 s1, 0		; GCN-NEXT: s_mov_b32 s1, 0
; GCN-NEXT: s_waitcnt lgkmcnt(0)		; GCN-NEXT: s_waitcnt lgkmcnt(0)
; GCN-NEXT: s_cmp_lg_u32 s0, s1		; GCN-NEXT: s_cmp_lg_u32 s0, s1
; GCN-NEXT: s_cbranch_scc1 .LBB1_2		; GCN-NEXT: s_cbranch_scc1 .LBB1_2
; GCN-NEXT: ; %bb.1: ; %bb0		; GCN-NEXT: ; %bb.1: ; %bb0
		; GCN-NEXT: s_or_saveexec_b64 s[28:29], -1
		; GCN-NEXT: buffer_load_dword v0, off, s[52:55], 0 offset:8 ; 4-byte Folded Reload
		; GCN-NEXT: s_mov_b64 exec, s[28:29]
		; GCN-NEXT: s_or_saveexec_b64 s[28:29], -1
		; GCN-NEXT: buffer_load_dword v1, off, s[52:55], 0 offset:4 ; 4-byte Folded Reload
		; GCN-NEXT: s_mov_b64 exec, s[28:29]
		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: v_readlane_b32 s16, v1, 8		; GCN-NEXT: v_readlane_b32 s16, v1, 8
; GCN-NEXT: v_readlane_b32 s17, v1, 9		; GCN-NEXT: v_readlane_b32 s17, v1, 9
; GCN-NEXT: v_readlane_b32 s20, v1, 0		; GCN-NEXT: v_readlane_b32 s20, v1, 0
; GCN-NEXT: v_readlane_b32 s21, v1, 1		; GCN-NEXT: v_readlane_b32 s21, v1, 1
; GCN-NEXT: v_readlane_b32 s22, v1, 2		; GCN-NEXT: v_readlane_b32 s22, v1, 2
; GCN-NEXT: v_readlane_b32 s23, v1, 3		; GCN-NEXT: v_readlane_b32 s23, v1, 3
; GCN-NEXT: v_readlane_b32 s24, v1, 4		; GCN-NEXT: v_readlane_b32 s24, v1, 4
; GCN-NEXT: v_readlane_b32 s25, v1, 5		; GCN-NEXT: v_readlane_b32 s25, v1, 5
▲ Show 20 Lines • Show All 129 Lines • ▼ Show 20 Lines
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:19]		; GCN-NEXT: ; def s[4:19]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v31, s4, 0		; GCN-NEXT: ; implicit-def: $vgpr0
; GCN-NEXT: v_writelane_b32 v31, s5, 1		; GCN-NEXT: v_writelane_b32 v0, s4, 0
; GCN-NEXT: v_writelane_b32 v31, s6, 2		; GCN-NEXT: v_writelane_b32 v0, s5, 1
; GCN-NEXT: v_writelane_b32 v31, s7, 3		; GCN-NEXT: v_writelane_b32 v0, s6, 2
; GCN-NEXT: v_writelane_b32 v31, s8, 4		; GCN-NEXT: v_writelane_b32 v0, s7, 3
; GCN-NEXT: v_writelane_b32 v31, s9, 5		; GCN-NEXT: v_writelane_b32 v0, s8, 4
; GCN-NEXT: v_writelane_b32 v31, s10, 6		; GCN-NEXT: v_writelane_b32 v0, s9, 5
; GCN-NEXT: v_writelane_b32 v31, s11, 7		; GCN-NEXT: v_writelane_b32 v0, s10, 6
; GCN-NEXT: v_writelane_b32 v31, s12, 8		; GCN-NEXT: v_writelane_b32 v0, s11, 7
; GCN-NEXT: v_writelane_b32 v31, s13, 9		; GCN-NEXT: v_writelane_b32 v0, s12, 8
; GCN-NEXT: v_writelane_b32 v31, s14, 10		; GCN-NEXT: v_writelane_b32 v0, s13, 9
; GCN-NEXT: v_writelane_b32 v31, s15, 11		; GCN-NEXT: v_writelane_b32 v0, s14, 10
; GCN-NEXT: v_writelane_b32 v31, s16, 12		; GCN-NEXT: v_writelane_b32 v0, s15, 11
; GCN-NEXT: v_writelane_b32 v31, s17, 13		; GCN-NEXT: v_writelane_b32 v0, s16, 12
; GCN-NEXT: v_writelane_b32 v31, s18, 14		; GCN-NEXT: v_writelane_b32 v0, s17, 13
; GCN-NEXT: v_writelane_b32 v31, s19, 15		; GCN-NEXT: v_writelane_b32 v0, s18, 14
		; GCN-NEXT: v_writelane_b32 v0, s19, 15
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:19]		; GCN-NEXT: ; def s[4:19]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v31, s4, 16		; GCN-NEXT: v_writelane_b32 v0, s4, 16
; GCN-NEXT: v_writelane_b32 v31, s5, 17		; GCN-NEXT: v_writelane_b32 v0, s5, 17
; GCN-NEXT: v_writelane_b32 v31, s6, 18		; GCN-NEXT: v_writelane_b32 v0, s6, 18
; GCN-NEXT: v_writelane_b32 v31, s7, 19		; GCN-NEXT: v_writelane_b32 v0, s7, 19
; GCN-NEXT: v_writelane_b32 v31, s8, 20		; GCN-NEXT: v_writelane_b32 v0, s8, 20
; GCN-NEXT: v_writelane_b32 v31, s9, 21		; GCN-NEXT: v_writelane_b32 v0, s9, 21
; GCN-NEXT: v_writelane_b32 v31, s10, 22		; GCN-NEXT: v_writelane_b32 v0, s10, 22
; GCN-NEXT: v_writelane_b32 v31, s11, 23		; GCN-NEXT: v_writelane_b32 v0, s11, 23
; GCN-NEXT: v_writelane_b32 v31, s12, 24		; GCN-NEXT: v_writelane_b32 v0, s12, 24
; GCN-NEXT: v_writelane_b32 v31, s13, 25		; GCN-NEXT: v_writelane_b32 v0, s13, 25
; GCN-NEXT: v_writelane_b32 v31, s14, 26		; GCN-NEXT: v_writelane_b32 v0, s14, 26
; GCN-NEXT: v_writelane_b32 v31, s15, 27		; GCN-NEXT: v_writelane_b32 v0, s15, 27
; GCN-NEXT: v_writelane_b32 v31, s16, 28		; GCN-NEXT: v_writelane_b32 v0, s16, 28
; GCN-NEXT: v_writelane_b32 v31, s17, 29		; GCN-NEXT: v_writelane_b32 v0, s17, 29
; GCN-NEXT: v_writelane_b32 v31, s18, 30		; GCN-NEXT: v_writelane_b32 v0, s18, 30
; GCN-NEXT: v_writelane_b32 v31, s19, 31		; GCN-NEXT: v_writelane_b32 v0, s19, 31
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:19]		; GCN-NEXT: ; def s[4:19]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v31, s4, 32		; GCN-NEXT: v_writelane_b32 v0, s4, 32
; GCN-NEXT: v_writelane_b32 v31, s5, 33		; GCN-NEXT: v_writelane_b32 v0, s5, 33
; GCN-NEXT: v_writelane_b32 v31, s6, 34		; GCN-NEXT: v_writelane_b32 v0, s6, 34
; GCN-NEXT: v_writelane_b32 v31, s7, 35		; GCN-NEXT: v_writelane_b32 v0, s7, 35
; GCN-NEXT: v_writelane_b32 v31, s8, 36		; GCN-NEXT: v_writelane_b32 v0, s8, 36
; GCN-NEXT: v_writelane_b32 v31, s9, 37		; GCN-NEXT: v_writelane_b32 v0, s9, 37
; GCN-NEXT: v_writelane_b32 v31, s10, 38		; GCN-NEXT: v_writelane_b32 v0, s10, 38
; GCN-NEXT: v_writelane_b32 v31, s11, 39		; GCN-NEXT: v_writelane_b32 v0, s11, 39
; GCN-NEXT: v_writelane_b32 v31, s12, 40		; GCN-NEXT: v_writelane_b32 v0, s12, 40
; GCN-NEXT: v_writelane_b32 v31, s13, 41		; GCN-NEXT: v_writelane_b32 v0, s13, 41
; GCN-NEXT: v_writelane_b32 v31, s14, 42		; GCN-NEXT: v_writelane_b32 v0, s14, 42
; GCN-NEXT: v_writelane_b32 v31, s15, 43		; GCN-NEXT: v_writelane_b32 v0, s15, 43
; GCN-NEXT: v_writelane_b32 v31, s16, 44		; GCN-NEXT: v_writelane_b32 v0, s16, 44
; GCN-NEXT: v_writelane_b32 v31, s17, 45		; GCN-NEXT: v_writelane_b32 v0, s17, 45
; GCN-NEXT: v_writelane_b32 v31, s18, 46		; GCN-NEXT: v_writelane_b32 v0, s18, 46
; GCN-NEXT: v_writelane_b32 v31, s19, 47		; GCN-NEXT: v_writelane_b32 v0, s19, 47
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:19]		; GCN-NEXT: ; def s[4:19]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v31, s4, 48		; GCN-NEXT: v_writelane_b32 v0, s4, 48
; GCN-NEXT: v_writelane_b32 v31, s5, 49		; GCN-NEXT: v_writelane_b32 v0, s5, 49
; GCN-NEXT: v_writelane_b32 v31, s6, 50		; GCN-NEXT: v_writelane_b32 v0, s6, 50
; GCN-NEXT: v_writelane_b32 v31, s7, 51		; GCN-NEXT: v_writelane_b32 v0, s7, 51
; GCN-NEXT: v_writelane_b32 v31, s8, 52		; GCN-NEXT: v_writelane_b32 v0, s8, 52
; GCN-NEXT: v_writelane_b32 v31, s9, 53		; GCN-NEXT: v_writelane_b32 v0, s9, 53
; GCN-NEXT: v_writelane_b32 v31, s10, 54		; GCN-NEXT: v_writelane_b32 v0, s10, 54
; GCN-NEXT: v_writelane_b32 v31, s11, 55		; GCN-NEXT: v_writelane_b32 v0, s11, 55
; GCN-NEXT: v_writelane_b32 v31, s12, 56		; GCN-NEXT: v_writelane_b32 v0, s12, 56
; GCN-NEXT: v_writelane_b32 v31, s13, 57		; GCN-NEXT: v_writelane_b32 v0, s13, 57
; GCN-NEXT: v_writelane_b32 v31, s14, 58		; GCN-NEXT: v_writelane_b32 v0, s14, 58
; GCN-NEXT: v_writelane_b32 v31, s15, 59		; GCN-NEXT: v_writelane_b32 v0, s15, 59
; GCN-NEXT: v_writelane_b32 v31, s16, 60		; GCN-NEXT: v_writelane_b32 v0, s16, 60
; GCN-NEXT: v_writelane_b32 v31, s17, 61		; GCN-NEXT: v_writelane_b32 v0, s17, 61
; GCN-NEXT: v_writelane_b32 v31, s18, 62		; GCN-NEXT: v_writelane_b32 v0, s18, 62
; GCN-NEXT: v_writelane_b32 v31, s19, 63		; GCN-NEXT: v_writelane_b32 v0, s19, 63
		; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1
		; GCN-NEXT: buffer_store_dword v0, off, s[52:55], 0 offset:8 ; 4-byte Folded Spill
		; GCN-NEXT: s_mov_b64 exec, s[34:35]
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[2:3]		; GCN-NEXT: ; def s[2:3]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: s_mov_b64 s[4:5], exec		; GCN-NEXT: ; implicit-def: $vgpr0
; GCN-NEXT: s_mov_b64 exec, 3
; GCN-NEXT: buffer_store_dword v0, off, s[52:55], 0
; GCN-NEXT: v_writelane_b32 v0, s2, 0		; GCN-NEXT: v_writelane_b32 v0, s2, 0
; GCN-NEXT: v_writelane_b32 v0, s3, 1		; GCN-NEXT: v_writelane_b32 v0, s3, 1
		; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1
; GCN-NEXT: buffer_store_dword v0, off, s[52:55], 0 offset:4 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v0, off, s[52:55], 0 offset:4 ; 4-byte Folded Spill
; GCN-NEXT: buffer_load_dword v0, off, s[52:55], 0		; GCN-NEXT: s_mov_b64 exec, s[34:35]
; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_mov_b64 exec, s[4:5]
; GCN-NEXT: s_mov_b32 s1, 0		; GCN-NEXT: s_mov_b32 s1, 0
; GCN-NEXT: s_waitcnt lgkmcnt(0)		; GCN-NEXT: s_waitcnt lgkmcnt(0)
; GCN-NEXT: s_cmp_lg_u32 s0, s1		; GCN-NEXT: s_cmp_lg_u32 s0, s1
; GCN-NEXT: s_cbranch_scc1 .LBB2_2		; GCN-NEXT: s_cbranch_scc1 .LBB2_2
; GCN-NEXT: ; %bb.1: ; %bb0		; GCN-NEXT: ; %bb.1: ; %bb0
; GCN-NEXT: v_readlane_b32 s36, v31, 32		; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1
; GCN-NEXT: v_readlane_b32 s37, v31, 33		; GCN-NEXT: buffer_load_dword v0, off, s[52:55], 0 offset:4 ; 4-byte Folded Reload
; GCN-NEXT: v_readlane_b32 s38, v31, 34		; GCN-NEXT: s_mov_b64 exec, s[34:35]
; GCN-NEXT: v_readlane_b32 s39, v31, 35		; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1
; GCN-NEXT: v_readlane_b32 s40, v31, 36		; GCN-NEXT: buffer_load_dword v1, off, s[52:55], 0 offset:8 ; 4-byte Folded Reload
; GCN-NEXT: v_readlane_b32 s41, v31, 37		; GCN-NEXT: s_mov_b64 exec, s[34:35]
; GCN-NEXT: v_readlane_b32 s42, v31, 38		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: v_readlane_b32 s43, v31, 39		; GCN-NEXT: v_readlane_b32 s36, v1, 32
; GCN-NEXT: v_readlane_b32 s44, v31, 40		; GCN-NEXT: v_readlane_b32 s37, v1, 33
; GCN-NEXT: v_readlane_b32 s45, v31, 41		; GCN-NEXT: v_readlane_b32 s38, v1, 34
; GCN-NEXT: v_readlane_b32 s46, v31, 42		; GCN-NEXT: v_readlane_b32 s39, v1, 35
; GCN-NEXT: v_readlane_b32 s47, v31, 43		; GCN-NEXT: v_readlane_b32 s40, v1, 36
; GCN-NEXT: v_readlane_b32 s48, v31, 44		; GCN-NEXT: v_readlane_b32 s41, v1, 37
; GCN-NEXT: v_readlane_b32 s49, v31, 45		; GCN-NEXT: v_readlane_b32 s42, v1, 38
; GCN-NEXT: v_readlane_b32 s50, v31, 46		; GCN-NEXT: v_readlane_b32 s43, v1, 39
; GCN-NEXT: v_readlane_b32 s51, v31, 47		; GCN-NEXT: v_readlane_b32 s44, v1, 40
; GCN-NEXT: v_readlane_b32 s0, v31, 16		; GCN-NEXT: v_readlane_b32 s45, v1, 41
; GCN-NEXT: v_readlane_b32 s1, v31, 17		; GCN-NEXT: v_readlane_b32 s46, v1, 42
; GCN-NEXT: v_readlane_b32 s2, v31, 18		; GCN-NEXT: v_readlane_b32 s47, v1, 43
; GCN-NEXT: v_readlane_b32 s3, v31, 19		; GCN-NEXT: v_readlane_b32 s48, v1, 44
; GCN-NEXT: v_readlane_b32 s4, v31, 20		; GCN-NEXT: v_readlane_b32 s49, v1, 45
; GCN-NEXT: v_readlane_b32 s5, v31, 21		; GCN-NEXT: v_readlane_b32 s50, v1, 46
; GCN-NEXT: v_readlane_b32 s6, v31, 22		; GCN-NEXT: v_readlane_b32 s51, v1, 47
; GCN-NEXT: v_readlane_b32 s7, v31, 23		; GCN-NEXT: v_readlane_b32 s0, v1, 16
; GCN-NEXT: v_readlane_b32 s8, v31, 24		; GCN-NEXT: v_readlane_b32 s1, v1, 17
; GCN-NEXT: v_readlane_b32 s9, v31, 25		; GCN-NEXT: v_readlane_b32 s2, v1, 18
; GCN-NEXT: v_readlane_b32 s10, v31, 26		; GCN-NEXT: v_readlane_b32 s3, v1, 19
; GCN-NEXT: v_readlane_b32 s11, v31, 27		; GCN-NEXT: v_readlane_b32 s4, v1, 20
; GCN-NEXT: v_readlane_b32 s12, v31, 28		; GCN-NEXT: v_readlane_b32 s5, v1, 21
; GCN-NEXT: v_readlane_b32 s13, v31, 29		; GCN-NEXT: v_readlane_b32 s6, v1, 22
; GCN-NEXT: v_readlane_b32 s14, v31, 30		; GCN-NEXT: v_readlane_b32 s7, v1, 23
; GCN-NEXT: v_readlane_b32 s15, v31, 31		; GCN-NEXT: v_readlane_b32 s8, v1, 24
; GCN-NEXT: v_readlane_b32 s16, v31, 0		; GCN-NEXT: v_readlane_b32 s9, v1, 25
; GCN-NEXT: v_readlane_b32 s17, v31, 1		; GCN-NEXT: v_readlane_b32 s10, v1, 26
; GCN-NEXT: v_readlane_b32 s18, v31, 2		; GCN-NEXT: v_readlane_b32 s11, v1, 27
; GCN-NEXT: v_readlane_b32 s19, v31, 3		; GCN-NEXT: v_readlane_b32 s12, v1, 28
; GCN-NEXT: v_readlane_b32 s20, v31, 4		; GCN-NEXT: v_readlane_b32 s13, v1, 29
; GCN-NEXT: v_readlane_b32 s21, v31, 5		; GCN-NEXT: v_readlane_b32 s14, v1, 30
; GCN-NEXT: v_readlane_b32 s22, v31, 6		; GCN-NEXT: v_readlane_b32 s15, v1, 31
; GCN-NEXT: v_readlane_b32 s23, v31, 7		; GCN-NEXT: v_readlane_b32 s16, v1, 0
; GCN-NEXT: v_readlane_b32 s24, v31, 8		; GCN-NEXT: v_readlane_b32 s17, v1, 1
; GCN-NEXT: v_readlane_b32 s25, v31, 9		; GCN-NEXT: v_readlane_b32 s18, v1, 2
; GCN-NEXT: v_readlane_b32 s26, v31, 10		; GCN-NEXT: v_readlane_b32 s19, v1, 3
; GCN-NEXT: v_readlane_b32 s27, v31, 11		; GCN-NEXT: v_readlane_b32 s20, v1, 4
; GCN-NEXT: v_readlane_b32 s28, v31, 12		; GCN-NEXT: v_readlane_b32 s21, v1, 5
; GCN-NEXT: v_readlane_b32 s29, v31, 13		; GCN-NEXT: v_readlane_b32 s22, v1, 6
; GCN-NEXT: v_readlane_b32 s30, v31, 14		; GCN-NEXT: v_readlane_b32 s23, v1, 7
; GCN-NEXT: v_readlane_b32 s31, v31, 15		; GCN-NEXT: v_readlane_b32 s24, v1, 8
		; GCN-NEXT: v_readlane_b32 s25, v1, 9
		; GCN-NEXT: v_readlane_b32 s26, v1, 10
		; GCN-NEXT: v_readlane_b32 s27, v1, 11
		; GCN-NEXT: v_readlane_b32 s28, v1, 12
		; GCN-NEXT: v_readlane_b32 s29, v1, 13
		; GCN-NEXT: v_readlane_b32 s30, v1, 14
		; GCN-NEXT: v_readlane_b32 s31, v1, 15
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[16:31]		; GCN-NEXT: ; use s[16:31]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:15]		; GCN-NEXT: ; use s[0:15]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_readlane_b32 s4, v31, 48		; GCN-NEXT: v_readlane_b32 s4, v1, 48
; GCN-NEXT: v_readlane_b32 s5, v31, 49		; GCN-NEXT: v_readlane_b32 s5, v1, 49
; GCN-NEXT: v_readlane_b32 s6, v31, 50		; GCN-NEXT: v_readlane_b32 s6, v1, 50
; GCN-NEXT: v_readlane_b32 s7, v31, 51		; GCN-NEXT: v_readlane_b32 s7, v1, 51
; GCN-NEXT: v_readlane_b32 s8, v31, 52		; GCN-NEXT: v_readlane_b32 s8, v1, 52
; GCN-NEXT: v_readlane_b32 s9, v31, 53		; GCN-NEXT: v_readlane_b32 s9, v1, 53
; GCN-NEXT: v_readlane_b32 s10, v31, 54		; GCN-NEXT: v_readlane_b32 s10, v1, 54
; GCN-NEXT: v_readlane_b32 s11, v31, 55		; GCN-NEXT: v_readlane_b32 s11, v1, 55
; GCN-NEXT: v_readlane_b32 s12, v31, 56		; GCN-NEXT: v_readlane_b32 s12, v1, 56
; GCN-NEXT: v_readlane_b32 s13, v31, 57		; GCN-NEXT: v_readlane_b32 s13, v1, 57
; GCN-NEXT: v_readlane_b32 s14, v31, 58		; GCN-NEXT: v_readlane_b32 s14, v1, 58
; GCN-NEXT: v_readlane_b32 s15, v31, 59		; GCN-NEXT: v_readlane_b32 s15, v1, 59
; GCN-NEXT: v_readlane_b32 s16, v31, 60		; GCN-NEXT: v_readlane_b32 s16, v1, 60
; GCN-NEXT: v_readlane_b32 s17, v31, 61		; GCN-NEXT: v_readlane_b32 s17, v1, 61
; GCN-NEXT: v_readlane_b32 s18, v31, 62		; GCN-NEXT: v_readlane_b32 s18, v1, 62
; GCN-NEXT: v_readlane_b32 s19, v31, 63		; GCN-NEXT: v_readlane_b32 s19, v1, 63
; GCN-NEXT: s_mov_b64 s[2:3], exec
; GCN-NEXT: s_mov_b64 exec, 3
; GCN-NEXT: buffer_store_dword v0, off, s[52:55], 0
; GCN-NEXT: buffer_load_dword v0, off, s[52:55], 0 offset:4 ; 4-byte Folded Reload
; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: v_readlane_b32 s0, v0, 0		; GCN-NEXT: v_readlane_b32 s0, v0, 0
; GCN-NEXT: v_readlane_b32 s1, v0, 1		; GCN-NEXT: v_readlane_b32 s1, v0, 1
; GCN-NEXT: buffer_load_dword v0, off, s[52:55], 0
; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_mov_b64 exec, s[2:3]
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[36:51]		; GCN-NEXT: ; use s[36:51]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[4:19]		; GCN-NEXT: ; use s[4:19]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:1]		; GCN-NEXT: ; use s[0:1]
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:19]		; GCN-NEXT: ; def s[4:19]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v31, s4, 0		; GCN-NEXT: ; implicit-def: $vgpr0
; GCN-NEXT: v_writelane_b32 v31, s5, 1		; GCN-NEXT: v_writelane_b32 v0, s4, 0
; GCN-NEXT: v_writelane_b32 v31, s6, 2		; GCN-NEXT: v_writelane_b32 v0, s5, 1
; GCN-NEXT: v_writelane_b32 v31, s7, 3		; GCN-NEXT: v_writelane_b32 v0, s6, 2
; GCN-NEXT: v_writelane_b32 v31, s8, 4		; GCN-NEXT: v_writelane_b32 v0, s7, 3
; GCN-NEXT: v_writelane_b32 v31, s9, 5		; GCN-NEXT: v_writelane_b32 v0, s8, 4
; GCN-NEXT: v_writelane_b32 v31, s10, 6		; GCN-NEXT: v_writelane_b32 v0, s9, 5
; GCN-NEXT: v_writelane_b32 v31, s11, 7		; GCN-NEXT: v_writelane_b32 v0, s10, 6
; GCN-NEXT: v_writelane_b32 v31, s12, 8		; GCN-NEXT: v_writelane_b32 v0, s11, 7
; GCN-NEXT: v_writelane_b32 v31, s13, 9		; GCN-NEXT: v_writelane_b32 v0, s12, 8
; GCN-NEXT: v_writelane_b32 v31, s14, 10		; GCN-NEXT: v_writelane_b32 v0, s13, 9
; GCN-NEXT: v_writelane_b32 v31, s15, 11		; GCN-NEXT: v_writelane_b32 v0, s14, 10
; GCN-NEXT: v_writelane_b32 v31, s16, 12		; GCN-NEXT: v_writelane_b32 v0, s15, 11
; GCN-NEXT: v_writelane_b32 v31, s17, 13		; GCN-NEXT: v_writelane_b32 v0, s16, 12
; GCN-NEXT: v_writelane_b32 v31, s18, 14		; GCN-NEXT: v_writelane_b32 v0, s17, 13
; GCN-NEXT: v_writelane_b32 v31, s19, 15		; GCN-NEXT: v_writelane_b32 v0, s18, 14
		; GCN-NEXT: v_writelane_b32 v0, s19, 15
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:19]		; GCN-NEXT: ; def s[4:19]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v31, s4, 16		; GCN-NEXT: v_writelane_b32 v0, s4, 16
; GCN-NEXT: v_writelane_b32 v31, s5, 17		; GCN-NEXT: v_writelane_b32 v0, s5, 17
; GCN-NEXT: v_writelane_b32 v31, s6, 18		; GCN-NEXT: v_writelane_b32 v0, s6, 18
; GCN-NEXT: v_writelane_b32 v31, s7, 19		; GCN-NEXT: v_writelane_b32 v0, s7, 19
; GCN-NEXT: v_writelane_b32 v31, s8, 20		; GCN-NEXT: v_writelane_b32 v0, s8, 20
; GCN-NEXT: v_writelane_b32 v31, s9, 21		; GCN-NEXT: v_writelane_b32 v0, s9, 21
; GCN-NEXT: v_writelane_b32 v31, s10, 22		; GCN-NEXT: v_writelane_b32 v0, s10, 22
; GCN-NEXT: v_writelane_b32 v31, s11, 23		; GCN-NEXT: v_writelane_b32 v0, s11, 23
; GCN-NEXT: v_writelane_b32 v31, s12, 24		; GCN-NEXT: v_writelane_b32 v0, s12, 24
; GCN-NEXT: v_writelane_b32 v31, s13, 25		; GCN-NEXT: v_writelane_b32 v0, s13, 25
; GCN-NEXT: v_writelane_b32 v31, s14, 26		; GCN-NEXT: v_writelane_b32 v0, s14, 26
; GCN-NEXT: v_writelane_b32 v31, s15, 27		; GCN-NEXT: v_writelane_b32 v0, s15, 27
; GCN-NEXT: v_writelane_b32 v31, s16, 28		; GCN-NEXT: v_writelane_b32 v0, s16, 28
; GCN-NEXT: v_writelane_b32 v31, s17, 29		; GCN-NEXT: v_writelane_b32 v0, s17, 29
; GCN-NEXT: v_writelane_b32 v31, s18, 30		; GCN-NEXT: v_writelane_b32 v0, s18, 30
; GCN-NEXT: v_writelane_b32 v31, s19, 31		; GCN-NEXT: v_writelane_b32 v0, s19, 31
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:19]		; GCN-NEXT: ; def s[4:19]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v31, s4, 32		; GCN-NEXT: v_writelane_b32 v0, s4, 32
; GCN-NEXT: v_writelane_b32 v31, s5, 33		; GCN-NEXT: v_writelane_b32 v0, s5, 33
; GCN-NEXT: v_writelane_b32 v31, s6, 34		; GCN-NEXT: v_writelane_b32 v0, s6, 34
; GCN-NEXT: v_writelane_b32 v31, s7, 35		; GCN-NEXT: v_writelane_b32 v0, s7, 35
; GCN-NEXT: v_writelane_b32 v31, s8, 36		; GCN-NEXT: v_writelane_b32 v0, s8, 36
; GCN-NEXT: v_writelane_b32 v31, s9, 37		; GCN-NEXT: v_writelane_b32 v0, s9, 37
; GCN-NEXT: v_writelane_b32 v31, s10, 38		; GCN-NEXT: v_writelane_b32 v0, s10, 38
; GCN-NEXT: v_writelane_b32 v31, s11, 39		; GCN-NEXT: v_writelane_b32 v0, s11, 39
; GCN-NEXT: v_writelane_b32 v31, s12, 40		; GCN-NEXT: v_writelane_b32 v0, s12, 40
; GCN-NEXT: v_writelane_b32 v31, s13, 41		; GCN-NEXT: v_writelane_b32 v0, s13, 41
; GCN-NEXT: v_writelane_b32 v31, s14, 42		; GCN-NEXT: v_writelane_b32 v0, s14, 42
; GCN-NEXT: v_writelane_b32 v31, s15, 43		; GCN-NEXT: v_writelane_b32 v0, s15, 43
; GCN-NEXT: v_writelane_b32 v31, s16, 44		; GCN-NEXT: v_writelane_b32 v0, s16, 44
; GCN-NEXT: v_writelane_b32 v31, s17, 45		; GCN-NEXT: v_writelane_b32 v0, s17, 45
; GCN-NEXT: v_writelane_b32 v31, s18, 46		; GCN-NEXT: v_writelane_b32 v0, s18, 46
; GCN-NEXT: v_writelane_b32 v31, s19, 47		; GCN-NEXT: v_writelane_b32 v0, s19, 47
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[4:19]		; GCN-NEXT: ; def s[4:19]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v31, s4, 48		; GCN-NEXT: v_writelane_b32 v0, s4, 48
; GCN-NEXT: v_writelane_b32 v31, s5, 49		; GCN-NEXT: v_writelane_b32 v0, s5, 49
; GCN-NEXT: v_writelane_b32 v31, s6, 50		; GCN-NEXT: v_writelane_b32 v0, s6, 50
; GCN-NEXT: v_writelane_b32 v31, s7, 51		; GCN-NEXT: v_writelane_b32 v0, s7, 51
; GCN-NEXT: v_writelane_b32 v31, s8, 52		; GCN-NEXT: v_writelane_b32 v0, s8, 52
; GCN-NEXT: v_writelane_b32 v31, s9, 53		; GCN-NEXT: v_writelane_b32 v0, s9, 53
; GCN-NEXT: v_writelane_b32 v31, s10, 54		; GCN-NEXT: v_writelane_b32 v0, s10, 54
; GCN-NEXT: v_writelane_b32 v31, s11, 55		; GCN-NEXT: v_writelane_b32 v0, s11, 55
; GCN-NEXT: v_writelane_b32 v31, s12, 56		; GCN-NEXT: v_writelane_b32 v0, s12, 56
; GCN-NEXT: v_writelane_b32 v31, s13, 57		; GCN-NEXT: v_writelane_b32 v0, s13, 57
; GCN-NEXT: v_writelane_b32 v31, s14, 58		; GCN-NEXT: v_writelane_b32 v0, s14, 58
; GCN-NEXT: v_writelane_b32 v31, s15, 59		; GCN-NEXT: v_writelane_b32 v0, s15, 59
; GCN-NEXT: v_writelane_b32 v31, s16, 60		; GCN-NEXT: v_writelane_b32 v0, s16, 60
; GCN-NEXT: v_writelane_b32 v31, s17, 61		; GCN-NEXT: v_writelane_b32 v0, s17, 61
; GCN-NEXT: v_writelane_b32 v31, s18, 62		; GCN-NEXT: v_writelane_b32 v0, s18, 62
; GCN-NEXT: v_writelane_b32 v31, s19, 63		; GCN-NEXT: v_writelane_b32 v0, s19, 63
		; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1
		; GCN-NEXT: buffer_store_dword v0, off, s[52:55], 0 offset:8 ; 4-byte Folded Spill
		; GCN-NEXT: s_mov_b64 exec, s[34:35]
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s[2:3]		; GCN-NEXT: ; def s[2:3]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: s_mov_b64 s[4:5], exec		; GCN-NEXT: ; implicit-def: $vgpr0
; GCN-NEXT: s_mov_b64 exec, 3
; GCN-NEXT: buffer_store_dword v0, off, s[52:55], 0
; GCN-NEXT: v_writelane_b32 v0, s2, 0		; GCN-NEXT: v_writelane_b32 v0, s2, 0
; GCN-NEXT: v_writelane_b32 v0, s3, 1		; GCN-NEXT: v_writelane_b32 v0, s3, 1
		; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1
; GCN-NEXT: buffer_store_dword v0, off, s[52:55], 0 offset:4 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v0, off, s[52:55], 0 offset:4 ; 4-byte Folded Spill
; GCN-NEXT: buffer_load_dword v0, off, s[52:55], 0		; GCN-NEXT: s_mov_b64 exec, s[34:35]
; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_mov_b64 exec, s[4:5]
; GCN-NEXT: s_mov_b32 s1, 0		; GCN-NEXT: s_mov_b32 s1, 0
; GCN-NEXT: s_waitcnt lgkmcnt(0)		; GCN-NEXT: s_waitcnt lgkmcnt(0)
; GCN-NEXT: s_cmp_lg_u32 s0, s1		; GCN-NEXT: s_cmp_lg_u32 s0, s1
; GCN-NEXT: s_cbranch_scc1 .LBB3_2		; GCN-NEXT: s_cbranch_scc1 .LBB3_2
; GCN-NEXT: ; %bb.1: ; %bb0		; GCN-NEXT: ; %bb.1: ; %bb0
; GCN-NEXT: v_readlane_b32 s36, v31, 32		; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1
; GCN-NEXT: v_readlane_b32 s37, v31, 33		; GCN-NEXT: buffer_load_dword v1, off, s[52:55], 0 offset:4 ; 4-byte Folded Reload
; GCN-NEXT: v_readlane_b32 s38, v31, 34		; GCN-NEXT: s_mov_b64 exec, s[34:35]
; GCN-NEXT: v_readlane_b32 s39, v31, 35		; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1
; GCN-NEXT: v_readlane_b32 s40, v31, 36		; GCN-NEXT: buffer_load_dword v2, off, s[52:55], 0 offset:8 ; 4-byte Folded Reload
; GCN-NEXT: v_readlane_b32 s41, v31, 37		; GCN-NEXT: s_mov_b64 exec, s[34:35]
; GCN-NEXT: v_readlane_b32 s42, v31, 38		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: v_readlane_b32 s43, v31, 39		; GCN-NEXT: v_readlane_b32 s36, v2, 32
; GCN-NEXT: v_readlane_b32 s44, v31, 40		; GCN-NEXT: v_readlane_b32 s37, v2, 33
; GCN-NEXT: v_readlane_b32 s45, v31, 41		; GCN-NEXT: v_readlane_b32 s38, v2, 34
; GCN-NEXT: v_readlane_b32 s46, v31, 42		; GCN-NEXT: v_readlane_b32 s39, v2, 35
; GCN-NEXT: v_readlane_b32 s47, v31, 43		; GCN-NEXT: v_readlane_b32 s40, v2, 36
; GCN-NEXT: v_readlane_b32 s48, v31, 44		; GCN-NEXT: v_readlane_b32 s41, v2, 37
; GCN-NEXT: v_readlane_b32 s49, v31, 45		; GCN-NEXT: v_readlane_b32 s42, v2, 38
; GCN-NEXT: v_readlane_b32 s50, v31, 46		; GCN-NEXT: v_readlane_b32 s43, v2, 39
; GCN-NEXT: v_readlane_b32 s51, v31, 47		; GCN-NEXT: v_readlane_b32 s44, v2, 40
; GCN-NEXT: v_readlane_b32 s0, v31, 16		; GCN-NEXT: v_readlane_b32 s45, v2, 41
; GCN-NEXT: v_readlane_b32 s1, v31, 17		; GCN-NEXT: v_readlane_b32 s46, v2, 42
; GCN-NEXT: v_readlane_b32 s2, v31, 18		; GCN-NEXT: v_readlane_b32 s47, v2, 43
; GCN-NEXT: v_readlane_b32 s3, v31, 19		; GCN-NEXT: v_readlane_b32 s48, v2, 44
; GCN-NEXT: v_readlane_b32 s4, v31, 20		; GCN-NEXT: v_readlane_b32 s49, v2, 45
; GCN-NEXT: v_readlane_b32 s5, v31, 21		; GCN-NEXT: v_readlane_b32 s50, v2, 46
; GCN-NEXT: v_readlane_b32 s6, v31, 22		; GCN-NEXT: v_readlane_b32 s51, v2, 47
; GCN-NEXT: v_readlane_b32 s7, v31, 23		; GCN-NEXT: v_readlane_b32 s0, v2, 16
; GCN-NEXT: v_readlane_b32 s8, v31, 24		; GCN-NEXT: v_readlane_b32 s1, v2, 17
; GCN-NEXT: v_readlane_b32 s9, v31, 25		; GCN-NEXT: v_readlane_b32 s2, v2, 18
; GCN-NEXT: v_readlane_b32 s10, v31, 26		; GCN-NEXT: v_readlane_b32 s3, v2, 19
; GCN-NEXT: v_readlane_b32 s11, v31, 27		; GCN-NEXT: v_readlane_b32 s4, v2, 20
; GCN-NEXT: v_readlane_b32 s12, v31, 28		; GCN-NEXT: v_readlane_b32 s5, v2, 21
; GCN-NEXT: v_readlane_b32 s13, v31, 29		; GCN-NEXT: v_readlane_b32 s6, v2, 22
; GCN-NEXT: v_readlane_b32 s14, v31, 30		; GCN-NEXT: v_readlane_b32 s7, v2, 23
; GCN-NEXT: v_readlane_b32 s15, v31, 31		; GCN-NEXT: v_readlane_b32 s8, v2, 24
; GCN-NEXT: v_readlane_b32 s16, v31, 0		; GCN-NEXT: v_readlane_b32 s9, v2, 25
; GCN-NEXT: v_readlane_b32 s17, v31, 1		; GCN-NEXT: v_readlane_b32 s10, v2, 26
; GCN-NEXT: v_readlane_b32 s18, v31, 2		; GCN-NEXT: v_readlane_b32 s11, v2, 27
; GCN-NEXT: v_readlane_b32 s19, v31, 3		; GCN-NEXT: v_readlane_b32 s12, v2, 28
; GCN-NEXT: v_readlane_b32 s20, v31, 4		; GCN-NEXT: v_readlane_b32 s13, v2, 29
; GCN-NEXT: v_readlane_b32 s21, v31, 5		; GCN-NEXT: v_readlane_b32 s14, v2, 30
; GCN-NEXT: v_readlane_b32 s22, v31, 6		; GCN-NEXT: v_readlane_b32 s15, v2, 31
; GCN-NEXT: v_readlane_b32 s23, v31, 7		; GCN-NEXT: v_readlane_b32 s16, v2, 0
; GCN-NEXT: v_readlane_b32 s24, v31, 8		; GCN-NEXT: v_readlane_b32 s17, v2, 1
; GCN-NEXT: v_readlane_b32 s25, v31, 9		; GCN-NEXT: v_readlane_b32 s18, v2, 2
; GCN-NEXT: v_readlane_b32 s26, v31, 10		; GCN-NEXT: v_readlane_b32 s19, v2, 3
; GCN-NEXT: v_readlane_b32 s27, v31, 11		; GCN-NEXT: v_readlane_b32 s20, v2, 4
; GCN-NEXT: v_readlane_b32 s28, v31, 12		; GCN-NEXT: v_readlane_b32 s21, v2, 5
; GCN-NEXT: v_readlane_b32 s29, v31, 13		; GCN-NEXT: v_readlane_b32 s22, v2, 6
; GCN-NEXT: v_readlane_b32 s30, v31, 14		; GCN-NEXT: v_readlane_b32 s23, v2, 7
; GCN-NEXT: v_readlane_b32 s31, v31, 15		; GCN-NEXT: v_readlane_b32 s24, v2, 8
		; GCN-NEXT: v_readlane_b32 s25, v2, 9
		; GCN-NEXT: v_readlane_b32 s26, v2, 10
		; GCN-NEXT: v_readlane_b32 s27, v2, 11
		; GCN-NEXT: v_readlane_b32 s28, v2, 12
		; GCN-NEXT: v_readlane_b32 s29, v2, 13
		; GCN-NEXT: v_readlane_b32 s30, v2, 14
		; GCN-NEXT: v_readlane_b32 s31, v2, 15
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def v0		; GCN-NEXT: ; def v0
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[16:31]		; GCN-NEXT: ; use s[16:31]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:15]		; GCN-NEXT: ; use s[0:15]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_readlane_b32 s4, v31, 48		; GCN-NEXT: v_readlane_b32 s4, v2, 48
; GCN-NEXT: v_readlane_b32 s5, v31, 49		; GCN-NEXT: v_readlane_b32 s5, v2, 49
; GCN-NEXT: v_readlane_b32 s6, v31, 50		; GCN-NEXT: v_readlane_b32 s6, v2, 50
; GCN-NEXT: v_readlane_b32 s7, v31, 51		; GCN-NEXT: v_readlane_b32 s7, v2, 51
; GCN-NEXT: v_readlane_b32 s8, v31, 52		; GCN-NEXT: v_readlane_b32 s8, v2, 52
; GCN-NEXT: v_readlane_b32 s9, v31, 53		; GCN-NEXT: v_readlane_b32 s9, v2, 53
; GCN-NEXT: v_readlane_b32 s10, v31, 54		; GCN-NEXT: v_readlane_b32 s10, v2, 54
; GCN-NEXT: v_readlane_b32 s11, v31, 55		; GCN-NEXT: v_readlane_b32 s11, v2, 55
; GCN-NEXT: v_readlane_b32 s12, v31, 56		; GCN-NEXT: v_readlane_b32 s12, v2, 56
; GCN-NEXT: v_readlane_b32 s13, v31, 57		; GCN-NEXT: v_readlane_b32 s13, v2, 57
; GCN-NEXT: v_readlane_b32 s14, v31, 58		; GCN-NEXT: v_readlane_b32 s14, v2, 58
; GCN-NEXT: v_readlane_b32 s15, v31, 59		; GCN-NEXT: v_readlane_b32 s15, v2, 59
; GCN-NEXT: v_readlane_b32 s16, v31, 60		; GCN-NEXT: v_readlane_b32 s16, v2, 60
; GCN-NEXT: v_readlane_b32 s17, v31, 61		; GCN-NEXT: v_readlane_b32 s17, v2, 61
; GCN-NEXT: v_readlane_b32 s18, v31, 62		; GCN-NEXT: v_readlane_b32 s18, v2, 62
; GCN-NEXT: v_readlane_b32 s19, v31, 63		; GCN-NEXT: v_readlane_b32 s19, v2, 63
; GCN-NEXT: s_mov_b64 s[2:3], exec
; GCN-NEXT: s_mov_b64 exec, 3
; GCN-NEXT: buffer_store_dword v1, off, s[52:55], 0
; GCN-NEXT: buffer_load_dword v1, off, s[52:55], 0 offset:4 ; 4-byte Folded Reload
; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: v_readlane_b32 s0, v1, 0		; GCN-NEXT: v_readlane_b32 s0, v1, 0
; GCN-NEXT: v_readlane_b32 s1, v1, 1		; GCN-NEXT: v_readlane_b32 s1, v1, 1
; GCN-NEXT: buffer_load_dword v1, off, s[52:55], 0
; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_mov_b64 exec, s[2:3]
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[36:51]		; GCN-NEXT: ; use s[36:51]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[4:19]		; GCN-NEXT: ; use s[4:19]
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s[0:1]		; GCN-NEXT: ; use s[0:1]
Show All 37 Lines

llvm/test/CodeGen/AMDGPU/scc-clobbered-sgpr-to-vmem-spill.ll

	; RUN: not --crash llc -mtriple=amdgcn--amdhsa -mcpu=gfx900 -verify-machineinstrs -o /dev/null %s 2>&1 \| FileCheck %s			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck %s

				; This was a negative test to catch an extreme case when all options are exhausted
				; while trying to spill SGPRs to memory. After we enabled SGPR spills into virtual VGPRs
				; the edge case won't arise and the test would always compile.

	; This ends up needing to spill SGPRs to memory, and also does not
	; have any free SGPRs available to save the exec mask when doing so.
	; The register scavenger also needs to use the emergency stack slot,
	; which tries to place the scavenged register restore instruction as
	; far the block as possible, near the terminator. This places a
	; restore instruction between the condition and the conditional
	; branch, which gets expanded into a sequence involving s_not_b64 on
	; the exec mask, clobbering SCC value before the branch. We probably
	; have to stop relying on being able to flip and restore the exec
	; mask, and always require a free SGPR for saving exec.

	; CHECK: * Bad machine code: Using an undefined physical register *
	; CHECK-NEXT: - function: kernel0
	; CHECK-NEXT: - basic block: %bb.0
	; CHECK-NEXT: - instruction: S_CBRANCH_SCC1 %bb.2, implicit killed $scc
	; CHECK-NEXT: - operand 1: implicit killed $scc
	define amdgpu_kernel void @kernel0(i32 addrspace(1)* %out, i32 %in) #1 {			define amdgpu_kernel void @kernel0(i32 addrspace(1)* %out, i32 %in) #1 {
				; CHECK-LABEL: kernel0:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[2:3]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ; implicit-def: $vgpr0
				; CHECK-NEXT: s_load_dword s0, s[4:5], 0x8
				; CHECK-NEXT: v_writelane_b32 v0, s2, 0
				; CHECK-NEXT: v_writelane_b32 v0, s3, 1
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[4:7]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_writelane_b32 v0, s4, 2
				; CHECK-NEXT: v_writelane_b32 v0, s5, 3
				; CHECK-NEXT: v_writelane_b32 v0, s6, 4
				; CHECK-NEXT: v_writelane_b32 v0, s7, 5
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[4:11]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_writelane_b32 v0, s4, 6
				; CHECK-NEXT: v_writelane_b32 v0, s5, 7
				; CHECK-NEXT: v_writelane_b32 v0, s6, 8
				; CHECK-NEXT: v_writelane_b32 v0, s7, 9
				; CHECK-NEXT: v_writelane_b32 v0, s8, 10
				; CHECK-NEXT: v_writelane_b32 v0, s9, 11
				; CHECK-NEXT: v_writelane_b32 v0, s10, 12
				; CHECK-NEXT: v_writelane_b32 v0, s11, 13
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[4:19]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_writelane_b32 v0, s4, 14
				; CHECK-NEXT: v_writelane_b32 v0, s5, 15
				; CHECK-NEXT: v_writelane_b32 v0, s6, 16
				; CHECK-NEXT: v_writelane_b32 v0, s7, 17
				; CHECK-NEXT: v_writelane_b32 v0, s8, 18
				; CHECK-NEXT: v_writelane_b32 v0, s9, 19
				; CHECK-NEXT: v_writelane_b32 v0, s10, 20
				; CHECK-NEXT: v_writelane_b32 v0, s11, 21
				; CHECK-NEXT: v_writelane_b32 v0, s12, 22
				; CHECK-NEXT: v_writelane_b32 v0, s13, 23
				; CHECK-NEXT: v_writelane_b32 v0, s14, 24
				; CHECK-NEXT: v_writelane_b32 v0, s15, 25
				; CHECK-NEXT: v_writelane_b32 v0, s16, 26
				; CHECK-NEXT: v_writelane_b32 v0, s17, 27
				; CHECK-NEXT: v_writelane_b32 v0, s18, 28
				; CHECK-NEXT: v_writelane_b32 v0, s19, 29
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[2:3]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_writelane_b32 v0, s2, 30
				; CHECK-NEXT: v_writelane_b32 v0, s3, 31
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[4:7]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_writelane_b32 v0, s4, 32
				; CHECK-NEXT: v_writelane_b32 v0, s5, 33
				; CHECK-NEXT: v_writelane_b32 v0, s6, 34
				; CHECK-NEXT: v_writelane_b32 v0, s7, 35
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[4:11]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_writelane_b32 v0, s4, 36
				; CHECK-NEXT: v_writelane_b32 v0, s5, 37
				; CHECK-NEXT: v_writelane_b32 v0, s6, 38
				; CHECK-NEXT: v_writelane_b32 v0, s7, 39
				; CHECK-NEXT: v_writelane_b32 v0, s8, 40
				; CHECK-NEXT: v_writelane_b32 v0, s9, 41
				; CHECK-NEXT: v_writelane_b32 v0, s10, 42
				; CHECK-NEXT: v_writelane_b32 v0, s11, 43
				; CHECK-NEXT: s_waitcnt lgkmcnt(0)
				; CHECK-NEXT: s_cmp_lg_u32 s0, 0
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[16:31]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[52:53]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[48:51]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[36:43]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[0:15]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_writelane_b32 v0, s0, 44
				; CHECK-NEXT: v_writelane_b32 v0, s1, 45
				; CHECK-NEXT: v_writelane_b32 v0, s2, 46
				; CHECK-NEXT: v_writelane_b32 v0, s3, 47
				; CHECK-NEXT: v_writelane_b32 v0, s4, 48
				; CHECK-NEXT: v_writelane_b32 v0, s5, 49
				; CHECK-NEXT: v_writelane_b32 v0, s6, 50
				; CHECK-NEXT: v_writelane_b32 v0, s7, 51
				; CHECK-NEXT: v_writelane_b32 v0, s8, 52
				; CHECK-NEXT: v_writelane_b32 v0, s9, 53
				; CHECK-NEXT: v_writelane_b32 v0, s10, 54
				; CHECK-NEXT: v_writelane_b32 v0, s11, 55
				; CHECK-NEXT: v_writelane_b32 v0, s12, 56
				; CHECK-NEXT: v_writelane_b32 v0, s13, 57
				; CHECK-NEXT: v_writelane_b32 v0, s14, 58
				; CHECK-NEXT: v_writelane_b32 v0, s15, 59
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[34:35]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[44:47]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[0:7]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ; implicit-def: $vgpr1
				; CHECK-NEXT: v_writelane_b32 v0, s0, 60
				; CHECK-NEXT: v_writelane_b32 v1, s4, 0
				; CHECK-NEXT: v_writelane_b32 v0, s1, 61
				; CHECK-NEXT: v_writelane_b32 v1, s5, 1
				; CHECK-NEXT: v_writelane_b32 v0, s2, 62
				; CHECK-NEXT: v_writelane_b32 v1, s6, 2
				; CHECK-NEXT: v_writelane_b32 v0, s3, 63
				; CHECK-NEXT: v_writelane_b32 v1, s7, 3
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[0:15]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_writelane_b32 v1, s0, 4
				; CHECK-NEXT: v_writelane_b32 v1, s1, 5
				; CHECK-NEXT: v_writelane_b32 v1, s2, 6
				; CHECK-NEXT: v_writelane_b32 v1, s3, 7
				; CHECK-NEXT: v_writelane_b32 v1, s4, 8
				; CHECK-NEXT: v_writelane_b32 v1, s5, 9
				; CHECK-NEXT: v_writelane_b32 v1, s6, 10
				; CHECK-NEXT: v_writelane_b32 v1, s7, 11
				; CHECK-NEXT: v_writelane_b32 v1, s8, 12
				; CHECK-NEXT: v_writelane_b32 v1, s9, 13
				; CHECK-NEXT: v_writelane_b32 v1, s10, 14
				; CHECK-NEXT: v_writelane_b32 v1, s11, 15
				; CHECK-NEXT: v_writelane_b32 v1, s12, 16
				; CHECK-NEXT: v_writelane_b32 v1, s13, 17
				; CHECK-NEXT: v_writelane_b32 v1, s14, 18
				; CHECK-NEXT: v_writelane_b32 v1, s15, 19
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[54:55]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[0:3]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_writelane_b32 v1, s0, 20
				; CHECK-NEXT: v_writelane_b32 v1, s1, 21
				; CHECK-NEXT: v_writelane_b32 v1, s2, 22
				; CHECK-NEXT: v_writelane_b32 v1, s3, 23
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[0:7]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_writelane_b32 v1, s0, 24
				; CHECK-NEXT: v_writelane_b32 v1, s1, 25
				; CHECK-NEXT: v_writelane_b32 v1, s2, 26
				; CHECK-NEXT: v_writelane_b32 v1, s3, 27
				; CHECK-NEXT: v_writelane_b32 v1, s4, 28
				; CHECK-NEXT: v_writelane_b32 v1, s5, 29
				; CHECK-NEXT: v_writelane_b32 v1, s6, 30
				; CHECK-NEXT: v_writelane_b32 v1, s7, 31
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[0:15]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_writelane_b32 v1, s0, 32
				; CHECK-NEXT: v_writelane_b32 v1, s1, 33
				; CHECK-NEXT: v_writelane_b32 v1, s2, 34
				; CHECK-NEXT: v_writelane_b32 v1, s3, 35
				; CHECK-NEXT: v_writelane_b32 v1, s4, 36
				; CHECK-NEXT: v_writelane_b32 v1, s5, 37
				; CHECK-NEXT: v_writelane_b32 v1, s6, 38
				; CHECK-NEXT: v_writelane_b32 v1, s7, 39
				; CHECK-NEXT: v_writelane_b32 v1, s8, 40
				; CHECK-NEXT: v_writelane_b32 v1, s9, 41
				; CHECK-NEXT: v_writelane_b32 v1, s10, 42
				; CHECK-NEXT: v_writelane_b32 v1, s11, 43
				; CHECK-NEXT: v_writelane_b32 v1, s12, 44
				; CHECK-NEXT: v_writelane_b32 v1, s13, 45
				; CHECK-NEXT: v_writelane_b32 v1, s14, 46
				; CHECK-NEXT: v_writelane_b32 v1, s15, 47
				; CHECK-NEXT: s_cbranch_scc0 .LBB0_2
				; CHECK-NEXT: ; %bb.1: ; %ret
				; CHECK-NEXT: s_endpgm
				; CHECK-NEXT: .LBB0_2: ; %bb0
				; CHECK-NEXT: v_readlane_b32 s0, v0, 0
				; CHECK-NEXT: v_readlane_b32 s1, v0, 1
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[0:1]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_readlane_b32 s0, v0, 2
				; CHECK-NEXT: v_readlane_b32 s1, v0, 3
				; CHECK-NEXT: v_readlane_b32 s2, v0, 4
				; CHECK-NEXT: v_readlane_b32 s3, v0, 5
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[0:3]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_readlane_b32 s0, v0, 6
				; CHECK-NEXT: v_readlane_b32 s1, v0, 7
				; CHECK-NEXT: v_readlane_b32 s2, v0, 8
				; CHECK-NEXT: v_readlane_b32 s3, v0, 9
				; CHECK-NEXT: v_readlane_b32 s4, v0, 10
				; CHECK-NEXT: v_readlane_b32 s5, v0, 11
				; CHECK-NEXT: v_readlane_b32 s6, v0, 12
				; CHECK-NEXT: v_readlane_b32 s7, v0, 13
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[0:7]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_readlane_b32 s0, v0, 14
				; CHECK-NEXT: v_readlane_b32 s1, v0, 15
				; CHECK-NEXT: v_readlane_b32 s2, v0, 16
				; CHECK-NEXT: v_readlane_b32 s3, v0, 17
				; CHECK-NEXT: v_readlane_b32 s4, v0, 18
				; CHECK-NEXT: v_readlane_b32 s5, v0, 19
				; CHECK-NEXT: v_readlane_b32 s6, v0, 20
				; CHECK-NEXT: v_readlane_b32 s7, v0, 21
				; CHECK-NEXT: v_readlane_b32 s8, v0, 22
				; CHECK-NEXT: v_readlane_b32 s9, v0, 23
				; CHECK-NEXT: v_readlane_b32 s10, v0, 24
				; CHECK-NEXT: v_readlane_b32 s11, v0, 25
				; CHECK-NEXT: v_readlane_b32 s12, v0, 26
				; CHECK-NEXT: v_readlane_b32 s13, v0, 27
				; CHECK-NEXT: v_readlane_b32 s14, v0, 28
				; CHECK-NEXT: v_readlane_b32 s15, v0, 29
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[0:15]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_readlane_b32 s0, v0, 30
				; CHECK-NEXT: v_readlane_b32 s1, v0, 31
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[0:1]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_readlane_b32 s0, v0, 32
				; CHECK-NEXT: v_readlane_b32 s1, v0, 33
				; CHECK-NEXT: v_readlane_b32 s2, v0, 34
				; CHECK-NEXT: v_readlane_b32 s3, v0, 35
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[0:3]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_readlane_b32 s0, v0, 36
				; CHECK-NEXT: v_readlane_b32 s1, v0, 37
				; CHECK-NEXT: v_readlane_b32 s2, v0, 38
				; CHECK-NEXT: v_readlane_b32 s3, v0, 39
				; CHECK-NEXT: v_readlane_b32 s4, v0, 40
				; CHECK-NEXT: v_readlane_b32 s5, v0, 41
				; CHECK-NEXT: v_readlane_b32 s6, v0, 42
				; CHECK-NEXT: v_readlane_b32 s7, v0, 43
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[0:7]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_readlane_b32 s0, v0, 44
				; CHECK-NEXT: v_readlane_b32 s1, v0, 45
				; CHECK-NEXT: v_readlane_b32 s2, v0, 46
				; CHECK-NEXT: v_readlane_b32 s3, v0, 47
				; CHECK-NEXT: v_readlane_b32 s4, v0, 48
				; CHECK-NEXT: v_readlane_b32 s5, v0, 49
				; CHECK-NEXT: v_readlane_b32 s6, v0, 50
				; CHECK-NEXT: v_readlane_b32 s7, v0, 51
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[16:31]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[52:53]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[48:51]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[36:43]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_readlane_b32 s8, v0, 52
				; CHECK-NEXT: v_readlane_b32 s9, v0, 53
				; CHECK-NEXT: v_readlane_b32 s10, v0, 54
				; CHECK-NEXT: v_readlane_b32 s11, v0, 55
				; CHECK-NEXT: v_readlane_b32 s12, v0, 56
				; CHECK-NEXT: v_readlane_b32 s13, v0, 57
				; CHECK-NEXT: v_readlane_b32 s14, v0, 58
				; CHECK-NEXT: v_readlane_b32 s15, v0, 59
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[0:15]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_readlane_b32 s0, v0, 60
				; CHECK-NEXT: v_readlane_b32 s1, v0, 61
				; CHECK-NEXT: v_readlane_b32 s2, v0, 62
				; CHECK-NEXT: v_readlane_b32 s3, v0, 63
				; CHECK-NEXT: v_readlane_b32 s4, v1, 0
				; CHECK-NEXT: v_readlane_b32 s5, v1, 1
				; CHECK-NEXT: v_readlane_b32 s6, v1, 2
				; CHECK-NEXT: v_readlane_b32 s7, v1, 3
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[34:35]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[44:47]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[0:7]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_readlane_b32 s0, v1, 4
				; CHECK-NEXT: v_readlane_b32 s1, v1, 5
				; CHECK-NEXT: v_readlane_b32 s2, v1, 6
				; CHECK-NEXT: v_readlane_b32 s3, v1, 7
				; CHECK-NEXT: v_readlane_b32 s4, v1, 8
				; CHECK-NEXT: v_readlane_b32 s5, v1, 9
				; CHECK-NEXT: v_readlane_b32 s6, v1, 10
				; CHECK-NEXT: v_readlane_b32 s7, v1, 11
				; CHECK-NEXT: v_readlane_b32 s8, v1, 12
				; CHECK-NEXT: v_readlane_b32 s9, v1, 13
				; CHECK-NEXT: v_readlane_b32 s10, v1, 14
				; CHECK-NEXT: v_readlane_b32 s11, v1, 15
				; CHECK-NEXT: v_readlane_b32 s12, v1, 16
				; CHECK-NEXT: v_readlane_b32 s13, v1, 17
				; CHECK-NEXT: v_readlane_b32 s14, v1, 18
				; CHECK-NEXT: v_readlane_b32 s15, v1, 19
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[0:15]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_readlane_b32 s0, v1, 20
				; CHECK-NEXT: v_readlane_b32 s1, v1, 21
				; CHECK-NEXT: v_readlane_b32 s2, v1, 22
				; CHECK-NEXT: v_readlane_b32 s3, v1, 23
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[54:55]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[0:3]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_readlane_b32 s0, v1, 24
				; CHECK-NEXT: v_readlane_b32 s1, v1, 25
				; CHECK-NEXT: v_readlane_b32 s2, v1, 26
				; CHECK-NEXT: v_readlane_b32 s3, v1, 27
				; CHECK-NEXT: v_readlane_b32 s4, v1, 28
				; CHECK-NEXT: v_readlane_b32 s5, v1, 29
				; CHECK-NEXT: v_readlane_b32 s6, v1, 30
				; CHECK-NEXT: v_readlane_b32 s7, v1, 31
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[0:7]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_readlane_b32 s0, v1, 32
				; CHECK-NEXT: v_readlane_b32 s1, v1, 33
				; CHECK-NEXT: v_readlane_b32 s2, v1, 34
				; CHECK-NEXT: v_readlane_b32 s3, v1, 35
				; CHECK-NEXT: v_readlane_b32 s4, v1, 36
				; CHECK-NEXT: v_readlane_b32 s5, v1, 37
				; CHECK-NEXT: v_readlane_b32 s6, v1, 38
				; CHECK-NEXT: v_readlane_b32 s7, v1, 39
				; CHECK-NEXT: v_readlane_b32 s8, v1, 40
				; CHECK-NEXT: v_readlane_b32 s9, v1, 41
				; CHECK-NEXT: v_readlane_b32 s10, v1, 42
				; CHECK-NEXT: v_readlane_b32 s11, v1, 43
				; CHECK-NEXT: v_readlane_b32 s12, v1, 44
				; CHECK-NEXT: v_readlane_b32 s13, v1, 45
				; CHECK-NEXT: v_readlane_b32 s14, v1, 46
				; CHECK-NEXT: v_readlane_b32 s15, v1, 47
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[0:15]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: s_endpgm
	call void asm sideeffect "", "~{v[0:7]}" () #0			call void asm sideeffect "", "~{v[0:7]}" () #0
	call void asm sideeffect "", "~{v[8:15]}" () #0			call void asm sideeffect "", "~{v[8:15]}" () #0
	call void asm sideeffect "", "~{v[16:19]}"() #0			call void asm sideeffect "", "~{v[16:19]}"() #0
	call void asm sideeffect "", "~{v[20:21]}"() #0			call void asm sideeffect "", "~{v[20:21]}"() #0
	call void asm sideeffect "", "~{v22}"() #0			call void asm sideeffect "", "~{v22}"() #0

	%val0 = call <2 x i32> asm sideeffect "; def $0", "=s" () #0			%val0 = call <2 x i32> asm sideeffect "; def $0", "=s" () #0
	%val1 = call <4 x i32> asm sideeffect "; def $0", "=s" () #0			%val1 = call <4 x i32> asm sideeffect "; def $0", "=s" () #0
	▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/sgpr-spill-dead-frame-in-dbg-value.mir

# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 -amdgpu-spill-sgpr-to-vgpr=true -verify-machineinstrs -run-pass=si-lower-sgpr-spills,prologepilog -o - %s \| FileCheck %s		# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
		# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 -amdgpu-spill-sgpr-to-vgpr=true -verify-machineinstrs -run-pass=si-lower-sgpr-spills -o - %s \| FileCheck -check-prefix=SGPR_SPILL %s
		# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 -amdgpu-spill-sgpr-to-vgpr=true -verify-machineinstrs --start-before=si-lower-sgpr-spills --stop-after=prologepilog -o - %s \| FileCheck -check-prefix=PEI %s

# After handling the SGPR spill to VGPR in SILowerSGPRSpills pass, replace the dead frame index in the DBG_VALUE instruction with reg 0.		# After handling the SGPR spill to VGPR in SILowerSGPRSpills pass, replace the dead frame index in the DBG_VALUE instruction with reg 0.
# Otherwise, the test would crash during PEI while trying to replace the dead frame index.		# Otherwise, the test would crash during PEI while trying to replace the dead frame index.
--- \|		--- \|
define amdgpu_kernel void @test() { ret void }		define amdgpu_kernel void @test() { ret void }

!0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !4, producer: "llvm", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !4, retainedTypes: !4)		!0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !4, producer: "llvm", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !4, retainedTypes: !4)
!1 = !DILocalVariable(name: "a", scope: !2, file: !4, line: 126, type: !6)		!1 = !DILocalVariable(name: "a", scope: !2, file: !4, line: 126, type: !6)
Show All 24 Lines	machineFunctionInfo:
hasSpilledSGPRs: true		hasSpilledSGPRs: true
argumentInfo:		argumentInfo:
privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }		privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
dispatchPtr: { reg: '$sgpr4_sgpr5' }		dispatchPtr: { reg: '$sgpr4_sgpr5' }
kernargSegmentPtr: { reg: '$sgpr6_sgpr7' }		kernargSegmentPtr: { reg: '$sgpr6_sgpr7' }
workGroupIDX: { reg: '$sgpr8' }		workGroupIDX: { reg: '$sgpr8' }
privateSegmentWaveByteOffset: { reg: '$sgpr9' }		privateSegmentWaveByteOffset: { reg: '$sgpr9' }
body: \|		body: \|
; CHECK-LABEL: name: test		; SGPR_SPILL-LABEL: name: test
; CHECK: bb.0:		; SGPR_SPILL: bb.0:
; CHECK: $vgpr0 = V_WRITELANE_B32 killed $sgpr10, 0, $vgpr0		; SGPR_SPILL: [[VGPR:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
; CHECK: DBG_VALUE $noreg, 0		; SGPR_SPILL: [[VGPR]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr10, 0, [[VGPR]]
; CHECK: bb.1:		; SGPR_SPILL: DBG_VALUE $noreg, 0
; CHECK: $sgpr10 = V_READLANE_B32 $vgpr0, 0		; SGPR_SPILL: bb.1:
; CHECK: S_ENDPGM 0		; SGPR_SPILL: $sgpr10 = V_READLANE_B32 [[VGPR]], 0
		; SGPR_SPILL: S_ENDPGM 0
		; PEI-LABEL: name: test
		; PEI: bb.0:
		; PEI: renamable $[[VGPR:vgpr[0-9]+]] = IMPLICIT_DEF
		; PEI: renamable $[[VGPR]] = V_WRITELANE_B32 killed $sgpr10, 0, killed $[[VGPR]]
		; PEI: bb.1:
		; PEI: $sgpr10 = V_READLANE_B32 killed $[[VGPR]], 0
		; PEI: S_ENDPGM 0
bb.0:		bb.0:
renamable $sgpr10 = IMPLICIT_DEF		renamable $sgpr10 = IMPLICIT_DEF
SI_SPILL_S32_SAVE killed $sgpr10, %stack.0, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32		SI_SPILL_S32_SAVE killed $sgpr10, %stack.0, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32
DBG_VALUE %stack.0, 0, !1, !8, debug-location !9		DBG_VALUE %stack.0, 0, !1, !8, debug-location !9

bb.1:		bb.1:
renamable $sgpr10 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32		renamable $sgpr10 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32
S_ENDPGM 0		S_ENDPGM 0

llvm/test/CodeGen/AMDGPU/sgpr-spill-fi-skip-processing-stack-arg-dbg-value.mir

# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 -amdgpu-spill-sgpr-to-vgpr=true -verify-machineinstrs -run-pass=si-lower-sgpr-spills,prologepilog -o - %s \| FileCheck %s		# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 -amdgpu-spill-sgpr-to-vgpr=true -verify-machineinstrs -run-pass=si-lower-sgpr-spills -o - %s \| FileCheck %s

# After handling the SGPR spill to VGPR in SILowerSGPRSpills pass, we replace the dead frame index in the DBG_VALUE instruction with reg 0.		# After handling the SGPR spill to VGPR in SILowerSGPRSpills pass, we replace the dead frame index in the DBG_VALUE instruction with reg 0.
# Skip looking for frame indices in the debug value instruction for incoming arguments passed via stack. The test would crash otherwise.		# Skip looking for frame indices in the debug value instruction for incoming arguments passed via stack. The test would crash otherwise.
# It is safe to skip the fixed stack objects as they will never become the spill objects.		# It is safe to skip the fixed stack objects as they will never become the spill objects.

--- \|		--- \|
define amdgpu_kernel void @test() { ret void }		define amdgpu_kernel void @test() { ret void }

Show All 30 Lines	argumentInfo:
privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }		privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
dispatchPtr: { reg: '$sgpr4_sgpr5' }		dispatchPtr: { reg: '$sgpr4_sgpr5' }
kernargSegmentPtr: { reg: '$sgpr6_sgpr7' }		kernargSegmentPtr: { reg: '$sgpr6_sgpr7' }
workGroupIDX: { reg: '$sgpr8' }		workGroupIDX: { reg: '$sgpr8' }
privateSegmentWaveByteOffset: { reg: '$sgpr9' }		privateSegmentWaveByteOffset: { reg: '$sgpr9' }
body: \|		body: \|
; CHECK-LABEL: name: test		; CHECK-LABEL: name: test
; CHECK: bb.0:		; CHECK: bb.0:
; CHECK: DBG_VALUE $noreg, 0		; CHECK: DBG_VALUE
bb.0:		bb.0:
renamable $sgpr10 = IMPLICIT_DEF		renamable $sgpr10 = IMPLICIT_DEF
SI_SPILL_S32_SAVE killed $sgpr10, %stack.0, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32		SI_SPILL_S32_SAVE killed $sgpr10, %stack.0, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32
DBG_VALUE %fixed-stack.0, 0, !1, !8, debug-location !9		DBG_VALUE %fixed-stack.0, 0, !1, !8, debug-location !9

bb.1:		bb.1:
renamable $sgpr10 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32		renamable $sgpr10 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32
S_ENDPGM 0		S_ENDPGM 0

llvm/test/CodeGen/AMDGPU/sgpr-spill-no-vgprs.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -O0 -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -O0 -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s

	; The first 64 SGPR spills can go to a VGPR, but there isn't a second			; This test was originally written when SGPRs are spilled directly to physical VGPRs and
	; so some spills must be to memory. The last 16 element spill runs out of lanes at the 15th element.			; stressed a case when there wasn't enough VGPRs to accommodate all spills.
				; When we started spilling them into virtual VGPR lanes, we always succeed in doing so.
				; The regalloc pass later takes care of allocating VGPRs to these virtual registers.

	define amdgpu_kernel void @partial_no_vgprs_last_sgpr_spill(i32 addrspace(1)* %out, i32 %in) #1 {			define amdgpu_kernel void @partial_no_vgprs_last_sgpr_spill(i32 addrspace(1)* %out, i32 %in) #1 {
	; GCN-LABEL: partial_no_vgprs_last_sgpr_spill:			; GCN-LABEL: partial_no_vgprs_last_sgpr_spill:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_add_u32 s0, s0, s7			; GCN-NEXT: s_add_u32 s0, s0, s7
	; GCN-NEXT: s_addc_u32 s1, s1, 0			; GCN-NEXT: s_addc_u32 s1, s1, 0
	; GCN-NEXT: s_load_dword s4, s[4:5], 0x2			; GCN-NEXT: s_load_dword s4, s[4:5], 0x2
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: ; def s[8:23]			; GCN-NEXT: ; def s[8:23]
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND
	; GCN-NEXT: v_writelane_b32 v23, s8, 0			; GCN-NEXT: ; implicit-def: $vgpr0
	; GCN-NEXT: v_writelane_b32 v23, s9, 1			; GCN-NEXT: v_writelane_b32 v0, s8, 0
	; GCN-NEXT: v_writelane_b32 v23, s10, 2			; GCN-NEXT: v_writelane_b32 v0, s9, 1
	; GCN-NEXT: v_writelane_b32 v23, s11, 3			; GCN-NEXT: v_writelane_b32 v0, s10, 2
	; GCN-NEXT: v_writelane_b32 v23, s12, 4			; GCN-NEXT: v_writelane_b32 v0, s11, 3
	; GCN-NEXT: v_writelane_b32 v23, s13, 5			; GCN-NEXT: v_writelane_b32 v0, s12, 4
	; GCN-NEXT: v_writelane_b32 v23, s14, 6			; GCN-NEXT: v_writelane_b32 v0, s13, 5
	; GCN-NEXT: v_writelane_b32 v23, s15, 7			; GCN-NEXT: v_writelane_b32 v0, s14, 6
	; GCN-NEXT: v_writelane_b32 v23, s16, 8			; GCN-NEXT: v_writelane_b32 v0, s15, 7
	; GCN-NEXT: v_writelane_b32 v23, s17, 9			; GCN-NEXT: v_writelane_b32 v0, s16, 8
	; GCN-NEXT: v_writelane_b32 v23, s18, 10			; GCN-NEXT: v_writelane_b32 v0, s17, 9
	; GCN-NEXT: v_writelane_b32 v23, s19, 11			; GCN-NEXT: v_writelane_b32 v0, s18, 10
	; GCN-NEXT: v_writelane_b32 v23, s20, 12			; GCN-NEXT: v_writelane_b32 v0, s19, 11
	; GCN-NEXT: v_writelane_b32 v23, s21, 13			; GCN-NEXT: v_writelane_b32 v0, s20, 12
	; GCN-NEXT: v_writelane_b32 v23, s22, 14			; GCN-NEXT: v_writelane_b32 v0, s21, 13
	; GCN-NEXT: v_writelane_b32 v23, s23, 15			; GCN-NEXT: v_writelane_b32 v0, s22, 14
				; GCN-NEXT: v_writelane_b32 v0, s23, 15
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: ; def s[8:23]			; GCN-NEXT: ; def s[8:23]
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND
	; GCN-NEXT: v_writelane_b32 v23, s8, 16			; GCN-NEXT: v_writelane_b32 v0, s8, 16
	; GCN-NEXT: v_writelane_b32 v23, s9, 17			; GCN-NEXT: v_writelane_b32 v0, s9, 17
	; GCN-NEXT: v_writelane_b32 v23, s10, 18			; GCN-NEXT: v_writelane_b32 v0, s10, 18
	; GCN-NEXT: v_writelane_b32 v23, s11, 19			; GCN-NEXT: v_writelane_b32 v0, s11, 19
	; GCN-NEXT: v_writelane_b32 v23, s12, 20			; GCN-NEXT: v_writelane_b32 v0, s12, 20
	; GCN-NEXT: v_writelane_b32 v23, s13, 21			; GCN-NEXT: v_writelane_b32 v0, s13, 21
	; GCN-NEXT: v_writelane_b32 v23, s14, 22			; GCN-NEXT: v_writelane_b32 v0, s14, 22
	; GCN-NEXT: v_writelane_b32 v23, s15, 23			; GCN-NEXT: v_writelane_b32 v0, s15, 23
	; GCN-NEXT: v_writelane_b32 v23, s16, 24			; GCN-NEXT: v_writelane_b32 v0, s16, 24
	; GCN-NEXT: v_writelane_b32 v23, s17, 25			; GCN-NEXT: v_writelane_b32 v0, s17, 25
	; GCN-NEXT: v_writelane_b32 v23, s18, 26			; GCN-NEXT: v_writelane_b32 v0, s18, 26
	; GCN-NEXT: v_writelane_b32 v23, s19, 27			; GCN-NEXT: v_writelane_b32 v0, s19, 27
	; GCN-NEXT: v_writelane_b32 v23, s20, 28			; GCN-NEXT: v_writelane_b32 v0, s20, 28
	; GCN-NEXT: v_writelane_b32 v23, s21, 29			; GCN-NEXT: v_writelane_b32 v0, s21, 29
	; GCN-NEXT: v_writelane_b32 v23, s22, 30			; GCN-NEXT: v_writelane_b32 v0, s22, 30
	; GCN-NEXT: v_writelane_b32 v23, s23, 31			; GCN-NEXT: v_writelane_b32 v0, s23, 31
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: ; def s[8:23]			; GCN-NEXT: ; def s[8:23]
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND
	; GCN-NEXT: v_writelane_b32 v23, s8, 32			; GCN-NEXT: v_writelane_b32 v0, s8, 32
	; GCN-NEXT: v_writelane_b32 v23, s9, 33			; GCN-NEXT: v_writelane_b32 v0, s9, 33
	; GCN-NEXT: v_writelane_b32 v23, s10, 34			; GCN-NEXT: v_writelane_b32 v0, s10, 34
	; GCN-NEXT: v_writelane_b32 v23, s11, 35			; GCN-NEXT: v_writelane_b32 v0, s11, 35
	; GCN-NEXT: v_writelane_b32 v23, s12, 36			; GCN-NEXT: v_writelane_b32 v0, s12, 36
	; GCN-NEXT: v_writelane_b32 v23, s13, 37			; GCN-NEXT: v_writelane_b32 v0, s13, 37
	; GCN-NEXT: v_writelane_b32 v23, s14, 38			; GCN-NEXT: v_writelane_b32 v0, s14, 38
	; GCN-NEXT: v_writelane_b32 v23, s15, 39			; GCN-NEXT: v_writelane_b32 v0, s15, 39
	; GCN-NEXT: v_writelane_b32 v23, s16, 40			; GCN-NEXT: v_writelane_b32 v0, s16, 40
	; GCN-NEXT: v_writelane_b32 v23, s17, 41			; GCN-NEXT: v_writelane_b32 v0, s17, 41
	; GCN-NEXT: v_writelane_b32 v23, s18, 42			; GCN-NEXT: v_writelane_b32 v0, s18, 42
	; GCN-NEXT: v_writelane_b32 v23, s19, 43			; GCN-NEXT: v_writelane_b32 v0, s19, 43
	; GCN-NEXT: v_writelane_b32 v23, s20, 44			; GCN-NEXT: v_writelane_b32 v0, s20, 44
	; GCN-NEXT: v_writelane_b32 v23, s21, 45			; GCN-NEXT: v_writelane_b32 v0, s21, 45
	; GCN-NEXT: v_writelane_b32 v23, s22, 46			; GCN-NEXT: v_writelane_b32 v0, s22, 46
	; GCN-NEXT: v_writelane_b32 v23, s23, 47			; GCN-NEXT: v_writelane_b32 v0, s23, 47
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: ; def s[8:23]			; GCN-NEXT: ; def s[8:23]
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND
	; GCN-NEXT: v_writelane_b32 v23, s8, 48			; GCN-NEXT: v_writelane_b32 v0, s8, 48
	; GCN-NEXT: v_writelane_b32 v23, s9, 49			; GCN-NEXT: v_writelane_b32 v0, s9, 49
	; GCN-NEXT: v_writelane_b32 v23, s10, 50			; GCN-NEXT: v_writelane_b32 v0, s10, 50
	; GCN-NEXT: v_writelane_b32 v23, s11, 51			; GCN-NEXT: v_writelane_b32 v0, s11, 51
	; GCN-NEXT: v_writelane_b32 v23, s12, 52			; GCN-NEXT: v_writelane_b32 v0, s12, 52
	; GCN-NEXT: v_writelane_b32 v23, s13, 53			; GCN-NEXT: v_writelane_b32 v0, s13, 53
	; GCN-NEXT: v_writelane_b32 v23, s14, 54			; GCN-NEXT: v_writelane_b32 v0, s14, 54
	; GCN-NEXT: v_writelane_b32 v23, s15, 55			; GCN-NEXT: v_writelane_b32 v0, s15, 55
	; GCN-NEXT: v_writelane_b32 v23, s16, 56			; GCN-NEXT: v_writelane_b32 v0, s16, 56
	; GCN-NEXT: v_writelane_b32 v23, s17, 57			; GCN-NEXT: v_writelane_b32 v0, s17, 57
	; GCN-NEXT: v_writelane_b32 v23, s18, 58			; GCN-NEXT: v_writelane_b32 v0, s18, 58
	; GCN-NEXT: v_writelane_b32 v23, s19, 59			; GCN-NEXT: v_writelane_b32 v0, s19, 59
	; GCN-NEXT: v_writelane_b32 v23, s20, 60			; GCN-NEXT: v_writelane_b32 v0, s20, 60
	; GCN-NEXT: v_writelane_b32 v23, s21, 61			; GCN-NEXT: v_writelane_b32 v0, s21, 61
	; GCN-NEXT: v_writelane_b32 v23, s22, 62			; GCN-NEXT: v_writelane_b32 v0, s22, 62
	; GCN-NEXT: v_writelane_b32 v23, s23, 63			; GCN-NEXT: v_writelane_b32 v0, s23, 63
				; GCN-NEXT: s_or_saveexec_b64 s[24:25], -1
				; GCN-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:8 ; 4-byte Folded Spill
				; GCN-NEXT: s_mov_b64 exec, s[24:25]
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: ; def s[6:7]			; GCN-NEXT: ; def s[6:7]
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND
	; GCN-NEXT: s_mov_b64 s[8:9], exec			; GCN-NEXT: ; implicit-def: $vgpr0
	; GCN-NEXT: s_mov_b64 exec, 3
	; GCN-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GCN-NEXT: v_writelane_b32 v0, s6, 0			; GCN-NEXT: v_writelane_b32 v0, s6, 0
	; GCN-NEXT: v_writelane_b32 v0, s7, 1			; GCN-NEXT: v_writelane_b32 v0, s7, 1
				; GCN-NEXT: s_or_saveexec_b64 s[24:25], -1
	; GCN-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:4 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_load_dword v0, off, s[0:3], 0			; GCN-NEXT: s_mov_b64 exec, s[24:25]
	; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_mov_b64 exec, s[8:9]
	; GCN-NEXT: s_mov_b32 s5, 0			; GCN-NEXT: s_mov_b32 s5, 0
	; GCN-NEXT: s_waitcnt lgkmcnt(0)			; GCN-NEXT: s_waitcnt lgkmcnt(0)
	; GCN-NEXT: s_cmp_lg_u32 s4, s5			; GCN-NEXT: s_cmp_lg_u32 s4, s5
	; GCN-NEXT: s_cbranch_scc1 .LBB0_2			; GCN-NEXT: s_cbranch_scc1 .LBB0_2
	; GCN-NEXT: ; %bb.1: ; %bb0			; GCN-NEXT: ; %bb.1: ; %bb0
	; GCN-NEXT: v_readlane_b32 s4, v23, 0			; GCN-NEXT: s_or_saveexec_b64 s[24:25], -1
	; GCN-NEXT: v_readlane_b32 s5, v23, 1			; GCN-NEXT: buffer_load_dword v0, off, s[0:3], 0 offset:4 ; 4-byte Folded Reload
	; GCN-NEXT: v_readlane_b32 s6, v23, 2			; GCN-NEXT: s_mov_b64 exec, s[24:25]
	; GCN-NEXT: v_readlane_b32 s7, v23, 3			; GCN-NEXT: s_or_saveexec_b64 s[24:25], -1
	; GCN-NEXT: v_readlane_b32 s8, v23, 4			; GCN-NEXT: buffer_load_dword v1, off, s[0:3], 0 offset:8 ; 4-byte Folded Reload
	; GCN-NEXT: v_readlane_b32 s9, v23, 5			; GCN-NEXT: s_mov_b64 exec, s[24:25]
	; GCN-NEXT: v_readlane_b32 s10, v23, 6			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: v_readlane_b32 s11, v23, 7			; GCN-NEXT: v_readlane_b32 s4, v1, 0
	; GCN-NEXT: v_readlane_b32 s12, v23, 8			; GCN-NEXT: v_readlane_b32 s5, v1, 1
	; GCN-NEXT: v_readlane_b32 s13, v23, 9			; GCN-NEXT: v_readlane_b32 s6, v1, 2
	; GCN-NEXT: v_readlane_b32 s14, v23, 10			; GCN-NEXT: v_readlane_b32 s7, v1, 3
	; GCN-NEXT: v_readlane_b32 s15, v23, 11			; GCN-NEXT: v_readlane_b32 s8, v1, 4
	; GCN-NEXT: v_readlane_b32 s16, v23, 12			; GCN-NEXT: v_readlane_b32 s9, v1, 5
	; GCN-NEXT: v_readlane_b32 s17, v23, 13			; GCN-NEXT: v_readlane_b32 s10, v1, 6
	; GCN-NEXT: v_readlane_b32 s18, v23, 14			; GCN-NEXT: v_readlane_b32 s11, v1, 7
	; GCN-NEXT: v_readlane_b32 s19, v23, 15			; GCN-NEXT: v_readlane_b32 s12, v1, 8
				; GCN-NEXT: v_readlane_b32 s13, v1, 9
				; GCN-NEXT: v_readlane_b32 s14, v1, 10
				; GCN-NEXT: v_readlane_b32 s15, v1, 11
				; GCN-NEXT: v_readlane_b32 s16, v1, 12
				; GCN-NEXT: v_readlane_b32 s17, v1, 13
				; GCN-NEXT: v_readlane_b32 s18, v1, 14
				; GCN-NEXT: v_readlane_b32 s19, v1, 15
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: ; use s[4:19]			; GCN-NEXT: ; use s[4:19]
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND
	; GCN-NEXT: v_readlane_b32 s4, v23, 16			; GCN-NEXT: v_readlane_b32 s4, v1, 16
	; GCN-NEXT: v_readlane_b32 s5, v23, 17			; GCN-NEXT: v_readlane_b32 s5, v1, 17
	; GCN-NEXT: v_readlane_b32 s6, v23, 18			; GCN-NEXT: v_readlane_b32 s6, v1, 18
	; GCN-NEXT: v_readlane_b32 s7, v23, 19			; GCN-NEXT: v_readlane_b32 s7, v1, 19
	; GCN-NEXT: v_readlane_b32 s8, v23, 20			; GCN-NEXT: v_readlane_b32 s8, v1, 20
	; GCN-NEXT: v_readlane_b32 s9, v23, 21			; GCN-NEXT: v_readlane_b32 s9, v1, 21
	; GCN-NEXT: v_readlane_b32 s10, v23, 22			; GCN-NEXT: v_readlane_b32 s10, v1, 22
	; GCN-NEXT: v_readlane_b32 s11, v23, 23			; GCN-NEXT: v_readlane_b32 s11, v1, 23
	; GCN-NEXT: v_readlane_b32 s12, v23, 24			; GCN-NEXT: v_readlane_b32 s12, v1, 24
	; GCN-NEXT: v_readlane_b32 s13, v23, 25			; GCN-NEXT: v_readlane_b32 s13, v1, 25
	; GCN-NEXT: v_readlane_b32 s14, v23, 26			; GCN-NEXT: v_readlane_b32 s14, v1, 26
	; GCN-NEXT: v_readlane_b32 s15, v23, 27			; GCN-NEXT: v_readlane_b32 s15, v1, 27
	; GCN-NEXT: v_readlane_b32 s16, v23, 28			; GCN-NEXT: v_readlane_b32 s16, v1, 28
	; GCN-NEXT: v_readlane_b32 s17, v23, 29			; GCN-NEXT: v_readlane_b32 s17, v1, 29
	; GCN-NEXT: v_readlane_b32 s18, v23, 30			; GCN-NEXT: v_readlane_b32 s18, v1, 30
	; GCN-NEXT: v_readlane_b32 s19, v23, 31			; GCN-NEXT: v_readlane_b32 s19, v1, 31
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: ; use s[4:19]			; GCN-NEXT: ; use s[4:19]
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND
	; GCN-NEXT: v_readlane_b32 s4, v23, 32			; GCN-NEXT: v_readlane_b32 s4, v1, 32
	; GCN-NEXT: v_readlane_b32 s5, v23, 33			; GCN-NEXT: v_readlane_b32 s5, v1, 33
	; GCN-NEXT: v_readlane_b32 s6, v23, 34			; GCN-NEXT: v_readlane_b32 s6, v1, 34
	; GCN-NEXT: v_readlane_b32 s7, v23, 35			; GCN-NEXT: v_readlane_b32 s7, v1, 35
	; GCN-NEXT: v_readlane_b32 s8, v23, 36			; GCN-NEXT: v_readlane_b32 s8, v1, 36
	; GCN-NEXT: v_readlane_b32 s9, v23, 37			; GCN-NEXT: v_readlane_b32 s9, v1, 37
	; GCN-NEXT: v_readlane_b32 s10, v23, 38			; GCN-NEXT: v_readlane_b32 s10, v1, 38
	; GCN-NEXT: v_readlane_b32 s11, v23, 39			; GCN-NEXT: v_readlane_b32 s11, v1, 39
	; GCN-NEXT: v_readlane_b32 s12, v23, 40			; GCN-NEXT: v_readlane_b32 s12, v1, 40
	; GCN-NEXT: v_readlane_b32 s13, v23, 41			; GCN-NEXT: v_readlane_b32 s13, v1, 41
	; GCN-NEXT: v_readlane_b32 s14, v23, 42			; GCN-NEXT: v_readlane_b32 s14, v1, 42
	; GCN-NEXT: v_readlane_b32 s15, v23, 43			; GCN-NEXT: v_readlane_b32 s15, v1, 43
	; GCN-NEXT: v_readlane_b32 s16, v23, 44			; GCN-NEXT: v_readlane_b32 s16, v1, 44
	; GCN-NEXT: v_readlane_b32 s17, v23, 45			; GCN-NEXT: v_readlane_b32 s17, v1, 45
	; GCN-NEXT: v_readlane_b32 s18, v23, 46			; GCN-NEXT: v_readlane_b32 s18, v1, 46
	; GCN-NEXT: v_readlane_b32 s19, v23, 47			; GCN-NEXT: v_readlane_b32 s19, v1, 47
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: ; use s[4:19]			; GCN-NEXT: ; use s[4:19]
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND
	; GCN-NEXT: v_readlane_b32 s8, v23, 48			; GCN-NEXT: v_readlane_b32 s8, v1, 48
	; GCN-NEXT: v_readlane_b32 s9, v23, 49			; GCN-NEXT: v_readlane_b32 s9, v1, 49
	; GCN-NEXT: v_readlane_b32 s10, v23, 50			; GCN-NEXT: v_readlane_b32 s10, v1, 50
	; GCN-NEXT: v_readlane_b32 s11, v23, 51			; GCN-NEXT: v_readlane_b32 s11, v1, 51
	; GCN-NEXT: v_readlane_b32 s12, v23, 52			; GCN-NEXT: v_readlane_b32 s12, v1, 52
	; GCN-NEXT: v_readlane_b32 s13, v23, 53			; GCN-NEXT: v_readlane_b32 s13, v1, 53
	; GCN-NEXT: v_readlane_b32 s14, v23, 54			; GCN-NEXT: v_readlane_b32 s14, v1, 54
	; GCN-NEXT: v_readlane_b32 s15, v23, 55			; GCN-NEXT: v_readlane_b32 s15, v1, 55
	; GCN-NEXT: v_readlane_b32 s16, v23, 56			; GCN-NEXT: v_readlane_b32 s16, v1, 56
	; GCN-NEXT: v_readlane_b32 s17, v23, 57			; GCN-NEXT: v_readlane_b32 s17, v1, 57
	; GCN-NEXT: v_readlane_b32 s18, v23, 58			; GCN-NEXT: v_readlane_b32 s18, v1, 58
	; GCN-NEXT: v_readlane_b32 s19, v23, 59			; GCN-NEXT: v_readlane_b32 s19, v1, 59
	; GCN-NEXT: v_readlane_b32 s20, v23, 60			; GCN-NEXT: v_readlane_b32 s20, v1, 60
	; GCN-NEXT: v_readlane_b32 s21, v23, 61			; GCN-NEXT: v_readlane_b32 s21, v1, 61
	; GCN-NEXT: v_readlane_b32 s22, v23, 62			; GCN-NEXT: v_readlane_b32 s22, v1, 62
	; GCN-NEXT: v_readlane_b32 s23, v23, 63			; GCN-NEXT: v_readlane_b32 s23, v1, 63
	; GCN-NEXT: s_mov_b64 s[6:7], exec
	; GCN-NEXT: s_mov_b64 exec, 3
	; GCN-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GCN-NEXT: buffer_load_dword v0, off, s[0:3], 0 offset:4 ; 4-byte Folded Reload
	; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: v_readlane_b32 s4, v0, 0			; GCN-NEXT: v_readlane_b32 s4, v0, 0
	; GCN-NEXT: v_readlane_b32 s5, v0, 1			; GCN-NEXT: v_readlane_b32 s5, v0, 1
	; GCN-NEXT: buffer_load_dword v0, off, s[0:3], 0
	; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: ; use s[8:23]			; GCN-NEXT: ; use s[8:23]
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: ; use s[4:5]			; GCN-NEXT: ; use s[4:5]
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND
	; GCN-NEXT: .LBB0_2: ; %ret			; GCN-NEXT: .LBB0_2: ; %ret
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	Show All 28 Lines

llvm/test/CodeGen/AMDGPU/sgpr-spill-partially-undef.mir

	Show All 14 Lines
	stack:			stack:
	- { id: 0, type: spill-slot, size: 8, alignment: 4, stack-id: sgpr-spill }			- { id: 0, type: spill-slot, size: 8, alignment: 4, stack-id: sgpr-spill }

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $sgpr4			liveins: $sgpr4

	; CHECK-LABEL: name: sgpr_spill_s64_undef_high32			; CHECK-LABEL: name: sgpr_spill_s64_undef_high32
	; CHECK: liveins: $sgpr4, $vgpr0			; CHECK: liveins: $sgpr4
	; CHECK-NEXT: {{ $}}			; CHECK-NEXT: {{ $}}
	; CHECK-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr4, 0, $vgpr0, implicit-def $sgpr4_sgpr5, implicit $sgpr4_sgpr5			; CHECK-NEXT: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
	; CHECK-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr5, 1, $vgpr0, implicit $sgpr4_sgpr5			; CHECK-NEXT: [[V_WRITELANE_B32_:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr4, 0, [[V_WRITELANE_B32_]], implicit-def $sgpr4_sgpr5, implicit $sgpr4_sgpr5
				; CHECK-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr5, 1, [[V_WRITELANE_B32_1]], implicit $sgpr4_sgpr5
	SI_SPILL_S64_SAVE renamable $sgpr4_sgpr5, %stack.0, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32 :: (store (s64) into %stack.0, align 4, addrspace 5)			SI_SPILL_S64_SAVE renamable $sgpr4_sgpr5, %stack.0, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32 :: (store (s64) into %stack.0, align 4, addrspace 5)

	...			...

	---			---
	name: sgpr_spill_s64_undef_low32			name: sgpr_spill_s64_undef_low32
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: true			isEntryFunction: true
	hasSpilledSGPRs: true			hasSpilledSGPRs: true
	scratchRSrcReg: '$sgpr96_sgpr97_sgpr98_sgpr99'			scratchRSrcReg: '$sgpr96_sgpr97_sgpr98_sgpr99'
	stackPtrOffsetReg: '$sgpr32'			stackPtrOffsetReg: '$sgpr32'

	stack:			stack:
	- { id: 0, type: spill-slot, size: 8, alignment: 4, stack-id: sgpr-spill }			- { id: 0, type: spill-slot, size: 8, alignment: 4, stack-id: sgpr-spill }

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $sgpr5			liveins: $sgpr5

	; CHECK-LABEL: name: sgpr_spill_s64_undef_low32			; CHECK-LABEL: name: sgpr_spill_s64_undef_low32
	; CHECK: liveins: $sgpr5, $vgpr0			; CHECK: liveins: $sgpr5
	; CHECK-NEXT: {{ $}}			; CHECK-NEXT: {{ $}}
	; CHECK-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr4, 0, $vgpr0, implicit-def $sgpr4_sgpr5, implicit $sgpr4_sgpr5			; CHECK-NEXT: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
	; CHECK-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr5, 1, $vgpr0, implicit $sgpr4_sgpr5			; CHECK-NEXT: [[V_WRITELANE_B32_:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr4, 0, [[V_WRITELANE_B32_]], implicit-def $sgpr4_sgpr5, implicit $sgpr4_sgpr5
				; CHECK-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr5, 1, [[V_WRITELANE_B32_1]], implicit $sgpr4_sgpr5
	SI_SPILL_S64_SAVE renamable $sgpr4_sgpr5, %stack.0, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32 :: (store (s64) into %stack.0, align 4, addrspace 5)			SI_SPILL_S64_SAVE renamable $sgpr4_sgpr5, %stack.0, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32 :: (store (s64) into %stack.0, align 4, addrspace 5)

	...			...

llvm/test/CodeGen/AMDGPU/sgpr-spill-update-only-slot-indexes.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -march=amdgcn -mcpu=gfx900 -sgpr-regalloc=fast -vgpr-regalloc=fast -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -march=amdgcn -mcpu=gfx900 -sgpr-regalloc=fast -vgpr-regalloc=fast -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s

	; Make sure there's no verifier error from improperly updated			; Make sure there's no verifier error from improperly updated
	; SlotIndexes if regalloc fast is manually used.			; SlotIndexes if regalloc fast is manually used.

	declare void @foo()			declare void @foo()

	define amdgpu_kernel void @kernel() {			define amdgpu_kernel void @kernel() {
	; GCN-LABEL: kernel:			; GCN-LABEL: kernel:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_mov_b32 s36, SCRATCH_RSRC_DWORD0			; GCN-NEXT: s_mov_b32 s36, SCRATCH_RSRC_DWORD0
	; GCN-NEXT: s_mov_b32 s37, SCRATCH_RSRC_DWORD1			; GCN-NEXT: s_mov_b32 s37, SCRATCH_RSRC_DWORD1
	; GCN-NEXT: s_mov_b32 s38, -1			; GCN-NEXT: s_mov_b32 s38, -1
				; GCN-NEXT: ; implicit-def: $vgpr3
	; GCN-NEXT: s_mov_b32 s39, 0xe00000			; GCN-NEXT: s_mov_b32 s39, 0xe00000
	; GCN-NEXT: v_writelane_b32 v40, s4, 0			; GCN-NEXT: v_writelane_b32 v3, s4, 0
	; GCN-NEXT: s_add_u32 s36, s36, s11			; GCN-NEXT: s_add_u32 s36, s36, s11
	; GCN-NEXT: v_writelane_b32 v40, s5, 1			; GCN-NEXT: v_writelane_b32 v3, s5, 1
	; GCN-NEXT: s_addc_u32 s37, s37, 0			; GCN-NEXT: s_addc_u32 s37, s37, 0
	; GCN-NEXT: s_mov_b64 s[4:5], s[0:1]			; GCN-NEXT: s_mov_b64 s[4:5], s[0:1]
	; GCN-NEXT: v_readlane_b32 s0, v40, 0			; GCN-NEXT: v_readlane_b32 s0, v3, 0
	; GCN-NEXT: s_mov_b32 s13, s9			; GCN-NEXT: s_mov_b32 s13, s9
	; GCN-NEXT: s_mov_b32 s12, s8			; GCN-NEXT: s_mov_b32 s12, s8
	; GCN-NEXT: v_readlane_b32 s1, v40, 1			; GCN-NEXT: v_readlane_b32 s1, v3, 1
	; GCN-NEXT: s_add_u32 s8, s0, 36			; GCN-NEXT: s_add_u32 s8, s0, 36
	; GCN-NEXT: s_addc_u32 s9, s1, 0			; GCN-NEXT: s_addc_u32 s9, s1, 0
	; GCN-NEXT: s_getpc_b64 s[0:1]			; GCN-NEXT: s_getpc_b64 s[0:1]
	; GCN-NEXT: s_add_u32 s0, s0, foo@gotpcrel32@lo+4			; GCN-NEXT: s_add_u32 s0, s0, foo@gotpcrel32@lo+4
	; GCN-NEXT: s_addc_u32 s1, s1, foo@gotpcrel32@hi+12			; GCN-NEXT: s_addc_u32 s1, s1, foo@gotpcrel32@hi+12
	; GCN-NEXT: s_load_dwordx2 s[16:17], s[0:1], 0x0			; GCN-NEXT: s_load_dwordx2 s[16:17], s[0:1], 0x0
	; GCN-NEXT: s_mov_b32 s14, s10			; GCN-NEXT: s_mov_b32 s14, s10
	; GCN-NEXT: s_mov_b64 s[10:11], s[6:7]			; GCN-NEXT: s_mov_b64 s[10:11], s[6:7]
	Show All 13 Lines

llvm/test/CodeGen/AMDGPU/sgpr-spill-vmem-large-frame.mir

# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py		# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 -amdgpu-spill-sgpr-to-vgpr=false -verify-machineinstrs -run-pass=si-lower-sgpr-spills,prologepilog -o - %s \| FileCheck %s		# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 -amdgpu-spill-sgpr-to-vgpr=false -verify-machineinstrs -start-before=si-lower-sgpr-spills -stop-after=prologepilog -o - %s \| FileCheck %s

# Check that we allocate 2 emergency stack slots if we're spilling		# Check that we allocate 2 emergency stack slots if we're spilling
# SGPRs to memory and potentially have an offset larger than fits in		# SGPRs to memory and potentially have an offset larger than fits in
# the addressing mode of the memory instructions.		# the addressing mode of the memory instructions.

---		---
name: test		name: test
tracksRegLiveness: true		tracksRegLiveness: true
Show All 13 Lines	bb.0:
liveins: $sgpr30_sgpr31, $sgpr10, $sgpr11		liveins: $sgpr30_sgpr31, $sgpr10, $sgpr11
; CHECK-LABEL: name: test		; CHECK-LABEL: name: test
; CHECK: liveins: $sgpr10, $sgpr11, $sgpr30_sgpr31		; CHECK: liveins: $sgpr10, $sgpr11, $sgpr30_sgpr31
; CHECK-NEXT: {{ $}}		; CHECK-NEXT: {{ $}}
; CHECK-NEXT: S_CMP_EQ_U32 0, 0, implicit-def $scc		; CHECK-NEXT: S_CMP_EQ_U32 0, 0, implicit-def $scc
; CHECK-NEXT: $sgpr6_sgpr7 = S_MOV_B64 $exec		; CHECK-NEXT: $sgpr6_sgpr7 = S_MOV_B64 $exec
; CHECK-NEXT: $exec = S_MOV_B64 1, implicit-def $vgpr2		; CHECK-NEXT: $exec = S_MOV_B64 1, implicit-def $vgpr2
; CHECK-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr2, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec :: (store (s32) into %stack.2, addrspace 5)		; CHECK-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr2, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec :: (store (s32) into %stack.2, addrspace 5)
; CHECK-NEXT: $vgpr2 = V_WRITELANE_B32 killed $sgpr10, 0, undef $vgpr2		; CHECK-NEXT: $vgpr2 = V_WRITELANE_B32 $sgpr10, 0, undef $vgpr2
; CHECK-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr2, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 8, 0, 0, implicit $exec :: (store (s32) into %stack.0, addrspace 5)		; CHECK-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr2, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 8, 0, 0, implicit $exec :: (store (s32) into %stack.0, addrspace 5)
; CHECK-NEXT: $vgpr2 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec :: (load (s32) from %stack.2, addrspace 5)		; CHECK-NEXT: $vgpr2 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec :: (load (s32) from %stack.2, addrspace 5)
; CHECK-NEXT: $exec = S_MOV_B64 killed $sgpr6_sgpr7, implicit killed $vgpr2		; CHECK-NEXT: $exec = S_MOV_B64 killed $sgpr6_sgpr7, implicit killed $vgpr2
; CHECK-NEXT: $sgpr4_sgpr5 = S_MOV_B64 $exec		; CHECK-NEXT: $sgpr4_sgpr5 = S_MOV_B64 $exec
; CHECK-NEXT: $exec = S_MOV_B64 1, implicit-def $vgpr1		; CHECK-NEXT: $exec = S_MOV_B64 1, implicit-def $vgpr1
; CHECK-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec :: (store (s32) into %stack.2, addrspace 5)		; CHECK-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec :: (store (s32) into %stack.2, addrspace 5)
; CHECK-NEXT: $vgpr1 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 8, 0, 0, implicit $exec :: (load (s32) from %stack.0, addrspace 5)		; CHECK-NEXT: $vgpr1 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 8, 0, 0, implicit $exec :: (load (s32) from %stack.0, addrspace 5)
; CHECK-NEXT: $sgpr10 = V_READLANE_B32 killed $vgpr1, 0		; CHECK-NEXT: $sgpr10 = V_READLANE_B32 killed $vgpr1, 0
; CHECK-NEXT: $vgpr1 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec :: (load (s32) from %stack.2, addrspace 5)		; CHECK-NEXT: $vgpr1 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec :: (load (s32) from %stack.2, addrspace 5)
; CHECK-NEXT: $exec = S_MOV_B64 killed $sgpr4_sgpr5, implicit killed $vgpr1		; CHECK-NEXT: $exec = S_MOV_B64 killed $sgpr4_sgpr5, implicit killed $vgpr1
; CHECK-NEXT: S_SETPC_B64 $sgpr30_sgpr31, implicit $scc		; CHECK-NEXT: S_SETPC_B64 $sgpr30_sgpr31, implicit $scc
S_CMP_EQ_U32 0, 0, implicit-def $scc		S_CMP_EQ_U32 0, 0, implicit-def $scc
SI_SPILL_S32_SAVE killed $sgpr10, %stack.0, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32		SI_SPILL_S32_SAVE killed $sgpr10, %stack.0, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32
renamable $sgpr10 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32		renamable $sgpr10 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32
S_SETPC_B64 $sgpr30_sgpr31, implicit $scc		S_SETPC_B64 $sgpr30_sgpr31, implicit $scc
...		...

llvm/test/CodeGen/AMDGPU/sgpr-spills-split-regalloc.ll

Show All 10 Lines	; GCN-NEXT: s_setpc_b64 s[30:31]
call void asm sideeffect "", "~{vcc}" () #0		call void asm sideeffect "", "~{vcc}" () #0
ret void		ret void
}		}

define void @spill_sgpr_with_no_lower_vgpr_available() #0 {		define void @spill_sgpr_with_no_lower_vgpr_available() #0 {
; GCN-LABEL: spill_sgpr_with_no_lower_vgpr_available:		; GCN-LABEL: spill_sgpr_with_no_lower_vgpr_available:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: s_mov_b32 s6, s33		; GCN-NEXT: s_mov_b32 s14, s33
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32
; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1		; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1
; GCN-NEXT: buffer_store_dword v255, off, s[0:3], s33 offset:448 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:452 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, s[4:5]		; GCN-NEXT: s_mov_b64 exec, s[4:5]
; GCN-NEXT: s_add_i32 s32, s32, 0x7400		; GCN-NEXT: s_add_i32 s32, s32, 0x7400
; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:440 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:440 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:436 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:436 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:432 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:432 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:428 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:428 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:424 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:424 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:420 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:420 ; 4-byte Folded Spill
▲ Show 20 Lines • Show All 97 Lines • ▼ Show 20 Lines
; GCN-NEXT: buffer_store_dword v239, off, s[0:3], s33 offset:28 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v239, off, s[0:3], s33 offset:28 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v248, off, s[0:3], s33 offset:24 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v248, off, s[0:3], s33 offset:24 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v249, off, s[0:3], s33 offset:20 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v249, off, s[0:3], s33 offset:20 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v250, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v250, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v251, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v251, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v252, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v252, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v253, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v253, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v254, off, s[0:3], s33 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v254, off, s[0:3], s33 ; 4-byte Folded Spill
; GCN-NEXT: v_writelane_b32 v255, s30, 0		; GCN-NEXT: ; implicit-def: $vgpr0
; GCN-NEXT: v_writelane_b32 v255, s31, 1		; GCN-NEXT: v_writelane_b32 v0, s30, 0
		; GCN-NEXT: v_writelane_b32 v0, s31, 1
		; GCN-NEXT: s_or_saveexec_b64 s[12:13], -1
		; GCN-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:448 ; 4-byte Folded Spill
		; GCN-NEXT: s_mov_b64 exec, s[12:13]
; GCN-NEXT: v_mov_b32_e32 v0, 0		; GCN-NEXT: v_mov_b32_e32 v0, 0
; GCN-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:444		; GCN-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:444
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
		; GCN-NEXT: s_or_saveexec_b64 s[12:13], -1
		; GCN-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:448 ; 4-byte Folded Reload
		; GCN-NEXT: s_mov_b64 exec, s[12:13]
; GCN-NEXT: s_getpc_b64 s[4:5]		; GCN-NEXT: s_getpc_b64 s[4:5]
; GCN-NEXT: s_add_u32 s4, s4, child_function@gotpcrel32@lo+4		; GCN-NEXT: s_add_u32 s4, s4, child_function@gotpcrel32@lo+4
; GCN-NEXT: s_addc_u32 s5, s5, child_function@gotpcrel32@hi+12		; GCN-NEXT: s_addc_u32 s5, s5, child_function@gotpcrel32@hi+12
; GCN-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0		; GCN-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
; GCN-NEXT: s_mov_b64 s[10:11], s[2:3]		; GCN-NEXT: s_mov_b64 s[10:11], s[2:3]
; GCN-NEXT: s_mov_b64 s[8:9], s[0:1]		; GCN-NEXT: s_mov_b64 s[8:9], s[0:1]
; GCN-NEXT: s_mov_b64 s[0:1], s[8:9]		; GCN-NEXT: s_mov_b64 s[0:1], s[8:9]
; GCN-NEXT: s_mov_b64 s[2:3], s[10:11]		; GCN-NEXT: s_mov_b64 s[2:3], s[10:11]
; GCN-NEXT: s_waitcnt lgkmcnt(0)		; GCN-NEXT: s_waitcnt lgkmcnt(0)
; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]		; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]
; GCN-NEXT: v_readlane_b32 s31, v255, 1		; GCN-NEXT: v_readlane_b32 s31, v0, 1
; GCN-NEXT: v_readlane_b32 s30, v255, 0		; GCN-NEXT: v_readlane_b32 s30, v0, 0
; GCN-NEXT: buffer_load_dword v254, off, s[0:3], s33 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v254, off, s[0:3], s33 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v253, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v253, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v252, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v252, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v251, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v251, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v250, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v250, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v249, off, s[0:3], s33 offset:20 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v249, off, s[0:3], s33 offset:20 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v248, off, s[0:3], s33 offset:24 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v248, off, s[0:3], s33 offset:24 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v239, off, s[0:3], s33 offset:28 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v239, off, s[0:3], s33 offset:28 ; 4-byte Folded Reload
▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines
; GCN-NEXT: buffer_load_dword v47, off, s[0:3], s33 offset:412 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v47, off, s[0:3], s33 offset:412 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s33 offset:416 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s33 offset:416 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v45, off, s[0:3], s33 offset:420 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v45, off, s[0:3], s33 offset:420 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:424 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:424 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:428 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:428 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:432 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:432 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:436 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:436 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:440 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:440 ; 4-byte Folded Reload
; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1		; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1
; GCN-NEXT: buffer_load_dword v255, off, s[0:3], s33 offset:448 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:452 ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, s[4:5]		; GCN-NEXT: s_mov_b64 exec, s[4:5]
; GCN-NEXT: s_add_i32 s32, s32, 0xffff8c00		; GCN-NEXT: s_add_i32 s32, s32, 0xffff8c00
; GCN-NEXT: s_mov_b32 s33, s6		; GCN-NEXT: s_mov_b32 s33, s14
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
%alloca = alloca i32, align 4, addrspace(5)		%alloca = alloca i32, align 4, addrspace(5)
store volatile i32 0, i32 addrspace(5)* %alloca		store volatile i32 0, i32 addrspace(5)* %alloca

call void asm sideeffect "",		call void asm sideeffect "",
"~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7},~{v8},~{v9}		"~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7},~{v8},~{v9}
,~{v10},~{v11},~{v12},~{v13},~{v14},~{v15},~{v16},~{v17},~{v18},~{v19}		,~{v10},~{v11},~{v12},~{v13},~{v14},~{v15},~{v16},~{v17},~{v18},~{v19}
Show All 24 Lines	; GCN-NEXT: s_setpc_b64 s[30:31]
call void @child_function()		call void @child_function()
ret void		ret void
}		}

define void @spill_to_lowest_available_vgpr() #0 {		define void @spill_to_lowest_available_vgpr() #0 {
; GCN-LABEL: spill_to_lowest_available_vgpr:		; GCN-LABEL: spill_to_lowest_available_vgpr:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: s_mov_b32 s6, s33		; GCN-NEXT: s_mov_b32 s14, s33
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32
; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1		; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1
; GCN-NEXT: buffer_store_dword v254, off, s[0:3], s33 offset:444 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:448 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, s[4:5]		; GCN-NEXT: s_mov_b64 exec, s[4:5]
; GCN-NEXT: s_add_i32 s32, s32, 0x7400		; GCN-NEXT: s_add_i32 s32, s32, 0x7400
; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:436 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:436 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:432 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:432 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:428 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:428 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:424 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:424 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:420 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:420 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:416 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:416 ; 4-byte Folded Spill
▲ Show 20 Lines • Show All 96 Lines • ▼ Show 20 Lines
; GCN-NEXT: buffer_store_dword v238, off, s[0:3], s33 offset:28 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v238, off, s[0:3], s33 offset:28 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v239, off, s[0:3], s33 offset:24 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v239, off, s[0:3], s33 offset:24 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v248, off, s[0:3], s33 offset:20 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v248, off, s[0:3], s33 offset:20 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v249, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v249, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v250, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v250, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v251, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v251, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v252, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v252, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v253, off, s[0:3], s33 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v253, off, s[0:3], s33 ; 4-byte Folded Spill
; GCN-NEXT: v_writelane_b32 v254, s30, 0		; GCN-NEXT: ; implicit-def: $vgpr0
; GCN-NEXT: v_writelane_b32 v254, s31, 1		; GCN-NEXT: v_writelane_b32 v0, s30, 0
		; GCN-NEXT: v_writelane_b32 v0, s31, 1
		; GCN-NEXT: s_or_saveexec_b64 s[12:13], -1
		; GCN-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:444 ; 4-byte Folded Spill
		; GCN-NEXT: s_mov_b64 exec, s[12:13]
; GCN-NEXT: v_mov_b32_e32 v0, 0		; GCN-NEXT: v_mov_b32_e32 v0, 0
; GCN-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:440		; GCN-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:440
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
		; GCN-NEXT: s_or_saveexec_b64 s[12:13], -1
		; GCN-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:444 ; 4-byte Folded Reload
		; GCN-NEXT: s_mov_b64 exec, s[12:13]
; GCN-NEXT: s_getpc_b64 s[4:5]		; GCN-NEXT: s_getpc_b64 s[4:5]
; GCN-NEXT: s_add_u32 s4, s4, child_function@gotpcrel32@lo+4		; GCN-NEXT: s_add_u32 s4, s4, child_function@gotpcrel32@lo+4
; GCN-NEXT: s_addc_u32 s5, s5, child_function@gotpcrel32@hi+12		; GCN-NEXT: s_addc_u32 s5, s5, child_function@gotpcrel32@hi+12
; GCN-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0		; GCN-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
; GCN-NEXT: s_mov_b64 s[10:11], s[2:3]		; GCN-NEXT: s_mov_b64 s[10:11], s[2:3]
; GCN-NEXT: s_mov_b64 s[8:9], s[0:1]		; GCN-NEXT: s_mov_b64 s[8:9], s[0:1]
; GCN-NEXT: s_mov_b64 s[0:1], s[8:9]		; GCN-NEXT: s_mov_b64 s[0:1], s[8:9]
; GCN-NEXT: s_mov_b64 s[2:3], s[10:11]		; GCN-NEXT: s_mov_b64 s[2:3], s[10:11]
; GCN-NEXT: s_waitcnt lgkmcnt(0)		; GCN-NEXT: s_waitcnt lgkmcnt(0)
; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]		; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]
; GCN-NEXT: v_readlane_b32 s31, v254, 1		; GCN-NEXT: v_readlane_b32 s31, v0, 1
; GCN-NEXT: v_readlane_b32 s30, v254, 0		; GCN-NEXT: v_readlane_b32 s30, v0, 0
; GCN-NEXT: buffer_load_dword v253, off, s[0:3], s33 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v253, off, s[0:3], s33 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v252, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v252, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v251, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v251, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v250, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v250, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v249, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v249, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v248, off, s[0:3], s33 offset:20 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v248, off, s[0:3], s33 offset:20 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v239, off, s[0:3], s33 offset:24 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v239, off, s[0:3], s33 offset:24 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v238, off, s[0:3], s33 offset:28 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v238, off, s[0:3], s33 offset:28 ; 4-byte Folded Reload
▲ Show 20 Lines • Show All 94 Lines • ▼ Show 20 Lines
; GCN-NEXT: buffer_load_dword v47, off, s[0:3], s33 offset:408 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v47, off, s[0:3], s33 offset:408 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s33 offset:412 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s33 offset:412 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v45, off, s[0:3], s33 offset:416 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v45, off, s[0:3], s33 offset:416 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:420 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:420 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:424 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:424 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:428 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:428 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:432 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:432 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:436 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:436 ; 4-byte Folded Reload
; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1		; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1
; GCN-NEXT: buffer_load_dword v254, off, s[0:3], s33 offset:444 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:448 ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, s[4:5]		; GCN-NEXT: s_mov_b64 exec, s[4:5]
; GCN-NEXT: s_add_i32 s32, s32, 0xffff8c00		; GCN-NEXT: s_add_i32 s32, s32, 0xffff8c00
; GCN-NEXT: s_mov_b32 s33, s6		; GCN-NEXT: s_mov_b32 s33, s14
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
%alloca = alloca i32, align 4, addrspace(5)		%alloca = alloca i32, align 4, addrspace(5)
store volatile i32 0, i32 addrspace(5)* %alloca		store volatile i32 0, i32 addrspace(5)* %alloca

call void asm sideeffect "",		call void asm sideeffect "",
"~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7},~{v8},~{v9}		"~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7},~{v8},~{v9}
,~{v10},~{v11},~{v12},~{v13},~{v14},~{v15},~{v16},~{v17},~{v18},~{v19}		,~{v10},~{v11},~{v12},~{v13},~{v14},~{v15},~{v16},~{v17},~{v18},~{v19}
Show All 24 Lines	; GCN-NEXT: s_setpc_b64 s[30:31]
call void @child_function()		call void @child_function()
ret void		ret void
}		}

define void @spill_sgpr_with_sgpr_uses() #0 {		define void @spill_sgpr_with_sgpr_uses() #0 {
; GCN-LABEL: spill_sgpr_with_sgpr_uses:		; GCN-LABEL: spill_sgpr_with_sgpr_uses:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1		; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1
; GCN-NEXT: buffer_store_dword v254, off, s[0:3], s32 offset:444 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:448 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, s[4:5]		; GCN-NEXT: s_mov_b64 exec, s[4:5]
; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:436 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:436 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:432 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:432 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:428 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:428 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:424 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:424 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s32 offset:420 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s32 offset:420 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s32 offset:416 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s32 offset:416 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s32 offset:412 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s32 offset:412 ; 4-byte Folded Spill
▲ Show 20 Lines • Show All 103 Lines • ▼ Show 20 Lines
; GCN-NEXT: v_mov_b32_e32 v0, 0		; GCN-NEXT: v_mov_b32_e32 v0, 0
; GCN-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:440		; GCN-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:440
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; def s4		; GCN-NEXT: ; def s4
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_writelane_b32 v254, s4, 0		; GCN-NEXT: ; implicit-def: $vgpr0
		; GCN-NEXT: v_writelane_b32 v0, s4, 0
		; GCN-NEXT: s_or_saveexec_b64 s[8:9], -1
		; GCN-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:444 ; 4-byte Folded Spill
		; GCN-NEXT: s_mov_b64 exec, s[8:9]
; GCN-NEXT: s_cbranch_scc1 .LBB3_2		; GCN-NEXT: s_cbranch_scc1 .LBB3_2
; GCN-NEXT: ; %bb.1: ; %bb0		; GCN-NEXT: ; %bb.1: ; %bb0
; GCN-NEXT: v_readlane_b32 s4, v254, 0		; GCN-NEXT: s_or_saveexec_b64 s[8:9], -1
		; GCN-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:444 ; 4-byte Folded Reload
		; GCN-NEXT: s_mov_b64 exec, s[8:9]
		; GCN-NEXT: s_waitcnt vmcnt(0)
		; GCN-NEXT: v_readlane_b32 s4, v0, 0
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s4		; GCN-NEXT: ; use s4
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: .LBB3_2: ; %ret		; GCN-NEXT: .LBB3_2: ; %ret
; GCN-NEXT: buffer_load_dword v253, off, s[0:3], s32 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v253, off, s[0:3], s32 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v252, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v252, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v251, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v251, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v250, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v250, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines
; GCN-NEXT: buffer_load_dword v47, off, s[0:3], s32 offset:408 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v47, off, s[0:3], s32 offset:408 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s32 offset:412 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s32 offset:412 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v45, off, s[0:3], s32 offset:416 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v45, off, s[0:3], s32 offset:416 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s32 offset:420 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s32 offset:420 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:424 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:424 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:428 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:428 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:432 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:432 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:436 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:436 ; 4-byte Folded Reload
; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1		; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1
; GCN-NEXT: buffer_load_dword v254, off, s[0:3], s32 offset:444 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:448 ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, s[4:5]		; GCN-NEXT: s_mov_b64 exec, s[4:5]
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
%alloca = alloca i32, align 4, addrspace(5)		%alloca = alloca i32, align 4, addrspace(5)
store volatile i32 0, i32 addrspace(5)* %alloca		store volatile i32 0, i32 addrspace(5)* %alloca

call void asm sideeffect "",		call void asm sideeffect "",
"~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7},~{v8},~{v9}		"~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7},~{v8},~{v9}
▲ Show 20 Lines • Show All 306 Lines • ▼ Show 20 Lines	; GCN-NEXT: s_setpc_b64 s[4:5]
ret void		ret void
}		}

define void @spill_sgpr_no_free_vgpr(<4 x i32> addrspace(1)* %out, <4 x i32> addrspace(1)* %in) #0 {		define void @spill_sgpr_no_free_vgpr(<4 x i32> addrspace(1)* %out, <4 x i32> addrspace(1)* %in) #0 {
; GCN-LABEL: spill_sgpr_no_free_vgpr:		; GCN-LABEL: spill_sgpr_no_free_vgpr:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1		; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1
; GCN-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:464 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:468 ; 4-byte Folded Spill
		; GCN-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:472 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, s[4:5]		; GCN-NEXT: s_mov_b64 exec, s[4:5]
; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:444 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:444 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:440 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:440 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:436 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:436 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:432 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:432 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s32 offset:428 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s32 offset:428 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s32 offset:424 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s32 offset:424 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s32 offset:420 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s32 offset:420 ; 4-byte Folded Spill
▲ Show 20 Lines • Show All 97 Lines • ▼ Show 20 Lines
; GCN-NEXT: buffer_store_dword v248, off, s[0:3], s32 offset:28 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v248, off, s[0:3], s32 offset:28 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v249, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v249, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v250, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v250, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v251, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v251, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v252, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v252, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v253, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v253, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v254, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v254, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v255, off, s[0:3], s32 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v255, off, s[0:3], s32 ; 4-byte Folded Spill
		; GCN-NEXT: ; implicit-def: $vgpr4
; GCN-NEXT: v_writelane_b32 v4, s34, 0		; GCN-NEXT: v_writelane_b32 v4, s34, 0
; GCN-NEXT: v_writelane_b32 v4, s35, 1		; GCN-NEXT: v_writelane_b32 v4, s35, 1
; GCN-NEXT: v_writelane_b32 v4, s36, 2		; GCN-NEXT: v_writelane_b32 v4, s36, 2
; GCN-NEXT: v_writelane_b32 v4, s37, 3		; GCN-NEXT: v_writelane_b32 v4, s37, 3
		; GCN-NEXT: s_or_saveexec_b64 s[8:9], -1
		; GCN-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:464 ; 4-byte Folded Spill
		; GCN-NEXT: s_mov_b64 exec, s[8:9]
; GCN-NEXT: v_mov_b32_e32 v5, v3		; GCN-NEXT: v_mov_b32_e32 v5, v3
; GCN-NEXT: v_mov_b32_e32 v3, v1		; GCN-NEXT: v_mov_b32_e32 v3, v2
		; GCN-NEXT: v_mov_b32_e32 v4, v1
		; GCN-NEXT: v_mov_b32_e32 v1, v0
		; GCN-NEXT: s_or_saveexec_b64 s[8:9], -1
		; GCN-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:464 ; 4-byte Folded Reload
		; GCN-NEXT: s_mov_b64 exec, s[8:9]
; GCN-NEXT: ; implicit-def: $sgpr4		; GCN-NEXT: ; implicit-def: $sgpr4
; GCN-NEXT: ; implicit-def: $sgpr4		; GCN-NEXT: ; implicit-def: $sgpr4
; GCN-NEXT: ; kill: def $vgpr3 killed $vgpr3 killed $exec		; GCN-NEXT: ; kill: def $vgpr4 killed $vgpr4 killed $exec
; GCN-NEXT: ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec		; GCN-NEXT: ; kill: def $vgpr1 killed $vgpr1 def $vgpr1_vgpr2 killed $exec
; GCN-NEXT: v_mov_b32_e32 v1, v3		; GCN-NEXT: v_mov_b32_e32 v2, v4
; GCN-NEXT: ; implicit-def: $sgpr4		; GCN-NEXT: ; implicit-def: $sgpr4
; GCN-NEXT: ; implicit-def: $sgpr4		; GCN-NEXT: ; implicit-def: $sgpr4
; GCN-NEXT: ; kill: def $vgpr5 killed $vgpr5 killed $exec		; GCN-NEXT: ; kill: def $vgpr5 killed $vgpr5 killed $exec
; GCN-NEXT: ; kill: def $vgpr2 killed $vgpr2 def $vgpr2_vgpr3 killed $exec		; GCN-NEXT: ; kill: def $vgpr3 killed $vgpr3 def $vgpr3_vgpr4 killed $exec
; GCN-NEXT: v_mov_b32_e32 v3, v5		; GCN-NEXT: v_mov_b32_e32 v4, v5
; GCN-NEXT: ; implicit-def: $sgpr4_sgpr5		; GCN-NEXT: ; implicit-def: $sgpr4_sgpr5
; GCN-NEXT: ; implicit-def: $sgpr4_sgpr5		; GCN-NEXT: ; implicit-def: $sgpr4_sgpr5
; GCN-NEXT: flat_load_dwordx4 v[5:8], v[2:3]		; GCN-NEXT: flat_load_dwordx4 v[3:6], v[3:4]
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: buffer_store_dword v5, off, s[0:3], s32 offset:448 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:448 ; 4-byte Folded Spill
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: buffer_store_dword v6, off, s[0:3], s32 offset:452 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:452 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v7, off, s[0:3], s32 offset:456 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v5, off, s[0:3], s32 offset:456 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v8, off, s[0:3], s32 offset:460 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v6, off, s[0:3], s32 offset:460 ; 4-byte Folded Spill
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: buffer_load_dword v5, off, s[0:3], s32 offset:448 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:448 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v6, off, s[0:3], s32 offset:452 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v4, off, s[0:3], s32 offset:452 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v7, off, s[0:3], s32 offset:456 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v5, off, s[0:3], s32 offset:456 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v8, off, s[0:3], s32 offset:460 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v6, off, s[0:3], s32 offset:460 ; 4-byte Folded Reload
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: flat_store_dwordx4 v[0:1], v[5:8]		; GCN-NEXT: flat_store_dwordx4 v[1:2], v[3:6]
; GCN-NEXT: v_readlane_b32 s37, v4, 3		; GCN-NEXT: v_readlane_b32 s37, v0, 3
; GCN-NEXT: v_readlane_b32 s36, v4, 2		; GCN-NEXT: v_readlane_b32 s36, v0, 2
; GCN-NEXT: v_readlane_b32 s35, v4, 1		; GCN-NEXT: v_readlane_b32 s35, v0, 1
; GCN-NEXT: v_readlane_b32 s34, v4, 0		; GCN-NEXT: v_readlane_b32 s34, v0, 0
; GCN-NEXT: buffer_load_dword v255, off, s[0:3], s32 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v255, off, s[0:3], s32 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v254, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v254, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v253, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v253, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v252, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v252, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v251, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v251, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v250, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v250, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v249, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v249, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v248, off, s[0:3], s32 offset:28 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v248, off, s[0:3], s32 offset:28 ; 4-byte Folded Reload
▲ Show 20 Lines • Show All 97 Lines • ▼ Show 20 Lines
; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s32 offset:420 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s32 offset:420 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v45, off, s[0:3], s32 offset:424 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v45, off, s[0:3], s32 offset:424 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s32 offset:428 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s32 offset:428 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:432 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:432 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:436 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:436 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:440 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:440 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:444 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:444 ; 4-byte Folded Reload
; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1		; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1
; GCN-NEXT: buffer_load_dword v4, off, s[0:3], s32 offset:464 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v4, off, s[0:3], s32 offset:468 ; 4-byte Folded Reload
		; GCN-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:472 ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, s[4:5]		; GCN-NEXT: s_mov_b64 exec, s[4:5]
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
%a = load <4 x i32>, <4 x i32> addrspace(1)* %in		%a = load <4 x i32>, <4 x i32> addrspace(1)* %in
call void asm sideeffect "",		call void asm sideeffect "",
"~{v6},~{v7},~{v8},~{v9}		"~{v6},~{v7},~{v8},~{v9}
,~{v10},~{v11},~{v12},~{v13},~{v14},~{v15},~{v16},~{v17},~{v18},~{v19}		,~{v10},~{v11},~{v12},~{v13},~{v14},~{v15},~{v16},~{v17},~{v18},~{v19}
,~{v20},~{v21},~{v22},~{v23},~{v24},~{v25},~{v26},~{v27},~{v28},~{v29}		,~{v20},~{v21},~{v22},~{v23},~{v24},~{v25},~{v26},~{v27},~{v28},~{v29}
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	; GCN-NEXT: s_setpc_b64 s[30:31]
,~{v250},~{v251},~{v252},~{v253},~{v254},~{v255}" () #0		,~{v250},~{v251},~{v252},~{v253},~{v254},~{v255}" () #0
ret void		ret void
}		}

define void @spill_sgpr_no_free_vgpr_ipra() #0 {		define void @spill_sgpr_no_free_vgpr_ipra() #0 {
; GCN-LABEL: spill_sgpr_no_free_vgpr_ipra:		; GCN-LABEL: spill_sgpr_no_free_vgpr_ipra:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: s_mov_b32 s6, s33		; GCN-NEXT: s_mov_b32 s14, s33
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32
		; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1
		; GCN-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:452 ; 4-byte Folded Spill
		; GCN-NEXT: s_mov_b64 exec, s[4:5]
; GCN-NEXT: s_add_i32 s32, s32, 0x7400		; GCN-NEXT: s_add_i32 s32, s32, 0x7400
; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:444 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:444 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:440 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:440 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:436 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:436 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:432 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:432 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:428 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:428 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:424 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:424 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s33 offset:420 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s33 offset:420 ; 4-byte Folded Spill
▲ Show 20 Lines • Show All 97 Lines • ▼ Show 20 Lines
; GCN-NEXT: buffer_store_dword v248, off, s[0:3], s33 offset:28 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v248, off, s[0:3], s33 offset:28 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v249, off, s[0:3], s33 offset:24 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v249, off, s[0:3], s33 offset:24 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v250, off, s[0:3], s33 offset:20 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v250, off, s[0:3], s33 offset:20 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v251, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v251, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v252, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v252, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v253, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v253, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v254, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v254, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v255, off, s[0:3], s33 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v255, off, s[0:3], s33 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 s[14:15], exec		; GCN-NEXT: ; implicit-def: $vgpr0
; GCN-NEXT: s_mov_b64 exec, 1		; GCN-NEXT: v_writelane_b32 v0, s30, 0
; GCN-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:456		; GCN-NEXT: v_writelane_b32 v0, s31, 1
; GCN-NEXT: v_writelane_b32 v1, s30, 0		; GCN-NEXT: s_or_saveexec_b64 s[12:13], -1
; GCN-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:448 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:448 ; 4-byte Folded Spill
; GCN-NEXT: buffer_load_dword v1, off, s[0:3], s33 offset:456
; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_mov_b64 exec, s[14:15]
; GCN-NEXT: s_mov_b64 s[12:13], exec
; GCN-NEXT: s_mov_b64 exec, 1
; GCN-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:456
; GCN-NEXT: v_writelane_b32 v0, s31, 0
; GCN-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:452 ; 4-byte Folded Spill
; GCN-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:456
; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_mov_b64 exec, s[12:13]		; GCN-NEXT: s_mov_b64 exec, s[12:13]
; GCN-NEXT: s_getpc_b64 s[4:5]		; GCN-NEXT: s_getpc_b64 s[4:5]
; GCN-NEXT: s_add_u32 s4, s4, child_function_ipra@rel32@lo+4		; GCN-NEXT: s_add_u32 s4, s4, child_function_ipra@rel32@lo+4
; GCN-NEXT: s_addc_u32 s5, s5, child_function_ipra@rel32@hi+12		; GCN-NEXT: s_addc_u32 s5, s5, child_function_ipra@rel32@hi+12
; GCN-NEXT: s_mov_b64 s[10:11], s[2:3]		; GCN-NEXT: s_mov_b64 s[10:11], s[2:3]
; GCN-NEXT: s_mov_b64 s[8:9], s[0:1]		; GCN-NEXT: s_mov_b64 s[8:9], s[0:1]
; GCN-NEXT: s_mov_b64 s[0:1], s[8:9]		; GCN-NEXT: s_mov_b64 s[0:1], s[8:9]
; GCN-NEXT: s_mov_b64 s[2:3], s[10:11]		; GCN-NEXT: s_mov_b64 s[2:3], s[10:11]
; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]		; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]
; GCN-NEXT: s_mov_b64 s[8:9], exec		; GCN-NEXT: s_or_saveexec_b64 s[12:13], -1
; GCN-NEXT: s_mov_b64 exec, 1
; GCN-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:456
; GCN-NEXT: buffer_load_dword v1, off, s[0:3], s33 offset:452 ; 4-byte Folded Reload
; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: v_readlane_b32 s31, v1, 0
; GCN-NEXT: buffer_load_dword v1, off, s[0:3], s33 offset:456
; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_mov_b64 exec, s[8:9]
; GCN-NEXT: s_mov_b64 s[4:5], exec
; GCN-NEXT: s_mov_b64 exec, 1
; GCN-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:456
; GCN-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:448 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:448 ; 4-byte Folded Reload
		; GCN-NEXT: s_mov_b64 exec, s[12:13]
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
		; GCN-NEXT: v_readlane_b32 s31, v0, 1
; GCN-NEXT: v_readlane_b32 s30, v0, 0		; GCN-NEXT: v_readlane_b32 s30, v0, 0
; GCN-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:456
; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_mov_b64 exec, s[4:5]
; GCN-NEXT: buffer_load_dword v255, off, s[0:3], s33 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v255, off, s[0:3], s33 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v254, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v254, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v253, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v253, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v252, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v252, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v251, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v251, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v250, off, s[0:3], s33 offset:20 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v250, off, s[0:3], s33 offset:20 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v249, off, s[0:3], s33 offset:24 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v249, off, s[0:3], s33 offset:24 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v248, off, s[0:3], s33 offset:28 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v248, off, s[0:3], s33 offset:28 ; 4-byte Folded Reload
▲ Show 20 Lines • Show All 96 Lines • ▼ Show 20 Lines
; GCN-NEXT: buffer_load_dword v47, off, s[0:3], s33 offset:416 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v47, off, s[0:3], s33 offset:416 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s33 offset:420 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s33 offset:420 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v45, off, s[0:3], s33 offset:424 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v45, off, s[0:3], s33 offset:424 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:428 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:428 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:432 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:432 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:436 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:436 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:440 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:440 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:444 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:444 ; 4-byte Folded Reload
		; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1
		; GCN-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:452 ; 4-byte Folded Reload
		; GCN-NEXT: s_mov_b64 exec, s[4:5]
; GCN-NEXT: s_add_i32 s32, s32, 0xffff8c00		; GCN-NEXT: s_add_i32 s32, s32, 0xffff8c00
; GCN-NEXT: s_mov_b32 s33, s6		; GCN-NEXT: s_mov_b32 s33, s14
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
call void @child_function_ipra()		call void @child_function_ipra()
ret void		ret void
}		}

define internal void @child_function_ipra_tail_call() #0 {		define internal void @child_function_ipra_tail_call() #0 {
; GCN-LABEL: child_function_ipra_tail_call:		; GCN-LABEL: child_function_ipra_tail_call:
▲ Show 20 Lines • Show All 274 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/si-spill-sgpr-stack.ll

	; RUN: llc -march=amdgcn -mcpu=fiji -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -check-prefix=ALL -check-prefix=SGPR %s			; RUN: llc -march=amdgcn -mcpu=fiji -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -check-prefix=ALL -check-prefix=SGPR %s

	; Make sure this doesn't crash.			; Make sure this doesn't crash.
	; ALL-LABEL: {{^}}test:			; ALL-LABEL: {{^}}test:
	; ALL: s_mov_b32 s[[LO:[0-9]+]], SCRATCH_RSRC_DWORD0			; ALL: s_mov_b32 s[[LO:[0-9]+]], SCRATCH_RSRC_DWORD0
	; ALL: s_mov_b32 s[[HI:[0-9]+]], 0xe80000			; ALL: s_mov_b32 s[[HI:[0-9]+]], 0xe80000

	; Make sure we are handling hazards correctly.			; Make sure we are handling hazards correctly.
	; SGPR: buffer_load_dword [[VHI:v[0-9]+]], off, s[{{[0-9]+:[0-9]+}}], 0 offset:4			; SGPR: v_mov_b32_e32 v0, vcc_lo
				; SGPR-NEXT: s_or_saveexec_b64 [[EXEC_COPY:s\[[0-9]+:[0-9]+\]]], -1
				; SGPR-NEXT: buffer_load_dword [[VHI:v[0-9]+]], off, s[{{[0-9]+:[0-9]+}}], 0 offset:4 ; 4-byte Folded Reload
				; SGPR-NEXT: s_mov_b64 exec, [[EXEC_COPY]]
	; SGPR-NEXT: s_waitcnt vmcnt(0)			; SGPR-NEXT: s_waitcnt vmcnt(0)
	; SGPR-NEXT: v_readlane_b32 s{{[0-9]+}}, [[VHI]], 0			; SGPR-NEXT: v_readlane_b32 s{{[0-9]+}}, [[VHI]], 0
	; SGPR-NEXT: v_readlane_b32 s{{[0-9]+}}, [[VHI]], 1			; SGPR-NEXT: v_readlane_b32 s{{[0-9]+}}, [[VHI]], 1
	; SGPR-NEXT: v_readlane_b32 s{{[0-9]+}}, [[VHI]], 2			; SGPR-NEXT: v_readlane_b32 s{{[0-9]+}}, [[VHI]], 2
	; SGPR-NEXT: v_readlane_b32 s[[HI:[0-9]+]], [[VHI]], 3			; SGPR-NEXT: v_readlane_b32 s[[HI:[0-9]+]], [[VHI]], 3
	; SGPR-NEXT: buffer_load_dword [[VHI]], off, s[96:99], 0			; SGPR-NEXT: s_nop 4
	; SGPR-NEXT: s_waitcnt vmcnt(0)
	; SGPR-NEXT: s_mov_b64 exec, s[4:5]
	; SGPR-NEXT: s_nop 1
	; SGPR-NEXT: buffer_store_dword v0, off, s[0:3], 0			; SGPR-NEXT: buffer_store_dword v0, off, s[0:3], 0

	; ALL: s_endpgm			; ALL: s_endpgm
	define amdgpu_kernel void @test(i32 addrspace(1)* %out, i32 %in) {			define amdgpu_kernel void @test(i32 addrspace(1)* %out, i32 %in) {
	call void asm sideeffect "", "~{s[0:7]}" ()			call void asm sideeffect "", "~{s[0:7]}" ()
	call void asm sideeffect "", "~{s[8:15]}" ()			call void asm sideeffect "", "~{s[8:15]}" ()
	call void asm sideeffect "", "~{s[16:23]}" ()			call void asm sideeffect "", "~{s[16:23]}" ()
	call void asm sideeffect "", "~{s[24:31]}" ()			call void asm sideeffect "", "~{s[24:31]}" ()
	▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/sibling-call.ll

	Show First 20 Lines • Show All 206 Lines • ▼ Show 20 Lines
	; GCN-DAG: s_addk_i32 s32, 0x800			; GCN-DAG: s_addk_i32 s32, 0x800
	; GCN: v_writelane_b32 [[CSRV_1]], [[FP_SCRATCH_COPY]], 0			; GCN: v_writelane_b32 [[CSRV_1]], [[FP_SCRATCH_COPY]], 0

	; GCN-DAG: s_getpc_b64 s[4:5]			; GCN-DAG: s_getpc_b64 s[4:5]
	; GCN-DAG: s_add_u32 s4, s4, i32_fastcc_i32_i32@gotpcrel32@lo+4			; GCN-DAG: s_add_u32 s4, s4, i32_fastcc_i32_i32@gotpcrel32@lo+4
	; GCN-DAG: s_addc_u32 s5, s5, i32_fastcc_i32_i32@gotpcrel32@hi+12			; GCN-DAG: s_addc_u32 s5, s5, i32_fastcc_i32_i32@gotpcrel32@hi+12

	; GCN-DAG: v_writelane_b32 [[CSRV]], s30, 0			; GCN-DAG: v_writelane_b32 [[CSRV]], s30, 0
	; GCN-DAG: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GCN-DAG: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GCN-DAG: buffer_store_dword v42, off, s[0:3], s33 ; 4-byte Folded Spill			; GCN-DAG: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
	; GCN-DAG: v_writelane_b32 [[CSRV]], s31, 1			; GCN-DAG: v_writelane_b32 [[CSRV]], s31, 1


	; GCN: s_swappc_b64			; GCN: s_swappc_b64

	; GCN-DAG: buffer_load_dword v42, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-DAG: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-DAG: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GCN-DAG: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload

	; GCN: s_getpc_b64 s[4:5]			; GCN: s_getpc_b64 s[4:5]
	; GCN-NEXT: s_add_u32 s4, s4, sibling_call_i32_fastcc_i32_i32@rel32@lo+4			; GCN-NEXT: s_add_u32 s4, s4, sibling_call_i32_fastcc_i32_i32@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s5, s5, sibling_call_i32_fastcc_i32_i32@rel32@hi+12			; GCN-NEXT: s_addc_u32 s5, s5, sibling_call_i32_fastcc_i32_i32@rel32@hi+12
	; GCN-NEXT: v_readlane_b32 s31, [[CSRV]], 1			; GCN-NEXT: v_readlane_b32 s31, [[CSRV]], 1
	; GCN-NEXT: v_readlane_b32 s30, [[CSRV]], 0			; GCN-NEXT: v_readlane_b32 s30, [[CSRV]], 0
	; GCN-NEXT: v_readlane_b32 [[FP_SCRATCH_COPY:s[0-9]+]], [[CSRV_1]], 0			; GCN-NEXT: v_readlane_b32 [[FP_SCRATCH_COPY:s[0-9]+]], [[CSRV_1]], 0
	; GCN-NEXT: s_or_saveexec_b64 s[8:9], -1			; GCN-NEXT: s_or_saveexec_b64 s[8:9], -1
	▲ Show 20 Lines • Show All 243 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/spill-csr-frame-ptr-reg-copy.ll

	; RUN: llc -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs -stress-regalloc=1 < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs -stress-regalloc=1 < %s \| FileCheck -check-prefix=GCN %s

	; GCN-LABEL: {{^}}spill_csr_s5_copy:			; GCN-LABEL: {{^}}spill_csr_s5_copy:
	; GCN: s_mov_b32 [[FP_SCRATCH_COPY:s[0-9]+]], s33			; GCN: s_mov_b32 [[FP_SCRATCH_COPY:s[0-9]+]], s33
	; GCN: s_or_saveexec_b64			; GCN: s_xor_saveexec_b64
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill			; GCN-NEXT: s_mov_b64 exec, -1
				; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec			; GCN-NEXT: s_mov_b64 exec
	; GCN: v_writelane_b32 v41, [[FP_SCRATCH_COPY]], 0			; GCN: v_writelane_b32 v40, [[FP_SCRATCH_COPY]], 2
	; GCN: s_swappc_b64			; GCN: s_swappc_b64

	; GCN: v_mov_b32_e32 [[K:v[0-9]+]], 9			; GCN: v_mov_b32_e32 [[K:v[0-9]+]], 9
	; GCN: buffer_store_dword [[K]], off, s[0:3], s33{{$}}			; GCN: buffer_store_dword [[K]], off, s[0:3], s33{{$}}

	; GCN: v_readlane_b32 [[FP_SCRATCH_COPY:s[0-9]+]], v41, 0			; GCN: v_readlane_b32 [[FP_SCRATCH_COPY:s[0-9]+]], v40, 2
	; GCN: s_or_saveexec_b64			; GCN: s_xor_saveexec_b64
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload			; GCN-NEXT: s_mov_b64 exec, -1
				; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload
	; GCN: s_mov_b64 exec			; GCN: s_mov_b64 exec
	; GCN: s_mov_b32 s33, [[FP_SCRATCH_COPY]]			; GCN: s_mov_b32 s33, [[FP_SCRATCH_COPY]]
	; GCN: s_setpc_b64			; GCN: s_setpc_b64
	define void @spill_csr_s5_copy() #0 {			define void @spill_csr_s5_copy() #0 {
	bb:			bb:
	%alloca = alloca i32, addrspace(5)			%alloca = alloca i32, addrspace(5)
	%tmp = tail call i64 @func() #1			%tmp = tail call i64 @func() #1
	%tmp1 = getelementptr inbounds i32, i32 addrspace(1)* null, i64 %tmp			%tmp1 = getelementptr inbounds i32, i32 addrspace(1)* null, i64 %tmp
	Show All 10 Lines

llvm/test/CodeGen/AMDGPU/spill-offset-calculation.ll

Show First 20 Lines • Show All 116 Lines • ▼ Show 20 Lines
define void @test_sgpr_offset_function_scavenge_fail_func() #2 {		define void @test_sgpr_offset_function_scavenge_fail_func() #2 {
; MUBUF-LABEL: test_sgpr_offset_function_scavenge_fail_func:		; MUBUF-LABEL: test_sgpr_offset_function_scavenge_fail_func:
; MUBUF: ; %bb.0: ; %entry		; MUBUF: ; %bb.0: ; %entry
; MUBUF-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; MUBUF-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; MUBUF-NEXT: ;;#ASMSTART		; MUBUF-NEXT: ;;#ASMSTART
; MUBUF-NEXT: ;;#ASMEND		; MUBUF-NEXT: ;;#ASMEND
; MUBUF-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:8 glc		; MUBUF-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:8 glc
; MUBUF-NEXT: s_waitcnt vmcnt(0)		; MUBUF-NEXT: s_waitcnt vmcnt(0)
; MUBUF-NEXT: v_mov_b32_e32 v1, 0x1004		; MUBUF-NEXT: s_add_i32 s10, s32, 0x40100
; MUBUF-NEXT: buffer_store_dword v0, v1, s[0:3], s32 offen ; 4-byte Folded Spill		; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s10 ; 4-byte Folded Spill
; MUBUF-NEXT: ;;#ASMSTART		; MUBUF-NEXT: ;;#ASMSTART
; MUBUF-NEXT: ;;#ASMEND		; MUBUF-NEXT: ;;#ASMEND
; MUBUF-NEXT: ;;#ASMSTART		; MUBUF-NEXT: ;;#ASMSTART
; MUBUF-NEXT: ;;#ASMEND		; MUBUF-NEXT: ;;#ASMEND
; MUBUF-NEXT: ;;#ASMSTART		; MUBUF-NEXT: ;;#ASMSTART
; MUBUF-NEXT: ;;#ASMEND		; MUBUF-NEXT: ;;#ASMEND
; MUBUF-NEXT: v_mov_b32_e32 v1, 0x1004		; MUBUF-NEXT: s_add_i32 s10, s32, 0x40100
; MUBUF-NEXT: buffer_load_dword v0, v1, s[0:3], s32 offen ; 4-byte Folded Reload		; MUBUF-NEXT: buffer_load_dword v0, off, s[0:3], s10 ; 4-byte Folded Reload
; MUBUF-NEXT: s_waitcnt vmcnt(0)		; MUBUF-NEXT: s_waitcnt vmcnt(0)
; MUBUF-NEXT: ;;#ASMSTART		; MUBUF-NEXT: ;;#ASMSTART
; MUBUF-NEXT: ;;#ASMEND		; MUBUF-NEXT: ;;#ASMEND
; MUBUF-NEXT: s_setpc_b64 s[30:31]		; MUBUF-NEXT: s_setpc_b64 s[30:31]
;		;
; FLATSCR-LABEL: test_sgpr_offset_function_scavenge_fail_func:		; FLATSCR-LABEL: test_sgpr_offset_function_scavenge_fail_func:
; FLATSCR: ; %bb.0: ; %entry		; FLATSCR: ; %bb.0: ; %entry
; FLATSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; FLATSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines
; MUBUF-LABEL: test_sgpr_offset_function_scavenge_fail_kernel:		; MUBUF-LABEL: test_sgpr_offset_function_scavenge_fail_kernel:
; MUBUF: ; %bb.0: ; %entry		; MUBUF: ; %bb.0: ; %entry
; MUBUF-NEXT: s_add_u32 s0, s0, s7		; MUBUF-NEXT: s_add_u32 s0, s0, s7
; MUBUF-NEXT: s_addc_u32 s1, s1, 0		; MUBUF-NEXT: s_addc_u32 s1, s1, 0
; MUBUF-NEXT: ;;#ASMSTART		; MUBUF-NEXT: ;;#ASMSTART
; MUBUF-NEXT: ;;#ASMEND		; MUBUF-NEXT: ;;#ASMEND
; MUBUF-NEXT: buffer_load_dword v0, off, s[0:3], 0 offset:8 glc		; MUBUF-NEXT: buffer_load_dword v0, off, s[0:3], 0 offset:8 glc
; MUBUF-NEXT: s_waitcnt vmcnt(0)		; MUBUF-NEXT: s_waitcnt vmcnt(0)
; MUBUF-NEXT: v_mov_b32_e32 v1, 0x1004		; MUBUF-NEXT: s_mov_b32 s10, 0x40100
; MUBUF-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen ; 4-byte Folded Spill		; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s10 ; 4-byte Folded Spill
; MUBUF-NEXT: ;;#ASMSTART		; MUBUF-NEXT: ;;#ASMSTART
; MUBUF-NEXT: ;;#ASMEND		; MUBUF-NEXT: ;;#ASMEND
; MUBUF-NEXT: ;;#ASMSTART		; MUBUF-NEXT: ;;#ASMSTART
; MUBUF-NEXT: ;;#ASMEND		; MUBUF-NEXT: ;;#ASMEND
; MUBUF-NEXT: ;;#ASMSTART		; MUBUF-NEXT: ;;#ASMSTART
; MUBUF-NEXT: ;;#ASMEND		; MUBUF-NEXT: ;;#ASMEND
; MUBUF-NEXT: v_mov_b32_e32 v1, 0x1004		; MUBUF-NEXT: s_mov_b32 s10, 0x40100
; MUBUF-NEXT: buffer_load_dword v0, v1, s[0:3], 0 offen ; 4-byte Folded Reload		; MUBUF-NEXT: buffer_load_dword v0, off, s[0:3], s10 ; 4-byte Folded Reload
; MUBUF-NEXT: s_waitcnt vmcnt(0)		; MUBUF-NEXT: s_waitcnt vmcnt(0)
; MUBUF-NEXT: ;;#ASMSTART		; MUBUF-NEXT: ;;#ASMSTART
; MUBUF-NEXT: ;;#ASMEND		; MUBUF-NEXT: ;;#ASMEND
; MUBUF-NEXT: s_endpgm		; MUBUF-NEXT: s_endpgm
;		;
; FLATSCR-LABEL: test_sgpr_offset_function_scavenge_fail_kernel:		; FLATSCR-LABEL: test_sgpr_offset_function_scavenge_fail_kernel:
; FLATSCR: ; %bb.0: ; %entry		; FLATSCR: ; %bb.0: ; %entry
; FLATSCR-NEXT: s_add_u32 flat_scratch_lo, s0, s3		; FLATSCR-NEXT: s_add_u32 flat_scratch_lo, s0, s3
▲ Show 20 Lines • Show All 429 Lines • ▼ Show 20 Lines	entry:
; Ensure the spill is of the full super-reg.		; Ensure the spill is of the full super-reg.
call void asm sideeffect "; $0", "r"(<2 x i32> %a)		call void asm sideeffect "; $0", "r"(<2 x i32> %a)

ret void		ret void
}		}

attributes #0 = { nounwind }		attributes #0 = { nounwind }
attributes #1 = { nounwind "amdgpu-num-sgpr"="17" "amdgpu-num-vgpr"="8" }		attributes #1 = { nounwind "amdgpu-num-sgpr"="17" "amdgpu-num-vgpr"="8" }
attributes #2 = { nounwind "amdgpu-num-sgpr"="14" "amdgpu-num-vgpr"="8" }		attributes #2 = { nounwind "amdgpu-num-sgpr"="16" "amdgpu-num-vgpr"="8" }
attributes #3 = { nounwind "amdgpu-num-sgpr"="16" "amdgpu-num-vgpr"="8" }		attributes #3 = { nounwind "amdgpu-num-sgpr"="18" "amdgpu-num-vgpr"="8" }

llvm/test/CodeGen/AMDGPU/spill-reg-tuple-super-reg-use.mir

# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py		# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
# RUN: llc -march=amdgcn -mcpu=gfx900 -run-pass=si-lower-sgpr-spills,prologepilog,machine-cp -verify-machineinstrs %s -o - \| FileCheck -check-prefix=GCN %s		# RUN: llc -march=amdgcn -mcpu=gfx900 -start-before=si-lower-sgpr-spills -stop-after=prologepilog -verify-machineinstrs %s -o - \| FileCheck -check-prefix=GCN %s

# Make sure the initial first $sgpr1 = COPY $sgpr2 copy is not deleted		# Make sure the initial first $sgpr1 = COPY $sgpr2 copy is not deleted
# by the copy propagation after lowering the spill.		# by the copy propagation after lowering the spill.

---		---
name: spill_sgpr128_use_subreg		name: spill_sgpr128_use_subreg
tracksRegLiveness: true		tracksRegLiveness: true
machineFunctionInfo:		machineFunctionInfo:
Show All 10 Lines	bb.0:

; GCN-LABEL: name: spill_sgpr128_use_subreg		; GCN-LABEL: name: spill_sgpr128_use_subreg
; GCN: liveins: $sgpr0, $sgpr1, $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $vgpr0, $vgpr1, $vgpr2, $vgpr3		; GCN: liveins: $sgpr0, $sgpr1, $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $vgpr0, $vgpr1, $vgpr2, $vgpr3
; GCN-NEXT: {{ $}}		; GCN-NEXT: {{ $}}
; GCN-NEXT: $sgpr8_sgpr9 = S_XOR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec		; GCN-NEXT: $sgpr8_sgpr9 = S_XOR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec
; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr0, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 0, 0, 0, implicit $exec :: (store (s32) into %stack.1, addrspace 5)		; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr0, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 0, 0, 0, implicit $exec :: (store (s32) into %stack.1, addrspace 5)
; GCN-NEXT: $exec = S_MOV_B64 killed $sgpr8_sgpr9		; GCN-NEXT: $exec = S_MOV_B64 killed $sgpr8_sgpr9
; GCN-NEXT: renamable $sgpr1 = COPY $sgpr2		; GCN-NEXT: renamable $sgpr1 = COPY $sgpr2
; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr0, 0, $vgpr0, implicit-def $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr0_sgpr1_sgpr2_sgpr3		; GCN-NEXT: renamable $vgpr0 = IMPLICIT_DEF
; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr1, 1, $vgpr0, implicit $sgpr0_sgpr1_sgpr2_sgpr3		; GCN-NEXT: renamable $vgpr0 = V_WRITELANE_B32 $sgpr0, 0, killed $vgpr0, implicit-def $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr0_sgpr1_sgpr2_sgpr3
; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr2, 2, $vgpr0, implicit $sgpr0_sgpr1_sgpr2_sgpr3		; GCN-NEXT: renamable $vgpr0 = V_WRITELANE_B32 $sgpr1, 1, killed $vgpr0, implicit $sgpr0_sgpr1_sgpr2_sgpr3
; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr3, 3, $vgpr0, implicit $sgpr0_sgpr1_sgpr2_sgpr3		; GCN-NEXT: renamable $vgpr0 = V_WRITELANE_B32 $sgpr2, 2, killed $vgpr0, implicit $sgpr0_sgpr1_sgpr2_sgpr3
; GCN-NEXT: renamable $sgpr8 = COPY killed renamable $sgpr1		; GCN-NEXT: dead renamable $vgpr0 = V_WRITELANE_B32 $sgpr3, 3, killed $vgpr0, implicit $sgpr0_sgpr1_sgpr2_sgpr3
		; GCN-NEXT: renamable $sgpr8 = COPY renamable $sgpr1
; GCN-NEXT: $sgpr0_sgpr1 = S_XOR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec		; GCN-NEXT: $sgpr0_sgpr1 = S_XOR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec
; GCN-NEXT: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 0, 0, 0, implicit $exec :: (load (s32) from %stack.1, addrspace 5)		; GCN-NEXT: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 0, 0, 0, implicit $exec :: (load (s32) from %stack.1, addrspace 5)
; GCN-NEXT: $exec = S_MOV_B64 killed $sgpr0_sgpr1		; GCN-NEXT: $exec = S_MOV_B64 killed $sgpr0_sgpr1
; GCN-NEXT: S_ENDPGM 0, implicit $sgpr8		; GCN-NEXT: S_ENDPGM 0, implicit $sgpr8
renamable $sgpr1 = COPY $sgpr2		renamable $sgpr1 = COPY $sgpr2
SI_SPILL_S128_SAVE renamable $sgpr0_sgpr1_sgpr2_sgpr3, %stack.0, implicit $exec, implicit $sgpr32 :: (store (s128) into %stack.0, align 4, addrspace 5)		SI_SPILL_S128_SAVE renamable $sgpr0_sgpr1_sgpr2_sgpr3, %stack.0, implicit $exec, implicit $sgpr32 :: (store (s128) into %stack.0, align 4, addrspace 5)
renamable $sgpr8 = COPY killed renamable $sgpr1		renamable $sgpr8 = COPY killed renamable $sgpr1
S_ENDPGM 0, implicit $sgpr8		S_ENDPGM 0, implicit $sgpr8
Show All 16 Lines	bb.0:

; GCN-LABEL: name: spill_sgpr128_use_kill		; GCN-LABEL: name: spill_sgpr128_use_kill
; GCN: liveins: $sgpr0, $sgpr1, $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $vgpr0, $vgpr1, $vgpr2, $vgpr3		; GCN: liveins: $sgpr0, $sgpr1, $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $vgpr0, $vgpr1, $vgpr2, $vgpr3
; GCN-NEXT: {{ $}}		; GCN-NEXT: {{ $}}
; GCN-NEXT: $sgpr8_sgpr9 = S_XOR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec		; GCN-NEXT: $sgpr8_sgpr9 = S_XOR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec
; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr0, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 0, 0, 0, implicit $exec :: (store (s32) into %stack.1, addrspace 5)		; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr0, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 0, 0, 0, implicit $exec :: (store (s32) into %stack.1, addrspace 5)
; GCN-NEXT: $exec = S_MOV_B64 killed $sgpr8_sgpr9		; GCN-NEXT: $exec = S_MOV_B64 killed $sgpr8_sgpr9
; GCN-NEXT: renamable $sgpr1 = COPY $sgpr2		; GCN-NEXT: renamable $sgpr1 = COPY $sgpr2
; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr0, 0, $vgpr0, implicit-def $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr0_sgpr1_sgpr2_sgpr3		; GCN-NEXT: renamable $vgpr0 = IMPLICIT_DEF
; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr1, 1, $vgpr0, implicit $sgpr0_sgpr1_sgpr2_sgpr3		; GCN-NEXT: renamable $vgpr0 = V_WRITELANE_B32 $sgpr0, 0, killed $vgpr0, implicit-def $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr0_sgpr1_sgpr2_sgpr3
; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr2, 2, $vgpr0, implicit $sgpr0_sgpr1_sgpr2_sgpr3		; GCN-NEXT: renamable $vgpr0 = V_WRITELANE_B32 $sgpr1, 1, killed $vgpr0, implicit $sgpr0_sgpr1_sgpr2_sgpr3
; GCN-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr3, 3, $vgpr0, implicit killed $sgpr0_sgpr1_sgpr2_sgpr3		; GCN-NEXT: renamable $vgpr0 = V_WRITELANE_B32 $sgpr2, 2, killed $vgpr0, implicit $sgpr0_sgpr1_sgpr2_sgpr3
		; GCN-NEXT: dead renamable $vgpr0 = V_WRITELANE_B32 $sgpr3, 3, killed $vgpr0, implicit $sgpr0_sgpr1_sgpr2_sgpr3
; GCN-NEXT: $sgpr0_sgpr1 = S_XOR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec		; GCN-NEXT: $sgpr0_sgpr1 = S_XOR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec
; GCN-NEXT: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 0, 0, 0, implicit $exec :: (load (s32) from %stack.1, addrspace 5)		; GCN-NEXT: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 0, 0, 0, implicit $exec :: (load (s32) from %stack.1, addrspace 5)
; GCN-NEXT: $exec = S_MOV_B64 killed $sgpr0_sgpr1		; GCN-NEXT: $exec = S_MOV_B64 killed $sgpr0_sgpr1
; GCN-NEXT: S_ENDPGM 0		; GCN-NEXT: S_ENDPGM 0
renamable $sgpr1 = COPY $sgpr2		renamable $sgpr1 = COPY $sgpr2
SI_SPILL_S128_SAVE renamable killed $sgpr0_sgpr1_sgpr2_sgpr3, %stack.0, implicit $exec, implicit $sgpr32 :: (store (s128) into %stack.0, align 4, addrspace 5)		SI_SPILL_S128_SAVE renamable killed $sgpr0_sgpr1_sgpr2_sgpr3, %stack.0, implicit $exec, implicit $sgpr32 :: (store (s128) into %stack.0, align 4, addrspace 5)
S_ENDPGM 0		S_ENDPGM 0
...		...
Show All 10 Lines

body: \|		body: \|
bb.0:		bb.0:
liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6, $vgpr7		liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6, $vgpr7

; GCN-LABEL: name: spill_vgpr128_use_subreg		; GCN-LABEL: name: spill_vgpr128_use_subreg
; GCN: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6, $vgpr7		; GCN: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6, $vgpr7
; GCN-NEXT: {{ $}}		; GCN-NEXT: {{ $}}
; GCN-NEXT: renamable $vgpr1 = COPY $vgpr2		; GCN-NEXT: renamable $vgpr1 = COPY $vgpr2, implicit $exec
; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr0, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 0, 0, 0, implicit $exec, implicit-def $vgpr0_vgpr1_vgpr2_vgpr3, implicit $vgpr0_vgpr1_vgpr2_vgpr3 :: (store (s32) into %stack.0, addrspace 5)		; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr0, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 0, 0, 0, implicit $exec, implicit-def $vgpr0_vgpr1_vgpr2_vgpr3, implicit $vgpr0_vgpr1_vgpr2_vgpr3 :: (store (s32) into %stack.0, addrspace 5)
; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr1, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 4, 0, 0, implicit $exec, implicit $vgpr0_vgpr1_vgpr2_vgpr3 :: (store (s32) into %stack.0 + 4, addrspace 5)		; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr1, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 4, 0, 0, implicit $exec, implicit $vgpr0_vgpr1_vgpr2_vgpr3 :: (store (s32) into %stack.0 + 4, addrspace 5)
; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr2, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 8, 0, 0, implicit $exec, implicit $vgpr0_vgpr1_vgpr2_vgpr3 :: (store (s32) into %stack.0 + 8, addrspace 5)		; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr2, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 8, 0, 0, implicit $exec, implicit $vgpr0_vgpr1_vgpr2_vgpr3 :: (store (s32) into %stack.0 + 8, addrspace 5)
; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr3, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 12, 0, 0, implicit $exec, implicit $vgpr0_vgpr1_vgpr2_vgpr3 :: (store (s32) into %stack.0 + 12, addrspace 5)		; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr3, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 12, 0, 0, implicit $exec, implicit $vgpr0_vgpr1_vgpr2_vgpr3 :: (store (s32) into %stack.0 + 12, addrspace 5)
; GCN-NEXT: renamable $vgpr8 = COPY killed renamable $vgpr1		; GCN-NEXT: renamable $vgpr8 = COPY $vgpr2, implicit $exec
; GCN-NEXT: S_ENDPGM 0, implicit $vgpr8		; GCN-NEXT: S_ENDPGM 0, implicit $vgpr8
renamable $vgpr1 = COPY $vgpr2		renamable $vgpr1 = COPY $vgpr2
SI_SPILL_V128_SAVE renamable $vgpr0_vgpr1_vgpr2_vgpr3, %stack.0, $sgpr32, 0, implicit $exec :: (store (s128) into %stack.0, align 4, addrspace 5)		SI_SPILL_V128_SAVE renamable $vgpr0_vgpr1_vgpr2_vgpr3, %stack.0, $sgpr32, 0, implicit $exec :: (store (s128) into %stack.0, align 4, addrspace 5)
renamable $vgpr8 = COPY killed renamable $vgpr1		renamable $vgpr8 = COPY killed renamable $vgpr1
S_ENDPGM 0, implicit $vgpr8		S_ENDPGM 0, implicit $vgpr8
...		...

---		---
name: spill_vgpr128_use_kill		name: spill_vgpr128_use_kill
tracksRegLiveness: true		tracksRegLiveness: true
machineFunctionInfo:		machineFunctionInfo:
scratchRSrcReg: $sgpr100_sgpr101_sgpr102_sgpr103		scratchRSrcReg: $sgpr100_sgpr101_sgpr102_sgpr103
stackPtrOffsetReg: $sgpr32		stackPtrOffsetReg: $sgpr32

stack:		stack:
- { id: 0, type: spill-slot, size: 16, alignment: 4 }		- { id: 0, type: spill-slot, size: 16, alignment: 4 }

body: \|		body: \|
bb.0:		bb.0:
liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6, $vgpr7		liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6, $vgpr7

; GCN-LABEL: name: spill_vgpr128_use_kill		; GCN-LABEL: name: spill_vgpr128_use_kill
; GCN: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6, $vgpr7		; GCN: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6, $vgpr7
; GCN-NEXT: {{ $}}		; GCN-NEXT: {{ $}}
; GCN-NEXT: renamable $vgpr1 = COPY $vgpr2		; GCN-NEXT: renamable $vgpr1 = COPY $vgpr2, implicit $exec
; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr0, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 0, 0, 0, implicit $exec, implicit-def $vgpr0_vgpr1_vgpr2_vgpr3, implicit $vgpr0_vgpr1_vgpr2_vgpr3 :: (store (s32) into %stack.0, addrspace 5)		; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr0, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 0, 0, 0, implicit $exec, implicit-def $vgpr0_vgpr1_vgpr2_vgpr3, implicit $vgpr0_vgpr1_vgpr2_vgpr3 :: (store (s32) into %stack.0, addrspace 5)
; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr1, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 4, 0, 0, implicit $exec, implicit $vgpr0_vgpr1_vgpr2_vgpr3 :: (store (s32) into %stack.0 + 4, addrspace 5)		; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr1, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 4, 0, 0, implicit $exec, implicit $vgpr0_vgpr1_vgpr2_vgpr3 :: (store (s32) into %stack.0 + 4, addrspace 5)
; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr2, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 8, 0, 0, implicit $exec, implicit $vgpr0_vgpr1_vgpr2_vgpr3 :: (store (s32) into %stack.0 + 8, addrspace 5)		; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr2, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 8, 0, 0, implicit $exec, implicit $vgpr0_vgpr1_vgpr2_vgpr3 :: (store (s32) into %stack.0 + 8, addrspace 5)
; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr3, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 12, 0, 0, implicit $exec, implicit killed $vgpr0_vgpr1_vgpr2_vgpr3 :: (store (s32) into %stack.0 + 12, addrspace 5)		; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr3, $sgpr100_sgpr101_sgpr102_sgpr103, $sgpr32, 12, 0, 0, implicit $exec, implicit $vgpr0_vgpr1_vgpr2_vgpr3 :: (store (s32) into %stack.0 + 12, addrspace 5)
; GCN-NEXT: S_ENDPGM 0		; GCN-NEXT: S_ENDPGM 0
renamable $vgpr1 = COPY $vgpr2		renamable $vgpr1 = COPY $vgpr2
SI_SPILL_V128_SAVE renamable killed $vgpr0_vgpr1_vgpr2_vgpr3, %stack.0, $sgpr32, 0, implicit $exec :: (store (s128) into %stack.0, align 4, addrspace 5)		SI_SPILL_V128_SAVE renamable killed $vgpr0_vgpr1_vgpr2_vgpr3, %stack.0, $sgpr32, 0, implicit $exec :: (store (s128) into %stack.0, align 4, addrspace 5)
S_ENDPGM 0		S_ENDPGM 0
...		...

llvm/test/CodeGen/AMDGPU/spill-scavenge-offset.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 10,079 Lines • ▼ Show 20 Lines
	; GFX6-NEXT: s_mov_b32 s41, SCRATCH_RSRC_DWORD1			; GFX6-NEXT: s_mov_b32 s41, SCRATCH_RSRC_DWORD1
	; GFX6-NEXT: s_mov_b32 s42, -1			; GFX6-NEXT: s_mov_b32 s42, -1
	; GFX6-NEXT: s_mov_b32 s43, 0xe8f000			; GFX6-NEXT: s_mov_b32 s43, 0xe8f000
	; GFX6-NEXT: s_add_u32 s40, s40, s3			; GFX6-NEXT: s_add_u32 s40, s40, s3
	; GFX6-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x9			; GFX6-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x9
	; GFX6-NEXT: v_mbcnt_lo_u32_b32_e64 v0, -1, 0			; GFX6-NEXT: v_mbcnt_lo_u32_b32_e64 v0, -1, 0
	; GFX6-NEXT: v_mbcnt_hi_u32_b32_e32 v5, -1, v0			; GFX6-NEXT: v_mbcnt_hi_u32_b32_e32 v5, -1, v0
	; GFX6-NEXT: v_mov_b32_e32 v6, 0			; GFX6-NEXT: v_mov_b32_e32 v6, 0
	; GFX6-NEXT: s_mov_b32 s38, 0			; GFX6-NEXT: s_mov_b32 s6, 0
	; GFX6-NEXT: s_mov_b32 s39, 0xf000			; GFX6-NEXT: s_mov_b32 s7, 0xf000
	; GFX6-NEXT: s_waitcnt lgkmcnt(0)			; GFX6-NEXT: s_waitcnt lgkmcnt(0)
	; GFX6-NEXT: s_mov_b64 s[36:37], s[2:3]			; GFX6-NEXT: s_mov_b64 s[4:5], s[2:3]
	; GFX6-NEXT: v_lshlrev_b32_e32 v7, 8, v5			; GFX6-NEXT: v_lshlrev_b32_e32 v7, 8, v5
	; GFX6-NEXT: v_mov_b32_e32 v8, v6			; GFX6-NEXT: v_mov_b32_e32 v8, v6
	; GFX6-NEXT: buffer_load_dwordx4 v[0:3], v[7:8], s[36:39], 0 addr64 offset:240			; GFX6-NEXT: buffer_load_dwordx4 v[0:3], v[7:8], s[4:7], 0 addr64 offset:240
	; GFX6-NEXT: s_addc_u32 s41, s41, 0			; GFX6-NEXT: s_addc_u32 s41, s41, 0
				; GFX6-NEXT: s_mov_b32 s2, 0x83800
				; GFX6-NEXT: s_mov_b64 s[34:35], exec
				; GFX6-NEXT: s_waitcnt vmcnt(0)
				; GFX6-NEXT: buffer_store_dword v0, off, s[40:43], s2 ; 4-byte Folded Spill
				; GFX6-NEXT: s_waitcnt vmcnt(0)
				; GFX6-NEXT: buffer_store_dword v1, off, s[40:43], s2 offset:4 ; 4-byte Folded Spill
				; GFX6-NEXT: buffer_store_dword v2, off, s[40:43], s2 offset:8 ; 4-byte Folded Spill
				; GFX6-NEXT: buffer_store_dword v3, off, s[40:43], s2 offset:12 ; 4-byte Folded Spill
				; GFX6-NEXT: s_waitcnt expcnt(0)
				; GFX6-NEXT: buffer_load_dwordx4 v[0:3], v[7:8], s[4:7], 0 addr64 offset:224
	; GFX6-NEXT: s_mov_b32 s2, 0x83400			; GFX6-NEXT: s_mov_b32 s2, 0x83400
	; GFX6-NEXT: s_mov_b64 s[44:45], exec
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: buffer_store_dword v0, off, s[40:43], s2 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v0, off, s[40:43], s2 ; 4-byte Folded Spill
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: buffer_store_dword v1, off, s[40:43], s2 offset:4 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v1, off, s[40:43], s2 offset:4 ; 4-byte Folded Spill
	; GFX6-NEXT: buffer_store_dword v2, off, s[40:43], s2 offset:8 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v2, off, s[40:43], s2 offset:8 ; 4-byte Folded Spill
	; GFX6-NEXT: buffer_store_dword v3, off, s[40:43], s2 offset:12 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v3, off, s[40:43], s2 offset:12 ; 4-byte Folded Spill
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dwordx4 v[0:3], v[7:8], s[36:39], 0 addr64 offset:224			; GFX6-NEXT: buffer_load_dwordx4 v[0:3], v[7:8], s[4:7], 0 addr64 offset:208
	; GFX6-NEXT: s_mov_b32 s2, 0x83000			; GFX6-NEXT: s_mov_b32 s2, 0x83000
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: buffer_store_dword v0, off, s[40:43], s2 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v0, off, s[40:43], s2 ; 4-byte Folded Spill
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: buffer_store_dword v1, off, s[40:43], s2 offset:4 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v1, off, s[40:43], s2 offset:4 ; 4-byte Folded Spill
	; GFX6-NEXT: buffer_store_dword v2, off, s[40:43], s2 offset:8 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v2, off, s[40:43], s2 offset:8 ; 4-byte Folded Spill
	; GFX6-NEXT: buffer_store_dword v3, off, s[40:43], s2 offset:12 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v3, off, s[40:43], s2 offset:12 ; 4-byte Folded Spill
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dwordx4 v[0:3], v[7:8], s[36:39], 0 addr64 offset:208			; GFX6-NEXT: buffer_load_dwordx4 v[0:3], v[7:8], s[4:7], 0 addr64 offset:192
	; GFX6-NEXT: s_mov_b32 s2, 0x82c00			; GFX6-NEXT: s_mov_b32 s2, 0x82c00
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: buffer_store_dword v0, off, s[40:43], s2 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v0, off, s[40:43], s2 ; 4-byte Folded Spill
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: buffer_store_dword v1, off, s[40:43], s2 offset:4 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v1, off, s[40:43], s2 offset:4 ; 4-byte Folded Spill
	; GFX6-NEXT: buffer_store_dword v2, off, s[40:43], s2 offset:8 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v2, off, s[40:43], s2 offset:8 ; 4-byte Folded Spill
	; GFX6-NEXT: buffer_store_dword v3, off, s[40:43], s2 offset:12 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v3, off, s[40:43], s2 offset:12 ; 4-byte Folded Spill
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dwordx4 v[0:3], v[7:8], s[36:39], 0 addr64 offset:192			; GFX6-NEXT: buffer_load_dwordx4 v[0:3], v[7:8], s[4:7], 0 addr64 offset:176
	; GFX6-NEXT: s_mov_b32 s2, 0x82800			; GFX6-NEXT: s_mov_b32 s2, 0x82800
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: buffer_store_dword v0, off, s[40:43], s2 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v0, off, s[40:43], s2 ; 4-byte Folded Spill
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: buffer_store_dword v1, off, s[40:43], s2 offset:4 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v1, off, s[40:43], s2 offset:4 ; 4-byte Folded Spill
	; GFX6-NEXT: buffer_store_dword v2, off, s[40:43], s2 offset:8 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v2, off, s[40:43], s2 offset:8 ; 4-byte Folded Spill
	; GFX6-NEXT: buffer_store_dword v3, off, s[40:43], s2 offset:12 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v3, off, s[40:43], s2 offset:12 ; 4-byte Folded Spill
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dwordx4 v[0:3], v[7:8], s[36:39], 0 addr64 offset:176			; GFX6-NEXT: buffer_load_dwordx4 v[0:3], v[7:8], s[4:7], 0 addr64 offset:160
	; GFX6-NEXT: s_mov_b32 s2, 0x82400			; GFX6-NEXT: s_mov_b32 s2, 0x82400
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: buffer_store_dword v0, off, s[40:43], s2 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v0, off, s[40:43], s2 ; 4-byte Folded Spill
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: buffer_store_dword v1, off, s[40:43], s2 offset:4 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v1, off, s[40:43], s2 offset:4 ; 4-byte Folded Spill
	; GFX6-NEXT: buffer_store_dword v2, off, s[40:43], s2 offset:8 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v2, off, s[40:43], s2 offset:8 ; 4-byte Folded Spill
	; GFX6-NEXT: buffer_store_dword v3, off, s[40:43], s2 offset:12 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v3, off, s[40:43], s2 offset:12 ; 4-byte Folded Spill
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dwordx4 v[0:3], v[7:8], s[36:39], 0 addr64 offset:160			; GFX6-NEXT: buffer_load_dwordx4 v[0:3], v[7:8], s[4:7], 0 addr64 offset:144
	; GFX6-NEXT: s_mov_b32 s2, 0x82000			; GFX6-NEXT: s_mov_b32 s2, 0x82000
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: buffer_store_dword v0, off, s[40:43], s2 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v0, off, s[40:43], s2 ; 4-byte Folded Spill
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: buffer_store_dword v1, off, s[40:43], s2 offset:4 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v1, off, s[40:43], s2 offset:4 ; 4-byte Folded Spill
	; GFX6-NEXT: buffer_store_dword v2, off, s[40:43], s2 offset:8 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v2, off, s[40:43], s2 offset:8 ; 4-byte Folded Spill
	; GFX6-NEXT: buffer_store_dword v3, off, s[40:43], s2 offset:12 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v3, off, s[40:43], s2 offset:12 ; 4-byte Folded Spill
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dwordx4 v[0:3], v[7:8], s[36:39], 0 addr64 offset:144			; GFX6-NEXT: buffer_load_dwordx4 v[0:3], v[7:8], s[4:7], 0 addr64 offset:128
	; GFX6-NEXT: s_mov_b32 s2, 0x81c00			; GFX6-NEXT: s_mov_b32 s2, 0x81c00
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: buffer_store_dword v0, off, s[40:43], s2 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v0, off, s[40:43], s2 ; 4-byte Folded Spill
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: buffer_store_dword v1, off, s[40:43], s2 offset:4 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v1, off, s[40:43], s2 offset:4 ; 4-byte Folded Spill
	; GFX6-NEXT: buffer_store_dword v2, off, s[40:43], s2 offset:8 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v2, off, s[40:43], s2 offset:8 ; 4-byte Folded Spill
	; GFX6-NEXT: buffer_store_dword v3, off, s[40:43], s2 offset:12 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v3, off, s[40:43], s2 offset:12 ; 4-byte Folded Spill
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dwordx4 v[0:3], v[7:8], s[36:39], 0 addr64 offset:128			; GFX6-NEXT: buffer_load_dwordx4 v[0:3], v[7:8], s[4:7], 0 addr64 offset:112
	; GFX6-NEXT: s_mov_b32 s2, 0x81800			; GFX6-NEXT: s_mov_b32 s2, 0x81800
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: buffer_store_dword v0, off, s[40:43], s2 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v0, off, s[40:43], s2 ; 4-byte Folded Spill
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: buffer_store_dword v1, off, s[40:43], s2 offset:4 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v1, off, s[40:43], s2 offset:4 ; 4-byte Folded Spill
	; GFX6-NEXT: buffer_store_dword v2, off, s[40:43], s2 offset:8 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v2, off, s[40:43], s2 offset:8 ; 4-byte Folded Spill
	; GFX6-NEXT: buffer_store_dword v3, off, s[40:43], s2 offset:12 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v3, off, s[40:43], s2 offset:12 ; 4-byte Folded Spill
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dwordx4 v[0:3], v[7:8], s[36:39], 0 addr64 offset:112			; GFX6-NEXT: buffer_load_dwordx4 v[0:3], v[7:8], s[4:7], 0 addr64 offset:96
	; GFX6-NEXT: s_mov_b32 s2, 0x81400			; GFX6-NEXT: s_mov_b32 s2, 0x81400
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: buffer_store_dword v0, off, s[40:43], s2 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v0, off, s[40:43], s2 ; 4-byte Folded Spill
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: buffer_store_dword v1, off, s[40:43], s2 offset:4 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v1, off, s[40:43], s2 offset:4 ; 4-byte Folded Spill
	; GFX6-NEXT: buffer_store_dword v2, off, s[40:43], s2 offset:8 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v2, off, s[40:43], s2 offset:8 ; 4-byte Folded Spill
	; GFX6-NEXT: buffer_store_dword v3, off, s[40:43], s2 offset:12 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v3, off, s[40:43], s2 offset:12 ; 4-byte Folded Spill
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dwordx4 v[0:3], v[7:8], s[36:39], 0 addr64 offset:96			; GFX6-NEXT: buffer_load_dwordx4 v[0:3], v[7:8], s[4:7], 0 addr64 offset:80
	; GFX6-NEXT: s_mov_b32 s2, 0x81000			; GFX6-NEXT: s_mov_b32 s2, 0x81000
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: buffer_store_dword v0, off, s[40:43], s2 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v0, off, s[40:43], s2 ; 4-byte Folded Spill
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: buffer_store_dword v1, off, s[40:43], s2 offset:4 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v1, off, s[40:43], s2 offset:4 ; 4-byte Folded Spill
	; GFX6-NEXT: buffer_store_dword v2, off, s[40:43], s2 offset:8 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v2, off, s[40:43], s2 offset:8 ; 4-byte Folded Spill
	; GFX6-NEXT: buffer_store_dword v3, off, s[40:43], s2 offset:12 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v3, off, s[40:43], s2 offset:12 ; 4-byte Folded Spill
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dwordx4 v[0:3], v[7:8], s[36:39], 0 addr64 offset:80			; GFX6-NEXT: buffer_load_dwordx4 v[0:3], v[7:8], s[4:7], 0 addr64 offset:64
	; GFX6-NEXT: s_mov_b32 s2, 0x80c00			; GFX6-NEXT: s_mov_b32 s2, 0x80800
	; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: buffer_store_dword v0, off, s[40:43], s2 ; 4-byte Folded Spill
	; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: buffer_store_dword v1, off, s[40:43], s2 offset:4 ; 4-byte Folded Spill
	; GFX6-NEXT: buffer_store_dword v2, off, s[40:43], s2 offset:8 ; 4-byte Folded Spill
	; GFX6-NEXT: buffer_store_dword v3, off, s[40:43], s2 offset:12 ; 4-byte Folded Spill
	; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dwordx4 v[0:3], v[7:8], s[36:39], 0 addr64 offset:64
	; GFX6-NEXT: s_mov_b32 s2, 0x80400
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: buffer_store_dword v0, off, s[40:43], s2 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v0, off, s[40:43], s2 ; 4-byte Folded Spill
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: buffer_store_dword v1, off, s[40:43], s2 offset:4 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v1, off, s[40:43], s2 offset:4 ; 4-byte Folded Spill
	; GFX6-NEXT: buffer_store_dword v2, off, s[40:43], s2 offset:8 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v2, off, s[40:43], s2 offset:8 ; 4-byte Folded Spill
	; GFX6-NEXT: buffer_store_dword v3, off, s[40:43], s2 offset:12 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v3, off, s[40:43], s2 offset:12 ; 4-byte Folded Spill
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dwordx4 v[0:3], v[7:8], s[36:39], 0 addr64			; GFX6-NEXT: buffer_load_dwordx4 v[0:3], v[7:8], s[4:7], 0 addr64
	; GFX6-NEXT: buffer_load_dwordx4 v[9:12], v[7:8], s[36:39], 0 addr64 offset:16			; GFX6-NEXT: buffer_load_dwordx4 v[9:12], v[7:8], s[4:7], 0 addr64 offset:16
	; GFX6-NEXT: s_mov_b32 s2, 0x80800			; GFX6-NEXT: s_mov_b32 s2, 0x80c00
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: buffer_store_dword v9, off, s[40:43], s2 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v9, off, s[40:43], s2 ; 4-byte Folded Spill
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: buffer_store_dword v10, off, s[40:43], s2 offset:4 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v10, off, s[40:43], s2 offset:4 ; 4-byte Folded Spill
	; GFX6-NEXT: buffer_store_dword v11, off, s[40:43], s2 offset:8 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v11, off, s[40:43], s2 offset:8 ; 4-byte Folded Spill
	; GFX6-NEXT: buffer_store_dword v12, off, s[40:43], s2 offset:12 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v12, off, s[40:43], s2 offset:12 ; 4-byte Folded Spill
	; GFX6-NEXT: buffer_load_dwordx4 v[13:16], v[7:8], s[36:39], 0 addr64 offset:32			; GFX6-NEXT: buffer_load_dwordx4 v[13:16], v[7:8], s[4:7], 0 addr64 offset:32
	; GFX6-NEXT: buffer_load_dwordx4 v[17:20], v[7:8], s[36:39], 0 addr64 offset:48			; GFX6-NEXT: s_mov_b64 s[2:3], s[6:7]
				; GFX6-NEXT: s_waitcnt expcnt(3)
				; GFX6-NEXT: s_mov_b64 exec, 15
				; GFX6-NEXT: buffer_store_dword v9, off, s[40:43], 0
				; GFX6-NEXT: s_waitcnt expcnt(0)
				; GFX6-NEXT: v_writelane_b32 v9, s0, 0
				; GFX6-NEXT: v_writelane_b32 v9, s1, 1
				; GFX6-NEXT: v_writelane_b32 v9, s2, 2
				; GFX6-NEXT: v_writelane_b32 v9, s3, 3
				; GFX6-NEXT: s_mov_b32 s8, 0x80400
				; GFX6-NEXT: buffer_store_dword v9, off, s[40:43], s8 ; 4-byte Folded Spill
				; GFX6-NEXT: s_waitcnt expcnt(0)
				; GFX6-NEXT: buffer_load_dword v9, off, s[40:43], 0
				; GFX6-NEXT: s_waitcnt vmcnt(0)
				; GFX6-NEXT: s_mov_b64 exec, s[34:35]
				; GFX6-NEXT: buffer_load_dwordx4 v[17:20], v[7:8], s[4:7], 0 addr64 offset:48
	; GFX6-NEXT: v_lshlrev_b32_e32 v4, 13, v0			; GFX6-NEXT: v_lshlrev_b32_e32 v4, 13, v0
	; GFX6-NEXT: v_add_i32_e32 v4, vcc, 16, v4			; GFX6-NEXT: v_add_i32_e32 v4, vcc, 16, v4
	; GFX6-NEXT: v_mov_b32_e32 v7, 1			; GFX6-NEXT: v_mov_b32_e32 v7, 1
	; GFX6-NEXT: buffer_store_dword v7, v4, s[40:43], 0 offen			; GFX6-NEXT: buffer_store_dword v7, v4, s[40:43], 0 offen
	; GFX6-NEXT: ;;#ASMSTART			; GFX6-NEXT: ;;#ASMSTART
	; GFX6-NEXT: ; def s[4:11]			; GFX6-NEXT: ; def s[4:11]
	; GFX6-NEXT: ;;#ASMEND			; GFX6-NEXT: ;;#ASMEND
				; GFX6-NEXT: s_mov_b64 s[36:37], exec
	; GFX6-NEXT: s_mov_b64 exec, 0xff			; GFX6-NEXT: s_mov_b64 exec, 0xff
	; GFX6-NEXT: buffer_store_dword v4, off, s[40:43], 0			; GFX6-NEXT: buffer_store_dword v4, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: v_writelane_b32 v4, s4, 0			; GFX6-NEXT: v_writelane_b32 v4, s4, 0
	; GFX6-NEXT: v_writelane_b32 v4, s5, 1			; GFX6-NEXT: v_writelane_b32 v4, s5, 1
	; GFX6-NEXT: v_writelane_b32 v4, s6, 2			; GFX6-NEXT: v_writelane_b32 v4, s6, 2
	; GFX6-NEXT: v_writelane_b32 v4, s7, 3			; GFX6-NEXT: v_writelane_b32 v4, s7, 3
	; GFX6-NEXT: v_writelane_b32 v4, s8, 4			; GFX6-NEXT: v_writelane_b32 v4, s8, 4
	; GFX6-NEXT: v_writelane_b32 v4, s9, 5			; GFX6-NEXT: v_writelane_b32 v4, s9, 5
	; GFX6-NEXT: v_writelane_b32 v4, s10, 6			; GFX6-NEXT: v_writelane_b32 v4, s10, 6
	; GFX6-NEXT: v_writelane_b32 v4, s11, 7			; GFX6-NEXT: v_writelane_b32 v4, s11, 7
	; GFX6-NEXT: s_mov_b32 s2, 0x83800			; GFX6-NEXT: s_mov_b32 s2, 0x83c00
	; GFX6-NEXT: buffer_store_dword v4, off, s[40:43], s2 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v4, off, s[40:43], s2 ; 4-byte Folded Spill
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dword v4, off, s[40:43], 0			; GFX6-NEXT: buffer_load_dword v4, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: s_mov_b64 exec, s[44:45]			; GFX6-NEXT: s_mov_b64 exec, s[36:37]
	; GFX6-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0			; GFX6-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
	; GFX6-NEXT: ;;#ASMSTART			; GFX6-NEXT: ;;#ASMSTART
	; GFX6-NEXT: ; def s[8:15]			; GFX6-NEXT: ; def s[8:15]
	; GFX6-NEXT: ;;#ASMEND			; GFX6-NEXT: ;;#ASMEND
	; GFX6-NEXT: ;;#ASMSTART			; GFX6-NEXT: ;;#ASMSTART
	; GFX6-NEXT: ; def s[16:23]			; GFX6-NEXT: ; def s[16:23]
	; GFX6-NEXT: ;;#ASMEND			; GFX6-NEXT: ;;#ASMEND
	; GFX6-NEXT: ;;#ASMSTART			; GFX6-NEXT: ;;#ASMSTART
	; GFX6-NEXT: ; def s[24:31]			; GFX6-NEXT: ; def s[24:31]
	; GFX6-NEXT: ;;#ASMEND			; GFX6-NEXT: ;;#ASMEND
	; GFX6-NEXT: ;;#ASMSTART			; GFX6-NEXT: ;;#ASMSTART
	; GFX6-NEXT: ; def s[4:7]			; GFX6-NEXT: ; def s[4:7]
	; GFX6-NEXT: ;;#ASMEND			; GFX6-NEXT: ;;#ASMEND
	; GFX6-NEXT: ;;#ASMSTART			; GFX6-NEXT: ;;#ASMSTART
	; GFX6-NEXT: ; def s[2:3]			; GFX6-NEXT: ; def s[2:3]
	; GFX6-NEXT: ;;#ASMEND			; GFX6-NEXT: ;;#ASMEND
	; GFX6-NEXT: ;;#ASMSTART			; GFX6-NEXT: ;;#ASMSTART
	; GFX6-NEXT: ; def s[36:37]
	; GFX6-NEXT: ;;#ASMEND
	; GFX6-NEXT: ;;#ASMSTART
	; GFX6-NEXT: ; def s33			; GFX6-NEXT: ; def s33
	; GFX6-NEXT: ;;#ASMEND			; GFX6-NEXT: ;;#ASMEND
	; GFX6-NEXT: s_and_saveexec_b64 s[34:35], vcc			; GFX6-NEXT: s_and_saveexec_b64 s[34:35], vcc
	; GFX6-NEXT: s_cbranch_execz .LBB1_2			; GFX6-NEXT: s_cbranch_execz .LBB1_2
	; GFX6-NEXT: ; %bb.1: ; %bb0			; GFX6-NEXT: ; %bb.1: ; %bb0
	; GFX6-NEXT: s_mov_b64 s[44:45], exec			; GFX6-NEXT: s_mov_b64 s[38:39], exec
	; GFX6-NEXT: s_mov_b64 exec, 0xff
	; GFX6-NEXT: buffer_store_dword v9, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: v_writelane_b32 v9, s8, 0
	; GFX6-NEXT: v_writelane_b32 v9, s9, 1
	; GFX6-NEXT: v_writelane_b32 v9, s10, 2
	; GFX6-NEXT: v_writelane_b32 v9, s11, 3
	; GFX6-NEXT: v_writelane_b32 v9, s12, 4
	; GFX6-NEXT: v_writelane_b32 v9, s13, 5
	; GFX6-NEXT: v_writelane_b32 v9, s14, 6
	; GFX6-NEXT: v_writelane_b32 v9, s15, 7
	; GFX6-NEXT: v_mov_b32_e32 v4, 0x2100
	; GFX6-NEXT: buffer_store_dword v9, v4, s[40:43], 0 offen ; 4-byte Folded Spill
	; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dword v9, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: s_mov_b64 exec, s[44:45]
	; GFX6-NEXT: s_mov_b64 s[44:45], exec
	; GFX6-NEXT: s_mov_b64 exec, 0xff
	; GFX6-NEXT: v_mov_b32_e32 v4, 0x20e0
	; GFX6-NEXT: buffer_store_dword v8, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dword v8, v4, s[40:43], 0 offen ; 4-byte Folded Reload
	; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: v_readlane_b32 s8, v8, 0
	; GFX6-NEXT: v_readlane_b32 s9, v8, 1
	; GFX6-NEXT: v_readlane_b32 s10, v8, 2
	; GFX6-NEXT: v_readlane_b32 s11, v8, 3
	; GFX6-NEXT: v_readlane_b32 s12, v8, 4
	; GFX6-NEXT: v_readlane_b32 s13, v8, 5
	; GFX6-NEXT: v_readlane_b32 s14, v8, 6
	; GFX6-NEXT: v_readlane_b32 s15, v8, 7
	; GFX6-NEXT: buffer_load_dword v8, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: s_mov_b64 exec, s[44:45]
	; GFX6-NEXT: s_mov_b64 s[44:45], exec
	; GFX6-NEXT: s_mov_b64 exec, 0xff			; GFX6-NEXT: s_mov_b64 exec, 0xff
	; GFX6-NEXT: buffer_store_dword v7, off, s[40:43], 0			; GFX6-NEXT: buffer_store_dword v7, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: v_writelane_b32 v7, s16, 0			; GFX6-NEXT: v_writelane_b32 v7, s8, 0
	; GFX6-NEXT: v_writelane_b32 v7, s17, 1			; GFX6-NEXT: v_writelane_b32 v7, s9, 1
	; GFX6-NEXT: v_writelane_b32 v7, s18, 2			; GFX6-NEXT: v_writelane_b32 v7, s10, 2
	; GFX6-NEXT: v_writelane_b32 v7, s19, 3			; GFX6-NEXT: v_writelane_b32 v7, s11, 3
	; GFX6-NEXT: v_writelane_b32 v7, s20, 4			; GFX6-NEXT: v_writelane_b32 v7, s12, 4
	; GFX6-NEXT: v_writelane_b32 v7, s21, 5			; GFX6-NEXT: v_writelane_b32 v7, s13, 5
	; GFX6-NEXT: v_writelane_b32 v7, s22, 6			; GFX6-NEXT: v_writelane_b32 v7, s14, 6
	; GFX6-NEXT: v_writelane_b32 v7, s23, 7			; GFX6-NEXT: v_writelane_b32 v7, s15, 7
	; GFX6-NEXT: v_mov_b32_e32 v4, 0x2120			; GFX6-NEXT: s_mov_b32 s36, 0x84400
	; GFX6-NEXT: buffer_store_dword v7, v4, s[40:43], 0 offen ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v7, off, s[40:43], s36 ; 4-byte Folded Spill
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dword v7, off, s[40:43], 0			; GFX6-NEXT: buffer_load_dword v7, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: s_mov_b64 exec, s[44:45]			; GFX6-NEXT: s_mov_b64 exec, s[38:39]
	; GFX6-NEXT: s_mov_b64 s[44:45], exec			; GFX6-NEXT: s_mov_b64 s[44:45], exec
	; GFX6-NEXT: s_mov_b64 exec, 0xff			; GFX6-NEXT: s_mov_b64 exec, 0xff
	; GFX6-NEXT: v_mov_b32_e32 v4, 0x2100			; GFX6-NEXT: s_mov_b32 s36, 0x83c00
	; GFX6-NEXT: buffer_store_dword v9, off, s[40:43], 0			; GFX6-NEXT: buffer_store_dword v4, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dword v9, v4, s[40:43], 0 offen ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v4, off, s[40:43], s36 ; 4-byte Folded Reload
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: v_readlane_b32 s16, v9, 0			; GFX6-NEXT: v_readlane_b32 s8, v4, 0
	; GFX6-NEXT: v_readlane_b32 s17, v9, 1			; GFX6-NEXT: v_readlane_b32 s9, v4, 1
	; GFX6-NEXT: v_readlane_b32 s18, v9, 2			; GFX6-NEXT: v_readlane_b32 s10, v4, 2
	; GFX6-NEXT: v_readlane_b32 s19, v9, 3			; GFX6-NEXT: v_readlane_b32 s11, v4, 3
	; GFX6-NEXT: v_readlane_b32 s20, v9, 4			; GFX6-NEXT: v_readlane_b32 s12, v4, 4
	; GFX6-NEXT: v_readlane_b32 s21, v9, 5			; GFX6-NEXT: v_readlane_b32 s13, v4, 5
	; GFX6-NEXT: v_readlane_b32 s22, v9, 6			; GFX6-NEXT: v_readlane_b32 s14, v4, 6
	; GFX6-NEXT: v_readlane_b32 s23, v9, 7			; GFX6-NEXT: v_readlane_b32 s15, v4, 7
	; GFX6-NEXT: buffer_load_dword v9, off, s[40:43], 0			; GFX6-NEXT: buffer_load_dword v4, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: s_mov_b64 exec, s[44:45]			; GFX6-NEXT: s_mov_b64 exec, s[44:45]
	; GFX6-NEXT: s_mov_b64 s[44:45], exec			; GFX6-NEXT: s_mov_b64 s[38:39], exec
	; GFX6-NEXT: s_mov_b64 exec, 0xff			; GFX6-NEXT: s_mov_b64 exec, 0xff
	; GFX6-NEXT: buffer_store_dword v8, off, s[40:43], 0			; GFX6-NEXT: buffer_store_dword v8, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: v_writelane_b32 v8, s24, 0			; GFX6-NEXT: v_writelane_b32 v8, s16, 0
	; GFX6-NEXT: v_writelane_b32 v8, s25, 1			; GFX6-NEXT: v_writelane_b32 v8, s17, 1
	; GFX6-NEXT: v_writelane_b32 v8, s26, 2			; GFX6-NEXT: v_writelane_b32 v8, s18, 2
	; GFX6-NEXT: v_writelane_b32 v8, s27, 3			; GFX6-NEXT: v_writelane_b32 v8, s19, 3
	; GFX6-NEXT: v_writelane_b32 v8, s28, 4			; GFX6-NEXT: v_writelane_b32 v8, s20, 4
	; GFX6-NEXT: v_writelane_b32 v8, s29, 5			; GFX6-NEXT: v_writelane_b32 v8, s21, 5
	; GFX6-NEXT: v_writelane_b32 v8, s30, 6			; GFX6-NEXT: v_writelane_b32 v8, s22, 6
	; GFX6-NEXT: v_writelane_b32 v8, s31, 7			; GFX6-NEXT: v_writelane_b32 v8, s23, 7
	; GFX6-NEXT: v_mov_b32_e32 v4, 0x2140			; GFX6-NEXT: s_mov_b32 s36, 0x84c00
	; GFX6-NEXT: buffer_store_dword v8, v4, s[40:43], 0 offen ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v8, off, s[40:43], s36 ; 4-byte Folded Spill
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dword v8, off, s[40:43], 0			; GFX6-NEXT: buffer_load_dword v8, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: s_mov_b64 exec, s[44:45]			; GFX6-NEXT: s_mov_b64 exec, s[38:39]
	; GFX6-NEXT: s_mov_b64 s[44:45], exec			; GFX6-NEXT: s_mov_b64 s[44:45], exec
	; GFX6-NEXT: s_mov_b64 exec, 0xff			; GFX6-NEXT: s_mov_b64 exec, 0xff
	; GFX6-NEXT: v_mov_b32_e32 v4, 0x2120			; GFX6-NEXT: s_mov_b32 s36, 0x84400
	; GFX6-NEXT: buffer_store_dword v7, off, s[40:43], 0			; GFX6-NEXT: buffer_store_dword v7, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dword v7, v4, s[40:43], 0 offen ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v7, off, s[40:43], s36 ; 4-byte Folded Reload
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: v_readlane_b32 s24, v7, 0			; GFX6-NEXT: v_readlane_b32 s16, v7, 0
	; GFX6-NEXT: v_readlane_b32 s25, v7, 1			; GFX6-NEXT: v_readlane_b32 s17, v7, 1
	; GFX6-NEXT: v_readlane_b32 s26, v7, 2			; GFX6-NEXT: v_readlane_b32 s18, v7, 2
	; GFX6-NEXT: v_readlane_b32 s27, v7, 3			; GFX6-NEXT: v_readlane_b32 s19, v7, 3
	; GFX6-NEXT: v_readlane_b32 s28, v7, 4			; GFX6-NEXT: v_readlane_b32 s20, v7, 4
	; GFX6-NEXT: v_readlane_b32 s29, v7, 5			; GFX6-NEXT: v_readlane_b32 s21, v7, 5
	; GFX6-NEXT: v_readlane_b32 s30, v7, 6			; GFX6-NEXT: v_readlane_b32 s22, v7, 6
	; GFX6-NEXT: v_readlane_b32 s31, v7, 7			; GFX6-NEXT: v_readlane_b32 s23, v7, 7
	; GFX6-NEXT: buffer_load_dword v7, off, s[40:43], 0			; GFX6-NEXT: buffer_load_dword v7, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: s_mov_b64 exec, s[44:45]			; GFX6-NEXT: s_mov_b64 exec, s[44:45]
	; GFX6-NEXT: s_mov_b64 s[44:45], exec			; GFX6-NEXT: s_mov_b64 s[38:39], exec
	; GFX6-NEXT: s_mov_b64 exec, 15			; GFX6-NEXT: s_mov_b64 exec, 0xff
	; GFX6-NEXT: buffer_store_dword v10, off, s[40:43], 0			; GFX6-NEXT: buffer_store_dword v4, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: v_writelane_b32 v10, s0, 0
	; GFX6-NEXT: v_writelane_b32 v10, s1, 1
	; GFX6-NEXT: v_writelane_b32 v10, s2, 2
	; GFX6-NEXT: v_writelane_b32 v10, s3, 3
	; GFX6-NEXT: v_mov_b32_e32 v4, 0x2160
	; GFX6-NEXT: buffer_store_dword v10, v4, s[40:43], 0 offen ; 4-byte Folded Spill
	; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dword v10, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: s_mov_b64 exec, s[44:45]
	; GFX6-NEXT: s_mov_b64 s[44:45], exec
	; GFX6-NEXT: s_mov_b64 exec, 15
	; GFX6-NEXT: buffer_store_dword v8, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: v_writelane_b32 v8, s4, 0
	; GFX6-NEXT: v_writelane_b32 v8, s5, 1
	; GFX6-NEXT: v_writelane_b32 v8, s6, 2
	; GFX6-NEXT: v_writelane_b32 v8, s7, 3
	; GFX6-NEXT: s_mov_b32 s0, 0x85c00
	; GFX6-NEXT: buffer_store_dword v8, off, s[40:43], s0 ; 4-byte Folded Spill
	; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dword v8, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: s_mov_b64 exec, s[44:45]
	; GFX6-NEXT: s_mov_b64 s[0:1], exec
	; GFX6-NEXT: s_mov_b64 exec, 3
	; GFX6-NEXT: buffer_store_dword v9, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: v_writelane_b32 v9, s2, 0			; GFX6-NEXT: v_writelane_b32 v4, s24, 0
	; GFX6-NEXT: v_writelane_b32 v9, s3, 1			; GFX6-NEXT: v_writelane_b32 v4, s25, 1
	; GFX6-NEXT: s_mov_b32 s4, 0x86600			; GFX6-NEXT: v_writelane_b32 v4, s26, 2
	; GFX6-NEXT: buffer_store_dword v9, off, s[40:43], s4 ; 4-byte Folded Spill			; GFX6-NEXT: v_writelane_b32 v4, s27, 3
				; GFX6-NEXT: v_writelane_b32 v4, s28, 4
				; GFX6-NEXT: v_writelane_b32 v4, s29, 5
				; GFX6-NEXT: v_writelane_b32 v4, s30, 6
				; GFX6-NEXT: v_writelane_b32 v4, s31, 7
				; GFX6-NEXT: s_mov_b32 s36, 0x85400
				; GFX6-NEXT: buffer_store_dword v4, off, s[40:43], s36 ; 4-byte Folded Spill
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dword v9, off, s[40:43], 0			; GFX6-NEXT: buffer_load_dword v4, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: s_mov_b64 exec, s[0:1]			; GFX6-NEXT: s_mov_b64 exec, s[38:39]
	; GFX6-NEXT: s_mov_b64 s[44:45], exec			; GFX6-NEXT: s_mov_b64 s[44:45], exec
	; GFX6-NEXT: s_mov_b64 exec, 0xff			; GFX6-NEXT: s_mov_b64 exec, 0xff
	; GFX6-NEXT: v_mov_b32_e32 v4, 0x2140			; GFX6-NEXT: s_mov_b32 s36, 0x84c00
	; GFX6-NEXT: buffer_store_dword v7, off, s[40:43], 0			; GFX6-NEXT: buffer_store_dword v9, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dword v7, v4, s[40:43], 0 offen ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v9, off, s[40:43], s36 ; 4-byte Folded Reload
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: v_readlane_b32 s0, v7, 0			; GFX6-NEXT: v_readlane_b32 s24, v9, 0
	; GFX6-NEXT: v_readlane_b32 s1, v7, 1			; GFX6-NEXT: v_readlane_b32 s25, v9, 1
	; GFX6-NEXT: v_readlane_b32 s2, v7, 2			; GFX6-NEXT: v_readlane_b32 s26, v9, 2
	; GFX6-NEXT: v_readlane_b32 s3, v7, 3			; GFX6-NEXT: v_readlane_b32 s27, v9, 3
	; GFX6-NEXT: v_readlane_b32 s4, v7, 4			; GFX6-NEXT: v_readlane_b32 s28, v9, 4
	; GFX6-NEXT: v_readlane_b32 s5, v7, 5			; GFX6-NEXT: v_readlane_b32 s29, v9, 5
	; GFX6-NEXT: v_readlane_b32 s6, v7, 6			; GFX6-NEXT: v_readlane_b32 s30, v9, 6
	; GFX6-NEXT: v_readlane_b32 s7, v7, 7			; GFX6-NEXT: v_readlane_b32 s31, v9, 7
	; GFX6-NEXT: buffer_load_dword v7, off, s[40:43], 0			; GFX6-NEXT: buffer_load_dword v9, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: s_mov_b64 exec, s[44:45]			; GFX6-NEXT: s_mov_b64 exec, s[44:45]
	; GFX6-NEXT: s_mov_b64 s[44:45], exec			; GFX6-NEXT: s_mov_b64 s[36:37], exec
	; GFX6-NEXT: s_mov_b64 exec, 15			; GFX6-NEXT: s_mov_b64 exec, 15
	; GFX6-NEXT: buffer_store_dword v8, off, s[40:43], 0			; GFX6-NEXT: buffer_store_dword v8, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: v_writelane_b32 v8, s36, 0			; GFX6-NEXT: v_writelane_b32 v8, s0, 0
	; GFX6-NEXT: v_writelane_b32 v8, s37, 1			; GFX6-NEXT: v_writelane_b32 v8, s1, 1
	; GFX6-NEXT: v_writelane_b32 v8, s38, 2			; GFX6-NEXT: v_writelane_b32 v8, s2, 2
	; GFX6-NEXT: v_writelane_b32 v8, s39, 3			; GFX6-NEXT: v_writelane_b32 v8, s3, 3
	; GFX6-NEXT: v_mov_b32_e32 v4, 0x2180			; GFX6-NEXT: s_mov_b32 s38, 0x85c00
	; GFX6-NEXT: buffer_store_dword v8, v4, s[40:43], 0 offen ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v8, off, s[40:43], s38 ; 4-byte Folded Spill
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dword v8, off, s[40:43], 0			; GFX6-NEXT: buffer_load_dword v8, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: s_mov_b64 exec, s[44:45]			; GFX6-NEXT: s_mov_b64 exec, s[36:37]
	; GFX6-NEXT: s_mov_b64 s[38:39], exec			; GFX6-NEXT: s_mov_b64 s[38:39], exec
	; GFX6-NEXT: s_mov_b64 exec, 3			; GFX6-NEXT: s_mov_b64 exec, 15
	; GFX6-NEXT: buffer_store_dword v10, off, s[40:43], 0			; GFX6-NEXT: buffer_store_dword v4, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: v_writelane_b32 v10, s36, 0			; GFX6-NEXT: v_writelane_b32 v4, s4, 0
	; GFX6-NEXT: v_writelane_b32 v10, s37, 1			; GFX6-NEXT: v_writelane_b32 v4, s5, 1
	; GFX6-NEXT: s_mov_b32 s44, 0x86400			; GFX6-NEXT: v_writelane_b32 v4, s6, 2
	; GFX6-NEXT: buffer_store_dword v10, off, s[40:43], s44 ; 4-byte Folded Spill			; GFX6-NEXT: v_writelane_b32 v4, s7, 3
				; GFX6-NEXT: s_mov_b32 s0, 0x86000
				; GFX6-NEXT: buffer_store_dword v4, off, s[40:43], s0 ; 4-byte Folded Spill
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dword v10, off, s[40:43], 0			; GFX6-NEXT: buffer_load_dword v4, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: s_mov_b64 exec, s[38:39]			; GFX6-NEXT: s_mov_b64 exec, s[38:39]
	; GFX6-NEXT: s_mov_b64 s[44:45], exec			; GFX6-NEXT: s_mov_b64 s[44:45], exec
	; GFX6-NEXT: s_mov_b64 exec, 15			; GFX6-NEXT: s_mov_b64 exec, 3
	; GFX6-NEXT: v_mov_b32_e32 v4, 0x2170			; GFX6-NEXT: buffer_store_dword v7, off, s[40:43], 0
	; GFX6-NEXT: buffer_store_dword v9, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dword v9, v4, s[40:43], 0 offen ; 4-byte Folded Reload			; GFX6-NEXT: v_writelane_b32 v7, s2, 0
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: v_writelane_b32 v7, s3, 1
	; GFX6-NEXT: v_readlane_b32 s36, v9, 0			; GFX6-NEXT: s_mov_b32 s0, 0x86400
	; GFX6-NEXT: v_readlane_b32 s37, v9, 1			; GFX6-NEXT: buffer_store_dword v7, off, s[40:43], s0 ; 4-byte Folded Spill
	; GFX6-NEXT: v_readlane_b32 s38, v9, 2			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: v_readlane_b32 s39, v9, 3			; GFX6-NEXT: buffer_load_dword v7, off, s[40:43], 0
	; GFX6-NEXT: buffer_load_dword v9, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: s_mov_b64 exec, s[44:45]			; GFX6-NEXT: s_mov_b64 exec, s[44:45]
	; GFX6-NEXT: s_not_b64 exec, exec			; GFX6-NEXT: s_mov_b64 s[36:37], exec
	; GFX6-NEXT: v_mov_b32_e32 v4, 0x2190			; GFX6-NEXT: s_mov_b64 exec, 0xff
	; GFX6-NEXT: buffer_store_dword v7, off, s[40:43], 0			; GFX6-NEXT: s_mov_b32 s38, 0x85400
				; GFX6-NEXT: buffer_store_dword v9, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dword v7, v4, s[40:43], 0 offen ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v9, off, s[40:43], s38 ; 4-byte Folded Reload
	; GFX6-NEXT: s_not_b64 exec, exec
	; GFX6-NEXT: v_mov_b32_e32 v4, 0x2190
	; GFX6-NEXT: buffer_load_dword v7, v4, s[40:43], 0 offen ; 4-byte Folded Reload
	; GFX6-NEXT: s_not_b64 exec, exec
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: v_readlane_b32 s44, v7, 0			; GFX6-NEXT: v_readlane_b32 s0, v9, 0
	; GFX6-NEXT: v_readlane_b32 s45, v7, 1			; GFX6-NEXT: v_readlane_b32 s1, v9, 1
	; GFX6-NEXT: buffer_load_dword v7, off, s[40:43], 0			; GFX6-NEXT: v_readlane_b32 s2, v9, 2
				; GFX6-NEXT: v_readlane_b32 s3, v9, 3
				; GFX6-NEXT: v_readlane_b32 s4, v9, 4
				; GFX6-NEXT: v_readlane_b32 s5, v9, 5
				; GFX6-NEXT: v_readlane_b32 s6, v9, 6
				; GFX6-NEXT: v_readlane_b32 s7, v9, 7
				; GFX6-NEXT: buffer_load_dword v9, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: s_not_b64 exec, exec			; GFX6-NEXT: s_mov_b64 exec, s[36:37]
	; GFX6-NEXT: s_mov_b64 vcc, s[34:35]			; GFX6-NEXT: s_mov_b64 s[44:45], exec
	; GFX6-NEXT: s_not_b64 exec, exec			; GFX6-NEXT: s_mov_b64 exec, 15
	; GFX6-NEXT: v_mov_b32_e32 v4, 0x2198			; GFX6-NEXT: v_mov_b32_e32 v4, 0x2180
	; GFX6-NEXT: buffer_store_dword v8, off, s[40:43], 0			; GFX6-NEXT: buffer_store_dword v8, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dword v8, v4, s[40:43], 0 offen ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v8, v4, s[40:43], 0 offen ; 4-byte Folded Reload
	; GFX6-NEXT: s_not_b64 exec, exec
	; GFX6-NEXT: v_mov_b32_e32 v4, 0x2198
	; GFX6-NEXT: buffer_load_dword v8, v4, s[40:43], 0 offen ; 4-byte Folded Reload
	; GFX6-NEXT: s_not_b64 exec, exec
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: v_readlane_b32 s34, v8, 0			; GFX6-NEXT: v_readlane_b32 s36, v8, 0
	; GFX6-NEXT: v_readlane_b32 s35, v8, 1			; GFX6-NEXT: v_readlane_b32 s37, v8, 1
				; GFX6-NEXT: v_readlane_b32 s38, v8, 2
				; GFX6-NEXT: v_readlane_b32 s39, v8, 3
	; GFX6-NEXT: buffer_load_dword v8, off, s[40:43], 0			; GFX6-NEXT: buffer_load_dword v8, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: s_not_b64 exec, exec			; GFX6-NEXT: s_mov_b64 exec, s[44:45]
	; GFX6-NEXT: ;;#ASMSTART			; GFX6-NEXT: s_mov_b64 vcc, s[34:35]
	; GFX6-NEXT: ; use s[8:15],s[16:23],s[24:31],s[0:7],s[36:39],s[34:35],s[44:45]			; GFX6-NEXT: s_mov_b64 s[44:45], exec
	; GFX6-NEXT: ;;#ASMEND			; GFX6-NEXT: s_mov_b64 exec, 3
	; GFX6-NEXT: s_mov_b64 s[34:35], vcc			; GFX6-NEXT: v_mov_b32_e32 v7, 0x2190
	; GFX6-NEXT: s_mov_b64 s[8:9], exec
	; GFX6-NEXT: s_mov_b64 exec, 15
	; GFX6-NEXT: s_mov_b32 s0, 0x86000
	; GFX6-NEXT: buffer_store_dword v4, off, s[40:43], 0			; GFX6-NEXT: buffer_store_dword v4, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dword v4, off, s[40:43], s0 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v4, v7, s[40:43], 0 offen ; 4-byte Folded Reload
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: v_readlane_b32 s36, v4, 0			; GFX6-NEXT: v_readlane_b32 s34, v4, 0
	; GFX6-NEXT: v_readlane_b32 s37, v4, 1			; GFX6-NEXT: v_readlane_b32 s35, v4, 1
	; GFX6-NEXT: v_readlane_b32 s38, v4, 2
	; GFX6-NEXT: v_readlane_b32 s39, v4, 3
	; GFX6-NEXT: buffer_load_dword v4, off, s[40:43], 0			; GFX6-NEXT: buffer_load_dword v4, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: s_mov_b64 exec, s[8:9]			; GFX6-NEXT: s_mov_b64 exec, s[44:45]
				; GFX6-NEXT: ;;#ASMSTART
				; GFX6-NEXT: ; use s[8:15],s[16:23],s[24:31],s[0:7],s[36:39],s[34:35]
				; GFX6-NEXT: ;;#ASMEND
				; GFX6-NEXT: s_mov_b64 s[34:35], vcc
	; GFX6-NEXT: s_mov_b64 s[4:5], exec			; GFX6-NEXT: s_mov_b64 s[4:5], exec
	; GFX6-NEXT: s_mov_b64 exec, 15			; GFX6-NEXT: s_mov_b64 exec, 15
	; GFX6-NEXT: s_mov_b32 s6, 0x85800			; GFX6-NEXT: s_mov_b32 s6, 0x85c00
	; GFX6-NEXT: buffer_store_dword v7, off, s[40:43], 0			; GFX6-NEXT: buffer_store_dword v7, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dword v7, off, s[40:43], s6 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v7, off, s[40:43], s6 ; 4-byte Folded Reload
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: v_readlane_b32 s0, v7, 0			; GFX6-NEXT: v_readlane_b32 s0, v7, 0
	; GFX6-NEXT: v_readlane_b32 s1, v7, 1			; GFX6-NEXT: v_readlane_b32 s1, v7, 1
	; GFX6-NEXT: v_readlane_b32 s2, v7, 2			; GFX6-NEXT: v_readlane_b32 s2, v7, 2
	; GFX6-NEXT: v_readlane_b32 s3, v7, 3			; GFX6-NEXT: v_readlane_b32 s3, v7, 3
	; GFX6-NEXT: buffer_load_dword v7, off, s[40:43], 0			; GFX6-NEXT: buffer_load_dword v7, off, s[40:43], 0
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: s_mov_b64 exec, s[4:5]			; GFX6-NEXT: s_mov_b64 exec, s[4:5]
	; GFX6-NEXT: s_mov_b32 s2, 0x83800			; GFX6-NEXT: s_mov_b32 s2, 0x83c00
	; GFX6-NEXT: buffer_store_dword v0, off, s[40:43], s2 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v0, off, s[40:43], s2 ; 4-byte Folded Spill
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: buffer_store_dword v1, off, s[40:43], s2 offset:4 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v1, off, s[40:43], s2 offset:4 ; 4-byte Folded Spill
	; GFX6-NEXT: buffer_store_dword v2, off, s[40:43], s2 offset:8 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v2, off, s[40:43], s2 offset:8 ; 4-byte Folded Spill
	; GFX6-NEXT: buffer_store_dword v3, off, s[40:43], s2 offset:12 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v3, off, s[40:43], s2 offset:12 ; 4-byte Folded Spill
	; GFX6-NEXT: s_mov_b32 s2, 0x84000			; GFX6-NEXT: s_mov_b32 s2, 0x84400
	; GFX6-NEXT: buffer_store_dword v13, off, s[40:43], s2 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v13, off, s[40:43], s2 ; 4-byte Folded Spill
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: buffer_store_dword v14, off, s[40:43], s2 offset:4 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v14, off, s[40:43], s2 offset:4 ; 4-byte Folded Spill
	; GFX6-NEXT: buffer_store_dword v15, off, s[40:43], s2 offset:8 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v15, off, s[40:43], s2 offset:8 ; 4-byte Folded Spill
	; GFX6-NEXT: buffer_store_dword v16, off, s[40:43], s2 offset:12 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v16, off, s[40:43], s2 offset:12 ; 4-byte Folded Spill
	; GFX6-NEXT: s_mov_b32 s2, 0x84800			; GFX6-NEXT: s_mov_b32 s2, 0x84c00
	; GFX6-NEXT: buffer_store_dword v17, off, s[40:43], s2 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v17, off, s[40:43], s2 ; 4-byte Folded Spill
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: buffer_store_dword v18, off, s[40:43], s2 offset:4 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v18, off, s[40:43], s2 offset:4 ; 4-byte Folded Spill
	; GFX6-NEXT: buffer_store_dword v19, off, s[40:43], s2 offset:8 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v19, off, s[40:43], s2 offset:8 ; 4-byte Folded Spill
	; GFX6-NEXT: buffer_store_dword v20, off, s[40:43], s2 offset:12 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v20, off, s[40:43], s2 offset:12 ; 4-byte Folded Spill
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: ;;#ASMSTART			; GFX6-NEXT: ;;#ASMSTART
	; GFX6-NEXT: ;;#ASMEND			; GFX6-NEXT: ;;#ASMEND
	; GFX6-NEXT: s_mov_b32 s2, 0x84800			; GFX6-NEXT: s_mov_b32 s2, 0x84c00
	; GFX6-NEXT: buffer_load_dword v17, off, s[40:43], s2 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v17, off, s[40:43], s2 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v18, off, s[40:43], s2 offset:4 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v18, off, s[40:43], s2 offset:4 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v19, off, s[40:43], s2 offset:8 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v19, off, s[40:43], s2 offset:8 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v20, off, s[40:43], s2 offset:12 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v20, off, s[40:43], s2 offset:12 ; 4-byte Folded Reload
	; GFX6-NEXT: s_mov_b32 s2, 0x84000			; GFX6-NEXT: s_mov_b32 s2, 0x84400
	; GFX6-NEXT: buffer_load_dword v13, off, s[40:43], s2 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v13, off, s[40:43], s2 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v14, off, s[40:43], s2 offset:4 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v14, off, s[40:43], s2 offset:4 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v15, off, s[40:43], s2 offset:8 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v15, off, s[40:43], s2 offset:8 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v16, off, s[40:43], s2 offset:12 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v16, off, s[40:43], s2 offset:12 ; 4-byte Folded Reload
	; GFX6-NEXT: s_mov_b32 s2, 0x83800			; GFX6-NEXT: s_mov_b32 s2, 0x83c00
	; GFX6-NEXT: buffer_load_dword v0, off, s[40:43], s2 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v0, off, s[40:43], s2 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v1, off, s[40:43], s2 offset:4 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v1, off, s[40:43], s2 offset:4 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v2, off, s[40:43], s2 offset:8 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v2, off, s[40:43], s2 offset:8 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v3, off, s[40:43], s2 offset:12 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v3, off, s[40:43], s2 offset:12 ; 4-byte Folded Reload
	; GFX6-NEXT: ;;#ASMSTART			; GFX6-NEXT: ;;#ASMSTART
	; GFX6-NEXT: ;;#ASMEND			; GFX6-NEXT: ;;#ASMEND
	; GFX6-NEXT: ;;#ASMSTART			; GFX6-NEXT: ;;#ASMSTART
	; GFX6-NEXT: ;;#ASMEND			; GFX6-NEXT: ;;#ASMEND
	; GFX6-NEXT: ;;#ASMSTART			; GFX6-NEXT: ;;#ASMSTART
	; GFX6-NEXT: ;;#ASMEND			; GFX6-NEXT: ;;#ASMEND
	; GFX6-NEXT: ;;#ASMSTART			; GFX6-NEXT: ;;#ASMSTART
	; GFX6-NEXT: ;;#ASMEND			; GFX6-NEXT: ;;#ASMEND
	; GFX6-NEXT: ;;#ASMSTART			; GFX6-NEXT: ;;#ASMSTART
	; GFX6-NEXT: ;;#ASMEND			; GFX6-NEXT: ;;#ASMEND
	; GFX6-NEXT: ;;#ASMSTART			; GFX6-NEXT: ;;#ASMSTART
	; GFX6-NEXT: ;;#ASMEND			; GFX6-NEXT: ;;#ASMEND
	; GFX6-NEXT: .LBB1_2: ; %ret			; GFX6-NEXT: .LBB1_2: ; %ret
	; GFX6-NEXT: s_or_b64 exec, exec, s[34:35]			; GFX6-NEXT: s_or_b64 exec, exec, s[34:35]
	; GFX6-NEXT: s_mov_b32 s4, 0x83400			; GFX6-NEXT: s_mov_b64 s[8:9], exec
				; GFX6-NEXT: s_mov_b64 exec, 15
				; GFX6-NEXT: s_mov_b32 s2, 0x80400
				; GFX6-NEXT: buffer_store_dword v10, off, s[40:43], 0
				; GFX6-NEXT: s_waitcnt expcnt(0)
				; GFX6-NEXT: buffer_load_dword v10, off, s[40:43], s2 ; 4-byte Folded Reload
				; GFX6-NEXT: s_waitcnt vmcnt(0)
				; GFX6-NEXT: v_readlane_b32 s4, v10, 0
				; GFX6-NEXT: v_readlane_b32 s5, v10, 1
				; GFX6-NEXT: v_readlane_b32 s6, v10, 2
				; GFX6-NEXT: v_readlane_b32 s7, v10, 3
				; GFX6-NEXT: buffer_load_dword v10, off, s[40:43], 0
				; GFX6-NEXT: s_waitcnt vmcnt(0)
				; GFX6-NEXT: s_mov_b64 exec, s[8:9]
				; GFX6-NEXT: s_mov_b32 s4, 0x83800
	; GFX6-NEXT: v_lshl_b64 v[4:5], v[5:6], 8			; GFX6-NEXT: v_lshl_b64 v[4:5], v[5:6], 8
	; GFX6-NEXT: buffer_load_dword v6, off, s[40:43], s4 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v6, off, s[40:43], s4 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v7, off, s[40:43], s4 offset:4 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v7, off, s[40:43], s4 offset:4 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v8, off, s[40:43], s4 offset:8 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v8, off, s[40:43], s4 offset:8 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v9, off, s[40:43], s4 offset:12 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v9, off, s[40:43], s4 offset:12 ; 4-byte Folded Reload
	; GFX6-NEXT: s_mov_b64 s[2:3], s[38:39]			; GFX6-NEXT: s_mov_b64 s[2:3], s[6:7]
	; GFX6-NEXT: s_mov_b32 s4, 0x83000			; GFX6-NEXT: s_mov_b32 s4, 0x83400
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: buffer_store_dwordx4 v[6:9], v[4:5], s[0:3], 0 addr64 offset:240			; GFX6-NEXT: buffer_store_dwordx4 v[6:9], v[4:5], s[0:3], 0 addr64 offset:240
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dword v6, off, s[40:43], s4 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v6, off, s[40:43], s4 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v7, off, s[40:43], s4 offset:4 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v7, off, s[40:43], s4 offset:4 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v8, off, s[40:43], s4 offset:8 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v8, off, s[40:43], s4 offset:8 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v9, off, s[40:43], s4 offset:12 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v9, off, s[40:43], s4 offset:12 ; 4-byte Folded Reload
	; GFX6-NEXT: s_mov_b32 s4, 0x82c00			; GFX6-NEXT: s_mov_b32 s4, 0x83000
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: buffer_store_dwordx4 v[6:9], v[4:5], s[0:3], 0 addr64 offset:224			; GFX6-NEXT: buffer_store_dwordx4 v[6:9], v[4:5], s[0:3], 0 addr64 offset:224
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dword v6, off, s[40:43], s4 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v6, off, s[40:43], s4 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v7, off, s[40:43], s4 offset:4 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v7, off, s[40:43], s4 offset:4 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v8, off, s[40:43], s4 offset:8 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v8, off, s[40:43], s4 offset:8 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v9, off, s[40:43], s4 offset:12 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v9, off, s[40:43], s4 offset:12 ; 4-byte Folded Reload
	; GFX6-NEXT: s_mov_b32 s4, 0x82800			; GFX6-NEXT: s_mov_b32 s4, 0x82c00
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: buffer_store_dwordx4 v[6:9], v[4:5], s[0:3], 0 addr64 offset:208			; GFX6-NEXT: buffer_store_dwordx4 v[6:9], v[4:5], s[0:3], 0 addr64 offset:208
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dword v6, off, s[40:43], s4 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v6, off, s[40:43], s4 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v7, off, s[40:43], s4 offset:4 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v7, off, s[40:43], s4 offset:4 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v8, off, s[40:43], s4 offset:8 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v8, off, s[40:43], s4 offset:8 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v9, off, s[40:43], s4 offset:12 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v9, off, s[40:43], s4 offset:12 ; 4-byte Folded Reload
	; GFX6-NEXT: s_mov_b32 s4, 0x82400			; GFX6-NEXT: s_mov_b32 s4, 0x82800
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: buffer_store_dwordx4 v[6:9], v[4:5], s[0:3], 0 addr64 offset:192			; GFX6-NEXT: buffer_store_dwordx4 v[6:9], v[4:5], s[0:3], 0 addr64 offset:192
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dword v6, off, s[40:43], s4 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v6, off, s[40:43], s4 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v7, off, s[40:43], s4 offset:4 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v7, off, s[40:43], s4 offset:4 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v8, off, s[40:43], s4 offset:8 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v8, off, s[40:43], s4 offset:8 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v9, off, s[40:43], s4 offset:12 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v9, off, s[40:43], s4 offset:12 ; 4-byte Folded Reload
	; GFX6-NEXT: s_mov_b32 s4, 0x82000			; GFX6-NEXT: s_mov_b32 s4, 0x82400
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: buffer_store_dwordx4 v[6:9], v[4:5], s[0:3], 0 addr64 offset:176			; GFX6-NEXT: buffer_store_dwordx4 v[6:9], v[4:5], s[0:3], 0 addr64 offset:176
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dword v6, off, s[40:43], s4 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v6, off, s[40:43], s4 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v7, off, s[40:43], s4 offset:4 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v7, off, s[40:43], s4 offset:4 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v8, off, s[40:43], s4 offset:8 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v8, off, s[40:43], s4 offset:8 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v9, off, s[40:43], s4 offset:12 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v9, off, s[40:43], s4 offset:12 ; 4-byte Folded Reload
	; GFX6-NEXT: s_mov_b32 s4, 0x81c00			; GFX6-NEXT: s_mov_b32 s4, 0x82000
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: buffer_store_dwordx4 v[6:9], v[4:5], s[0:3], 0 addr64 offset:160			; GFX6-NEXT: buffer_store_dwordx4 v[6:9], v[4:5], s[0:3], 0 addr64 offset:160
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dword v6, off, s[40:43], s4 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v6, off, s[40:43], s4 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v7, off, s[40:43], s4 offset:4 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v7, off, s[40:43], s4 offset:4 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v8, off, s[40:43], s4 offset:8 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v8, off, s[40:43], s4 offset:8 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v9, off, s[40:43], s4 offset:12 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v9, off, s[40:43], s4 offset:12 ; 4-byte Folded Reload
	; GFX6-NEXT: s_mov_b32 s4, 0x81800			; GFX6-NEXT: s_mov_b32 s4, 0x81c00
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: buffer_store_dwordx4 v[6:9], v[4:5], s[0:3], 0 addr64 offset:144			; GFX6-NEXT: buffer_store_dwordx4 v[6:9], v[4:5], s[0:3], 0 addr64 offset:144
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dword v6, off, s[40:43], s4 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v6, off, s[40:43], s4 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v7, off, s[40:43], s4 offset:4 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v7, off, s[40:43], s4 offset:4 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v8, off, s[40:43], s4 offset:8 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v8, off, s[40:43], s4 offset:8 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v9, off, s[40:43], s4 offset:12 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v9, off, s[40:43], s4 offset:12 ; 4-byte Folded Reload
	; GFX6-NEXT: s_mov_b32 s4, 0x81400			; GFX6-NEXT: s_mov_b32 s4, 0x81800
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: buffer_store_dwordx4 v[6:9], v[4:5], s[0:3], 0 addr64 offset:128			; GFX6-NEXT: buffer_store_dwordx4 v[6:9], v[4:5], s[0:3], 0 addr64 offset:128
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dword v6, off, s[40:43], s4 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v6, off, s[40:43], s4 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v7, off, s[40:43], s4 offset:4 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v7, off, s[40:43], s4 offset:4 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v8, off, s[40:43], s4 offset:8 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v8, off, s[40:43], s4 offset:8 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v9, off, s[40:43], s4 offset:12 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v9, off, s[40:43], s4 offset:12 ; 4-byte Folded Reload
	; GFX6-NEXT: s_mov_b32 s4, 0x81000			; GFX6-NEXT: s_mov_b32 s4, 0x81400
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: buffer_store_dwordx4 v[6:9], v[4:5], s[0:3], 0 addr64 offset:112			; GFX6-NEXT: buffer_store_dwordx4 v[6:9], v[4:5], s[0:3], 0 addr64 offset:112
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dword v6, off, s[40:43], s4 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v6, off, s[40:43], s4 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v7, off, s[40:43], s4 offset:4 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v7, off, s[40:43], s4 offset:4 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v8, off, s[40:43], s4 offset:8 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v8, off, s[40:43], s4 offset:8 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v9, off, s[40:43], s4 offset:12 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v9, off, s[40:43], s4 offset:12 ; 4-byte Folded Reload
	; GFX6-NEXT: s_mov_b32 s4, 0x80c00			; GFX6-NEXT: s_mov_b32 s4, 0x81000
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: buffer_store_dwordx4 v[6:9], v[4:5], s[0:3], 0 addr64 offset:96			; GFX6-NEXT: buffer_store_dwordx4 v[6:9], v[4:5], s[0:3], 0 addr64 offset:96
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dword v6, off, s[40:43], s4 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v6, off, s[40:43], s4 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v7, off, s[40:43], s4 offset:4 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v7, off, s[40:43], s4 offset:4 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v8, off, s[40:43], s4 offset:8 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v8, off, s[40:43], s4 offset:8 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v9, off, s[40:43], s4 offset:12 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v9, off, s[40:43], s4 offset:12 ; 4-byte Folded Reload
	; GFX6-NEXT: s_mov_b32 s4, 0x80400			; GFX6-NEXT: s_mov_b32 s4, 0x80800
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: buffer_store_dwordx4 v[6:9], v[4:5], s[0:3], 0 addr64 offset:80			; GFX6-NEXT: buffer_store_dwordx4 v[6:9], v[4:5], s[0:3], 0 addr64 offset:80
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: buffer_load_dword v6, off, s[40:43], s4 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v6, off, s[40:43], s4 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v7, off, s[40:43], s4 offset:4 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v7, off, s[40:43], s4 offset:4 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v8, off, s[40:43], s4 offset:8 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v8, off, s[40:43], s4 offset:8 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v9, off, s[40:43], s4 offset:12 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v9, off, s[40:43], s4 offset:12 ; 4-byte Folded Reload
	; GFX6-NEXT: s_mov_b32 s4, 0x80800			; GFX6-NEXT: s_mov_b32 s4, 0x80c00
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: buffer_store_dwordx4 v[6:9], v[4:5], s[0:3], 0 addr64 offset:64			; GFX6-NEXT: buffer_store_dwordx4 v[6:9], v[4:5], s[0:3], 0 addr64 offset:64
	; GFX6-NEXT: buffer_store_dwordx4 v[17:20], v[4:5], s[0:3], 0 addr64 offset:48			; GFX6-NEXT: buffer_store_dwordx4 v[17:20], v[4:5], s[0:3], 0 addr64 offset:48
	; GFX6-NEXT: buffer_store_dwordx4 v[13:16], v[4:5], s[0:3], 0 addr64 offset:32			; GFX6-NEXT: buffer_store_dwordx4 v[13:16], v[4:5], s[0:3], 0 addr64 offset:32
	; GFX6-NEXT: s_waitcnt expcnt(2)			; GFX6-NEXT: s_waitcnt expcnt(2)
	; GFX6-NEXT: buffer_load_dword v6, off, s[40:43], s4 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v6, off, s[40:43], s4 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v7, off, s[40:43], s4 offset:4 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v7, off, s[40:43], s4 offset:4 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v8, off, s[40:43], s4 offset:8 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v8, off, s[40:43], s4 offset:8 ; 4-byte Folded Reload
	▲ Show 20 Lines • Show All 85 Lines • ▼ Show 20 Lines
	; GFX9-FLATSCR-NEXT: ;;#ASMEND			; GFX9-FLATSCR-NEXT: ;;#ASMEND
	; GFX9-FLATSCR-NEXT: ;;#ASMSTART			; GFX9-FLATSCR-NEXT: ;;#ASMSTART
	; GFX9-FLATSCR-NEXT: ; def s[40:43]			; GFX9-FLATSCR-NEXT: ; def s[40:43]
	; GFX9-FLATSCR-NEXT: ;;#ASMEND			; GFX9-FLATSCR-NEXT: ;;#ASMEND
	; GFX9-FLATSCR-NEXT: ;;#ASMSTART			; GFX9-FLATSCR-NEXT: ;;#ASMSTART
	; GFX9-FLATSCR-NEXT: ; def s[38:39]			; GFX9-FLATSCR-NEXT: ; def s[38:39]
	; GFX9-FLATSCR-NEXT: ;;#ASMEND			; GFX9-FLATSCR-NEXT: ;;#ASMEND
	; GFX9-FLATSCR-NEXT: ;;#ASMSTART			; GFX9-FLATSCR-NEXT: ;;#ASMSTART
	; GFX9-FLATSCR-NEXT: ; def s[44:45]
	; GFX9-FLATSCR-NEXT: ;;#ASMEND
	; GFX9-FLATSCR-NEXT: ;;#ASMSTART
	; GFX9-FLATSCR-NEXT: ; def s33			; GFX9-FLATSCR-NEXT: ; def s33
	; GFX9-FLATSCR-NEXT: ;;#ASMEND			; GFX9-FLATSCR-NEXT: ;;#ASMEND
	; GFX9-FLATSCR-NEXT: s_and_saveexec_b64 s[34:35], vcc			; GFX9-FLATSCR-NEXT: s_and_saveexec_b64 s[34:35], vcc
	; GFX9-FLATSCR-NEXT: s_cbranch_execz .LBB1_2			; GFX9-FLATSCR-NEXT: s_cbranch_execz .LBB1_2
	; GFX9-FLATSCR-NEXT: ; %bb.1: ; %bb0			; GFX9-FLATSCR-NEXT: ; %bb.1: ; %bb0
	; GFX9-FLATSCR-NEXT: ;;#ASMSTART			; GFX9-FLATSCR-NEXT: ;;#ASMSTART
	; GFX9-FLATSCR-NEXT: ; use s[0:7],s[8:15],s[16:23],s[24:31],s[40:43],s[38:39],s[44:45]			; GFX9-FLATSCR-NEXT: ; use s[0:7],s[8:15],s[16:23],s[24:31],s[40:43],s[38:39]
	; GFX9-FLATSCR-NEXT: ;;#ASMEND			; GFX9-FLATSCR-NEXT: ;;#ASMEND
	; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x20d0			; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x20d0
	; GFX9-FLATSCR-NEXT: scratch_store_dwordx4 off, v[0:3], s0 ; 16-byte Folded Spill			; GFX9-FLATSCR-NEXT: scratch_store_dwordx4 off, v[0:3], s0 ; 16-byte Folded Spill
	; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x20e0			; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x20e0
	; GFX9-FLATSCR-NEXT: scratch_store_dwordx4 off, v[16:19], s0 ; 16-byte Folded Spill			; GFX9-FLATSCR-NEXT: scratch_store_dwordx4 off, v[16:19], s0 ; 16-byte Folded Spill
	; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x20f0			; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x20f0
	; GFX9-FLATSCR-NEXT: scratch_store_dwordx4 off, v[20:23], s0 ; 16-byte Folded Spill			; GFX9-FLATSCR-NEXT: scratch_store_dwordx4 off, v[20:23], s0 ; 16-byte Folded Spill
	; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x2100			; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x2100
	▲ Show 20 Lines • Show All 128 Lines • ▼ Show 20 Lines
	; GFX10-FLATSCR-NEXT: ;;#ASMEND			; GFX10-FLATSCR-NEXT: ;;#ASMEND
	; GFX10-FLATSCR-NEXT: ;;#ASMSTART			; GFX10-FLATSCR-NEXT: ;;#ASMSTART
	; GFX10-FLATSCR-NEXT: ; def s[40:43]			; GFX10-FLATSCR-NEXT: ; def s[40:43]
	; GFX10-FLATSCR-NEXT: ;;#ASMEND			; GFX10-FLATSCR-NEXT: ;;#ASMEND
	; GFX10-FLATSCR-NEXT: ;;#ASMSTART			; GFX10-FLATSCR-NEXT: ;;#ASMSTART
	; GFX10-FLATSCR-NEXT: ; def s[34:35]			; GFX10-FLATSCR-NEXT: ; def s[34:35]
	; GFX10-FLATSCR-NEXT: ;;#ASMEND			; GFX10-FLATSCR-NEXT: ;;#ASMEND
	; GFX10-FLATSCR-NEXT: ;;#ASMSTART			; GFX10-FLATSCR-NEXT: ;;#ASMSTART
	; GFX10-FLATSCR-NEXT: ; def s[38:39]			; GFX10-FLATSCR-NEXT: ; def s38
	; GFX10-FLATSCR-NEXT: ;;#ASMEND
	; GFX10-FLATSCR-NEXT: ;;#ASMSTART
	; GFX10-FLATSCR-NEXT: ; def s44
	; GFX10-FLATSCR-NEXT: ;;#ASMEND			; GFX10-FLATSCR-NEXT: ;;#ASMEND
	; GFX10-FLATSCR-NEXT: v_cmpx_eq_u32_e32 0, v0			; GFX10-FLATSCR-NEXT: v_cmpx_eq_u32_e32 0, v0
	; GFX10-FLATSCR-NEXT: s_cbranch_execz .LBB1_2			; GFX10-FLATSCR-NEXT: s_cbranch_execz .LBB1_2
	; GFX10-FLATSCR-NEXT: ; %bb.1: ; %bb0			; GFX10-FLATSCR-NEXT: ; %bb.1: ; %bb0
	; GFX10-FLATSCR-NEXT: ;;#ASMSTART			; GFX10-FLATSCR-NEXT: ;;#ASMSTART
	; GFX10-FLATSCR-NEXT: ; use s[0:7],s[8:15],s[16:23],s[24:31],s[40:43],s[34:35],s[38:39]			; GFX10-FLATSCR-NEXT: ; use s[0:7],s[8:15],s[16:23],s[24:31],s[40:43],s[34:35]
	; GFX10-FLATSCR-NEXT: ;;#ASMEND			; GFX10-FLATSCR-NEXT: ;;#ASMEND
	; GFX10-FLATSCR-NEXT: s_movk_i32 s0, 0x2010			; GFX10-FLATSCR-NEXT: s_movk_i32 s0, 0x2010
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v88, v59			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v88, v59
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v92, v63			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v92, v63
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v87, v58			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v87, v58
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v86, v57			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v86, v57
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v85, v56			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v85, v56
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v91, v62			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v91, v62
	▲ Show 20 Lines • Show All 161 Lines • ▼ Show 20 Lines

	; fill up SGPRs			; fill up SGPRs
	%sgpr0 = call <8 x i32> asm sideeffect "; def $0", "=s" ()			%sgpr0 = call <8 x i32> asm sideeffect "; def $0", "=s" ()
	%sgpr1 = call <8 x i32> asm sideeffect "; def $0", "=s" ()			%sgpr1 = call <8 x i32> asm sideeffect "; def $0", "=s" ()
	%sgpr2 = call <8 x i32> asm sideeffect "; def $0", "=s" ()			%sgpr2 = call <8 x i32> asm sideeffect "; def $0", "=s" ()
	%sgpr3 = call <8 x i32> asm sideeffect "; def $0", "=s" ()			%sgpr3 = call <8 x i32> asm sideeffect "; def $0", "=s" ()
	%sgpr4 = call <4 x i32> asm sideeffect "; def $0", "=s" ()			%sgpr4 = call <4 x i32> asm sideeffect "; def $0", "=s" ()
	%sgpr5 = call <2 x i32> asm sideeffect "; def $0", "=s" ()			%sgpr5 = call <2 x i32> asm sideeffect "; def $0", "=s" ()
	%sgpr6 = call <2 x i32> asm sideeffect "; def $0", "=s" ()			%sgpr6 = call i32 asm sideeffect "; def $0", "=s" ()
	%sgpr7 = call i32 asm sideeffect "; def $0", "=s" ()

	%cmp = icmp eq i32 %x, 0			%cmp = icmp eq i32 %x, 0
	br i1 %cmp, label %bb0, label %ret			br i1 %cmp, label %bb0, label %ret

	bb0:			bb0:
	; create SGPR pressure			; create SGPR pressure
	call void asm sideeffect "; use $0,$1,$2,$3,$4,$5,$6", "s,s,s,s,s,s,s,s"(<8 x i32> %sgpr0, <8 x i32> %sgpr1, <8 x i32> %sgpr2, <8 x i32> %sgpr3, <4 x i32> %sgpr4, <2 x i32> %sgpr5, <2 x i32> %sgpr6, i32 %sgpr7)			call void asm sideeffect "; use $0,$1,$2,$3,$4,$5", "s,s,s,s,s,s,s"(<8 x i32> %sgpr0, <8 x i32> %sgpr1, <8 x i32> %sgpr2, <8 x i32> %sgpr3, <4 x i32> %sgpr4, <2 x i32> %sgpr5, i32 %sgpr6)

	; mark most VGPR registers as used to increase register pressure			; mark most VGPR registers as used to increase register pressure
	call void asm sideeffect "", "~{v4},~{v8},~{v12},~{v16},~{v20},~{v24},~{v28},~{v32}" ()			call void asm sideeffect "", "~{v4},~{v8},~{v12},~{v16},~{v20},~{v24},~{v28},~{v32}" ()
	call void asm sideeffect "", "~{v36},~{v40},~{v44},~{v48},~{v52},~{v56},~{v60},~{v64}" ()			call void asm sideeffect "", "~{v36},~{v40},~{v44},~{v48},~{v52},~{v56},~{v60},~{v64}" ()
	call void asm sideeffect "", "~{v68},~{v72},~{v76},~{v80},~{v84},~{v88},~{v92},~{v96}" ()			call void asm sideeffect "", "~{v68},~{v72},~{v76},~{v80},~{v84},~{v88},~{v92},~{v96}" ()
	call void asm sideeffect "", "~{v100},~{v104},~{v108},~{v112},~{v116},~{v120},~{v124},~{v128}" ()			call void asm sideeffect "", "~{v100},~{v104},~{v108},~{v112},~{v116},~{v120},~{v124},~{v128}" ()
	call void asm sideeffect "", "~{v132},~{v136},~{v140},~{v144},~{v148},~{v152},~{v156},~{v160}" ()			call void asm sideeffect "", "~{v132},~{v136},~{v140},~{v144},~{v148},~{v152},~{v156},~{v160}" ()
	call void asm sideeffect "", "~{v164},~{v168},~{v172},~{v176},~{v180},~{v184},~{v188},~{v192}" ()			call void asm sideeffect "", "~{v164},~{v168},~{v172},~{v176},~{v180},~{v184},~{v188},~{v192}" ()
	Show All 18 Lines

llvm/test/CodeGen/AMDGPU/spill-sgpr-csr-live-ins.mir

	# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	# RUN: llc -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs -run-pass=si-lower-sgpr-spills -o - %s \| FileCheck %s			# RUN: llc -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs -run-pass=si-lower-sgpr-spills -o - %s \| FileCheck %s

	---			---
	name: spill_csr_sgpr_argument			name: spill_csr_sgpr_argument
	tracksRegLiveness: true			tracksRegLiveness: true
	liveins:			liveins:
	- { reg: '$sgpr50' }			- { reg: '$sgpr50' }
	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $sgpr50			liveins: $sgpr50
	; CHECK-LABEL: name: spill_csr_sgpr_argument			; CHECK-LABEL: name: spill_csr_sgpr_argument
	; CHECK: liveins: $sgpr50, $vgpr0			; CHECK: liveins: $sgpr50
	; CHECK-NEXT: {{ $}}			; CHECK-NEXT: {{ $}}
	; CHECK-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr50, 0, $vgpr0			; CHECK-NEXT: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
				; CHECK-NEXT: [[V_WRITELANE_B32_:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr50, 0, [[V_WRITELANE_B32_]]
	; CHECK-NEXT: S_NOP 0, implicit $sgpr50			; CHECK-NEXT: S_NOP 0, implicit $sgpr50
	; CHECK-NEXT: $sgpr50 = S_MOV_B32 0			; CHECK-NEXT: $sgpr50 = S_MOV_B32 0
	S_NOP 0, implicit $sgpr50			S_NOP 0, implicit $sgpr50
	$sgpr50 = S_MOV_B32 0			$sgpr50 = S_MOV_B32 0

	...			...

llvm/test/CodeGen/AMDGPU/spill-sgpr-stack-no-sgpr.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -march=amdgcn -mcpu=gfx1010 -mattr=-wavefrontsize32,+wavefrontsize64 -verify-machineinstrs < %s \| FileCheck -check-prefix=GFX10 %s			; RUN: llc -march=amdgcn -mcpu=gfx1010 -mattr=-wavefrontsize32,+wavefrontsize64 -verify-machineinstrs < %s \| FileCheck -check-prefix=GFX10 %s

	; Spill an SGPR to scratch without having spare SGPRs available to save exec			; The test was originally written to spill an SGPR to scratch without having spare SGPRs available to save exec.
				; This scenario no longer exists when we enabled SGPR spill into virtual VGPRs.

	define amdgpu_kernel void @test() #1 {			define amdgpu_kernel void @test() #1 {
	; GFX10-LABEL: test:			; GFX10-LABEL: test:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_mov_b32 s8, SCRATCH_RSRC_DWORD0			; GFX10-NEXT: s_mov_b32 s12, SCRATCH_RSRC_DWORD0
	; GFX10-NEXT: s_mov_b32 s9, SCRATCH_RSRC_DWORD1			; GFX10-NEXT: s_mov_b32 s13, SCRATCH_RSRC_DWORD1
	; GFX10-NEXT: s_mov_b32 s10, -1			; GFX10-NEXT: s_mov_b32 s14, -1
	; GFX10-NEXT: s_mov_b32 s11, 0x31e16000			; GFX10-NEXT: s_mov_b32 s15, 0x31e16000
	; GFX10-NEXT: s_add_u32 s8, s8, s1			; GFX10-NEXT: s_add_u32 s12, s12, s1
	; GFX10-NEXT: s_addc_u32 s9, s9, 0			; GFX10-NEXT: s_addc_u32 s13, s13, 0
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; def s[0:7]			; GFX10-NEXT: ; def s[0:7]
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; def s[8:12]			; GFX10-NEXT: ; def s[8:12]
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: s_not_b64 exec, exec			; GFX10-NOT: s_not_b64 exec, exec
	; GFX10-NEXT: buffer_store_dword v0, off, s[8:11], 0			; GFX10-NEXT: ; implicit-def: $vgpr0
	; GFX10-NEXT: v_writelane_b32 v0, s8, 0			; GFX10-NEXT: v_writelane_b32 v0, s8, 0
	; GFX10-NEXT: v_writelane_b32 v0, s9, 1			; GFX10-NEXT: v_writelane_b32 v0, s9, 1
	; GFX10-NEXT: v_writelane_b32 v0, s10, 2			; GFX10-NEXT: v_writelane_b32 v0, s10, 2
	; GFX10-NEXT: v_writelane_b32 v0, s11, 3			; GFX10-NEXT: v_writelane_b32 v0, s11, 3
	; GFX10-NEXT: v_writelane_b32 v0, s12, 4			; GFX10-NEXT: v_writelane_b32 v0, s12, 4
	; GFX10-NEXT: buffer_store_dword v0, off, s[8:11], 0 offset:4 ; 4-byte Folded Spill			; GFX10-NEXT: s_or_saveexec_b64 s[14:15], -1
				; GFX10-NEXT: buffer_store_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_not_b64 exec, exec			; GFX10-NEXT: s_mov_b64 exec, s[14:15]
	; GFX10-NEXT: buffer_store_dword v0, off, s[8:11], 0 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_not_b64 exec, exec
	; GFX10-NEXT: buffer_load_dword v0, off, s[8:11], 0
	; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_not_b64 exec, exec
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use s[0:7]			; GFX10-NEXT: ; use s[0:7]
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: s_mov_b64 s[6:7], exec			; GFX10-NEXT: s_or_saveexec_b64 s[14:15], -1
	; GFX10-NEXT: s_mov_b64 exec, 31			; GFX10-NEXT: buffer_load_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload
	; GFX10-NEXT: buffer_store_dword v0, off, s[8:11], 0			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: buffer_load_dword v0, off, s[8:11], 0 offset:4 ; 4-byte Folded Reload			; GFX10-NEXT: s_mov_b64 exec, s[14:15]
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_readlane_b32 s0, v0, 0			; GFX10-NEXT: v_readlane_b32 s0, v0, 0
	; GFX10-NEXT: v_readlane_b32 s1, v0, 1			; GFX10-NEXT: v_readlane_b32 s1, v0, 1
	; GFX10-NEXT: v_readlane_b32 s2, v0, 2			; GFX10-NEXT: v_readlane_b32 s2, v0, 2
	; GFX10-NEXT: v_readlane_b32 s3, v0, 3			; GFX10-NEXT: v_readlane_b32 s3, v0, 3
	; GFX10-NEXT: v_readlane_b32 s4, v0, 4			; GFX10-NEXT: v_readlane_b32 s4, v0, 4
	; GFX10-NEXT: buffer_load_dword v0, off, s[8:11], 0
	; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b64 exec, s[6:7]
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use s[0:4]			; GFX10-NEXT: ; use s[0:4]
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	%wide.sgpr0 = call <8 x i32> asm sideeffect "; def $0", "={s[0:7]}" () #0			%wide.sgpr0 = call <8 x i32> asm sideeffect "; def $0", "={s[0:7]}" () #0
	%wide.sgpr2 = call <5 x i32> asm sideeffect "; def $0", "={s[8:12]}" () #0			%wide.sgpr2 = call <5 x i32> asm sideeffect "; def $0", "={s[8:12]}" () #0
	call void asm sideeffect "", "~{v[0:7]}" () #0			call void asm sideeffect "", "~{v[0:7]}" () #0
	call void asm sideeffect "; use $0", "s"(<8 x i32> %wide.sgpr0) #0			call void asm sideeffect "; use $0", "s"(<8 x i32> %wide.sgpr0) #0
	call void asm sideeffect "; use $0", "s"(<5 x i32> %wide.sgpr2) #0			call void asm sideeffect "; use $0", "s"(<5 x i32> %wide.sgpr2) #0
	ret void			ret void
	}			}

	attributes #0 = { nounwind }			attributes #0 = { nounwind }
	attributes #1 = { nounwind "amdgpu-num-sgpr"="16" "amdgpu-num-vgpr"="8" }			attributes #1 = { nounwind "amdgpu-num-sgpr"="18" "amdgpu-num-vgpr"="8" }

llvm/test/CodeGen/AMDGPU/spill-sgpr-to-virtual-vgpr.mir

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx906 -run-pass=si-lower-sgpr-spills -verify-machineinstrs %s -o - \| FileCheck -check-prefix=GCN %s

				# A simple SGPR spill. Implicit def for lane VGPR should be inserted just before the spill instruction.
				---
				name: sgpr32_spill
				tracksRegLiveness: true
				frameInfo:
				maxAlignment: 4
				stack:
				- { id: 0, type: spill-slot, size: 4, alignment: 4, stack-id: sgpr-spill }
				machineFunctionInfo:
				isEntryFunction: false
				scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
				stackPtrOffsetReg: '$sgpr32'
				frameOffsetReg: '$sgpr33'
				hasSpilledSGPRs: true
				body: \|
				bb.0:
				liveins: $sgpr30_sgpr31, $sgpr10
				; GCN-LABEL: name: sgpr32_spill
				; GCN: liveins: $sgpr30_sgpr31, $sgpr10
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: S_NOP 0
				; GCN-NEXT: [[V_WRITELANE_B32_:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
				; GCN-NEXT: [[V_WRITELANE_B32_]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr10, 0, [[V_WRITELANE_B32_]]
				; GCN-NEXT: $sgpr10 = V_READLANE_B32 [[V_WRITELANE_B32_]], 0
				arsenmUnsubmitted Not Done Reply Inline Actions The test checks seem to not capture that these operands are tied arsenm: The test checks seem to not capture that these operands are tied
				cdevadasAuthorUnsubmitted Done Reply Inline Actions The auto-generator didn't generate the tied-operands correctly. I can hand-modify this test to show the tied operand. It's the simplest case. cdevadas: The auto-generator didn't generate the tied-operands correctly. I can hand-modify this test to…
				; GCN-NEXT: S_SETPC_B64 $sgpr30_sgpr31
				S_NOP 0
				SI_SPILL_S32_SAVE killed $sgpr10, %stack.0, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32
				renamable $sgpr10 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32
				S_SETPC_B64 $sgpr30_sgpr31
				...

				# Needed an additional virtual lane register as the lanes of current register are fully occupied while spilling a wide SGPR tuple.
				# There must be two implicit def for the two lane VGPRs.

				---
				name: sgpr_spill_lane_crossover
				tracksRegLiveness: true
				frameInfo:
				maxAlignment: 4
				stack:
				- { id: 0, type: spill-slot, size: 4, alignment: 4, stack-id: sgpr-spill }
				- { id: 1, type: spill-slot, size: 128, alignment: 4, stack-id: sgpr-spill }
				machineFunctionInfo:
				isEntryFunction: false
				scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
				stackPtrOffsetReg: '$sgpr32'
				frameOffsetReg: '$sgpr33'
				hasSpilledSGPRs: true
				body: \|
				bb.0:
				liveins: $sgpr30_sgpr31, $sgpr10, $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71, $sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79, $sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87, $sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-LABEL: name: sgpr_spill_lane_crossover
				; GCN: liveins: $sgpr10, $sgpr64, $sgpr65, $sgpr66, $sgpr67, $sgpr68, $sgpr69, $sgpr70, $sgpr71, $sgpr72, $sgpr73, $sgpr74, $sgpr75, $sgpr76, $sgpr77, $sgpr78, $sgpr79, $sgpr80, $sgpr81, $sgpr82, $sgpr83, $sgpr84, $sgpr85, $sgpr86, $sgpr87, $sgpr88, $sgpr89, $sgpr90, $sgpr91, $sgpr92, $sgpr93, $sgpr94, $sgpr95, $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71, $sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79, $sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87, $sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95, $sgpr30_sgpr31
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr64, 0, [[V_WRITELANE_B32_1]]
				cdevadasAuthorUnsubmitted Done Reply Inline Actions This test is already hand-modified to check the tied operands. cdevadas: This test is already hand-modified to check the tied operands.
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr65, 1, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr66, 2, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr67, 3, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr68, 4, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr69, 5, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr70, 6, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr71, 7, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr72, 8, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr73, 9, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr74, 10, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr75, 11, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr76, 12, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr77, 13, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr78, 14, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr79, 15, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr80, 16, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr81, 17, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr82, 18, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr83, 19, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr84, 20, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr85, 21, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr86, 22, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr87, 23, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr88, 24, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr89, 25, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr90, 26, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr91, 27, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr92, 28, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr93, 29, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr94, 30, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr95, 31, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: S_NOP 0
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr10, 32, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: [[V_WRITELANE_B32_2:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr64, 33, [[V_WRITELANE_B32_1]], implicit-def $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95, implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr65, 34, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr66, 35, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr67, 36, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr68, 37, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr69, 38, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr70, 39, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr71, 40, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr72, 41, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr73, 42, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr74, 43, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr75, 44, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr76, 45, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr77, 46, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr78, 47, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr79, 48, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr80, 49, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr81, 50, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr82, 51, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr83, 52, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr84, 53, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr85, 54, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr86, 55, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr87, 56, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr88, 57, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr89, 58, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr90, 59, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr91, 60, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr92, 61, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr93, 62, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_1]]:vgpr_32 = V_WRITELANE_B32 $sgpr94, 63, [[V_WRITELANE_B32_1]], implicit $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: [[V_WRITELANE_B32_2]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr95, 0, [[V_WRITELANE_B32_2]], implicit killed $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: S_NOP 0
				; GCN-NEXT: $sgpr64 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 33, implicit-def $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
				; GCN-NEXT: $sgpr65 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 34
				; GCN-NEXT: $sgpr66 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 35
				; GCN-NEXT: $sgpr67 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 36
				; GCN-NEXT: $sgpr68 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 37
				; GCN-NEXT: $sgpr69 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 38
				; GCN-NEXT: $sgpr70 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 39
				; GCN-NEXT: $sgpr71 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 40
				; GCN-NEXT: $sgpr72 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 41
				; GCN-NEXT: $sgpr73 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 42
				; GCN-NEXT: $sgpr74 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 43
				; GCN-NEXT: $sgpr75 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 44
				; GCN-NEXT: $sgpr76 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 45
				; GCN-NEXT: $sgpr77 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 46
				; GCN-NEXT: $sgpr78 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 47
				; GCN-NEXT: $sgpr79 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 48
				; GCN-NEXT: $sgpr80 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 49
				; GCN-NEXT: $sgpr81 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 50
				; GCN-NEXT: $sgpr82 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 51
				; GCN-NEXT: $sgpr83 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 52
				; GCN-NEXT: $sgpr84 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 53
				; GCN-NEXT: $sgpr85 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 54
				; GCN-NEXT: $sgpr86 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 55
				; GCN-NEXT: $sgpr87 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 56
				; GCN-NEXT: $sgpr88 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 57
				; GCN-NEXT: $sgpr89 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 58
				; GCN-NEXT: $sgpr90 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 59
				; GCN-NEXT: $sgpr91 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 60
				; GCN-NEXT: $sgpr92 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 61
				; GCN-NEXT: $sgpr93 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 62
				; GCN-NEXT: $sgpr94 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 63
				; GCN-NEXT: $sgpr95 = V_READLANE_B32 [[V_WRITELANE_B32_2]], 0
				; GCN-NEXT: $sgpr10 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 32
				; GCN-NEXT: S_SETPC_B64 $sgpr30_sgpr31
				S_NOP 0
				SI_SPILL_S32_SAVE killed $sgpr10, %stack.0, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32
				SI_SPILL_S1024_SAVE killed $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95, %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32
				S_NOP 0
				renamable $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95 = SI_SPILL_S1024_RESTORE %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32
				renamable $sgpr10 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32
				S_SETPC_B64 $sgpr30_sgpr31
				...

				# The implicit def for the lane VGPR should be inserted at the common dominator block (the entry block here).

				---
				name: lane_vgpr_implicit_def_at_common_dominator_block
				tracksRegLiveness: true
				frameInfo:
				maxAlignment: 4
				stack:
				- { id: 0, type: spill-slot, size: 4, alignment: 4, stack-id: sgpr-spill }
				machineFunctionInfo:
				isEntryFunction: false
				scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
				stackPtrOffsetReg: '$sgpr32'
				frameOffsetReg: '$sgpr33'
				hasSpilledSGPRs: true
				body: \|
				; GCN-LABEL: name: lane_vgpr_implicit_def_at_common_dominator_block
				; GCN: bb.0:
				; GCN-NEXT: successors: %bb.2(0x40000000), %bb.1(0x40000000)
				; GCN-NEXT: liveins: $sgpr10, $sgpr11, $sgpr30_sgpr31
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: S_NOP 0
				; GCN-NEXT: S_CMP_EQ_U32 $sgpr11, 0, implicit-def $scc
				; GCN-NEXT: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
				; GCN-NEXT: S_CBRANCH_SCC1 %bb.2, implicit killed $scc
				; GCN-NEXT: {{ $}}
				arsenmUnsubmitted Not Done Reply Inline Actions Needs a case where the insert block has no terminators arsenm: Needs a case where the insert block has no terminators
				cdevadasAuthorUnsubmitted Done Reply Inline Actions I couldn't write one successfully. Will try some unstructured flow to force one. cdevadas: I couldn't write one successfully. Will try some unstructured flow to force one.
				cdevadasAuthorUnsubmitted Done Reply Inline Actions I don't think such a case exists. A fall-through block will have only one successor and that becomes the nearest dominator for its children. It would be true even for any unstructured flow. cdevadas: I don't think such a case exists. A fall-through block will have only one successor and that…
				; GCN-NEXT: bb.1:
				; GCN-NEXT: successors: %bb.3(0x80000000)
				; GCN-NEXT: liveins: $sgpr10, $sgpr30_sgpr31
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: $sgpr10 = S_MOV_B32 10
				; GCN-NEXT: [[V_WRITELANE_B32_:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr10, 0, [[V_WRITELANE_B32_]]
				; GCN-NEXT: S_BRANCH %bb.3
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: bb.2:
				; GCN-NEXT: successors: %bb.3(0x80000000)
				; GCN-NEXT: liveins: $sgpr10, $sgpr30_sgpr31
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: $sgpr10 = S_MOV_B32 20
				; GCN-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr10, 0, [[V_WRITELANE_B32_1]]
				; GCN-NEXT: S_BRANCH %bb.3
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: bb.3:
				; GCN-NEXT: liveins: $sgpr10, $sgpr30_sgpr31
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: $sgpr10 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 0
				; GCN-NEXT: S_SETPC_B64 $sgpr30_sgpr31, implicit $sgpr10
				bb.0:
				liveins: $sgpr10, $sgpr11, $sgpr30_sgpr31
				S_NOP 0
				S_CMP_EQ_U32 $sgpr11, 0, implicit-def $scc
				S_CBRANCH_SCC1 %bb.2, implicit killed $scc
				bb.1:
				liveins: $sgpr10, $sgpr30_sgpr31
				$sgpr10 = S_MOV_B32 10
				SI_SPILL_S32_SAVE killed $sgpr10, %stack.0, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32
				S_BRANCH %bb.3
				bb.2:
				liveins: $sgpr10, $sgpr30_sgpr31
				$sgpr10 = S_MOV_B32 20
				SI_SPILL_S32_SAVE killed $sgpr10, %stack.0, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32
				S_BRANCH %bb.3
				bb.3:
				liveins: $sgpr10, $sgpr30_sgpr31
				renamable $sgpr10 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32
				S_SETPC_B64 $sgpr30_sgpr31, implicit $sgpr10
				...

				# The common dominator block is visited only at the end. The insertion point was initially identified to the
				# terminator instruction in the dominator block which later becomes the point where a spill get inserted in the same block.

				---
				name: dominator_block_follows_the_successors_bbs
				tracksRegLiveness: true
				frameInfo:
				maxAlignment: 4
				stack:
				- { id: 0, type: spill-slot, size: 4, alignment: 4, stack-id: sgpr-spill }
				machineFunctionInfo:
				isEntryFunction: false
				scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
				stackPtrOffsetReg: '$sgpr32'
				frameOffsetReg: '$sgpr33'
				hasSpilledSGPRs: true
				body: \|
				; GCN-LABEL: name: dominator_block_follows_the_successors_bbs
				; GCN: bb.0:
				; GCN-NEXT: successors: %bb.3(0x80000000)
				; GCN-NEXT: liveins: $sgpr10, $sgpr11, $sgpr30_sgpr31
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: S_NOP 0
				; GCN-NEXT: S_BRANCH %bb.3
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: bb.1:
				; GCN-NEXT: successors: %bb.2(0x80000000)
				; GCN-NEXT: liveins: $sgpr10, $sgpr30_sgpr31
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: $sgpr10 = V_READLANE_B32 %0, 0
				; GCN-NEXT: $sgpr10 = S_ADD_I32 $sgpr10, 15, implicit-def dead $scc
				; GCN-NEXT: S_BRANCH %bb.2
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: bb.2:
				; GCN-NEXT: successors: %bb.3(0x80000000)
				; GCN-NEXT: liveins: $sgpr10, $sgpr30_sgpr31
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: $sgpr10 = V_READLANE_B32 %0, 0
				; GCN-NEXT: $sgpr10 = S_ADD_I32 $sgpr10, 20, implicit-def dead $scc
				; GCN-NEXT: S_BRANCH %bb.3
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: bb.3:
				; GCN-NEXT: successors: %bb.2(0x40000000), %bb.1(0x40000000)
				; GCN-NEXT: liveins: $sgpr10, $sgpr11, $sgpr30_sgpr31
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: $sgpr10 = S_MOV_B32 10
				; GCN-NEXT: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
				; GCN-NEXT: [[V_WRITELANE_B32_:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr10, 0, [[V_WRITELANE_B32_]]
				; GCN-NEXT: S_CMP_EQ_U32 $sgpr11, 0, implicit-def $scc
				; GCN-NEXT: S_CBRANCH_SCC1 %bb.2, implicit killed $scc
				; GCN-NEXT: S_BRANCH %bb.1
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: bb.4:
				; GCN-NEXT: liveins: $sgpr10, $sgpr30_sgpr31
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: S_NOP 0
				; GCN-NEXT: S_SETPC_B64 $sgpr30_sgpr31, implicit $sgpr10
				bb.0:
				liveins: $sgpr10, $sgpr11, $sgpr30_sgpr31
				S_NOP 0
				S_BRANCH %bb.3
				bb.1:
				liveins: $sgpr10, $sgpr30_sgpr31
				renamable $sgpr10 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32
				$sgpr10 = S_ADD_I32 $sgpr10, 15, implicit-def dead $scc
				S_BRANCH %bb.2
				bb.2:
				liveins: $sgpr10, $sgpr30_sgpr31
				renamable $sgpr10 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32
				$sgpr10 = S_ADD_I32 $sgpr10, 20, implicit-def dead $scc
				S_BRANCH %bb.3
				bb.3:
				liveins: $sgpr10, $sgpr11, $sgpr30_sgpr31
				$sgpr10 = S_MOV_B32 10
				SI_SPILL_S32_SAVE killed $sgpr10, %stack.0, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32
				S_CMP_EQ_U32 $sgpr11, 0, implicit-def $scc
				S_CBRANCH_SCC1 %bb.2, implicit killed $scc
				S_BRANCH %bb.1
				bb.4:
				liveins: $sgpr10, $sgpr30_sgpr31
				S_NOP 0
				S_SETPC_B64 $sgpr30_sgpr31, implicit $sgpr10
				...

llvm/test/CodeGen/AMDGPU/spill-vgpr-to-agpr-update-regscavenger.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 -O0 -verify-machineinstrs -o - %s \| FileCheck %s

				; Regression test for `processFunctionBeforeFrameFinalized`:
				; Check that it correctly updates RegisterScavenger so we
				; don't end up with bad machine code due to using undefined
				; physical registers.

				define void @test() {
				; CHECK-LABEL: test:
				; CHECK: ; %bb.0: ; %bb.0
				; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; CHECK-NEXT: s_xor_saveexec_b64 s[4:5], -1
				; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
				; CHECK-NEXT: s_mov_b64 exec, s[4:5]
				; CHECK-NEXT: .LBB0_1: ; %bb.1
				; CHECK-NEXT: ; =>This Inner Loop Header: Depth=1
				; CHECK-NEXT: s_cbranch_scc1 .LBB0_3
				; CHECK-NEXT: ; %bb.2: ; %bb.2
				; CHECK-NEXT: ; in Loop: Header=BB0_1 Depth=1
				; CHECK-NEXT: .LBB0_3: ; %bb.3
				; CHECK-NEXT: ; in Loop: Header=BB0_1 Depth=1
				; CHECK-NEXT: ; implicit-def: $sgpr4
				; CHECK-NEXT: v_mov_b32_e32 v0, s4
				; CHECK-NEXT: v_readfirstlane_b32 s6, v0
				; CHECK-NEXT: s_mov_b64 s[4:5], -1
				; CHECK-NEXT: s_mov_b32 s7, 0
				; CHECK-NEXT: s_cmp_eq_u32 s6, s7
				; CHECK-NEXT: ; implicit-def: $vgpr0
				; CHECK-NEXT: v_writelane_b32 v0, s4, 0
				; CHECK-NEXT: v_writelane_b32 v0, s5, 1
				; CHECK-NEXT: s_mov_b64 s[10:11], exec
				; CHECK-NEXT: s_mov_b64 exec, -1
				; CHECK-NEXT: v_accvgpr_write_b32 a0, v0 ; Reload Reuse
				; CHECK-NEXT: s_mov_b64 exec, s[10:11]
				; CHECK-NEXT: s_cbranch_scc1 .LBB0_5
				; CHECK-NEXT: ; %bb.4: ; %bb.4
				; CHECK-NEXT: ; in Loop: Header=BB0_1 Depth=1
				; CHECK-NEXT: s_or_saveexec_b64 s[10:11], -1
				; CHECK-NEXT: v_accvgpr_read_b32 v0, a0 ; Reload Reuse
				; CHECK-NEXT: s_mov_b64 exec, s[10:11]
				; CHECK-NEXT: s_mov_b64 s[4:5], 0
				; CHECK-NEXT: v_writelane_b32 v0, s4, 0
				; CHECK-NEXT: v_writelane_b32 v0, s5, 1
				; CHECK-NEXT: s_or_saveexec_b64 s[10:11], -1
				; CHECK-NEXT: s_nop 0
				; CHECK-NEXT: v_accvgpr_write_b32 a0, v0 ; Reload Reuse
				; CHECK-NEXT: s_mov_b64 exec, s[10:11]
				; CHECK-NEXT: .LBB0_5: ; %Flow
				; CHECK-NEXT: ; in Loop: Header=BB0_1 Depth=1
				; CHECK-NEXT: s_or_saveexec_b64 s[10:11], -1
				; CHECK-NEXT: s_nop 0
				; CHECK-NEXT: v_accvgpr_read_b32 v0, a0 ; Reload Reuse
				; CHECK-NEXT: s_mov_b64 exec, s[10:11]
				; CHECK-NEXT: v_readlane_b32 s4, v0, 0
				; CHECK-NEXT: v_readlane_b32 s5, v0, 1
				; CHECK-NEXT: v_cndmask_b32_e64 v0, 0, 1, s[4:5]
				; CHECK-NEXT: s_mov_b32 s4, 1
				; CHECK-NEXT: ; implicit-def: $sgpr5
				; CHECK-NEXT: v_cmp_ne_u32_e64 s[4:5], v0, s4
				; CHECK-NEXT: s_and_b64 vcc, exec, s[4:5]
				; CHECK-NEXT: s_cbranch_vccnz .LBB0_1
				; CHECK-NEXT: ; %bb.6: ; %bb.5
				; CHECK-NEXT: s_xor_saveexec_b64 s[4:5], -1
				; CHECK-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
				; CHECK-NEXT: s_mov_b64 exec, s[4:5]
				; CHECK-NEXT: s_waitcnt vmcnt(0)
				; CHECK-NEXT: s_setpc_b64 s[30:31]
				bb.0:
				br label %bb.1
				bb.1: ; preds = %bb.4, %bb.0
				br i1 poison, label %bb.2, label %bb.3
				bb.2: ; preds = %bb.1
				br label %bb.3
				bb.3: ; preds = %bb.2, %bb.1
				%call = tail call i32 @llvm.amdgcn.readfirstlane(i32 poison)
				%cmp = icmp eq i32 %call, 0
				br i1 %cmp, label %bb.5, label %bb.4
				bb.4: ; preds = %bb.3
				br label %bb.1
				bb.5: ; preds = %bb.3
				ret void
				}

				declare i32 @llvm.amdgcn.readfirstlane(i32)

llvm/test/CodeGen/AMDGPU/spill-writelane-vgprs.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 -verify-machineinstrs -o - %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 -verify-machineinstrs -o - %s \| FileCheck -check-prefix=GCN %s
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -verify-machineinstrs -o - %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -verify-machineinstrs -o - %s \| FileCheck -check-prefix=GCN %s

	; Callee must preserve the VGPR modified by writelane even if it is marked Caller-saved.			; Callee must preserve the VGPR modified by writelane even if it is marked Caller-saved.

	declare i32 @llvm.amdgcn.writelane(i32, i32, i32)			declare i32 @llvm.amdgcn.writelane(i32, i32, i32)

	define void @sgpr_spill_writelane() {			define void @sgpr_spill_writelane() {
	; GCN-LABEL: sgpr_spill_writelane:			; GCN-LABEL: sgpr_spill_writelane:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_xor_saveexec_b64 s[6:7], -1
	; GCN-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[6:7]
				; GCN-NEXT: ; implicit-def: $vgpr0
	; GCN-NEXT: v_writelane_b32 v0, s35, 0			; GCN-NEXT: v_writelane_b32 v0, s35, 0
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND
	; GCN-NEXT: v_readlane_b32 s35, v0, 0			; GCN-NEXT: v_readlane_b32 s35, v0, 0
	; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_xor_saveexec_b64 s[6:7], -1
	; GCN-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v0, off, s[0:3], s32 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	call void asm sideeffect "", "~{s35}"()			call void asm sideeffect "", "~{s35}"()
	ret void			ret void
	}			}

	; FIXME: The writelane intrinsic doesn't really overwrite any inactive lanes			; FIXME: The writelane intrinsic doesn't really overwrite any inactive lanes
	; and hence there is no need to preserve the VGPR it modifies.			; and hence there is no need to preserve the VGPR it modifies.
	Show All 36 Lines

llvm/test/CodeGen/AMDGPU/spill192.mir

Show All 26 Lines	body: \|
; SPILLED-NEXT: S_NOP 1		; SPILLED-NEXT: S_NOP 1
; SPILLED-NEXT: {{ $}}		; SPILLED-NEXT: {{ $}}
; SPILLED-NEXT: bb.2:		; SPILLED-NEXT: bb.2:
; SPILLED-NEXT: $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9 = SI_SPILL_S192_RESTORE %stack.0, implicit $exec, implicit $sgpr32 :: (load (s192) from %stack.0, align 4, addrspace 5)		; SPILLED-NEXT: $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9 = SI_SPILL_S192_RESTORE %stack.0, implicit $exec, implicit $sgpr32 :: (load (s192) from %stack.0, align 4, addrspace 5)
; SPILLED-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9		; SPILLED-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9
; EXPANDED-LABEL: name: spill_restore_sgpr192		; EXPANDED-LABEL: name: spill_restore_sgpr192
; EXPANDED: bb.0:		; EXPANDED: bb.0:
; EXPANDED-NEXT: successors: %bb.1(0x80000000)		; EXPANDED-NEXT: successors: %bb.1(0x80000000)
; EXPANDED-NEXT: liveins: $vgpr0
; EXPANDED-NEXT: {{ $}}		; EXPANDED-NEXT: {{ $}}
; EXPANDED-NEXT: S_NOP 0, implicit-def renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9		; EXPANDED-NEXT: S_NOP 0, implicit-def renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr4, 0, $vgpr0, implicit-def $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9, implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9		; EXPANDED-NEXT: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr5, 1, $vgpr0, implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9		; EXPANDED-NEXT: [[V_WRITELANE_B32_:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr4, 0, [[V_WRITELANE_B32_]], implicit-def $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9, implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr6, 2, $vgpr0, implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr5, 1, [[V_WRITELANE_B32_1]], implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr7, 3, $vgpr0, implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr6, 2, [[V_WRITELANE_B32_1]], implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr8, 4, $vgpr0, implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr7, 3, [[V_WRITELANE_B32_1]], implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr9, 5, $vgpr0, implicit killed $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr8, 4, [[V_WRITELANE_B32_1]], implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9
		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr9, 5, [[V_WRITELANE_B32_1]], implicit killed $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9
; EXPANDED-NEXT: S_CBRANCH_SCC1 %bb.1, implicit undef $scc		; EXPANDED-NEXT: S_CBRANCH_SCC1 %bb.1, implicit undef $scc
; EXPANDED-NEXT: {{ $}}		; EXPANDED-NEXT: {{ $}}
; EXPANDED-NEXT: bb.1:		; EXPANDED-NEXT: bb.1:
; EXPANDED-NEXT: successors: %bb.2(0x80000000)		; EXPANDED-NEXT: successors: %bb.2(0x80000000)
; EXPANDED-NEXT: liveins: $vgpr0
; EXPANDED-NEXT: {{ $}}		; EXPANDED-NEXT: {{ $}}
; EXPANDED-NEXT: S_NOP 1		; EXPANDED-NEXT: S_NOP 1
; EXPANDED-NEXT: {{ $}}		; EXPANDED-NEXT: {{ $}}
; EXPANDED-NEXT: bb.2:		; EXPANDED-NEXT: bb.2:
; EXPANDED-NEXT: liveins: $vgpr0		; EXPANDED-NEXT: $sgpr4 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 0, implicit-def $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9
; EXPANDED-NEXT: {{ $}}		; EXPANDED-NEXT: $sgpr5 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 1
; EXPANDED-NEXT: $sgpr4 = V_READLANE_B32 $vgpr0, 0, implicit-def $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9		; EXPANDED-NEXT: $sgpr6 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 2
; EXPANDED-NEXT: $sgpr5 = V_READLANE_B32 $vgpr0, 1		; EXPANDED-NEXT: $sgpr7 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 3
; EXPANDED-NEXT: $sgpr6 = V_READLANE_B32 $vgpr0, 2		; EXPANDED-NEXT: $sgpr8 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 4
; EXPANDED-NEXT: $sgpr7 = V_READLANE_B32 $vgpr0, 3		; EXPANDED-NEXT: $sgpr9 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 5
; EXPANDED-NEXT: $sgpr8 = V_READLANE_B32 $vgpr0, 4
; EXPANDED-NEXT: $sgpr9 = V_READLANE_B32 $vgpr0, 5
; EXPANDED-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9		; EXPANDED-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9
bb.0:		bb.0:
S_NOP 0, implicit-def %0:sgpr_192		S_NOP 0, implicit-def %0:sgpr_192
S_CBRANCH_SCC1 implicit undef $scc, %bb.1		S_CBRANCH_SCC1 implicit undef $scc, %bb.1

bb.1:		bb.1:
S_NOP 1		S_NOP 1

▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/spill224.mir

Show All 24 Lines	body: \|
; SPILLED-NEXT: S_NOP 1		; SPILLED-NEXT: S_NOP 1
; SPILLED-NEXT: {{ $}}		; SPILLED-NEXT: {{ $}}
; SPILLED-NEXT: bb.2:		; SPILLED-NEXT: bb.2:
; SPILLED-NEXT: $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10 = SI_SPILL_S224_RESTORE %stack.0, implicit $exec, implicit $sgpr32 :: (load (s224) from %stack.0, align 4, addrspace 5)		; SPILLED-NEXT: $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10 = SI_SPILL_S224_RESTORE %stack.0, implicit $exec, implicit $sgpr32 :: (load (s224) from %stack.0, align 4, addrspace 5)
; SPILLED-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10		; SPILLED-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10
; EXPANDED-LABEL: name: spill_restore_sgpr224		; EXPANDED-LABEL: name: spill_restore_sgpr224
; EXPANDED: bb.0:		; EXPANDED: bb.0:
; EXPANDED-NEXT: successors: %bb.1(0x80000000)		; EXPANDED-NEXT: successors: %bb.1(0x80000000)
; EXPANDED-NEXT: liveins: $vgpr0
; EXPANDED-NEXT: {{ $}}		; EXPANDED-NEXT: {{ $}}
; EXPANDED-NEXT: S_NOP 0, implicit-def renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10		; EXPANDED-NEXT: S_NOP 0, implicit-def renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr4, 0, $vgpr0, implicit-def $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10, implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10		; EXPANDED-NEXT: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr5, 1, $vgpr0, implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10		; EXPANDED-NEXT: [[V_WRITELANE_B32_:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr4, 0, [[V_WRITELANE_B32_]], implicit-def $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10, implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr6, 2, $vgpr0, implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr5, 1, [[V_WRITELANE_B32_1]], implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr7, 3, $vgpr0, implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr6, 2, [[V_WRITELANE_B32_1]], implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr8, 4, $vgpr0, implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr7, 3, [[V_WRITELANE_B32_1]], implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 $sgpr9, 5, $vgpr0, implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr8, 4, [[V_WRITELANE_B32_1]], implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10
; EXPANDED-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr10, 6, $vgpr0, implicit killed $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 $sgpr9, 5, [[V_WRITELANE_B32_1]], implicit $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10
		; EXPANDED-NEXT: [[V_WRITELANE_B32_1:%[0-9]+]]:vgpr_32 = V_WRITELANE_B32 killed $sgpr10, 6, [[V_WRITELANE_B32_1]], implicit killed $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10
; EXPANDED-NEXT: S_CBRANCH_SCC1 %bb.1, implicit undef $scc		; EXPANDED-NEXT: S_CBRANCH_SCC1 %bb.1, implicit undef $scc
; EXPANDED-NEXT: {{ $}}		; EXPANDED-NEXT: {{ $}}
; EXPANDED-NEXT: bb.1:		; EXPANDED-NEXT: bb.1:
; EXPANDED-NEXT: successors: %bb.2(0x80000000)		; EXPANDED-NEXT: successors: %bb.2(0x80000000)
; EXPANDED-NEXT: liveins: $vgpr0
; EXPANDED-NEXT: {{ $}}		; EXPANDED-NEXT: {{ $}}
; EXPANDED-NEXT: S_NOP 1		; EXPANDED-NEXT: S_NOP 1
; EXPANDED-NEXT: {{ $}}		; EXPANDED-NEXT: {{ $}}
; EXPANDED-NEXT: bb.2:		; EXPANDED-NEXT: bb.2:
; EXPANDED-NEXT: liveins: $vgpr0		; EXPANDED-NEXT: $sgpr4 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 0, implicit-def $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10
; EXPANDED-NEXT: {{ $}}		; EXPANDED-NEXT: $sgpr5 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 1
; EXPANDED-NEXT: $sgpr4 = V_READLANE_B32 $vgpr0, 0, implicit-def $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10		; EXPANDED-NEXT: $sgpr6 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 2
; EXPANDED-NEXT: $sgpr5 = V_READLANE_B32 $vgpr0, 1		; EXPANDED-NEXT: $sgpr7 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 3
; EXPANDED-NEXT: $sgpr6 = V_READLANE_B32 $vgpr0, 2		; EXPANDED-NEXT: $sgpr8 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 4
; EXPANDED-NEXT: $sgpr7 = V_READLANE_B32 $vgpr0, 3		; EXPANDED-NEXT: $sgpr9 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 5
; EXPANDED-NEXT: $sgpr8 = V_READLANE_B32 $vgpr0, 4		; EXPANDED-NEXT: $sgpr10 = V_READLANE_B32 [[V_WRITELANE_B32_1]], 6
; EXPANDED-NEXT: $sgpr9 = V_READLANE_B32 $vgpr0, 5
; EXPANDED-NEXT: $sgpr10 = V_READLANE_B32 $vgpr0, 6
; EXPANDED-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10		; EXPANDED-NEXT: S_NOP 0, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10
bb.0:		bb.0:
S_NOP 0, implicit-def %0:sgpr_224		S_NOP 0, implicit-def %0:sgpr_224
S_CBRANCH_SCC1 implicit undef $scc, %bb.1		S_CBRANCH_SCC1 implicit undef $scc, %bb.1

bb.1:		bb.1:
S_NOP 1		S_NOP 1

▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/tail-call-amdgpu-gfx.ll

	Show All 16 Lines
	; GCN-LABEL: caller:			; GCN-LABEL: caller:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_mov_b32 s36, s33			; GCN-NEXT: s_mov_b32 s36, s33
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_xor_saveexec_b64 s[34:35], -1			; GCN-NEXT: s_xor_saveexec_b64 s[34:35], -1
	; GCN-NEXT: buffer_store_dword v1, off, s[0:3], s33 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v1, off, s[0:3], s33 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[34:35]			; GCN-NEXT: s_mov_b64 exec, s[34:35]
	; GCN-NEXT: v_writelane_b32 v1, s4, 0			; GCN-NEXT: ; implicit-def: $vgpr1
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
				; GCN-NEXT: v_writelane_b32 v1, s4, 0
	; GCN-NEXT: v_writelane_b32 v1, s30, 1			; GCN-NEXT: v_writelane_b32 v1, s30, 1
	; GCN-NEXT: v_add_f32_e32 v0, 1.0, v0			; GCN-NEXT: v_add_f32_e32 v0, 1.0, v0
	; GCN-NEXT: s_mov_b32 s4, 2.0			; GCN-NEXT: s_mov_b32 s4, 2.0
	; GCN-NEXT: v_writelane_b32 v1, s31, 2			; GCN-NEXT: v_writelane_b32 v1, s31, 2
	; GCN-NEXT: s_getpc_b64 s[34:35]			; GCN-NEXT: s_getpc_b64 s[34:35]
	; GCN-NEXT: s_add_u32 s34, s34, callee@rel32@lo+4			; GCN-NEXT: s_add_u32 s34, s34, callee@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s35, s35, callee@rel32@hi+12			; GCN-NEXT: s_addc_u32 s35, s35, callee@rel32@hi+12
	; GCN-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GCN-NEXT: s_swappc_b64 s[30:31], s[34:35]
	Show All 14 Lines

llvm/test/CodeGen/AMDGPU/tuple-allocation-failure.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -greedy-regclass-priority-trumps-globalness=1 -o - %s \| FileCheck -check-prefixes=GFX90A,GLOBALNESS1 %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -greedy-regclass-priority-trumps-globalness=1 -o - %s \| FileCheck -check-prefixes=GFX90A,GLOBALNESS1 %s
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -greedy-regclass-priority-trumps-globalness=0 -o - %s \| FileCheck -check-prefixes=GFX90A,GLOBALNESS0 %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -greedy-regclass-priority-trumps-globalness=0 -o - %s \| FileCheck -check-prefixes=GFX90A,GLOBALNESS0 %s

	declare void @wobble()			declare void @wobble()

	define internal fastcc void @widget() {			define internal fastcc void @widget() {
	; GFX90A-LABEL: widget:			; GFX90A-LABEL: widget:
	; GFX90A: ; %bb.0: ; %bb			; GFX90A: ; %bb.0: ; %bb
	; GFX90A-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX90A-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX90A-NEXT: s_mov_b32 s16, s33			; GFX90A-NEXT: s_mov_b32 s16, s33
	; GFX90A-NEXT: s_mov_b32 s33, s32			; GFX90A-NEXT: s_mov_b32 s33, s32
	; GFX90A-NEXT: s_or_saveexec_b64 s[18:19], -1			; GFX90A-NEXT: s_xor_saveexec_b64 s[18:19], -1
	; GFX90A-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX90A-NEXT: buffer_store_dword v0, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX90A-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GFX90A-NEXT: s_mov_b64 exec, -1
				; GFX90A-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX90A-NEXT: s_mov_b64 exec, s[18:19]			; GFX90A-NEXT: s_mov_b64 exec, s[18:19]
	; GFX90A-NEXT: s_addk_i32 s32, 0x400			; GFX90A-NEXT: s_addk_i32 s32, 0x400
	; GFX90A-NEXT: v_writelane_b32 v41, s16, 0			; GFX90A-NEXT: v_writelane_b32 v40, s16, 0
	; GFX90A-NEXT: s_getpc_b64 s[16:17]			; GFX90A-NEXT: s_getpc_b64 s[16:17]
	; GFX90A-NEXT: s_add_u32 s16, s16, wobble@gotpcrel32@lo+4			; GFX90A-NEXT: s_add_u32 s16, s16, wobble@gotpcrel32@lo+4
	; GFX90A-NEXT: s_addc_u32 s17, s17, wobble@gotpcrel32@hi+12			; GFX90A-NEXT: s_addc_u32 s17, s17, wobble@gotpcrel32@hi+12
	; GFX90A-NEXT: s_load_dwordx2 s[16:17], s[16:17], 0x0			; GFX90A-NEXT: s_load_dwordx2 s[16:17], s[16:17], 0x0
	; GFX90A-NEXT: v_writelane_b32 v40, s30, 0			; GFX90A-NEXT: ; implicit-def: $vgpr0
	; GFX90A-NEXT: v_writelane_b32 v40, s31, 1			; GFX90A-NEXT: v_writelane_b32 v0, s30, 0
				; GFX90A-NEXT: v_writelane_b32 v0, s31, 1
	; GFX90A-NEXT: s_waitcnt lgkmcnt(0)			; GFX90A-NEXT: s_waitcnt lgkmcnt(0)
	; GFX90A-NEXT: s_swappc_b64 s[30:31], s[16:17]			; GFX90A-NEXT: s_swappc_b64 s[30:31], s[16:17]
	bb:			bb:
	tail call void @wobble()			tail call void @wobble()
	unreachable			unreachable
	}			}

	define amdgpu_kernel void @kernel(i32 addrspace(1)* %arg1.global, i1 %tmp3.i.i, i32 %tmp5.i.i, i32 %tmp427.i, i1 %tmp438.i, double %tmp27.i, i1 %tmp48.i) {			define amdgpu_kernel void @kernel(i32 addrspace(1)* %arg1.global, i1 %tmp3.i.i, i32 %tmp5.i.i, i32 %tmp427.i, i1 %tmp438.i, double %tmp27.i, i1 %tmp48.i) {
	; GLOBALNESS1-LABEL: kernel:			; GLOBALNESS1-LABEL: kernel:
	; GLOBALNESS1: ; %bb.0: ; %bb			; GLOBALNESS1: ; %bb.0: ; %bb
	; GLOBALNESS1-NEXT: s_mov_b64 s[54:55], s[6:7]			; GLOBALNESS1-NEXT: s_mov_b64 s[54:55], s[6:7]
	; GLOBALNESS1-NEXT: s_load_dwordx4 s[36:39], s[8:9], 0x0			; GLOBALNESS1-NEXT: s_load_dwordx4 s[36:39], s[8:9], 0x0
	; GLOBALNESS1-NEXT: s_load_dword s6, s[8:9], 0x14			; GLOBALNESS1-NEXT: s_load_dword s6, s[8:9], 0x14
	; GLOBALNESS1-NEXT: v_mov_b32_e32 v43, v0			; GLOBALNESS1-NEXT: v_mov_b32_e32 v41, v0
	; GLOBALNESS1-NEXT: v_mov_b32_e32 v40, 0			; GLOBALNESS1-NEXT: v_mov_b32_e32 v42, 0
	; GLOBALNESS1-NEXT: v_pk_mov_b32 v[0:1], 0, 0			; GLOBALNESS1-NEXT: v_pk_mov_b32 v[0:1], 0, 0
	; GLOBALNESS1-NEXT: global_store_dword v[0:1], v40, off			; GLOBALNESS1-NEXT: global_store_dword v[0:1], v42, off
	; GLOBALNESS1-NEXT: s_waitcnt lgkmcnt(0)			; GLOBALNESS1-NEXT: s_waitcnt lgkmcnt(0)
	; GLOBALNESS1-NEXT: global_load_dword v0, v40, s[36:37]			; GLOBALNESS1-NEXT: global_load_dword v0, v42, s[36:37]
	; GLOBALNESS1-NEXT: s_add_u32 flat_scratch_lo, s12, s17			; GLOBALNESS1-NEXT: s_add_u32 flat_scratch_lo, s12, s17
	; GLOBALNESS1-NEXT: s_mov_b64 s[64:65], s[4:5]			; GLOBALNESS1-NEXT: s_mov_b64 s[64:65], s[4:5]
	; GLOBALNESS1-NEXT: s_load_dwordx2 s[4:5], s[8:9], 0x18			; GLOBALNESS1-NEXT: s_load_dwordx2 s[4:5], s[8:9], 0x18
	; GLOBALNESS1-NEXT: s_load_dword s7, s[8:9], 0x20			; GLOBALNESS1-NEXT: s_load_dword s7, s[8:9], 0x20
	; GLOBALNESS1-NEXT: s_addc_u32 flat_scratch_hi, s13, 0			; GLOBALNESS1-NEXT: s_addc_u32 flat_scratch_hi, s13, 0
	; GLOBALNESS1-NEXT: s_add_u32 s0, s0, s17			; GLOBALNESS1-NEXT: s_add_u32 s0, s0, s17
	; GLOBALNESS1-NEXT: s_addc_u32 s1, s1, 0			; GLOBALNESS1-NEXT: s_addc_u32 s1, s1, 0
	; GLOBALNESS1-NEXT: v_mov_b32_e32 v41, 0x40994400			; GLOBALNESS1-NEXT: v_mov_b32_e32 v43, 0x40994400
	; GLOBALNESS1-NEXT: s_bitcmp1_b32 s38, 0			; GLOBALNESS1-NEXT: s_bitcmp1_b32 s38, 0
	; GLOBALNESS1-NEXT: s_waitcnt lgkmcnt(0)			; GLOBALNESS1-NEXT: s_waitcnt lgkmcnt(0)
	; GLOBALNESS1-NEXT: v_cmp_ngt_f64_e64 s[36:37], s[4:5], v[40:41]			; GLOBALNESS1-NEXT: v_cmp_ngt_f64_e64 s[40:41], s[4:5], v[42:43]
	; GLOBALNESS1-NEXT: v_cmp_ngt_f64_e64 s[40:41], s[4:5], 0			; GLOBALNESS1-NEXT: v_cmp_ngt_f64_e64 s[42:43], s[4:5], 0
	; GLOBALNESS1-NEXT: s_cselect_b64 s[4:5], -1, 0			; GLOBALNESS1-NEXT: s_cselect_b64 s[4:5], -1, 0
	; GLOBALNESS1-NEXT: s_xor_b64 s[94:95], s[4:5], -1			; GLOBALNESS1-NEXT: s_xor_b64 s[94:95], s[4:5], -1
	; GLOBALNESS1-NEXT: s_bitcmp1_b32 s6, 0			; GLOBALNESS1-NEXT: s_bitcmp1_b32 s6, 0
	; GLOBALNESS1-NEXT: v_cndmask_b32_e64 v1, 0, 1, s[4:5]			; GLOBALNESS1-NEXT: v_cndmask_b32_e64 v1, 0, 1, s[4:5]
	; GLOBALNESS1-NEXT: s_cselect_b64 s[4:5], -1, 0			; GLOBALNESS1-NEXT: s_cselect_b64 s[4:5], -1, 0
	; GLOBALNESS1-NEXT: s_xor_b64 s[88:89], s[4:5], -1			; GLOBALNESS1-NEXT: s_xor_b64 s[88:89], s[4:5], -1
	; GLOBALNESS1-NEXT: s_bitcmp1_b32 s7, 0			; GLOBALNESS1-NEXT: s_bitcmp1_b32 s7, 0
	; GLOBALNESS1-NEXT: s_cselect_b64 s[4:5], -1, 0			; GLOBALNESS1-NEXT: s_cselect_b64 s[4:5], -1, 0
	; GLOBALNESS1-NEXT: s_getpc_b64 s[6:7]			; GLOBALNESS1-NEXT: s_getpc_b64 s[6:7]
	; GLOBALNESS1-NEXT: s_add_u32 s6, s6, wobble@gotpcrel32@lo+4			; GLOBALNESS1-NEXT: s_add_u32 s6, s6, wobble@gotpcrel32@lo+4
	; GLOBALNESS1-NEXT: s_addc_u32 s7, s7, wobble@gotpcrel32@hi+12			; GLOBALNESS1-NEXT: s_addc_u32 s7, s7, wobble@gotpcrel32@hi+12
	; GLOBALNESS1-NEXT: s_xor_b64 s[86:87], s[4:5], -1			; GLOBALNESS1-NEXT: s_xor_b64 s[86:87], s[4:5], -1
				; GLOBALNESS1-NEXT: ; implicit-def: $vgpr40
	; GLOBALNESS1-NEXT: s_load_dwordx2 s[66:67], s[6:7], 0x0			; GLOBALNESS1-NEXT: s_load_dwordx2 s[66:67], s[6:7], 0x0
	; GLOBALNESS1-NEXT: s_mov_b32 s98, s16			; GLOBALNESS1-NEXT: s_mov_b32 s98, s16
	; GLOBALNESS1-NEXT: s_mov_b64 s[62:63], s[8:9]			; GLOBALNESS1-NEXT: s_mov_b64 s[62:63], s[8:9]
	; GLOBALNESS1-NEXT: s_mov_b32 s99, s15			; GLOBALNESS1-NEXT: s_mov_b32 s99, s15
	; GLOBALNESS1-NEXT: s_mov_b32 s100, s14			; GLOBALNESS1-NEXT: s_mov_b32 s56, s14
	; GLOBALNESS1-NEXT: s_mov_b64 s[34:35], s[10:11]			; GLOBALNESS1-NEXT: s_mov_b64 s[34:35], s[10:11]
	; GLOBALNESS1-NEXT: s_mov_b64 s[92:93], 0x80			; GLOBALNESS1-NEXT: s_mov_b64 s[92:93], 0x80
	; GLOBALNESS1-NEXT: v_cmp_ne_u32_e64 s[42:43], 1, v1			; GLOBALNESS1-NEXT: v_cmp_ne_u32_e64 s[36:37], 1, v1
	; GLOBALNESS1-NEXT: s_mov_b32 s69, 0x3ff00000			; GLOBALNESS1-NEXT: s_mov_b32 s69, 0x3ff00000
	; GLOBALNESS1-NEXT: s_mov_b32 s32, 0			; GLOBALNESS1-NEXT: s_mov_b32 s32, 0
	; GLOBALNESS1-NEXT: ; implicit-def: $agpr32_agpr33_agpr34_agpr35_agpr36_agpr37_agpr38_agpr39_agpr40_agpr41_agpr42_agpr43_agpr44_agpr45_agpr46_agpr47_agpr48_agpr49_agpr50_agpr51_agpr52_agpr53_agpr54_agpr55_agpr56_agpr57_agpr58_agpr59_agpr60_agpr61_agpr62_agpr63			; GLOBALNESS1-NEXT: ; implicit-def: $agpr32_agpr33_agpr34_agpr35_agpr36_agpr37_agpr38_agpr39_agpr40_agpr41_agpr42_agpr43_agpr44_agpr45_agpr46_agpr47_agpr48_agpr49_agpr50_agpr51_agpr52_agpr53_agpr54_agpr55_agpr56_agpr57_agpr58_agpr59_agpr60_agpr61_agpr62_agpr63
	; GLOBALNESS1-NEXT: s_waitcnt vmcnt(0)			; GLOBALNESS1-NEXT: s_waitcnt vmcnt(0)
	; GLOBALNESS1-NEXT: v_cmp_gt_i32_e64 s[4:5], 0, v0			; GLOBALNESS1-NEXT: v_cmp_gt_i32_e64 s[4:5], 0, v0
	; GLOBALNESS1-NEXT: v_writelane_b32 v42, s4, 0			; GLOBALNESS1-NEXT: v_writelane_b32 v40, s4, 0
	; GLOBALNESS1-NEXT: v_writelane_b32 v42, s5, 1			; GLOBALNESS1-NEXT: v_writelane_b32 v40, s5, 1
	; GLOBALNESS1-NEXT: v_cmp_eq_u32_e64 s[4:5], 1, v0			; GLOBALNESS1-NEXT: v_cmp_eq_u32_e64 s[4:5], 1, v0
	; GLOBALNESS1-NEXT: v_writelane_b32 v42, s4, 2			; GLOBALNESS1-NEXT: v_writelane_b32 v40, s4, 2
	; GLOBALNESS1-NEXT: v_writelane_b32 v42, s5, 3			; GLOBALNESS1-NEXT: v_writelane_b32 v40, s5, 3
	; GLOBALNESS1-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v0			; GLOBALNESS1-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v0
	; GLOBALNESS1-NEXT: v_writelane_b32 v42, s4, 4			; GLOBALNESS1-NEXT: v_writelane_b32 v40, s4, 4
	; GLOBALNESS1-NEXT: v_cmp_gt_i32_e64 s[90:91], 1, v0			; GLOBALNESS1-NEXT: v_cmp_gt_i32_e64 s[90:91], 1, v0
	; GLOBALNESS1-NEXT: v_writelane_b32 v42, s5, 5			; GLOBALNESS1-NEXT: v_writelane_b32 v40, s5, 5
	; GLOBALNESS1-NEXT: s_branch .LBB1_4			; GLOBALNESS1-NEXT: s_branch .LBB1_4
	; GLOBALNESS1-NEXT: .LBB1_1: ; %bb70.i			; GLOBALNESS1-NEXT: .LBB1_1: ; %bb70.i
	; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS1-NEXT: v_readlane_b32 s6, v42, 4			; GLOBALNESS1-NEXT: v_readlane_b32 s6, v40, 4
	; GLOBALNESS1-NEXT: v_readlane_b32 s7, v42, 5			; GLOBALNESS1-NEXT: v_readlane_b32 s7, v40, 5
	; GLOBALNESS1-NEXT: s_andn2_b64 vcc, exec, s[6:7]			; GLOBALNESS1-NEXT: s_andn2_b64 vcc, exec, s[6:7]
	; GLOBALNESS1-NEXT: s_cbranch_vccz .LBB1_29			; GLOBALNESS1-NEXT: s_cbranch_vccz .LBB1_29
	; GLOBALNESS1-NEXT: .LBB1_2: ; %Flow6			; GLOBALNESS1-NEXT: .LBB1_2: ; %Flow6
	; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS1-NEXT: s_or_b64 exec, exec, s[4:5]			; GLOBALNESS1-NEXT: s_or_b64 exec, exec, s[4:5]
	; GLOBALNESS1-NEXT: s_mov_b64 s[6:7], 0			; GLOBALNESS1-NEXT: s_mov_b64 s[6:7], 0
	; GLOBALNESS1-NEXT: ; implicit-def: $sgpr4_sgpr5			; GLOBALNESS1-NEXT: ; implicit-def: $sgpr4_sgpr5
	; GLOBALNESS1-NEXT: .LBB1_3: ; %Flow19			; GLOBALNESS1-NEXT: .LBB1_3: ; %Flow19
	Show All 33 Lines
	; GLOBALNESS1-NEXT: v_accvgpr_write_b32 a32, v0			; GLOBALNESS1-NEXT: v_accvgpr_write_b32 a32, v0
	; GLOBALNESS1-NEXT: s_cbranch_vccnz .LBB1_30			; GLOBALNESS1-NEXT: s_cbranch_vccnz .LBB1_30
	; GLOBALNESS1-NEXT: .LBB1_4: ; %bb5			; GLOBALNESS1-NEXT: .LBB1_4: ; %bb5
	; GLOBALNESS1-NEXT: ; =>This Loop Header: Depth=1			; GLOBALNESS1-NEXT: ; =>This Loop Header: Depth=1
	; GLOBALNESS1-NEXT: ; Child Loop BB1_15 Depth 2			; GLOBALNESS1-NEXT: ; Child Loop BB1_15 Depth 2
	; GLOBALNESS1-NEXT: v_pk_mov_b32 v[0:1], s[92:93], s[92:93] op_sel:[0,1]			; GLOBALNESS1-NEXT: v_pk_mov_b32 v[0:1], s[92:93], s[92:93] op_sel:[0,1]
	; GLOBALNESS1-NEXT: flat_load_dword v44, v[0:1]			; GLOBALNESS1-NEXT: flat_load_dword v44, v[0:1]
	; GLOBALNESS1-NEXT: s_add_u32 s8, s62, 40			; GLOBALNESS1-NEXT: s_add_u32 s8, s62, 40
	; GLOBALNESS1-NEXT: buffer_store_dword v40, off, s[0:3], 0			; GLOBALNESS1-NEXT: buffer_store_dword v42, off, s[0:3], 0
	; GLOBALNESS1-NEXT: flat_load_dword v45, v[0:1]			; GLOBALNESS1-NEXT: flat_load_dword v45, v[0:1]
	; GLOBALNESS1-NEXT: s_addc_u32 s9, s63, 0			; GLOBALNESS1-NEXT: s_addc_u32 s9, s63, 0
	; GLOBALNESS1-NEXT: s_mov_b64 s[4:5], s[64:65]			; GLOBALNESS1-NEXT: s_mov_b64 s[4:5], s[64:65]
	; GLOBALNESS1-NEXT: s_mov_b64 s[6:7], s[54:55]			; GLOBALNESS1-NEXT: s_mov_b64 s[6:7], s[54:55]
	; GLOBALNESS1-NEXT: s_mov_b64 s[10:11], s[34:35]			; GLOBALNESS1-NEXT: s_mov_b64 s[10:11], s[34:35]
	; GLOBALNESS1-NEXT: s_mov_b32 s12, s100			; GLOBALNESS1-NEXT: s_mov_b32 s12, s56
	; GLOBALNESS1-NEXT: s_mov_b32 s13, s99			; GLOBALNESS1-NEXT: s_mov_b32 s13, s99
	; GLOBALNESS1-NEXT: s_mov_b32 s14, s98			; GLOBALNESS1-NEXT: s_mov_b32 s14, s98
	; GLOBALNESS1-NEXT: v_mov_b32_e32 v31, v43			; GLOBALNESS1-NEXT: v_mov_b32_e32 v31, v41
	; GLOBALNESS1-NEXT: s_waitcnt lgkmcnt(0)			; GLOBALNESS1-NEXT: s_waitcnt lgkmcnt(0)
	; GLOBALNESS1-NEXT: s_swappc_b64 s[30:31], s[66:67]			; GLOBALNESS1-NEXT: s_swappc_b64 s[30:31], s[66:67]
	; GLOBALNESS1-NEXT: s_and_b64 vcc, exec, s[42:43]			; GLOBALNESS1-NEXT: s_and_b64 vcc, exec, s[36:37]
	; GLOBALNESS1-NEXT: s_mov_b64 s[6:7], -1			; GLOBALNESS1-NEXT: s_mov_b64 s[6:7], -1
	; GLOBALNESS1-NEXT: ; implicit-def: $sgpr4_sgpr5			; GLOBALNESS1-NEXT: ; implicit-def: $sgpr4_sgpr5
	; GLOBALNESS1-NEXT: s_cbranch_vccnz .LBB1_8			; GLOBALNESS1-NEXT: s_cbranch_vccnz .LBB1_8
	; GLOBALNESS1-NEXT: ; %bb.5: ; %NodeBlock			; GLOBALNESS1-NEXT: ; %bb.5: ; %NodeBlock
	; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS1-NEXT: s_cmp_lt_i32 s39, 1			; GLOBALNESS1-NEXT: s_cmp_lt_i32 s39, 1
	; GLOBALNESS1-NEXT: s_cbranch_scc1 .LBB1_7			; GLOBALNESS1-NEXT: s_cbranch_scc1 .LBB1_7
	; GLOBALNESS1-NEXT: ; %bb.6: ; %LeafBlock3			; GLOBALNESS1-NEXT: ; %bb.6: ; %LeafBlock3
	▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
	; GLOBALNESS1-NEXT: v_pk_mov_b32 v[26:27], s[94:95], s[94:95] op_sel:[0,1]			; GLOBALNESS1-NEXT: v_pk_mov_b32 v[26:27], s[94:95], s[94:95] op_sel:[0,1]
	; GLOBALNESS1-NEXT: v_pk_mov_b32 v[28:29], s[96:97], s[96:97] op_sel:[0,1]			; GLOBALNESS1-NEXT: v_pk_mov_b32 v[28:29], s[96:97], s[96:97] op_sel:[0,1]
	; GLOBALNESS1-NEXT: v_pk_mov_b32 v[30:31], s[98:99], s[98:99] op_sel:[0,1]			; GLOBALNESS1-NEXT: v_pk_mov_b32 v[30:31], s[98:99], s[98:99] op_sel:[0,1]
	; GLOBALNESS1-NEXT: s_and_saveexec_b64 s[70:71], s[96:97]			; GLOBALNESS1-NEXT: s_and_saveexec_b64 s[70:71], s[96:97]
	; GLOBALNESS1-NEXT: s_cbranch_execz .LBB1_26			; GLOBALNESS1-NEXT: s_cbranch_execz .LBB1_26
	; GLOBALNESS1-NEXT: ; %bb.10: ; %bb33.i			; GLOBALNESS1-NEXT: ; %bb.10: ; %bb33.i
	; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS1-NEXT: global_load_dwordx2 v[0:1], v[32:33], off			; GLOBALNESS1-NEXT: global_load_dwordx2 v[0:1], v[32:33], off
	; GLOBALNESS1-NEXT: v_readlane_b32 s4, v42, 0			; GLOBALNESS1-NEXT: v_readlane_b32 s4, v40, 0
	; GLOBALNESS1-NEXT: v_readlane_b32 s5, v42, 1			; GLOBALNESS1-NEXT: v_readlane_b32 s5, v40, 1
				; GLOBALNESS1-NEXT: s_mov_b64 s[72:73], s[36:37]
				; GLOBALNESS1-NEXT: s_mov_b32 s75, s39
	; GLOBALNESS1-NEXT: s_andn2_b64 vcc, exec, s[4:5]			; GLOBALNESS1-NEXT: s_andn2_b64 vcc, exec, s[4:5]
	; GLOBALNESS1-NEXT: s_cbranch_vccnz .LBB1_12			; GLOBALNESS1-NEXT: s_cbranch_vccnz .LBB1_12
	; GLOBALNESS1-NEXT: ; %bb.11: ; %bb39.i			; GLOBALNESS1-NEXT: ; %bb.11: ; %bb39.i
	; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS1-NEXT: v_mov_b32_e32 v41, v40			; GLOBALNESS1-NEXT: v_mov_b32_e32 v43, v42
	; GLOBALNESS1-NEXT: v_pk_mov_b32 v[2:3], 0, 0			; GLOBALNESS1-NEXT: v_pk_mov_b32 v[2:3], 0, 0
	; GLOBALNESS1-NEXT: global_store_dwordx2 v[2:3], v[40:41], off			; GLOBALNESS1-NEXT: global_store_dwordx2 v[2:3], v[42:43], off
	; GLOBALNESS1-NEXT: .LBB1_12: ; %bb44.lr.ph.i			; GLOBALNESS1-NEXT: .LBB1_12: ; %bb44.lr.ph.i
	; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS1-NEXT: v_cmp_ne_u32_e32 vcc, 0, v45			; GLOBALNESS1-NEXT: v_cmp_ne_u32_e32 vcc, 0, v45
	; GLOBALNESS1-NEXT: v_cndmask_b32_e32 v2, 0, v44, vcc			; GLOBALNESS1-NEXT: v_cndmask_b32_e32 v2, 0, v44, vcc
	; GLOBALNESS1-NEXT: s_mov_b64 s[72:73], s[42:43]
	; GLOBALNESS1-NEXT: s_mov_b32 s75, s39
	; GLOBALNESS1-NEXT: s_waitcnt vmcnt(0)			; GLOBALNESS1-NEXT: s_waitcnt vmcnt(0)
	; GLOBALNESS1-NEXT: v_cmp_nlt_f64_e64 s[56:57], 0, v[0:1]			; GLOBALNESS1-NEXT: v_cmp_nlt_f64_e64 s[36:37], 0, v[0:1]
	; GLOBALNESS1-NEXT: v_cmp_eq_u32_e64 s[58:59], 0, v2			; GLOBALNESS1-NEXT: v_cmp_eq_u32_e64 s[58:59], 0, v2
	; GLOBALNESS1-NEXT: s_branch .LBB1_15			; GLOBALNESS1-NEXT: s_branch .LBB1_15
	; GLOBALNESS1-NEXT: .LBB1_13: ; %Flow7			; GLOBALNESS1-NEXT: .LBB1_13: ; %Flow7
	; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_15 Depth=2			; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_15 Depth=2
	; GLOBALNESS1-NEXT: s_or_b64 exec, exec, s[4:5]			; GLOBALNESS1-NEXT: s_or_b64 exec, exec, s[4:5]
	; GLOBALNESS1-NEXT: .LBB1_14: ; %bb63.i			; GLOBALNESS1-NEXT: .LBB1_14: ; %bb63.i
	; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_15 Depth=2			; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_15 Depth=2
	; GLOBALNESS1-NEXT: s_andn2_b64 vcc, exec, s[86:87]			; GLOBALNESS1-NEXT: s_andn2_b64 vcc, exec, s[86:87]
	; GLOBALNESS1-NEXT: s_cbranch_vccz .LBB1_25			; GLOBALNESS1-NEXT: s_cbranch_vccz .LBB1_25
	; GLOBALNESS1-NEXT: .LBB1_15: ; %bb44.i			; GLOBALNESS1-NEXT: .LBB1_15: ; %bb44.i
	; GLOBALNESS1-NEXT: ; Parent Loop BB1_4 Depth=1			; GLOBALNESS1-NEXT: ; Parent Loop BB1_4 Depth=1
	; GLOBALNESS1-NEXT: ; => This Inner Loop Header: Depth=2			; GLOBALNESS1-NEXT: ; => This Inner Loop Header: Depth=2
	; GLOBALNESS1-NEXT: s_andn2_b64 vcc, exec, s[94:95]			; GLOBALNESS1-NEXT: s_andn2_b64 vcc, exec, s[94:95]
	; GLOBALNESS1-NEXT: s_cbranch_vccnz .LBB1_14			; GLOBALNESS1-NEXT: s_cbranch_vccnz .LBB1_14
	; GLOBALNESS1-NEXT: ; %bb.16: ; %bb46.i			; GLOBALNESS1-NEXT: ; %bb.16: ; %bb46.i
	; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_15 Depth=2			; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_15 Depth=2
	; GLOBALNESS1-NEXT: s_andn2_b64 vcc, exec, s[88:89]			; GLOBALNESS1-NEXT: s_andn2_b64 vcc, exec, s[88:89]
	; GLOBALNESS1-NEXT: s_cbranch_vccnz .LBB1_14			; GLOBALNESS1-NEXT: s_cbranch_vccnz .LBB1_14
	; GLOBALNESS1-NEXT: ; %bb.17: ; %bb50.i			; GLOBALNESS1-NEXT: ; %bb.17: ; %bb50.i
	; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_15 Depth=2			; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_15 Depth=2
	; GLOBALNESS1-NEXT: s_andn2_b64 vcc, exec, s[36:37]			; GLOBALNESS1-NEXT: s_andn2_b64 vcc, exec, s[40:41]
	; GLOBALNESS1-NEXT: s_cbranch_vccnz .LBB1_20			; GLOBALNESS1-NEXT: s_cbranch_vccnz .LBB1_20
	; GLOBALNESS1-NEXT: ; %bb.18: ; %bb3.i.i			; GLOBALNESS1-NEXT: ; %bb.18: ; %bb3.i.i
	; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_15 Depth=2			; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_15 Depth=2
	; GLOBALNESS1-NEXT: s_andn2_b64 vcc, exec, s[40:41]			; GLOBALNESS1-NEXT: s_andn2_b64 vcc, exec, s[42:43]
	; GLOBALNESS1-NEXT: s_cbranch_vccnz .LBB1_20			; GLOBALNESS1-NEXT: s_cbranch_vccnz .LBB1_20
	; GLOBALNESS1-NEXT: ; %bb.19: ; %bb6.i.i			; GLOBALNESS1-NEXT: ; %bb.19: ; %bb6.i.i
	; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_15 Depth=2			; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_15 Depth=2
	; GLOBALNESS1-NEXT: s_andn2_b64 vcc, exec, s[56:57]			; GLOBALNESS1-NEXT: s_andn2_b64 vcc, exec, s[36:37]
	; GLOBALNESS1-NEXT: .LBB1_20: ; %spam.exit.i			; GLOBALNESS1-NEXT: .LBB1_20: ; %spam.exit.i
	; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_15 Depth=2			; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_15 Depth=2
	; GLOBALNESS1-NEXT: s_andn2_b64 vcc, exec, s[90:91]			; GLOBALNESS1-NEXT: s_andn2_b64 vcc, exec, s[90:91]
	; GLOBALNESS1-NEXT: s_cbranch_vccnz .LBB1_14			; GLOBALNESS1-NEXT: s_cbranch_vccnz .LBB1_14
	; GLOBALNESS1-NEXT: ; %bb.21: ; %bb55.i			; GLOBALNESS1-NEXT: ; %bb.21: ; %bb55.i
	; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_15 Depth=2			; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_15 Depth=2
	; GLOBALNESS1-NEXT: s_add_u32 s60, s62, 40			; GLOBALNESS1-NEXT: s_add_u32 s60, s62, 40
	; GLOBALNESS1-NEXT: s_addc_u32 s61, s63, 0			; GLOBALNESS1-NEXT: s_addc_u32 s61, s63, 0
	; GLOBALNESS1-NEXT: s_mov_b64 s[4:5], s[64:65]			; GLOBALNESS1-NEXT: s_mov_b64 s[4:5], s[64:65]
	; GLOBALNESS1-NEXT: s_mov_b64 s[6:7], s[54:55]			; GLOBALNESS1-NEXT: s_mov_b64 s[6:7], s[54:55]
	; GLOBALNESS1-NEXT: s_mov_b64 s[8:9], s[60:61]			; GLOBALNESS1-NEXT: s_mov_b64 s[8:9], s[60:61]
	; GLOBALNESS1-NEXT: s_mov_b64 s[10:11], s[34:35]			; GLOBALNESS1-NEXT: s_mov_b64 s[10:11], s[34:35]
	; GLOBALNESS1-NEXT: s_mov_b32 s12, s100			; GLOBALNESS1-NEXT: s_mov_b32 s12, s56
	; GLOBALNESS1-NEXT: s_mov_b32 s13, s99			; GLOBALNESS1-NEXT: s_mov_b32 s13, s99
	; GLOBALNESS1-NEXT: s_mov_b32 s14, s98			; GLOBALNESS1-NEXT: s_mov_b32 s14, s98
	; GLOBALNESS1-NEXT: v_mov_b32_e32 v31, v43			; GLOBALNESS1-NEXT: v_mov_b32_e32 v31, v41
	; GLOBALNESS1-NEXT: s_swappc_b64 s[30:31], s[66:67]			; GLOBALNESS1-NEXT: s_swappc_b64 s[30:31], s[66:67]
	; GLOBALNESS1-NEXT: v_pk_mov_b32 v[44:45], 0, 0			; GLOBALNESS1-NEXT: v_pk_mov_b32 v[44:45], 0, 0
	; GLOBALNESS1-NEXT: s_mov_b64 s[4:5], s[64:65]			; GLOBALNESS1-NEXT: s_mov_b64 s[4:5], s[64:65]
	; GLOBALNESS1-NEXT: s_mov_b64 s[6:7], s[54:55]			; GLOBALNESS1-NEXT: s_mov_b64 s[6:7], s[54:55]
	; GLOBALNESS1-NEXT: s_mov_b64 s[8:9], s[60:61]			; GLOBALNESS1-NEXT: s_mov_b64 s[8:9], s[60:61]
	; GLOBALNESS1-NEXT: s_mov_b64 s[10:11], s[34:35]			; GLOBALNESS1-NEXT: s_mov_b64 s[10:11], s[34:35]
	; GLOBALNESS1-NEXT: s_mov_b32 s12, s100			; GLOBALNESS1-NEXT: s_mov_b32 s12, s56
	; GLOBALNESS1-NEXT: s_mov_b32 s13, s99			; GLOBALNESS1-NEXT: s_mov_b32 s13, s99
	; GLOBALNESS1-NEXT: s_mov_b32 s14, s98			; GLOBALNESS1-NEXT: s_mov_b32 s14, s98
	; GLOBALNESS1-NEXT: v_mov_b32_e32 v31, v43			; GLOBALNESS1-NEXT: v_mov_b32_e32 v31, v41
	; GLOBALNESS1-NEXT: global_store_dwordx2 v[44:45], a[32:33], off			; GLOBALNESS1-NEXT: global_store_dwordx2 v[44:45], a[32:33], off
	; GLOBALNESS1-NEXT: s_swappc_b64 s[30:31], s[66:67]			; GLOBALNESS1-NEXT: s_swappc_b64 s[30:31], s[66:67]
	; GLOBALNESS1-NEXT: s_and_saveexec_b64 s[4:5], s[58:59]			; GLOBALNESS1-NEXT: s_and_saveexec_b64 s[4:5], s[58:59]
	; GLOBALNESS1-NEXT: s_cbranch_execz .LBB1_13			; GLOBALNESS1-NEXT: s_cbranch_execz .LBB1_13
	; GLOBALNESS1-NEXT: ; %bb.22: ; %bb62.i			; GLOBALNESS1-NEXT: ; %bb.22: ; %bb62.i
	; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_15 Depth=2			; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_15 Depth=2
	; GLOBALNESS1-NEXT: v_mov_b32_e32 v41, v40			; GLOBALNESS1-NEXT: v_mov_b32_e32 v43, v42
	; GLOBALNESS1-NEXT: global_store_dwordx2 v[44:45], v[40:41], off			; GLOBALNESS1-NEXT: global_store_dwordx2 v[44:45], v[42:43], off
	; GLOBALNESS1-NEXT: s_branch .LBB1_13			; GLOBALNESS1-NEXT: s_branch .LBB1_13
	; GLOBALNESS1-NEXT: .LBB1_23: ; %LeafBlock			; GLOBALNESS1-NEXT: .LBB1_23: ; %LeafBlock
	; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS1-NEXT: s_cmp_lg_u32 s39, 0			; GLOBALNESS1-NEXT: s_cmp_lg_u32 s39, 0
	; GLOBALNESS1-NEXT: s_mov_b64 s[4:5], 0			; GLOBALNESS1-NEXT: s_mov_b64 s[4:5], 0
	; GLOBALNESS1-NEXT: s_cselect_b64 s[6:7], -1, 0			; GLOBALNESS1-NEXT: s_cselect_b64 s[6:7], -1, 0
	; GLOBALNESS1-NEXT: s_and_b64 vcc, exec, s[6:7]			; GLOBALNESS1-NEXT: s_and_b64 vcc, exec, s[6:7]
	; GLOBALNESS1-NEXT: s_cbranch_vccnz .LBB1_9			; GLOBALNESS1-NEXT: s_cbranch_vccnz .LBB1_9
	; GLOBALNESS1-NEXT: .LBB1_24: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS1-NEXT: .LBB1_24: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS1-NEXT: s_mov_b64 s[6:7], -1			; GLOBALNESS1-NEXT: s_mov_b64 s[6:7], -1
	; GLOBALNESS1-NEXT: ; implicit-def: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31			; GLOBALNESS1-NEXT: ; implicit-def: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31
	; GLOBALNESS1-NEXT: s_branch .LBB1_3			; GLOBALNESS1-NEXT: s_branch .LBB1_3
	; GLOBALNESS1-NEXT: .LBB1_25: ; %Flow14			; GLOBALNESS1-NEXT: .LBB1_25: ; %Flow14
	; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS1-NEXT: s_mov_b64 s[4:5], s[36:37]
	; GLOBALNESS1-NEXT: s_mov_b32 s36, s93			; GLOBALNESS1-NEXT: s_mov_b32 s36, s93
	; GLOBALNESS1-NEXT: s_mov_b32 s37, s93			; GLOBALNESS1-NEXT: s_mov_b32 s37, s93
	; GLOBALNESS1-NEXT: s_mov_b32 s38, s93			; GLOBALNESS1-NEXT: s_mov_b32 s38, s93
	; GLOBALNESS1-NEXT: s_mov_b32 s39, s93			; GLOBALNESS1-NEXT: s_mov_b32 s39, s93
	; GLOBALNESS1-NEXT: s_mov_b64 s[6:7], s[40:41]			; GLOBALNESS1-NEXT: s_mov_b64 s[4:5], s[40:41]
	; GLOBALNESS1-NEXT: s_mov_b32 s40, s93			; GLOBALNESS1-NEXT: s_mov_b32 s40, s93
	; GLOBALNESS1-NEXT: s_mov_b32 s41, s93			; GLOBALNESS1-NEXT: s_mov_b32 s41, s93
				; GLOBALNESS1-NEXT: s_mov_b64 s[6:7], s[42:43]
	; GLOBALNESS1-NEXT: s_mov_b32 s42, s93			; GLOBALNESS1-NEXT: s_mov_b32 s42, s93
	; GLOBALNESS1-NEXT: s_mov_b32 s43, s93			; GLOBALNESS1-NEXT: s_mov_b32 s43, s93
	; GLOBALNESS1-NEXT: s_mov_b32 s44, s93			; GLOBALNESS1-NEXT: s_mov_b32 s44, s93
	; GLOBALNESS1-NEXT: s_mov_b32 s45, s93			; GLOBALNESS1-NEXT: s_mov_b32 s45, s93
	; GLOBALNESS1-NEXT: s_mov_b32 s46, s93			; GLOBALNESS1-NEXT: s_mov_b32 s46, s93
	; GLOBALNESS1-NEXT: s_mov_b32 s47, s93			; GLOBALNESS1-NEXT: s_mov_b32 s47, s93
	; GLOBALNESS1-NEXT: s_mov_b32 s48, s93			; GLOBALNESS1-NEXT: s_mov_b32 s48, s93
	; GLOBALNESS1-NEXT: s_mov_b32 s49, s93			; GLOBALNESS1-NEXT: s_mov_b32 s49, s93
	Show All 12 Lines
	; GLOBALNESS1-NEXT: v_pk_mov_b32 v[16:17], s[52:53], s[52:53] op_sel:[0,1]			; GLOBALNESS1-NEXT: v_pk_mov_b32 v[16:17], s[52:53], s[52:53] op_sel:[0,1]
	; GLOBALNESS1-NEXT: v_pk_mov_b32 v[18:19], s[54:55], s[54:55] op_sel:[0,1]			; GLOBALNESS1-NEXT: v_pk_mov_b32 v[18:19], s[54:55], s[54:55] op_sel:[0,1]
	; GLOBALNESS1-NEXT: v_pk_mov_b32 v[20:21], s[56:57], s[56:57] op_sel:[0,1]			; GLOBALNESS1-NEXT: v_pk_mov_b32 v[20:21], s[56:57], s[56:57] op_sel:[0,1]
	; GLOBALNESS1-NEXT: v_pk_mov_b32 v[22:23], s[58:59], s[58:59] op_sel:[0,1]			; GLOBALNESS1-NEXT: v_pk_mov_b32 v[22:23], s[58:59], s[58:59] op_sel:[0,1]
	; GLOBALNESS1-NEXT: v_pk_mov_b32 v[24:25], s[60:61], s[60:61] op_sel:[0,1]			; GLOBALNESS1-NEXT: v_pk_mov_b32 v[24:25], s[60:61], s[60:61] op_sel:[0,1]
	; GLOBALNESS1-NEXT: v_pk_mov_b32 v[26:27], s[62:63], s[62:63] op_sel:[0,1]			; GLOBALNESS1-NEXT: v_pk_mov_b32 v[26:27], s[62:63], s[62:63] op_sel:[0,1]
	; GLOBALNESS1-NEXT: v_pk_mov_b32 v[28:29], s[64:65], s[64:65] op_sel:[0,1]			; GLOBALNESS1-NEXT: v_pk_mov_b32 v[28:29], s[64:65], s[64:65] op_sel:[0,1]
	; GLOBALNESS1-NEXT: v_pk_mov_b32 v[30:31], s[66:67], s[66:67] op_sel:[0,1]			; GLOBALNESS1-NEXT: v_pk_mov_b32 v[30:31], s[66:67], s[66:67] op_sel:[0,1]
	; GLOBALNESS1-NEXT: s_mov_b64 s[40:41], s[6:7]			; GLOBALNESS1-NEXT: s_mov_b64 s[42:43], s[6:7]
	; GLOBALNESS1-NEXT: s_mov_b64 s[36:37], s[4:5]			; GLOBALNESS1-NEXT: s_mov_b64 s[40:41], s[4:5]
	; GLOBALNESS1-NEXT: s_mov_b32 s39, s75			; GLOBALNESS1-NEXT: s_mov_b32 s39, s75
	; GLOBALNESS1-NEXT: s_mov_b64 s[42:43], s[72:73]			; GLOBALNESS1-NEXT: s_mov_b64 s[36:37], s[72:73]
	; GLOBALNESS1-NEXT: .LBB1_26: ; %Flow15			; GLOBALNESS1-NEXT: .LBB1_26: ; %Flow15
	; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS1-NEXT: s_or_b64 exec, exec, s[70:71]			; GLOBALNESS1-NEXT: s_or_b64 exec, exec, s[70:71]
	; GLOBALNESS1-NEXT: s_and_saveexec_b64 s[4:5], s[96:97]			; GLOBALNESS1-NEXT: s_and_saveexec_b64 s[4:5], s[96:97]
	; GLOBALNESS1-NEXT: s_cbranch_execz .LBB1_2			; GLOBALNESS1-NEXT: s_cbranch_execz .LBB1_2
	; GLOBALNESS1-NEXT: ; %bb.27: ; %bb67.i			; GLOBALNESS1-NEXT: ; %bb.27: ; %bb67.i
	; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS1-NEXT: v_readlane_b32 s6, v42, 2			; GLOBALNESS1-NEXT: v_readlane_b32 s6, v40, 2
	; GLOBALNESS1-NEXT: v_readlane_b32 s7, v42, 3			; GLOBALNESS1-NEXT: v_readlane_b32 s7, v40, 3
	; GLOBALNESS1-NEXT: s_andn2_b64 vcc, exec, s[6:7]			; GLOBALNESS1-NEXT: s_andn2_b64 vcc, exec, s[6:7]
	; GLOBALNESS1-NEXT: s_cbranch_vccnz .LBB1_1			; GLOBALNESS1-NEXT: s_cbranch_vccnz .LBB1_1
	; GLOBALNESS1-NEXT: ; %bb.28: ; %bb69.i			; GLOBALNESS1-NEXT: ; %bb.28: ; %bb69.i
	; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS1-NEXT: v_mov_b32_e32 v41, v40			; GLOBALNESS1-NEXT: v_mov_b32_e32 v43, v42
	; GLOBALNESS1-NEXT: v_pk_mov_b32 v[32:33], 0, 0			; GLOBALNESS1-NEXT: v_pk_mov_b32 v[32:33], 0, 0
	; GLOBALNESS1-NEXT: global_store_dwordx2 v[32:33], v[40:41], off			; GLOBALNESS1-NEXT: global_store_dwordx2 v[32:33], v[42:43], off
	; GLOBALNESS1-NEXT: s_branch .LBB1_1			; GLOBALNESS1-NEXT: s_branch .LBB1_1
	; GLOBALNESS1-NEXT: .LBB1_29: ; %bb73.i			; GLOBALNESS1-NEXT: .LBB1_29: ; %bb73.i
	; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS1-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS1-NEXT: v_mov_b32_e32 v41, v40			; GLOBALNESS1-NEXT: v_mov_b32_e32 v43, v42
	; GLOBALNESS1-NEXT: v_pk_mov_b32 v[32:33], 0, 0			; GLOBALNESS1-NEXT: v_pk_mov_b32 v[32:33], 0, 0
	; GLOBALNESS1-NEXT: global_store_dwordx2 v[32:33], v[40:41], off			; GLOBALNESS1-NEXT: global_store_dwordx2 v[32:33], v[42:43], off
	; GLOBALNESS1-NEXT: s_branch .LBB1_2			; GLOBALNESS1-NEXT: s_branch .LBB1_2
	; GLOBALNESS1-NEXT: .LBB1_30: ; %loop.exit.guard			; GLOBALNESS1-NEXT: .LBB1_30: ; %loop.exit.guard
	; GLOBALNESS1-NEXT: s_andn2_b64 vcc, exec, s[4:5]			; GLOBALNESS1-NEXT: s_andn2_b64 vcc, exec, s[4:5]
	; GLOBALNESS1-NEXT: s_mov_b64 s[4:5], -1			; GLOBALNESS1-NEXT: s_mov_b64 s[4:5], -1
	; GLOBALNESS1-NEXT: s_cbranch_vccz .LBB1_32			; GLOBALNESS1-NEXT: s_cbranch_vccz .LBB1_32
	; GLOBALNESS1-NEXT: ; %bb.31: ; %bb7.i.i			; GLOBALNESS1-NEXT: ; %bb.31: ; %bb7.i.i
	; GLOBALNESS1-NEXT: s_add_u32 s8, s62, 40			; GLOBALNESS1-NEXT: s_add_u32 s8, s62, 40
	; GLOBALNESS1-NEXT: s_addc_u32 s9, s63, 0			; GLOBALNESS1-NEXT: s_addc_u32 s9, s63, 0
	; GLOBALNESS1-NEXT: s_mov_b64 s[4:5], s[64:65]			; GLOBALNESS1-NEXT: s_mov_b64 s[4:5], s[64:65]
	; GLOBALNESS1-NEXT: s_mov_b64 s[6:7], s[54:55]			; GLOBALNESS1-NEXT: s_mov_b64 s[6:7], s[54:55]
	; GLOBALNESS1-NEXT: s_mov_b64 s[10:11], s[34:35]			; GLOBALNESS1-NEXT: s_mov_b64 s[10:11], s[34:35]
	; GLOBALNESS1-NEXT: s_mov_b32 s12, s100			; GLOBALNESS1-NEXT: s_mov_b32 s12, s56
	; GLOBALNESS1-NEXT: s_mov_b32 s13, s99			; GLOBALNESS1-NEXT: s_mov_b32 s13, s99
	; GLOBALNESS1-NEXT: s_mov_b32 s14, s98			; GLOBALNESS1-NEXT: s_mov_b32 s14, s98
	; GLOBALNESS1-NEXT: v_mov_b32_e32 v31, v43			; GLOBALNESS1-NEXT: v_mov_b32_e32 v31, v41
	; GLOBALNESS1-NEXT: s_getpc_b64 s[16:17]			; GLOBALNESS1-NEXT: s_getpc_b64 s[16:17]
	; GLOBALNESS1-NEXT: s_add_u32 s16, s16, widget@rel32@lo+4			; GLOBALNESS1-NEXT: s_add_u32 s16, s16, widget@rel32@lo+4
	; GLOBALNESS1-NEXT: s_addc_u32 s17, s17, widget@rel32@hi+12			; GLOBALNESS1-NEXT: s_addc_u32 s17, s17, widget@rel32@hi+12
	; GLOBALNESS1-NEXT: s_swappc_b64 s[30:31], s[16:17]			; GLOBALNESS1-NEXT: s_swappc_b64 s[30:31], s[16:17]
	; GLOBALNESS1-NEXT: s_mov_b64 s[4:5], 0			; GLOBALNESS1-NEXT: s_mov_b64 s[4:5], 0
	; GLOBALNESS1-NEXT: .LBB1_32: ; %Flow			; GLOBALNESS1-NEXT: .LBB1_32: ; %Flow
	; GLOBALNESS1-NEXT: s_andn2_b64 vcc, exec, s[4:5]			; GLOBALNESS1-NEXT: s_andn2_b64 vcc, exec, s[4:5]
	; GLOBALNESS1-NEXT: s_cbranch_vccnz .LBB1_34			; GLOBALNESS1-NEXT: s_cbranch_vccnz .LBB1_34
	; GLOBALNESS1-NEXT: ; %bb.33: ; %bb11.i.i			; GLOBALNESS1-NEXT: ; %bb.33: ; %bb11.i.i
	; GLOBALNESS1-NEXT: s_add_u32 s8, s62, 40			; GLOBALNESS1-NEXT: s_add_u32 s8, s62, 40
	; GLOBALNESS1-NEXT: s_addc_u32 s9, s63, 0			; GLOBALNESS1-NEXT: s_addc_u32 s9, s63, 0
	; GLOBALNESS1-NEXT: s_mov_b64 s[4:5], s[64:65]			; GLOBALNESS1-NEXT: s_mov_b64 s[4:5], s[64:65]
	; GLOBALNESS1-NEXT: s_mov_b64 s[6:7], s[54:55]			; GLOBALNESS1-NEXT: s_mov_b64 s[6:7], s[54:55]
	; GLOBALNESS1-NEXT: s_mov_b64 s[10:11], s[34:35]			; GLOBALNESS1-NEXT: s_mov_b64 s[10:11], s[34:35]
	; GLOBALNESS1-NEXT: s_mov_b32 s12, s100			; GLOBALNESS1-NEXT: s_mov_b32 s12, s56
	; GLOBALNESS1-NEXT: s_mov_b32 s13, s99			; GLOBALNESS1-NEXT: s_mov_b32 s13, s99
	; GLOBALNESS1-NEXT: s_mov_b32 s14, s98			; GLOBALNESS1-NEXT: s_mov_b32 s14, s98
	; GLOBALNESS1-NEXT: v_mov_b32_e32 v31, v43			; GLOBALNESS1-NEXT: v_mov_b32_e32 v31, v41
	; GLOBALNESS1-NEXT: s_getpc_b64 s[16:17]			; GLOBALNESS1-NEXT: s_getpc_b64 s[16:17]
	; GLOBALNESS1-NEXT: s_add_u32 s16, s16, widget@rel32@lo+4			; GLOBALNESS1-NEXT: s_add_u32 s16, s16, widget@rel32@lo+4
	; GLOBALNESS1-NEXT: s_addc_u32 s17, s17, widget@rel32@hi+12			; GLOBALNESS1-NEXT: s_addc_u32 s17, s17, widget@rel32@hi+12
	; GLOBALNESS1-NEXT: s_swappc_b64 s[30:31], s[16:17]			; GLOBALNESS1-NEXT: s_swappc_b64 s[30:31], s[16:17]
	; GLOBALNESS1-NEXT: .LBB1_34: ; %UnifiedUnreachableBlock			; GLOBALNESS1-NEXT: .LBB1_34: ; %UnifiedUnreachableBlock
	;			;
	; GLOBALNESS0-LABEL: kernel:			; GLOBALNESS0-LABEL: kernel:
	; GLOBALNESS0: ; %bb.0: ; %bb			; GLOBALNESS0: ; %bb.0: ; %bb
	; GLOBALNESS0-NEXT: s_mov_b64 s[54:55], s[6:7]			; GLOBALNESS0-NEXT: s_mov_b64 s[54:55], s[6:7]
	; GLOBALNESS0-NEXT: s_load_dwordx4 s[36:39], s[8:9], 0x0			; GLOBALNESS0-NEXT: s_load_dwordx4 s[36:39], s[8:9], 0x0
	; GLOBALNESS0-NEXT: s_load_dword s6, s[8:9], 0x14			; GLOBALNESS0-NEXT: s_load_dword s6, s[8:9], 0x14
	; GLOBALNESS0-NEXT: v_mov_b32_e32 v43, v0			; GLOBALNESS0-NEXT: v_mov_b32_e32 v41, v0
	; GLOBALNESS0-NEXT: v_mov_b32_e32 v40, 0			; GLOBALNESS0-NEXT: v_mov_b32_e32 v42, 0
	; GLOBALNESS0-NEXT: v_pk_mov_b32 v[0:1], 0, 0			; GLOBALNESS0-NEXT: v_pk_mov_b32 v[0:1], 0, 0
	; GLOBALNESS0-NEXT: global_store_dword v[0:1], v40, off			; GLOBALNESS0-NEXT: global_store_dword v[0:1], v42, off
	; GLOBALNESS0-NEXT: s_waitcnt lgkmcnt(0)			; GLOBALNESS0-NEXT: s_waitcnt lgkmcnt(0)
	; GLOBALNESS0-NEXT: global_load_dword v0, v40, s[36:37]			; GLOBALNESS0-NEXT: global_load_dword v0, v42, s[36:37]
	; GLOBALNESS0-NEXT: s_add_u32 flat_scratch_lo, s12, s17			; GLOBALNESS0-NEXT: s_add_u32 flat_scratch_lo, s12, s17
	; GLOBALNESS0-NEXT: s_mov_b64 s[62:63], s[4:5]			; GLOBALNESS0-NEXT: s_mov_b64 s[62:63], s[4:5]
	; GLOBALNESS0-NEXT: s_load_dwordx2 s[4:5], s[8:9], 0x18			; GLOBALNESS0-NEXT: s_load_dwordx2 s[4:5], s[8:9], 0x18
	; GLOBALNESS0-NEXT: s_load_dword s7, s[8:9], 0x20			; GLOBALNESS0-NEXT: s_load_dword s7, s[8:9], 0x20
	; GLOBALNESS0-NEXT: s_addc_u32 flat_scratch_hi, s13, 0			; GLOBALNESS0-NEXT: s_addc_u32 flat_scratch_hi, s13, 0
	; GLOBALNESS0-NEXT: s_add_u32 s0, s0, s17			; GLOBALNESS0-NEXT: s_add_u32 s0, s0, s17
	; GLOBALNESS0-NEXT: s_addc_u32 s1, s1, 0			; GLOBALNESS0-NEXT: s_addc_u32 s1, s1, 0
	; GLOBALNESS0-NEXT: v_mov_b32_e32 v41, 0x40994400			; GLOBALNESS0-NEXT: v_mov_b32_e32 v43, 0x40994400
	; GLOBALNESS0-NEXT: s_bitcmp1_b32 s38, 0			; GLOBALNESS0-NEXT: s_bitcmp1_b32 s38, 0
	; GLOBALNESS0-NEXT: s_waitcnt lgkmcnt(0)			; GLOBALNESS0-NEXT: s_waitcnt lgkmcnt(0)
	; GLOBALNESS0-NEXT: v_cmp_ngt_f64_e64 s[36:37], s[4:5], v[40:41]			; GLOBALNESS0-NEXT: v_cmp_ngt_f64_e64 s[40:41], s[4:5], v[42:43]
	; GLOBALNESS0-NEXT: v_cmp_ngt_f64_e64 s[40:41], s[4:5], 0			; GLOBALNESS0-NEXT: v_cmp_ngt_f64_e64 s[42:43], s[4:5], 0
	; GLOBALNESS0-NEXT: s_cselect_b64 s[4:5], -1, 0			; GLOBALNESS0-NEXT: s_cselect_b64 s[4:5], -1, 0
	; GLOBALNESS0-NEXT: s_xor_b64 s[94:95], s[4:5], -1			; GLOBALNESS0-NEXT: s_xor_b64 s[94:95], s[4:5], -1
	; GLOBALNESS0-NEXT: s_bitcmp1_b32 s6, 0			; GLOBALNESS0-NEXT: s_bitcmp1_b32 s6, 0
	; GLOBALNESS0-NEXT: v_cndmask_b32_e64 v1, 0, 1, s[4:5]			; GLOBALNESS0-NEXT: v_cndmask_b32_e64 v1, 0, 1, s[4:5]
	; GLOBALNESS0-NEXT: s_cselect_b64 s[4:5], -1, 0			; GLOBALNESS0-NEXT: s_cselect_b64 s[4:5], -1, 0
	; GLOBALNESS0-NEXT: s_xor_b64 s[88:89], s[4:5], -1			; GLOBALNESS0-NEXT: s_xor_b64 s[88:89], s[4:5], -1
	; GLOBALNESS0-NEXT: s_bitcmp1_b32 s7, 0			; GLOBALNESS0-NEXT: s_bitcmp1_b32 s7, 0
	; GLOBALNESS0-NEXT: s_cselect_b64 s[4:5], -1, 0			; GLOBALNESS0-NEXT: s_cselect_b64 s[4:5], -1, 0
	; GLOBALNESS0-NEXT: s_getpc_b64 s[6:7]			; GLOBALNESS0-NEXT: s_getpc_b64 s[6:7]
	; GLOBALNESS0-NEXT: s_add_u32 s6, s6, wobble@gotpcrel32@lo+4			; GLOBALNESS0-NEXT: s_add_u32 s6, s6, wobble@gotpcrel32@lo+4
	; GLOBALNESS0-NEXT: s_addc_u32 s7, s7, wobble@gotpcrel32@hi+12			; GLOBALNESS0-NEXT: s_addc_u32 s7, s7, wobble@gotpcrel32@hi+12
	; GLOBALNESS0-NEXT: s_xor_b64 s[86:87], s[4:5], -1			; GLOBALNESS0-NEXT: s_xor_b64 s[86:87], s[4:5], -1
				; GLOBALNESS0-NEXT: ; implicit-def: $vgpr40
	; GLOBALNESS0-NEXT: s_load_dwordx2 s[66:67], s[6:7], 0x0			; GLOBALNESS0-NEXT: s_load_dwordx2 s[66:67], s[6:7], 0x0
	; GLOBALNESS0-NEXT: s_mov_b32 s98, s16			; GLOBALNESS0-NEXT: s_mov_b32 s98, s16
	; GLOBALNESS0-NEXT: s_mov_b64 s[60:61], s[8:9]			; GLOBALNESS0-NEXT: s_mov_b64 s[60:61], s[8:9]
	; GLOBALNESS0-NEXT: s_mov_b32 s99, s15			; GLOBALNESS0-NEXT: s_mov_b32 s99, s15
	; GLOBALNESS0-NEXT: s_mov_b32 s100, s14			; GLOBALNESS0-NEXT: s_mov_b32 s56, s14
	; GLOBALNESS0-NEXT: s_mov_b64 s[34:35], s[10:11]			; GLOBALNESS0-NEXT: s_mov_b64 s[34:35], s[10:11]
	; GLOBALNESS0-NEXT: s_mov_b64 s[92:93], 0x80			; GLOBALNESS0-NEXT: s_mov_b64 s[92:93], 0x80
	; GLOBALNESS0-NEXT: v_cmp_ne_u32_e64 s[42:43], 1, v1			; GLOBALNESS0-NEXT: v_cmp_ne_u32_e64 s[36:37], 1, v1
	; GLOBALNESS0-NEXT: s_mov_b32 s69, 0x3ff00000			; GLOBALNESS0-NEXT: s_mov_b32 s69, 0x3ff00000
	; GLOBALNESS0-NEXT: s_mov_b32 s32, 0			; GLOBALNESS0-NEXT: s_mov_b32 s32, 0
	; GLOBALNESS0-NEXT: ; implicit-def: $agpr32_agpr33_agpr34_agpr35_agpr36_agpr37_agpr38_agpr39_agpr40_agpr41_agpr42_agpr43_agpr44_agpr45_agpr46_agpr47_agpr48_agpr49_agpr50_agpr51_agpr52_agpr53_agpr54_agpr55_agpr56_agpr57_agpr58_agpr59_agpr60_agpr61_agpr62_agpr63			; GLOBALNESS0-NEXT: ; implicit-def: $agpr32_agpr33_agpr34_agpr35_agpr36_agpr37_agpr38_agpr39_agpr40_agpr41_agpr42_agpr43_agpr44_agpr45_agpr46_agpr47_agpr48_agpr49_agpr50_agpr51_agpr52_agpr53_agpr54_agpr55_agpr56_agpr57_agpr58_agpr59_agpr60_agpr61_agpr62_agpr63
	; GLOBALNESS0-NEXT: s_waitcnt vmcnt(0)			; GLOBALNESS0-NEXT: s_waitcnt vmcnt(0)
	; GLOBALNESS0-NEXT: v_cmp_gt_i32_e64 s[4:5], 0, v0			; GLOBALNESS0-NEXT: v_cmp_gt_i32_e64 s[4:5], 0, v0
	; GLOBALNESS0-NEXT: v_writelane_b32 v42, s4, 0			; GLOBALNESS0-NEXT: v_writelane_b32 v40, s4, 0
	; GLOBALNESS0-NEXT: v_writelane_b32 v42, s5, 1			; GLOBALNESS0-NEXT: v_writelane_b32 v40, s5, 1
	; GLOBALNESS0-NEXT: v_cmp_eq_u32_e64 s[4:5], 1, v0			; GLOBALNESS0-NEXT: v_cmp_eq_u32_e64 s[4:5], 1, v0
	; GLOBALNESS0-NEXT: v_writelane_b32 v42, s4, 2			; GLOBALNESS0-NEXT: v_writelane_b32 v40, s4, 2
	; GLOBALNESS0-NEXT: v_writelane_b32 v42, s5, 3			; GLOBALNESS0-NEXT: v_writelane_b32 v40, s5, 3
	; GLOBALNESS0-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v0			; GLOBALNESS0-NEXT: v_cmp_eq_u32_e64 s[4:5], 0, v0
	; GLOBALNESS0-NEXT: v_writelane_b32 v42, s4, 4			; GLOBALNESS0-NEXT: v_writelane_b32 v40, s4, 4
	; GLOBALNESS0-NEXT: v_cmp_gt_i32_e64 s[90:91], 1, v0			; GLOBALNESS0-NEXT: v_cmp_gt_i32_e64 s[90:91], 1, v0
	; GLOBALNESS0-NEXT: v_writelane_b32 v42, s5, 5			; GLOBALNESS0-NEXT: v_writelane_b32 v40, s5, 5
	; GLOBALNESS0-NEXT: s_branch .LBB1_4			; GLOBALNESS0-NEXT: s_branch .LBB1_4
	; GLOBALNESS0-NEXT: .LBB1_1: ; %bb70.i			; GLOBALNESS0-NEXT: .LBB1_1: ; %bb70.i
	; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS0-NEXT: v_readlane_b32 s6, v42, 4			; GLOBALNESS0-NEXT: v_readlane_b32 s6, v40, 4
	; GLOBALNESS0-NEXT: v_readlane_b32 s7, v42, 5			; GLOBALNESS0-NEXT: v_readlane_b32 s7, v40, 5
	; GLOBALNESS0-NEXT: s_andn2_b64 vcc, exec, s[6:7]			; GLOBALNESS0-NEXT: s_andn2_b64 vcc, exec, s[6:7]
	; GLOBALNESS0-NEXT: s_cbranch_vccz .LBB1_29			; GLOBALNESS0-NEXT: s_cbranch_vccz .LBB1_29
	; GLOBALNESS0-NEXT: .LBB1_2: ; %Flow6			; GLOBALNESS0-NEXT: .LBB1_2: ; %Flow6
	; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS0-NEXT: s_or_b64 exec, exec, s[4:5]			; GLOBALNESS0-NEXT: s_or_b64 exec, exec, s[4:5]
	; GLOBALNESS0-NEXT: s_mov_b64 s[6:7], 0			; GLOBALNESS0-NEXT: s_mov_b64 s[6:7], 0
	; GLOBALNESS0-NEXT: ; implicit-def: $sgpr4_sgpr5			; GLOBALNESS0-NEXT: ; implicit-def: $sgpr4_sgpr5
	; GLOBALNESS0-NEXT: .LBB1_3: ; %Flow19			; GLOBALNESS0-NEXT: .LBB1_3: ; %Flow19
	Show All 33 Lines
	; GLOBALNESS0-NEXT: v_accvgpr_write_b32 a32, v0			; GLOBALNESS0-NEXT: v_accvgpr_write_b32 a32, v0
	; GLOBALNESS0-NEXT: s_cbranch_vccnz .LBB1_30			; GLOBALNESS0-NEXT: s_cbranch_vccnz .LBB1_30
	; GLOBALNESS0-NEXT: .LBB1_4: ; %bb5			; GLOBALNESS0-NEXT: .LBB1_4: ; %bb5
	; GLOBALNESS0-NEXT: ; =>This Loop Header: Depth=1			; GLOBALNESS0-NEXT: ; =>This Loop Header: Depth=1
	; GLOBALNESS0-NEXT: ; Child Loop BB1_15 Depth 2			; GLOBALNESS0-NEXT: ; Child Loop BB1_15 Depth 2
	; GLOBALNESS0-NEXT: v_pk_mov_b32 v[0:1], s[92:93], s[92:93] op_sel:[0,1]			; GLOBALNESS0-NEXT: v_pk_mov_b32 v[0:1], s[92:93], s[92:93] op_sel:[0,1]
	; GLOBALNESS0-NEXT: flat_load_dword v44, v[0:1]			; GLOBALNESS0-NEXT: flat_load_dword v44, v[0:1]
	; GLOBALNESS0-NEXT: s_add_u32 s8, s60, 40			; GLOBALNESS0-NEXT: s_add_u32 s8, s60, 40
	; GLOBALNESS0-NEXT: buffer_store_dword v40, off, s[0:3], 0			; GLOBALNESS0-NEXT: buffer_store_dword v42, off, s[0:3], 0
	; GLOBALNESS0-NEXT: flat_load_dword v45, v[0:1]			; GLOBALNESS0-NEXT: flat_load_dword v45, v[0:1]
	; GLOBALNESS0-NEXT: s_addc_u32 s9, s61, 0			; GLOBALNESS0-NEXT: s_addc_u32 s9, s61, 0
	; GLOBALNESS0-NEXT: s_mov_b64 s[4:5], s[62:63]			; GLOBALNESS0-NEXT: s_mov_b64 s[4:5], s[62:63]
	; GLOBALNESS0-NEXT: s_mov_b64 s[6:7], s[54:55]			; GLOBALNESS0-NEXT: s_mov_b64 s[6:7], s[54:55]
	; GLOBALNESS0-NEXT: s_mov_b64 s[10:11], s[34:35]			; GLOBALNESS0-NEXT: s_mov_b64 s[10:11], s[34:35]
	; GLOBALNESS0-NEXT: s_mov_b32 s12, s100			; GLOBALNESS0-NEXT: s_mov_b32 s12, s56
	; GLOBALNESS0-NEXT: s_mov_b32 s13, s99			; GLOBALNESS0-NEXT: s_mov_b32 s13, s99
	; GLOBALNESS0-NEXT: s_mov_b32 s14, s98			; GLOBALNESS0-NEXT: s_mov_b32 s14, s98
	; GLOBALNESS0-NEXT: v_mov_b32_e32 v31, v43			; GLOBALNESS0-NEXT: v_mov_b32_e32 v31, v41
	; GLOBALNESS0-NEXT: s_waitcnt lgkmcnt(0)			; GLOBALNESS0-NEXT: s_waitcnt lgkmcnt(0)
	; GLOBALNESS0-NEXT: s_swappc_b64 s[30:31], s[66:67]			; GLOBALNESS0-NEXT: s_swappc_b64 s[30:31], s[66:67]
	; GLOBALNESS0-NEXT: s_and_b64 vcc, exec, s[42:43]			; GLOBALNESS0-NEXT: s_and_b64 vcc, exec, s[36:37]
	; GLOBALNESS0-NEXT: s_mov_b64 s[6:7], -1			; GLOBALNESS0-NEXT: s_mov_b64 s[6:7], -1
	; GLOBALNESS0-NEXT: ; implicit-def: $sgpr4_sgpr5			; GLOBALNESS0-NEXT: ; implicit-def: $sgpr4_sgpr5
	; GLOBALNESS0-NEXT: s_cbranch_vccnz .LBB1_8			; GLOBALNESS0-NEXT: s_cbranch_vccnz .LBB1_8
	; GLOBALNESS0-NEXT: ; %bb.5: ; %NodeBlock			; GLOBALNESS0-NEXT: ; %bb.5: ; %NodeBlock
	; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS0-NEXT: s_cmp_lt_i32 s39, 1			; GLOBALNESS0-NEXT: s_cmp_lt_i32 s39, 1
	; GLOBALNESS0-NEXT: s_cbranch_scc1 .LBB1_7			; GLOBALNESS0-NEXT: s_cbranch_scc1 .LBB1_7
	; GLOBALNESS0-NEXT: ; %bb.6: ; %LeafBlock3			; GLOBALNESS0-NEXT: ; %bb.6: ; %LeafBlock3
	▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
	; GLOBALNESS0-NEXT: v_pk_mov_b32 v[26:27], s[94:95], s[94:95] op_sel:[0,1]			; GLOBALNESS0-NEXT: v_pk_mov_b32 v[26:27], s[94:95], s[94:95] op_sel:[0,1]
	; GLOBALNESS0-NEXT: v_pk_mov_b32 v[28:29], s[96:97], s[96:97] op_sel:[0,1]			; GLOBALNESS0-NEXT: v_pk_mov_b32 v[28:29], s[96:97], s[96:97] op_sel:[0,1]
	; GLOBALNESS0-NEXT: v_pk_mov_b32 v[30:31], s[98:99], s[98:99] op_sel:[0,1]			; GLOBALNESS0-NEXT: v_pk_mov_b32 v[30:31], s[98:99], s[98:99] op_sel:[0,1]
	; GLOBALNESS0-NEXT: s_and_saveexec_b64 s[70:71], s[96:97]			; GLOBALNESS0-NEXT: s_and_saveexec_b64 s[70:71], s[96:97]
	; GLOBALNESS0-NEXT: s_cbranch_execz .LBB1_26			; GLOBALNESS0-NEXT: s_cbranch_execz .LBB1_26
	; GLOBALNESS0-NEXT: ; %bb.10: ; %bb33.i			; GLOBALNESS0-NEXT: ; %bb.10: ; %bb33.i
	; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS0-NEXT: global_load_dwordx2 v[0:1], v[32:33], off			; GLOBALNESS0-NEXT: global_load_dwordx2 v[0:1], v[32:33], off
	; GLOBALNESS0-NEXT: v_readlane_b32 s4, v42, 0			; GLOBALNESS0-NEXT: v_readlane_b32 s4, v40, 0
	; GLOBALNESS0-NEXT: v_readlane_b32 s5, v42, 1			; GLOBALNESS0-NEXT: v_readlane_b32 s5, v40, 1
				; GLOBALNESS0-NEXT: s_mov_b64 s[72:73], s[36:37]
				; GLOBALNESS0-NEXT: s_mov_b32 s75, s39
	; GLOBALNESS0-NEXT: s_andn2_b64 vcc, exec, s[4:5]			; GLOBALNESS0-NEXT: s_andn2_b64 vcc, exec, s[4:5]
	; GLOBALNESS0-NEXT: s_cbranch_vccnz .LBB1_12			; GLOBALNESS0-NEXT: s_cbranch_vccnz .LBB1_12
	; GLOBALNESS0-NEXT: ; %bb.11: ; %bb39.i			; GLOBALNESS0-NEXT: ; %bb.11: ; %bb39.i
	; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS0-NEXT: v_mov_b32_e32 v41, v40			; GLOBALNESS0-NEXT: v_mov_b32_e32 v43, v42
	; GLOBALNESS0-NEXT: v_pk_mov_b32 v[2:3], 0, 0			; GLOBALNESS0-NEXT: v_pk_mov_b32 v[2:3], 0, 0
	; GLOBALNESS0-NEXT: global_store_dwordx2 v[2:3], v[40:41], off			; GLOBALNESS0-NEXT: global_store_dwordx2 v[2:3], v[42:43], off
	; GLOBALNESS0-NEXT: .LBB1_12: ; %bb44.lr.ph.i			; GLOBALNESS0-NEXT: .LBB1_12: ; %bb44.lr.ph.i
	; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS0-NEXT: v_cmp_ne_u32_e32 vcc, 0, v45			; GLOBALNESS0-NEXT: v_cmp_ne_u32_e32 vcc, 0, v45
	; GLOBALNESS0-NEXT: v_cndmask_b32_e32 v2, 0, v44, vcc			; GLOBALNESS0-NEXT: v_cndmask_b32_e32 v2, 0, v44, vcc
	; GLOBALNESS0-NEXT: s_mov_b64 s[72:73], s[42:43]
	; GLOBALNESS0-NEXT: s_mov_b32 s75, s39
	; GLOBALNESS0-NEXT: s_waitcnt vmcnt(0)			; GLOBALNESS0-NEXT: s_waitcnt vmcnt(0)
	; GLOBALNESS0-NEXT: v_cmp_nlt_f64_e64 s[56:57], 0, v[0:1]			; GLOBALNESS0-NEXT: v_cmp_nlt_f64_e64 s[36:37], 0, v[0:1]
	; GLOBALNESS0-NEXT: v_cmp_eq_u32_e64 s[58:59], 0, v2			; GLOBALNESS0-NEXT: v_cmp_eq_u32_e64 s[58:59], 0, v2
	; GLOBALNESS0-NEXT: s_branch .LBB1_15			; GLOBALNESS0-NEXT: s_branch .LBB1_15
	; GLOBALNESS0-NEXT: .LBB1_13: ; %Flow7			; GLOBALNESS0-NEXT: .LBB1_13: ; %Flow7
	; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_15 Depth=2			; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_15 Depth=2
	; GLOBALNESS0-NEXT: s_or_b64 exec, exec, s[4:5]			; GLOBALNESS0-NEXT: s_or_b64 exec, exec, s[4:5]
	; GLOBALNESS0-NEXT: .LBB1_14: ; %bb63.i			; GLOBALNESS0-NEXT: .LBB1_14: ; %bb63.i
	; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_15 Depth=2			; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_15 Depth=2
	; GLOBALNESS0-NEXT: s_andn2_b64 vcc, exec, s[86:87]			; GLOBALNESS0-NEXT: s_andn2_b64 vcc, exec, s[86:87]
	; GLOBALNESS0-NEXT: s_cbranch_vccz .LBB1_25			; GLOBALNESS0-NEXT: s_cbranch_vccz .LBB1_25
	; GLOBALNESS0-NEXT: .LBB1_15: ; %bb44.i			; GLOBALNESS0-NEXT: .LBB1_15: ; %bb44.i
	; GLOBALNESS0-NEXT: ; Parent Loop BB1_4 Depth=1			; GLOBALNESS0-NEXT: ; Parent Loop BB1_4 Depth=1
	; GLOBALNESS0-NEXT: ; => This Inner Loop Header: Depth=2			; GLOBALNESS0-NEXT: ; => This Inner Loop Header: Depth=2
	; GLOBALNESS0-NEXT: s_andn2_b64 vcc, exec, s[94:95]			; GLOBALNESS0-NEXT: s_andn2_b64 vcc, exec, s[94:95]
	; GLOBALNESS0-NEXT: s_cbranch_vccnz .LBB1_14			; GLOBALNESS0-NEXT: s_cbranch_vccnz .LBB1_14
	; GLOBALNESS0-NEXT: ; %bb.16: ; %bb46.i			; GLOBALNESS0-NEXT: ; %bb.16: ; %bb46.i
	; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_15 Depth=2			; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_15 Depth=2
	; GLOBALNESS0-NEXT: s_andn2_b64 vcc, exec, s[88:89]			; GLOBALNESS0-NEXT: s_andn2_b64 vcc, exec, s[88:89]
	; GLOBALNESS0-NEXT: s_cbranch_vccnz .LBB1_14			; GLOBALNESS0-NEXT: s_cbranch_vccnz .LBB1_14
	; GLOBALNESS0-NEXT: ; %bb.17: ; %bb50.i			; GLOBALNESS0-NEXT: ; %bb.17: ; %bb50.i
	; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_15 Depth=2			; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_15 Depth=2
	; GLOBALNESS0-NEXT: s_andn2_b64 vcc, exec, s[36:37]			; GLOBALNESS0-NEXT: s_andn2_b64 vcc, exec, s[40:41]
	; GLOBALNESS0-NEXT: s_cbranch_vccnz .LBB1_20			; GLOBALNESS0-NEXT: s_cbranch_vccnz .LBB1_20
	; GLOBALNESS0-NEXT: ; %bb.18: ; %bb3.i.i			; GLOBALNESS0-NEXT: ; %bb.18: ; %bb3.i.i
	; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_15 Depth=2			; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_15 Depth=2
	; GLOBALNESS0-NEXT: s_andn2_b64 vcc, exec, s[40:41]			; GLOBALNESS0-NEXT: s_andn2_b64 vcc, exec, s[42:43]
	; GLOBALNESS0-NEXT: s_cbranch_vccnz .LBB1_20			; GLOBALNESS0-NEXT: s_cbranch_vccnz .LBB1_20
	; GLOBALNESS0-NEXT: ; %bb.19: ; %bb6.i.i			; GLOBALNESS0-NEXT: ; %bb.19: ; %bb6.i.i
	; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_15 Depth=2			; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_15 Depth=2
	; GLOBALNESS0-NEXT: s_andn2_b64 vcc, exec, s[56:57]			; GLOBALNESS0-NEXT: s_andn2_b64 vcc, exec, s[36:37]
	; GLOBALNESS0-NEXT: .LBB1_20: ; %spam.exit.i			; GLOBALNESS0-NEXT: .LBB1_20: ; %spam.exit.i
	; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_15 Depth=2			; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_15 Depth=2
	; GLOBALNESS0-NEXT: s_andn2_b64 vcc, exec, s[90:91]			; GLOBALNESS0-NEXT: s_andn2_b64 vcc, exec, s[90:91]
	; GLOBALNESS0-NEXT: s_cbranch_vccnz .LBB1_14			; GLOBALNESS0-NEXT: s_cbranch_vccnz .LBB1_14
	; GLOBALNESS0-NEXT: ; %bb.21: ; %bb55.i			; GLOBALNESS0-NEXT: ; %bb.21: ; %bb55.i
	; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_15 Depth=2			; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_15 Depth=2
	; GLOBALNESS0-NEXT: s_add_u32 s64, s60, 40			; GLOBALNESS0-NEXT: s_add_u32 s64, s60, 40
	; GLOBALNESS0-NEXT: s_addc_u32 s65, s61, 0			; GLOBALNESS0-NEXT: s_addc_u32 s65, s61, 0
	; GLOBALNESS0-NEXT: s_mov_b64 s[4:5], s[62:63]			; GLOBALNESS0-NEXT: s_mov_b64 s[4:5], s[62:63]
	; GLOBALNESS0-NEXT: s_mov_b64 s[6:7], s[54:55]			; GLOBALNESS0-NEXT: s_mov_b64 s[6:7], s[54:55]
	; GLOBALNESS0-NEXT: s_mov_b64 s[8:9], s[64:65]			; GLOBALNESS0-NEXT: s_mov_b64 s[8:9], s[64:65]
	; GLOBALNESS0-NEXT: s_mov_b64 s[10:11], s[34:35]			; GLOBALNESS0-NEXT: s_mov_b64 s[10:11], s[34:35]
	; GLOBALNESS0-NEXT: s_mov_b32 s12, s100			; GLOBALNESS0-NEXT: s_mov_b32 s12, s56
	; GLOBALNESS0-NEXT: s_mov_b32 s13, s99			; GLOBALNESS0-NEXT: s_mov_b32 s13, s99
	; GLOBALNESS0-NEXT: s_mov_b32 s14, s98			; GLOBALNESS0-NEXT: s_mov_b32 s14, s98
	; GLOBALNESS0-NEXT: v_mov_b32_e32 v31, v43			; GLOBALNESS0-NEXT: v_mov_b32_e32 v31, v41
	; GLOBALNESS0-NEXT: s_swappc_b64 s[30:31], s[66:67]			; GLOBALNESS0-NEXT: s_swappc_b64 s[30:31], s[66:67]
	; GLOBALNESS0-NEXT: v_pk_mov_b32 v[44:45], 0, 0			; GLOBALNESS0-NEXT: v_pk_mov_b32 v[44:45], 0, 0
	; GLOBALNESS0-NEXT: s_mov_b64 s[4:5], s[62:63]			; GLOBALNESS0-NEXT: s_mov_b64 s[4:5], s[62:63]
	; GLOBALNESS0-NEXT: s_mov_b64 s[6:7], s[54:55]			; GLOBALNESS0-NEXT: s_mov_b64 s[6:7], s[54:55]
	; GLOBALNESS0-NEXT: s_mov_b64 s[8:9], s[64:65]			; GLOBALNESS0-NEXT: s_mov_b64 s[8:9], s[64:65]
	; GLOBALNESS0-NEXT: s_mov_b64 s[10:11], s[34:35]			; GLOBALNESS0-NEXT: s_mov_b64 s[10:11], s[34:35]
	; GLOBALNESS0-NEXT: s_mov_b32 s12, s100			; GLOBALNESS0-NEXT: s_mov_b32 s12, s56
	; GLOBALNESS0-NEXT: s_mov_b32 s13, s99			; GLOBALNESS0-NEXT: s_mov_b32 s13, s99
	; GLOBALNESS0-NEXT: s_mov_b32 s14, s98			; GLOBALNESS0-NEXT: s_mov_b32 s14, s98
	; GLOBALNESS0-NEXT: v_mov_b32_e32 v31, v43			; GLOBALNESS0-NEXT: v_mov_b32_e32 v31, v41
	; GLOBALNESS0-NEXT: global_store_dwordx2 v[44:45], a[32:33], off			; GLOBALNESS0-NEXT: global_store_dwordx2 v[44:45], a[32:33], off
	; GLOBALNESS0-NEXT: s_swappc_b64 s[30:31], s[66:67]			; GLOBALNESS0-NEXT: s_swappc_b64 s[30:31], s[66:67]
	; GLOBALNESS0-NEXT: s_and_saveexec_b64 s[4:5], s[58:59]			; GLOBALNESS0-NEXT: s_and_saveexec_b64 s[4:5], s[58:59]
	; GLOBALNESS0-NEXT: s_cbranch_execz .LBB1_13			; GLOBALNESS0-NEXT: s_cbranch_execz .LBB1_13
	; GLOBALNESS0-NEXT: ; %bb.22: ; %bb62.i			; GLOBALNESS0-NEXT: ; %bb.22: ; %bb62.i
	; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_15 Depth=2			; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_15 Depth=2
	; GLOBALNESS0-NEXT: v_mov_b32_e32 v41, v40			; GLOBALNESS0-NEXT: v_mov_b32_e32 v43, v42
	; GLOBALNESS0-NEXT: global_store_dwordx2 v[44:45], v[40:41], off			; GLOBALNESS0-NEXT: global_store_dwordx2 v[44:45], v[42:43], off
	; GLOBALNESS0-NEXT: s_branch .LBB1_13			; GLOBALNESS0-NEXT: s_branch .LBB1_13
	; GLOBALNESS0-NEXT: .LBB1_23: ; %LeafBlock			; GLOBALNESS0-NEXT: .LBB1_23: ; %LeafBlock
	; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS0-NEXT: s_cmp_lg_u32 s39, 0			; GLOBALNESS0-NEXT: s_cmp_lg_u32 s39, 0
	; GLOBALNESS0-NEXT: s_mov_b64 s[4:5], 0			; GLOBALNESS0-NEXT: s_mov_b64 s[4:5], 0
	; GLOBALNESS0-NEXT: s_cselect_b64 s[6:7], -1, 0			; GLOBALNESS0-NEXT: s_cselect_b64 s[6:7], -1, 0
	; GLOBALNESS0-NEXT: s_and_b64 vcc, exec, s[6:7]			; GLOBALNESS0-NEXT: s_and_b64 vcc, exec, s[6:7]
	; GLOBALNESS0-NEXT: s_cbranch_vccnz .LBB1_9			; GLOBALNESS0-NEXT: s_cbranch_vccnz .LBB1_9
	; GLOBALNESS0-NEXT: .LBB1_24: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS0-NEXT: .LBB1_24: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS0-NEXT: s_mov_b64 s[6:7], -1			; GLOBALNESS0-NEXT: s_mov_b64 s[6:7], -1
	; GLOBALNESS0-NEXT: ; implicit-def: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31			; GLOBALNESS0-NEXT: ; implicit-def: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31
	; GLOBALNESS0-NEXT: s_branch .LBB1_3			; GLOBALNESS0-NEXT: s_branch .LBB1_3
	; GLOBALNESS0-NEXT: .LBB1_25: ; %Flow14			; GLOBALNESS0-NEXT: .LBB1_25: ; %Flow14
	; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS0-NEXT: s_mov_b64 s[4:5], s[36:37]
	; GLOBALNESS0-NEXT: s_mov_b32 s36, s93			; GLOBALNESS0-NEXT: s_mov_b32 s36, s93
	; GLOBALNESS0-NEXT: s_mov_b32 s37, s93			; GLOBALNESS0-NEXT: s_mov_b32 s37, s93
	; GLOBALNESS0-NEXT: s_mov_b32 s38, s93			; GLOBALNESS0-NEXT: s_mov_b32 s38, s93
	; GLOBALNESS0-NEXT: s_mov_b32 s39, s93			; GLOBALNESS0-NEXT: s_mov_b32 s39, s93
	; GLOBALNESS0-NEXT: s_mov_b64 s[6:7], s[40:41]			; GLOBALNESS0-NEXT: s_mov_b64 s[4:5], s[40:41]
	; GLOBALNESS0-NEXT: s_mov_b32 s40, s93			; GLOBALNESS0-NEXT: s_mov_b32 s40, s93
	; GLOBALNESS0-NEXT: s_mov_b32 s41, s93			; GLOBALNESS0-NEXT: s_mov_b32 s41, s93
				; GLOBALNESS0-NEXT: s_mov_b64 s[6:7], s[42:43]
	; GLOBALNESS0-NEXT: s_mov_b32 s42, s93			; GLOBALNESS0-NEXT: s_mov_b32 s42, s93
	; GLOBALNESS0-NEXT: s_mov_b32 s43, s93			; GLOBALNESS0-NEXT: s_mov_b32 s43, s93
	; GLOBALNESS0-NEXT: s_mov_b32 s44, s93			; GLOBALNESS0-NEXT: s_mov_b32 s44, s93
	; GLOBALNESS0-NEXT: s_mov_b32 s45, s93			; GLOBALNESS0-NEXT: s_mov_b32 s45, s93
	; GLOBALNESS0-NEXT: s_mov_b32 s46, s93			; GLOBALNESS0-NEXT: s_mov_b32 s46, s93
	; GLOBALNESS0-NEXT: s_mov_b32 s47, s93			; GLOBALNESS0-NEXT: s_mov_b32 s47, s93
	; GLOBALNESS0-NEXT: s_mov_b32 s48, s93			; GLOBALNESS0-NEXT: s_mov_b32 s48, s93
	; GLOBALNESS0-NEXT: s_mov_b32 s49, s93			; GLOBALNESS0-NEXT: s_mov_b32 s49, s93
	Show All 12 Lines
	; GLOBALNESS0-NEXT: v_pk_mov_b32 v[16:17], s[52:53], s[52:53] op_sel:[0,1]			; GLOBALNESS0-NEXT: v_pk_mov_b32 v[16:17], s[52:53], s[52:53] op_sel:[0,1]
	; GLOBALNESS0-NEXT: v_pk_mov_b32 v[18:19], s[54:55], s[54:55] op_sel:[0,1]			; GLOBALNESS0-NEXT: v_pk_mov_b32 v[18:19], s[54:55], s[54:55] op_sel:[0,1]
	; GLOBALNESS0-NEXT: v_pk_mov_b32 v[20:21], s[56:57], s[56:57] op_sel:[0,1]			; GLOBALNESS0-NEXT: v_pk_mov_b32 v[20:21], s[56:57], s[56:57] op_sel:[0,1]
	; GLOBALNESS0-NEXT: v_pk_mov_b32 v[22:23], s[58:59], s[58:59] op_sel:[0,1]			; GLOBALNESS0-NEXT: v_pk_mov_b32 v[22:23], s[58:59], s[58:59] op_sel:[0,1]
	; GLOBALNESS0-NEXT: v_pk_mov_b32 v[24:25], s[60:61], s[60:61] op_sel:[0,1]			; GLOBALNESS0-NEXT: v_pk_mov_b32 v[24:25], s[60:61], s[60:61] op_sel:[0,1]
	; GLOBALNESS0-NEXT: v_pk_mov_b32 v[26:27], s[62:63], s[62:63] op_sel:[0,1]			; GLOBALNESS0-NEXT: v_pk_mov_b32 v[26:27], s[62:63], s[62:63] op_sel:[0,1]
	; GLOBALNESS0-NEXT: v_pk_mov_b32 v[28:29], s[64:65], s[64:65] op_sel:[0,1]			; GLOBALNESS0-NEXT: v_pk_mov_b32 v[28:29], s[64:65], s[64:65] op_sel:[0,1]
	; GLOBALNESS0-NEXT: v_pk_mov_b32 v[30:31], s[66:67], s[66:67] op_sel:[0,1]			; GLOBALNESS0-NEXT: v_pk_mov_b32 v[30:31], s[66:67], s[66:67] op_sel:[0,1]
	; GLOBALNESS0-NEXT: s_mov_b64 s[40:41], s[6:7]			; GLOBALNESS0-NEXT: s_mov_b64 s[42:43], s[6:7]
	; GLOBALNESS0-NEXT: s_mov_b64 s[36:37], s[4:5]			; GLOBALNESS0-NEXT: s_mov_b64 s[40:41], s[4:5]
	; GLOBALNESS0-NEXT: s_mov_b32 s39, s75			; GLOBALNESS0-NEXT: s_mov_b32 s39, s75
	; GLOBALNESS0-NEXT: s_mov_b64 s[42:43], s[72:73]			; GLOBALNESS0-NEXT: s_mov_b64 s[36:37], s[72:73]
	; GLOBALNESS0-NEXT: .LBB1_26: ; %Flow15			; GLOBALNESS0-NEXT: .LBB1_26: ; %Flow15
	; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS0-NEXT: s_or_b64 exec, exec, s[70:71]			; GLOBALNESS0-NEXT: s_or_b64 exec, exec, s[70:71]
	; GLOBALNESS0-NEXT: s_and_saveexec_b64 s[4:5], s[96:97]			; GLOBALNESS0-NEXT: s_and_saveexec_b64 s[4:5], s[96:97]
	; GLOBALNESS0-NEXT: s_cbranch_execz .LBB1_2			; GLOBALNESS0-NEXT: s_cbranch_execz .LBB1_2
	; GLOBALNESS0-NEXT: ; %bb.27: ; %bb67.i			; GLOBALNESS0-NEXT: ; %bb.27: ; %bb67.i
	; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS0-NEXT: v_readlane_b32 s6, v42, 2			; GLOBALNESS0-NEXT: v_readlane_b32 s6, v40, 2
	; GLOBALNESS0-NEXT: v_readlane_b32 s7, v42, 3			; GLOBALNESS0-NEXT: v_readlane_b32 s7, v40, 3
	; GLOBALNESS0-NEXT: s_andn2_b64 vcc, exec, s[6:7]			; GLOBALNESS0-NEXT: s_andn2_b64 vcc, exec, s[6:7]
	; GLOBALNESS0-NEXT: s_cbranch_vccnz .LBB1_1			; GLOBALNESS0-NEXT: s_cbranch_vccnz .LBB1_1
	; GLOBALNESS0-NEXT: ; %bb.28: ; %bb69.i			; GLOBALNESS0-NEXT: ; %bb.28: ; %bb69.i
	; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS0-NEXT: v_mov_b32_e32 v41, v40			; GLOBALNESS0-NEXT: v_mov_b32_e32 v43, v42
	; GLOBALNESS0-NEXT: v_pk_mov_b32 v[32:33], 0, 0			; GLOBALNESS0-NEXT: v_pk_mov_b32 v[32:33], 0, 0
	; GLOBALNESS0-NEXT: global_store_dwordx2 v[32:33], v[40:41], off			; GLOBALNESS0-NEXT: global_store_dwordx2 v[32:33], v[42:43], off
	; GLOBALNESS0-NEXT: s_branch .LBB1_1			; GLOBALNESS0-NEXT: s_branch .LBB1_1
	; GLOBALNESS0-NEXT: .LBB1_29: ; %bb73.i			; GLOBALNESS0-NEXT: .LBB1_29: ; %bb73.i
	; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1			; GLOBALNESS0-NEXT: ; in Loop: Header=BB1_4 Depth=1
	; GLOBALNESS0-NEXT: v_mov_b32_e32 v41, v40			; GLOBALNESS0-NEXT: v_mov_b32_e32 v43, v42
	; GLOBALNESS0-NEXT: v_pk_mov_b32 v[32:33], 0, 0			; GLOBALNESS0-NEXT: v_pk_mov_b32 v[32:33], 0, 0
	; GLOBALNESS0-NEXT: global_store_dwordx2 v[32:33], v[40:41], off			; GLOBALNESS0-NEXT: global_store_dwordx2 v[32:33], v[42:43], off
	; GLOBALNESS0-NEXT: s_branch .LBB1_2			; GLOBALNESS0-NEXT: s_branch .LBB1_2
	; GLOBALNESS0-NEXT: .LBB1_30: ; %loop.exit.guard			; GLOBALNESS0-NEXT: .LBB1_30: ; %loop.exit.guard
	; GLOBALNESS0-NEXT: s_andn2_b64 vcc, exec, s[4:5]			; GLOBALNESS0-NEXT: s_andn2_b64 vcc, exec, s[4:5]
	; GLOBALNESS0-NEXT: s_mov_b64 s[4:5], -1			; GLOBALNESS0-NEXT: s_mov_b64 s[4:5], -1
	; GLOBALNESS0-NEXT: s_cbranch_vccz .LBB1_32			; GLOBALNESS0-NEXT: s_cbranch_vccz .LBB1_32
	; GLOBALNESS0-NEXT: ; %bb.31: ; %bb7.i.i			; GLOBALNESS0-NEXT: ; %bb.31: ; %bb7.i.i
	; GLOBALNESS0-NEXT: s_add_u32 s8, s60, 40			; GLOBALNESS0-NEXT: s_add_u32 s8, s60, 40
	; GLOBALNESS0-NEXT: s_addc_u32 s9, s61, 0			; GLOBALNESS0-NEXT: s_addc_u32 s9, s61, 0
	; GLOBALNESS0-NEXT: s_mov_b64 s[4:5], s[62:63]			; GLOBALNESS0-NEXT: s_mov_b64 s[4:5], s[62:63]
	; GLOBALNESS0-NEXT: s_mov_b64 s[6:7], s[54:55]			; GLOBALNESS0-NEXT: s_mov_b64 s[6:7], s[54:55]
	; GLOBALNESS0-NEXT: s_mov_b64 s[10:11], s[34:35]			; GLOBALNESS0-NEXT: s_mov_b64 s[10:11], s[34:35]
	; GLOBALNESS0-NEXT: s_mov_b32 s12, s100			; GLOBALNESS0-NEXT: s_mov_b32 s12, s56
	; GLOBALNESS0-NEXT: s_mov_b32 s13, s99			; GLOBALNESS0-NEXT: s_mov_b32 s13, s99
	; GLOBALNESS0-NEXT: s_mov_b32 s14, s98			; GLOBALNESS0-NEXT: s_mov_b32 s14, s98
	; GLOBALNESS0-NEXT: v_mov_b32_e32 v31, v43			; GLOBALNESS0-NEXT: v_mov_b32_e32 v31, v41
	; GLOBALNESS0-NEXT: s_getpc_b64 s[16:17]			; GLOBALNESS0-NEXT: s_getpc_b64 s[16:17]
	; GLOBALNESS0-NEXT: s_add_u32 s16, s16, widget@rel32@lo+4			; GLOBALNESS0-NEXT: s_add_u32 s16, s16, widget@rel32@lo+4
	; GLOBALNESS0-NEXT: s_addc_u32 s17, s17, widget@rel32@hi+12			; GLOBALNESS0-NEXT: s_addc_u32 s17, s17, widget@rel32@hi+12
	; GLOBALNESS0-NEXT: s_swappc_b64 s[30:31], s[16:17]			; GLOBALNESS0-NEXT: s_swappc_b64 s[30:31], s[16:17]
	; GLOBALNESS0-NEXT: s_mov_b64 s[4:5], 0			; GLOBALNESS0-NEXT: s_mov_b64 s[4:5], 0
	; GLOBALNESS0-NEXT: .LBB1_32: ; %Flow			; GLOBALNESS0-NEXT: .LBB1_32: ; %Flow
	; GLOBALNESS0-NEXT: s_andn2_b64 vcc, exec, s[4:5]			; GLOBALNESS0-NEXT: s_andn2_b64 vcc, exec, s[4:5]
	; GLOBALNESS0-NEXT: s_cbranch_vccnz .LBB1_34			; GLOBALNESS0-NEXT: s_cbranch_vccnz .LBB1_34
	; GLOBALNESS0-NEXT: ; %bb.33: ; %bb11.i.i			; GLOBALNESS0-NEXT: ; %bb.33: ; %bb11.i.i
	; GLOBALNESS0-NEXT: s_add_u32 s8, s60, 40			; GLOBALNESS0-NEXT: s_add_u32 s8, s60, 40
	; GLOBALNESS0-NEXT: s_addc_u32 s9, s61, 0			; GLOBALNESS0-NEXT: s_addc_u32 s9, s61, 0
	; GLOBALNESS0-NEXT: s_mov_b64 s[4:5], s[62:63]			; GLOBALNESS0-NEXT: s_mov_b64 s[4:5], s[62:63]
	; GLOBALNESS0-NEXT: s_mov_b64 s[6:7], s[54:55]			; GLOBALNESS0-NEXT: s_mov_b64 s[6:7], s[54:55]
	; GLOBALNESS0-NEXT: s_mov_b64 s[10:11], s[34:35]			; GLOBALNESS0-NEXT: s_mov_b64 s[10:11], s[34:35]
	; GLOBALNESS0-NEXT: s_mov_b32 s12, s100			; GLOBALNESS0-NEXT: s_mov_b32 s12, s56
	; GLOBALNESS0-NEXT: s_mov_b32 s13, s99			; GLOBALNESS0-NEXT: s_mov_b32 s13, s99
	; GLOBALNESS0-NEXT: s_mov_b32 s14, s98			; GLOBALNESS0-NEXT: s_mov_b32 s14, s98
	; GLOBALNESS0-NEXT: v_mov_b32_e32 v31, v43			; GLOBALNESS0-NEXT: v_mov_b32_e32 v31, v41
	; GLOBALNESS0-NEXT: s_getpc_b64 s[16:17]			; GLOBALNESS0-NEXT: s_getpc_b64 s[16:17]
	; GLOBALNESS0-NEXT: s_add_u32 s16, s16, widget@rel32@lo+4			; GLOBALNESS0-NEXT: s_add_u32 s16, s16, widget@rel32@lo+4
	; GLOBALNESS0-NEXT: s_addc_u32 s17, s17, widget@rel32@hi+12			; GLOBALNESS0-NEXT: s_addc_u32 s17, s17, widget@rel32@hi+12
	; GLOBALNESS0-NEXT: s_swappc_b64 s[30:31], s[16:17]			; GLOBALNESS0-NEXT: s_swappc_b64 s[30:31], s[16:17]
	; GLOBALNESS0-NEXT: .LBB1_34: ; %UnifiedUnreachableBlock			; GLOBALNESS0-NEXT: .LBB1_34: ; %UnifiedUnreachableBlock
	bb:			bb:
	store i32 0, i32 addrspace(1)* null, align 4			store i32 0, i32 addrspace(1)* null, align 4
	%tmp4 = load i32, i32 addrspace(1)* %arg1.global, align 4			%tmp4 = load i32, i32 addrspace(1)* %arg1.global, align 4
	▲ Show 20 Lines • Show All 112 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/unstructured-cfg-def-use-issue.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=amdgcn-amdhsa -verify-machineinstrs -simplifycfg-require-and-preserve-domtree=1 < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -mtriple=amdgcn-amdhsa -verify-machineinstrs -simplifycfg-require-and-preserve-domtree=1 < %s \| FileCheck -check-prefix=GCN %s
	; RUN: opt -S -si-annotate-control-flow -mtriple=amdgcn-amdhsa -verify-machineinstrs -simplifycfg-require-and-preserve-domtree=1 < %s \| FileCheck -check-prefix=SI-OPT %s			; RUN: opt -S -si-annotate-control-flow -mtriple=amdgcn-amdhsa -verify-machineinstrs -simplifycfg-require-and-preserve-domtree=1 < %s \| FileCheck -check-prefix=SI-OPT %s

	define hidden void @widget() {			define hidden void @widget() {
	; GCN-LABEL: widget:			; GCN-LABEL: widget:
	; GCN: ; %bb.0: ; %bb			; GCN: ; %bb.0: ; %bb
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_mov_b32 s16, s33			; GCN-NEXT: s_mov_b32 s16, s33
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_or_saveexec_b64 s[18:19], -1			; GCN-NEXT: s_or_saveexec_b64 s[18:19], -1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[18:19]			; GCN-NEXT: s_mov_b64 exec, s[18:19]
	; GCN-NEXT: v_writelane_b32 v41, s16, 0			; GCN-NEXT: v_writelane_b32 v41, s16, 0
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
				; GCN-NEXT: ; implicit-def: $vgpr40
	; GCN-NEXT: v_writelane_b32 v40, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	; GCN-NEXT: v_writelane_b32 v40, s31, 1			; GCN-NEXT: v_writelane_b32 v40, s31, 1
	; GCN-NEXT: v_mov_b32_e32 v0, 0			; GCN-NEXT: v_mov_b32_e32 v0, 0
	; GCN-NEXT: v_mov_b32_e32 v1, 0			; GCN-NEXT: v_mov_b32_e32 v1, 0
	; GCN-NEXT: flat_load_dword v0, v[0:1]			; GCN-NEXT: flat_load_dword v0, v[0:1]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: v_cmp_gt_i32_e32 vcc, 21, v0			; GCN-NEXT: v_cmp_gt_i32_e32 vcc, 21, v0
	; GCN-NEXT: v_readfirstlane_b32 s16, v0			; GCN-NEXT: v_readfirstlane_b32 s16, v0
	▲ Show 20 Lines • Show All 159 Lines • ▼ Show 20 Lines
	; SI-OPT-NEXT: store float 0x7FF8000000000000, float addrspace(5)* null, align 4			; SI-OPT-NEXT: store float 0x7FF8000000000000, float addrspace(5)* null, align 4
	; SI-OPT-NEXT: br label [[BB2]]			; SI-OPT-NEXT: br label [[BB2]]
	;			;
	; GCN-LABEL: blam:			; GCN-LABEL: blam:
	; GCN: ; %bb.0: ; %bb			; GCN: ; %bb.0: ; %bb
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_mov_b32 s16, s33			; GCN-NEXT: s_mov_b32 s16, s33
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_or_saveexec_b64 s[18:19], -1			; GCN-NEXT: s_xor_saveexec_b64 s[18:19], -1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:20 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:20 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s33 offset:24 ; 4-byte Folded Spill			; GCN-NEXT: s_mov_b64 exec, -1
				; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:24 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[18:19]			; GCN-NEXT: s_mov_b64 exec, s[18:19]
	; GCN-NEXT: v_writelane_b32 v46, s16, 0			; GCN-NEXT: v_writelane_b32 v45, s16, 0
	; GCN-NEXT: s_addk_i32 s32, 0x800			; GCN-NEXT: s_addk_i32 s32, 0x800
	; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 ; 4-byte Folded Spill
	; GCN-NEXT: v_writelane_b32 v40, s30, 0			; GCN-NEXT: ; implicit-def: $vgpr0
	; GCN-NEXT: v_writelane_b32 v40, s31, 1			; GCN-NEXT: v_writelane_b32 v0, s30, 0
	; GCN-NEXT: v_writelane_b32 v40, s34, 2			; GCN-NEXT: v_writelane_b32 v0, s31, 1
	; GCN-NEXT: v_writelane_b32 v40, s35, 3			; GCN-NEXT: v_writelane_b32 v0, s34, 2
	; GCN-NEXT: v_writelane_b32 v40, s36, 4			; GCN-NEXT: v_writelane_b32 v0, s35, 3
	; GCN-NEXT: v_writelane_b32 v40, s37, 5			; GCN-NEXT: v_writelane_b32 v0, s36, 4
	; GCN-NEXT: v_writelane_b32 v40, s38, 6			; GCN-NEXT: v_writelane_b32 v0, s37, 5
	; GCN-NEXT: v_writelane_b32 v40, s39, 7			; GCN-NEXT: v_writelane_b32 v0, s38, 6
	; GCN-NEXT: v_writelane_b32 v40, s40, 8			; GCN-NEXT: v_writelane_b32 v0, s39, 7
	; GCN-NEXT: v_writelane_b32 v40, s41, 9			; GCN-NEXT: v_writelane_b32 v0, s40, 8
	; GCN-NEXT: v_writelane_b32 v40, s42, 10			; GCN-NEXT: v_writelane_b32 v0, s41, 9
	; GCN-NEXT: v_writelane_b32 v40, s43, 11			; GCN-NEXT: v_writelane_b32 v0, s42, 10
	; GCN-NEXT: v_writelane_b32 v40, s44, 12			; GCN-NEXT: v_writelane_b32 v0, s43, 11
	; GCN-NEXT: v_writelane_b32 v40, s45, 13			; GCN-NEXT: v_writelane_b32 v0, s44, 12
	; GCN-NEXT: v_writelane_b32 v40, s46, 14			; GCN-NEXT: v_writelane_b32 v0, s45, 13
	; GCN-NEXT: v_writelane_b32 v40, s47, 15			; GCN-NEXT: v_writelane_b32 v0, s46, 14
	; GCN-NEXT: v_writelane_b32 v40, s48, 16			; GCN-NEXT: v_writelane_b32 v0, s47, 15
	; GCN-NEXT: v_writelane_b32 v40, s49, 17			; GCN-NEXT: v_writelane_b32 v0, s48, 16
	; GCN-NEXT: v_mov_b32_e32 v41, v31			; GCN-NEXT: v_writelane_b32 v0, s49, 17
				; GCN-NEXT: v_mov_b32_e32 v40, v31
	; GCN-NEXT: s_mov_b32 s44, s15			; GCN-NEXT: s_mov_b32 s44, s15
	; GCN-NEXT: s_mov_b32 s45, s14			; GCN-NEXT: s_mov_b32 s45, s14
	; GCN-NEXT: s_mov_b32 s46, s13			; GCN-NEXT: s_mov_b32 s46, s13
	; GCN-NEXT: s_mov_b32 s47, s12			; GCN-NEXT: s_mov_b32 s47, s12
	; GCN-NEXT: s_mov_b64 s[34:35], s[10:11]			; GCN-NEXT: s_mov_b64 s[34:35], s[10:11]
	; GCN-NEXT: s_mov_b64 s[36:37], s[8:9]			; GCN-NEXT: s_mov_b64 s[36:37], s[8:9]
	; GCN-NEXT: s_mov_b64 s[38:39], s[6:7]			; GCN-NEXT: s_mov_b64 s[38:39], s[6:7]
	; GCN-NEXT: s_mov_b64 s[40:41], s[4:5]			; GCN-NEXT: s_mov_b64 s[40:41], s[4:5]
	; GCN-NEXT: s_mov_b64 s[4:5], 0			; GCN-NEXT: s_mov_b64 s[4:5], 0
	; GCN-NEXT: v_mov_b32_e32 v0, 0			; GCN-NEXT: v_mov_b32_e32 v0, 0
	; GCN-NEXT: v_mov_b32_e32 v1, 0			; GCN-NEXT: v_mov_b32_e32 v1, 0
	; GCN-NEXT: v_and_b32_e32 v2, 0x3ff, v41			; GCN-NEXT: v_and_b32_e32 v2, 0x3ff, v40
	; GCN-NEXT: v_mov_b32_e32 v43, 0			; GCN-NEXT: v_mov_b32_e32 v42, 0
	; GCN-NEXT: flat_load_dword v44, v[0:1]			; GCN-NEXT: flat_load_dword v43, v[0:1]
	; GCN-NEXT: v_mov_b32_e32 v45, 0x7fc00000			; GCN-NEXT: v_mov_b32_e32 v44, 0x7fc00000
	; GCN-NEXT: s_getpc_b64 s[48:49]			; GCN-NEXT: s_getpc_b64 s[48:49]
	; GCN-NEXT: s_add_u32 s48, s48, spam@rel32@lo+4			; GCN-NEXT: s_add_u32 s48, s48, spam@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s49, s49, spam@rel32@hi+12			; GCN-NEXT: s_addc_u32 s49, s49, spam@rel32@hi+12
	; GCN-NEXT: v_lshlrev_b32_e32 v42, 2, v2			; GCN-NEXT: v_lshlrev_b32_e32 v41, 2, v2
	; GCN-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GCN-NEXT: v_cmp_eq_f32_e64 s[42:43], 0, v44			; GCN-NEXT: v_cmp_eq_f32_e64 s[42:43], 0, v43
	; GCN-NEXT: s_branch .LBB1_3			; GCN-NEXT: s_branch .LBB1_3
	; GCN-NEXT: .LBB1_1: ; %bb10			; GCN-NEXT: .LBB1_1: ; %bb10
	; GCN-NEXT: ; in Loop: Header=BB1_3 Depth=1			; GCN-NEXT: ; in Loop: Header=BB1_3 Depth=1
	; GCN-NEXT: s_or_b64 exec, exec, s[6:7]			; GCN-NEXT: s_or_b64 exec, exec, s[6:7]
	; GCN-NEXT: buffer_store_dword v45, off, s[0:3], 0			; GCN-NEXT: buffer_store_dword v44, off, s[0:3], 0
	; GCN-NEXT: .LBB1_2: ; %bb18			; GCN-NEXT: .LBB1_2: ; %bb18
	; GCN-NEXT: ; in Loop: Header=BB1_3 Depth=1			; GCN-NEXT: ; in Loop: Header=BB1_3 Depth=1
	; GCN-NEXT: buffer_store_dword v45, off, s[0:3], 0			; GCN-NEXT: buffer_store_dword v44, off, s[0:3], 0
	; GCN-NEXT: s_mov_b64 s[4:5], 0			; GCN-NEXT: s_mov_b64 s[4:5], 0
	; GCN-NEXT: .LBB1_3: ; %bb2			; GCN-NEXT: .LBB1_3: ; %bb2
	; GCN-NEXT: ; =>This Loop Header: Depth=1			; GCN-NEXT: ; =>This Loop Header: Depth=1
	; GCN-NEXT: ; Child Loop BB1_4 Depth 2			; GCN-NEXT: ; Child Loop BB1_4 Depth 2
	; GCN-NEXT: s_mov_b64 s[6:7], 0			; GCN-NEXT: s_mov_b64 s[6:7], 0
	; GCN-NEXT: .LBB1_4: ; %bb2			; GCN-NEXT: .LBB1_4: ; %bb2
	; GCN-NEXT: ; Parent Loop BB1_3 Depth=1			; GCN-NEXT: ; Parent Loop BB1_3 Depth=1
	; GCN-NEXT: ; => This Inner Loop Header: Depth=2			; GCN-NEXT: ; => This Inner Loop Header: Depth=2
	; GCN-NEXT: flat_load_dword v0, v[42:43]			; GCN-NEXT: flat_load_dword v0, v[41:42]
	; GCN-NEXT: buffer_store_dword v43, off, s[0:3], 0			; GCN-NEXT: buffer_store_dword v42, off, s[0:3], 0
	; GCN-NEXT: s_waitcnt vmcnt(1)			; GCN-NEXT: s_waitcnt vmcnt(1)
	; GCN-NEXT: v_cmp_gt_i32_e32 vcc, 3, v0			; GCN-NEXT: v_cmp_gt_i32_e32 vcc, 3, v0
	; GCN-NEXT: s_and_saveexec_b64 s[8:9], vcc			; GCN-NEXT: s_and_saveexec_b64 s[8:9], vcc
	; GCN-NEXT: s_cbranch_execz .LBB1_6			; GCN-NEXT: s_cbranch_execz .LBB1_6
	; GCN-NEXT: ; %bb.5: ; %bb8			; GCN-NEXT: ; %bb.5: ; %bb8
	; GCN-NEXT: ; in Loop: Header=BB1_4 Depth=2			; GCN-NEXT: ; in Loop: Header=BB1_4 Depth=2
	; GCN-NEXT: v_cmp_eq_u32_e32 vcc, 1, v0			; GCN-NEXT: v_cmp_eq_u32_e32 vcc, 1, v0
	; GCN-NEXT: s_or_b64 s[6:7], vcc, s[6:7]			; GCN-NEXT: s_or_b64 s[6:7], vcc, s[6:7]
	Show All 15 Lines
	; GCN-NEXT: s_mov_b64 s[4:5], s[40:41]			; GCN-NEXT: s_mov_b64 s[4:5], s[40:41]
	; GCN-NEXT: s_mov_b64 s[6:7], s[38:39]			; GCN-NEXT: s_mov_b64 s[6:7], s[38:39]
	; GCN-NEXT: s_mov_b64 s[8:9], s[36:37]			; GCN-NEXT: s_mov_b64 s[8:9], s[36:37]
	; GCN-NEXT: s_mov_b64 s[10:11], s[34:35]			; GCN-NEXT: s_mov_b64 s[10:11], s[34:35]
	; GCN-NEXT: s_mov_b32 s12, s47			; GCN-NEXT: s_mov_b32 s12, s47
	; GCN-NEXT: s_mov_b32 s13, s46			; GCN-NEXT: s_mov_b32 s13, s46
	; GCN-NEXT: s_mov_b32 s14, s45			; GCN-NEXT: s_mov_b32 s14, s45
	; GCN-NEXT: s_mov_b32 s15, s44			; GCN-NEXT: s_mov_b32 s15, s44
	; GCN-NEXT: v_mov_b32_e32 v31, v41			; GCN-NEXT: v_mov_b32_e32 v31, v40
	; GCN-NEXT: s_swappc_b64 s[30:31], s[48:49]			; GCN-NEXT: s_swappc_b64 s[30:31], s[48:49]
	; GCN-NEXT: v_cmp_eq_f32_e32 vcc, 0, v0			; GCN-NEXT: v_cmp_eq_f32_e32 vcc, 0, v0
	; GCN-NEXT: s_mov_b64 s[4:5], 0			; GCN-NEXT: s_mov_b64 s[4:5], 0
	; GCN-NEXT: s_mov_b64 s[6:7], 0			; GCN-NEXT: s_mov_b64 s[6:7], 0
	; GCN-NEXT: s_and_saveexec_b64 s[8:9], vcc			; GCN-NEXT: s_and_saveexec_b64 s[8:9], vcc
	; GCN-NEXT: s_cbranch_execnz .LBB1_4			; GCN-NEXT: s_cbranch_execnz .LBB1_4
	; GCN-NEXT: ; %bb.8: ; %bb14			; GCN-NEXT: ; %bb.8: ; %bb14
	; GCN-NEXT: ; in Loop: Header=BB1_3 Depth=1			; GCN-NEXT: ; in Loop: Header=BB1_3 Depth=1
	; GCN-NEXT: s_or_b64 exec, exec, s[8:9]			; GCN-NEXT: s_or_b64 exec, exec, s[8:9]
	; GCN-NEXT: s_and_saveexec_b64 s[4:5], s[42:43]			; GCN-NEXT: s_and_saveexec_b64 s[4:5], s[42:43]
	; GCN-NEXT: s_cbranch_execnz .LBB1_10			; GCN-NEXT: s_cbranch_execnz .LBB1_10
	; GCN-NEXT: ; %bb.9: ; %bb16			; GCN-NEXT: ; %bb.9: ; %bb16
	; GCN-NEXT: ; in Loop: Header=BB1_3 Depth=1			; GCN-NEXT: ; in Loop: Header=BB1_3 Depth=1
	; GCN-NEXT: s_or_b64 exec, exec, s[4:5]			; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
	; GCN-NEXT: buffer_store_dword v45, off, s[0:3], 0			; GCN-NEXT: buffer_store_dword v44, off, s[0:3], 0
	; GCN-NEXT: .LBB1_10: ; %bb17			; GCN-NEXT: .LBB1_10: ; %bb17
	; GCN-NEXT: ; in Loop: Header=BB1_3 Depth=1			; GCN-NEXT: ; in Loop: Header=BB1_3 Depth=1
	; GCN-NEXT: buffer_store_dword v44, off, s[0:3], 0			; GCN-NEXT: buffer_store_dword v43, off, s[0:3], 0
	; GCN-NEXT: s_branch .LBB1_2			; GCN-NEXT: s_branch .LBB1_2
	bb:			bb:
	%tmp = load float, float* null, align 16			%tmp = load float, float* null, align 16
	br label %bb2			br label %bb2

	bb1: ; preds = %bb8, %bb6			bb1: ; preds = %bb8, %bb6
	br label %bb2			br label %bb2

	▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/vgpr-tuple-allocation.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX9 %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX9 %s
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10 %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10 %s
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1100 -verify-machineinstrs < %s \| FileCheck -check-prefix=GFX11 %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1100 -verify-machineinstrs < %s \| FileCheck -check-prefix=GFX11 %s

	declare void @extern_func() #2			declare void @extern_func() #2

	define <4 x float> @non_preserved_vgpr_tuple8(<8 x i32> %rsrc, <4 x i32> %samp, float %bias, float %zcompare, float %s, float %t, float %clamp) {			define <4 x float> @non_preserved_vgpr_tuple8(<8 x i32> %rsrc, <4 x i32> %samp, float %bias, float %zcompare, float %s, float %t, float %clamp) {
	; The vgpr tuple8 operand in image_gather4_c_b_cl instruction needs not be			; The vgpr tuple8 operand in image_gather4_c_b_cl instruction needs not be
	; preserved across the call and should get 8 scratch registers.			; preserved across the call and should get 8 scratch registers.
	; GFX9-LABEL: non_preserved_vgpr_tuple8:			; GFX9-LABEL: non_preserved_vgpr_tuple8:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s4, s33			; GFX9-NEXT: s_mov_b32 s4, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:20 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:20 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: v_mov_b32_e32 v36, v16			; GFX9-NEXT: v_mov_b32_e32 v36, v16
	; GFX9-NEXT: v_mov_b32_e32 v35, v15			; GFX9-NEXT: v_mov_b32_e32 v35, v15
	; GFX9-NEXT: v_mov_b32_e32 v34, v14			; GFX9-NEXT: v_mov_b32_e32 v34, v14
	; GFX9-NEXT: v_mov_b32_e32 v33, v13			; GFX9-NEXT: v_mov_b32_e32 v33, v13
	; GFX9-NEXT: v_mov_b32_e32 v32, v12			; GFX9-NEXT: v_mov_b32_e32 v32, v12
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v44, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v43, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: image_gather4_c_b_cl v[41:44], v[32:36], s[4:11], s[4:7] dmask:0x1			; GFX9-NEXT: image_gather4_c_b_cl v[40:43], v[32:36], s[4:11], s[4:7] dmask:0x1
	; GFX9-NEXT: s_addk_i32 s32, 0x800			; GFX9-NEXT: s_addk_i32 s32, 0x800
	; GFX9-NEXT: v_writelane_b32 v45, s4, 0			; GFX9-NEXT: v_writelane_b32 v45, s4, 0
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, extern_func@gotpcrel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, extern_func@gotpcrel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, extern_func@gotpcrel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, extern_func@gotpcrel32@hi+12
	; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: ; implicit-def: $vgpr44
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v44, s30, 0
				; GFX9-NEXT: v_writelane_b32 v44, s31, 1
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_mov_b32_e32 v0, v41			; GFX9-NEXT: v_mov_b32_e32 v0, v40
	; GFX9-NEXT: v_mov_b32_e32 v1, v42			; GFX9-NEXT: v_mov_b32_e32 v1, v41
	; GFX9-NEXT: v_mov_b32_e32 v2, v43			; GFX9-NEXT: v_mov_b32_e32 v2, v42
	; GFX9-NEXT: v_mov_b32_e32 v3, v44			; GFX9-NEXT: v_mov_b32_e32 v3, v43
	; GFX9-NEXT: buffer_load_dword v44, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v43, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v44, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v44, 0
	; GFX9-NEXT: v_readlane_b32 s4, v45, 0			; GFX9-NEXT: v_readlane_b32 s4, v45, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v45, off, s[0:3], s33 offset:20 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v45, off, s[0:3], s33 offset:20 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_addk_i32 s32, 0xf800			; GFX9-NEXT: s_addk_i32 s32, 0xf800
	; GFX9-NEXT: s_mov_b32 s33, s4			; GFX9-NEXT: s_mov_b32 s33, s4
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: non_preserved_vgpr_tuple8:			; GFX10-LABEL: non_preserved_vgpr_tuple8:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_mov_b32 s4, s33			; GFX10-NEXT: s_mov_b32 s4, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s5, -1			; GFX10-NEXT: s_or_saveexec_b32 s5, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:20 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:20 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s5			; GFX10-NEXT: s_mov_b32 exec_lo, s5
	; GFX10-NEXT: v_mov_b32_e32 v36, v16			; GFX10-NEXT: v_mov_b32_e32 v36, v16
	; GFX10-NEXT: v_mov_b32_e32 v35, v15			; GFX10-NEXT: v_mov_b32_e32 v35, v15
	; GFX10-NEXT: v_mov_b32_e32 v34, v14			; GFX10-NEXT: v_mov_b32_e32 v34, v14
	; GFX10-NEXT: v_mov_b32_e32 v33, v13			; GFX10-NEXT: v_mov_b32_e32 v33, v13
	; GFX10-NEXT: v_mov_b32_e32 v32, v12			; GFX10-NEXT: v_mov_b32_e32 v32, v12
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v44, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v43, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: image_gather4_c_b_cl v[41:44], v[32:36], s[4:11], s[4:7] dmask:0x1 dim:SQ_RSRC_IMG_2D			; GFX10-NEXT: image_gather4_c_b_cl v[40:43], v[32:36], s[4:11], s[4:7] dmask:0x1 dim:SQ_RSRC_IMG_2D
	; GFX10-NEXT: s_addk_i32 s32, 0x400			; GFX10-NEXT: s_addk_i32 s32, 0x400
	; GFX10-NEXT: v_writelane_b32 v45, s4, 0			; GFX10-NEXT: v_writelane_b32 v45, s4, 0
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, extern_func@gotpcrel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, extern_func@gotpcrel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, extern_func@gotpcrel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, extern_func@gotpcrel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: ; implicit-def: $vgpr44
	; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v44, s30, 0
				; GFX10-NEXT: v_writelane_b32 v44, s31, 1
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_mov_b32_e32 v0, v41			; GFX10-NEXT: v_mov_b32_e32 v0, v40
	; GFX10-NEXT: v_mov_b32_e32 v1, v42			; GFX10-NEXT: v_mov_b32_e32 v1, v41
	; GFX10-NEXT: v_mov_b32_e32 v2, v43			; GFX10-NEXT: v_mov_b32_e32 v2, v42
	; GFX10-NEXT: v_mov_b32_e32 v3, v44			; GFX10-NEXT: v_mov_b32_e32 v3, v43
	; GFX10-NEXT: s_clause 0x3			; GFX10-NEXT: s_clause 0x3
	; GFX10-NEXT: buffer_load_dword v44, off, s[0:3], s33			; GFX10-NEXT: buffer_load_dword v43, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:4			; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:4
	; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:8			; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:8
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:12			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:12
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v44, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v44, 0
	; GFX10-NEXT: v_readlane_b32 s4, v45, 0			; GFX10-NEXT: v_readlane_b32 s4, v45, 0
	; GFX10-NEXT: s_or_saveexec_b32 s5, -1			; GFX10-NEXT: s_or_saveexec_b32 s5, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: s_clause 0x1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:16			; GFX10-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:16
	; GFX10-NEXT: buffer_load_dword v45, off, s[0:3], s33 offset:20			; GFX10-NEXT: buffer_load_dword v45, off, s[0:3], s33 offset:20
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s5			; GFX10-NEXT: s_mov_b32 exec_lo, s5
	; GFX10-NEXT: s_addk_i32 s32, 0xfc00			; GFX10-NEXT: s_addk_i32 s32, 0xfc00
	; GFX10-NEXT: s_mov_b32 s33, s4			; GFX10-NEXT: s_mov_b32 s33, s4
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: non_preserved_vgpr_tuple8:			; GFX11-LABEL: non_preserved_vgpr_tuple8:
	; GFX11: ; %bb.0: ; %main_body			; GFX11: ; %bb.0: ; %main_body
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: s_clause 0x1
	; GFX11-NEXT: scratch_store_b32 off, v40, s33 offset:16			; GFX11-NEXT: scratch_store_b32 off, v44, s33 offset:16
	; GFX11-NEXT: scratch_store_b32 off, v45, s33 offset:20			; GFX11-NEXT: scratch_store_b32 off, v45, s33 offset:20
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: v_dual_mov_b32 v36, v16 :: v_dual_mov_b32 v35, v15			; GFX11-NEXT: v_dual_mov_b32 v36, v16 :: v_dual_mov_b32 v35, v15
	; GFX11-NEXT: v_dual_mov_b32 v34, v14 :: v_dual_mov_b32 v33, v13			; GFX11-NEXT: v_dual_mov_b32 v34, v14 :: v_dual_mov_b32 v33, v13
	; GFX11-NEXT: v_mov_b32_e32 v32, v12			; GFX11-NEXT: v_mov_b32_e32 v32, v12
	; GFX11-NEXT: s_clause 0x3			; GFX11-NEXT: s_clause 0x3
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:12			; GFX11-NEXT: scratch_store_b32 off, v40, s33 offset:12
	; GFX11-NEXT: scratch_store_b32 off, v42, s33 offset:8			; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:8
	; GFX11-NEXT: scratch_store_b32 off, v43, s33 offset:4			; GFX11-NEXT: scratch_store_b32 off, v42, s33 offset:4
	; GFX11-NEXT: scratch_store_b32 off, v44, s33			; GFX11-NEXT: scratch_store_b32 off, v43, s33
	; GFX11-NEXT: ;;#ASMSTART			; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ;;#ASMEND			; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: ;;#ASMSTART			; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ;;#ASMEND			; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: ;;#ASMSTART			; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ;;#ASMEND			; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: ;;#ASMSTART			; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ;;#ASMEND			; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: image_gather4_c_b_cl v[41:44], v[32:36], s[0:7], s[0:3] dmask:0x1 dim:SQ_RSRC_IMG_2D			; GFX11-NEXT: image_gather4_c_b_cl v[40:43], v[32:36], s[0:7], s[0:3] dmask:0x1 dim:SQ_RSRC_IMG_2D
	; GFX11-NEXT: s_add_i32 s32, s32, 32			; GFX11-NEXT: s_add_i32 s32, s32, 32
	; GFX11-NEXT: v_writelane_b32 v45, s0, 0			; GFX11-NEXT: v_writelane_b32 v45, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, extern_func@gotpcrel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, extern_func@gotpcrel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, extern_func@gotpcrel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, extern_func@gotpcrel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: ; implicit-def: $vgpr44
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_writelane_b32 v44, s30, 0
				; GFX11-NEXT: v_writelane_b32 v44, s31, 1
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: v_dual_mov_b32 v0, v41 :: v_dual_mov_b32 v1, v42			; GFX11-NEXT: v_dual_mov_b32 v0, v40 :: v_dual_mov_b32 v1, v41
	; GFX11-NEXT: v_dual_mov_b32 v2, v43 :: v_dual_mov_b32 v3, v44			; GFX11-NEXT: v_dual_mov_b32 v2, v42 :: v_dual_mov_b32 v3, v43
	; GFX11-NEXT: s_clause 0x3			; GFX11-NEXT: s_clause 0x3
	; GFX11-NEXT: scratch_load_b32 v44, off, s33			; GFX11-NEXT: scratch_load_b32 v43, off, s33
	; GFX11-NEXT: scratch_load_b32 v43, off, s33 offset:4			; GFX11-NEXT: scratch_load_b32 v42, off, s33 offset:4
	; GFX11-NEXT: scratch_load_b32 v42, off, s33 offset:8			; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:8
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:12			; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:12
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v44, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v44, 0
	; GFX11-NEXT: v_readlane_b32 s0, v45, 0			; GFX11-NEXT: v_readlane_b32 s0, v45, 0
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: s_clause 0x1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:16			; GFX11-NEXT: scratch_load_b32 v44, off, s33 offset:16
	; GFX11-NEXT: scratch_load_b32 v45, off, s33 offset:20			; GFX11-NEXT: scratch_load_b32 v45, off, s33 offset:20
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_addk_i32 s32, 0xffe0			; GFX11-NEXT: s_addk_i32 s32, 0xffe0
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]


	Show All 21 Lines
	; Only the lower 5 sub-registers of the tuple are preserved.			; Only the lower 5 sub-registers of the tuple are preserved.
	; The upper 3 sub-registers are unused.			; The upper 3 sub-registers are unused.
	; GFX9-LABEL: call_preserved_vgpr_tuple8:			; GFX9-LABEL: call_preserved_vgpr_tuple8:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s4, s33			; GFX9-NEXT: s_mov_b32 s4, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:20 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:20 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v46, off, s[0:3], s33 offset:24 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v46, off, s[0:3], s33 offset:24 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v45, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v44, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: v_mov_b32_e32 v45, v16			; GFX9-NEXT: v_mov_b32_e32 v44, v16
	; GFX9-NEXT: v_mov_b32_e32 v44, v15			; GFX9-NEXT: v_mov_b32_e32 v43, v15
	; GFX9-NEXT: v_mov_b32_e32 v43, v14			; GFX9-NEXT: v_mov_b32_e32 v42, v14
	; GFX9-NEXT: v_mov_b32_e32 v42, v13			; GFX9-NEXT: v_mov_b32_e32 v41, v13
	; GFX9-NEXT: v_mov_b32_e32 v41, v12			; GFX9-NEXT: v_mov_b32_e32 v40, v12
	; GFX9-NEXT: image_gather4_c_b_cl v[0:3], v[41:45], s[4:11], s[4:7] dmask:0x1			; GFX9-NEXT: image_gather4_c_b_cl v[0:3], v[40:44], s[4:11], s[4:7] dmask:0x1
	; GFX9-NEXT: s_addk_i32 s32, 0x800			; GFX9-NEXT: s_addk_i32 s32, 0x800
	; GFX9-NEXT: v_writelane_b32 v46, s4, 0			; GFX9-NEXT: v_writelane_b32 v46, s4, 0
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, extern_func@gotpcrel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, extern_func@gotpcrel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, extern_func@gotpcrel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, extern_func@gotpcrel32@hi+12
	; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: ; implicit-def: $vgpr45
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v45, s30, 0
				; GFX9-NEXT: v_writelane_b32 v45, s31, 1
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: global_store_dwordx4 v[0:1], v[0:3], off			; GFX9-NEXT: global_store_dwordx4 v[0:1], v[0:3], off
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: image_gather4_c_b_cl v[0:3], v[41:45], s[4:11], s[4:7] dmask:0x1			; GFX9-NEXT: image_gather4_c_b_cl v[0:3], v[40:44], s[4:11], s[4:7] dmask:0x1
	; GFX9-NEXT: s_nop 0			; GFX9-NEXT: s_nop 0
	; GFX9-NEXT: buffer_load_dword v45, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v44, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v45, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v45, 0
	; GFX9-NEXT: v_readlane_b32 s4, v46, 0			; GFX9-NEXT: v_readlane_b32 s4, v46, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:20 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v45, off, s[0:3], s33 offset:20 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v46, off, s[0:3], s33 offset:24 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v46, off, s[0:3], s33 offset:24 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_addk_i32 s32, 0xf800			; GFX9-NEXT: s_addk_i32 s32, 0xf800
	; GFX9-NEXT: s_mov_b32 s33, s4			; GFX9-NEXT: s_mov_b32 s33, s4
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: call_preserved_vgpr_tuple8:			; GFX10-LABEL: call_preserved_vgpr_tuple8:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_mov_b32 s4, s33			; GFX10-NEXT: s_mov_b32 s4, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_or_saveexec_b32 s5, -1			; GFX10-NEXT: s_or_saveexec_b32 s5, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:20 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:20 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v46, off, s[0:3], s33 offset:24 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v46, off, s[0:3], s33 offset:24 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s5			; GFX10-NEXT: s_mov_b32 exec_lo, s5
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v45, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v44, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: image_gather4_c_b_cl v[0:3], v[12:16], s[4:11], s[4:7] dmask:0x1 dim:SQ_RSRC_IMG_2D			; GFX10-NEXT: image_gather4_c_b_cl v[0:3], v[12:16], s[4:11], s[4:7] dmask:0x1 dim:SQ_RSRC_IMG_2D
	; GFX10-NEXT: s_addk_i32 s32, 0x400			; GFX10-NEXT: s_addk_i32 s32, 0x400
	; GFX10-NEXT: v_writelane_b32 v46, s4, 0			; GFX10-NEXT: v_writelane_b32 v46, s4, 0
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, extern_func@gotpcrel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, extern_func@gotpcrel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, extern_func@gotpcrel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, extern_func@gotpcrel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: ; implicit-def: $vgpr45
				; GFX10-NEXT: v_mov_b32_e32 v40, v16
	; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX10-NEXT: v_mov_b32_e32 v41, v16			; GFX10-NEXT: v_writelane_b32 v45, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v42, v15			; GFX10-NEXT: v_mov_b32_e32 v41, v15
	; GFX10-NEXT: v_mov_b32_e32 v43, v14			; GFX10-NEXT: v_mov_b32_e32 v42, v14
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_mov_b32_e32 v43, v13
	; GFX10-NEXT: v_mov_b32_e32 v44, v13			; GFX10-NEXT: v_mov_b32_e32 v44, v12
	; GFX10-NEXT: v_mov_b32_e32 v45, v12			; GFX10-NEXT: v_writelane_b32 v45, s31, 1
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: global_store_dwordx4 v[0:1], v[0:3], off			; GFX10-NEXT: global_store_dwordx4 v[0:1], v[0:3], off
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: image_gather4_c_b_cl v[0:3], [v45, v44, v43, v42, v41], s[4:11], s[4:7] dmask:0x1 dim:SQ_RSRC_IMG_2D			; GFX10-NEXT: image_gather4_c_b_cl v[0:3], [v44, v43, v42, v41, v40], s[4:11], s[4:7] dmask:0x1 dim:SQ_RSRC_IMG_2D
	; GFX10-NEXT: s_clause 0x4			; GFX10-NEXT: s_clause 0x4
	; GFX10-NEXT: buffer_load_dword v45, off, s[0:3], s33			; GFX10-NEXT: buffer_load_dword v44, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:4			; GFX10-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:4
	; GFX10-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:8			; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:8
	; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:12			; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:12
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:16			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:16
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v45, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v45, 0
	; GFX10-NEXT: v_readlane_b32 s4, v46, 0			; GFX10-NEXT: v_readlane_b32 s4, v46, 0
	; GFX10-NEXT: s_or_saveexec_b32 s5, -1			; GFX10-NEXT: s_or_saveexec_b32 s5, -1
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: s_clause 0x1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:20			; GFX10-NEXT: buffer_load_dword v45, off, s[0:3], s33 offset:20
	; GFX10-NEXT: buffer_load_dword v46, off, s[0:3], s33 offset:24			; GFX10-NEXT: buffer_load_dword v46, off, s[0:3], s33 offset:24
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s5			; GFX10-NEXT: s_mov_b32 exec_lo, s5
	; GFX10-NEXT: s_addk_i32 s32, 0xfc00			; GFX10-NEXT: s_addk_i32 s32, 0xfc00
	; GFX10-NEXT: s_mov_b32 s33, s4			; GFX10-NEXT: s_mov_b32 s33, s4
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: call_preserved_vgpr_tuple8:			; GFX11-LABEL: call_preserved_vgpr_tuple8:
	; GFX11: ; %bb.0: ; %main_body			; GFX11: ; %bb.0: ; %main_body
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: s_clause 0x1
	; GFX11-NEXT: scratch_store_b32 off, v40, s33 offset:20			; GFX11-NEXT: scratch_store_b32 off, v45, s33 offset:20
	; GFX11-NEXT: scratch_store_b32 off, v46, s33 offset:24			; GFX11-NEXT: scratch_store_b32 off, v46, s33 offset:24
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_clause 0x4			; GFX11-NEXT: s_clause 0x4
	; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:16			; GFX11-NEXT: scratch_store_b32 off, v40, s33 offset:16
	; GFX11-NEXT: scratch_store_b32 off, v42, s33 offset:12			; GFX11-NEXT: scratch_store_b32 off, v41, s33 offset:12
	; GFX11-NEXT: scratch_store_b32 off, v43, s33 offset:8			; GFX11-NEXT: scratch_store_b32 off, v42, s33 offset:8
	; GFX11-NEXT: scratch_store_b32 off, v44, s33 offset:4			; GFX11-NEXT: scratch_store_b32 off, v43, s33 offset:4
	; GFX11-NEXT: scratch_store_b32 off, v45, s33			; GFX11-NEXT: scratch_store_b32 off, v44, s33
	; GFX11-NEXT: image_gather4_c_b_cl v[0:3], v[12:16], s[0:7], s[0:3] dmask:0x1 dim:SQ_RSRC_IMG_2D			; GFX11-NEXT: image_gather4_c_b_cl v[0:3], v[12:16], s[0:7], s[0:3] dmask:0x1 dim:SQ_RSRC_IMG_2D
	; GFX11-NEXT: s_add_i32 s32, s32, 32			; GFX11-NEXT: s_add_i32 s32, s32, 32
	; GFX11-NEXT: v_writelane_b32 v46, s0, 0			; GFX11-NEXT: v_writelane_b32 v46, s0, 0
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, extern_func@gotpcrel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, extern_func@gotpcrel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, extern_func@gotpcrel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, extern_func@gotpcrel32@hi+12
	; GFX11-NEXT: v_writelane_b32 v40, s30, 0			; GFX11-NEXT: ; implicit-def: $vgpr45
				; GFX11-NEXT: v_dual_mov_b32 v40, v16 :: v_dual_mov_b32 v41, v15
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
	; GFX11-NEXT: v_dual_mov_b32 v41, v16 :: v_dual_mov_b32 v42, v15			; GFX11-NEXT: v_writelane_b32 v45, s30, 0
	; GFX11-NEXT: v_dual_mov_b32 v43, v14 :: v_dual_mov_b32 v44, v13			; GFX11-NEXT: v_dual_mov_b32 v42, v14 :: v_dual_mov_b32 v43, v13
	; GFX11-NEXT: v_writelane_b32 v40, s31, 1			; GFX11-NEXT: v_mov_b32_e32 v44, v12
	; GFX11-NEXT: v_mov_b32_e32 v45, v12			; GFX11-NEXT: v_writelane_b32 v45, s31, 1
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: global_store_b128 v[0:1], v[0:3], off			; GFX11-NEXT: global_store_b128 v[0:1], v[0:3], off
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: image_gather4_c_b_cl v[0:3], [v45, v44, v43, v42, v41], s[0:7], s[0:3] dmask:0x1 dim:SQ_RSRC_IMG_2D			; GFX11-NEXT: image_gather4_c_b_cl v[0:3], [v44, v43, v42, v41, v40], s[0:7], s[0:3] dmask:0x1 dim:SQ_RSRC_IMG_2D
	; GFX11-NEXT: s_clause 0x4			; GFX11-NEXT: s_clause 0x4
	; GFX11-NEXT: scratch_load_b32 v45, off, s33			; GFX11-NEXT: scratch_load_b32 v44, off, s33
	; GFX11-NEXT: scratch_load_b32 v44, off, s33 offset:4			; GFX11-NEXT: scratch_load_b32 v43, off, s33 offset:4
	; GFX11-NEXT: scratch_load_b32 v43, off, s33 offset:8			; GFX11-NEXT: scratch_load_b32 v42, off, s33 offset:8
	; GFX11-NEXT: scratch_load_b32 v42, off, s33 offset:12			; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:12
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:16			; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:16
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v45, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v45, 0
	; GFX11-NEXT: v_readlane_b32 s0, v46, 0			; GFX11-NEXT: v_readlane_b32 s0, v46, 0
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: s_clause 0x1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:20			; GFX11-NEXT: scratch_load_b32 v45, off, s33 offset:20
	; GFX11-NEXT: scratch_load_b32 v46, off, s33 offset:24			; GFX11-NEXT: scratch_load_b32 v46, off, s33 offset:24
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_addk_i32 s32, 0xffe0			; GFX11-NEXT: s_addk_i32 s32, 0xffe0
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]


	Show All 18 Lines

llvm/test/CodeGen/AMDGPU/wwm-register-spill-during-regalloc.ll

This file was added.

				; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx906 -stop-after=virtregrewriter,1 --verify-machineinstrs -o - %s \| FileCheck -check-prefix=WWM-SPILL %s
				; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx906 -O0 -stop-after=regallocfast,1 --verify-machineinstrs -o - %s \| FileCheck -check-prefix=WWM-SPILL-O0 %s
				; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx906 --verify-machineinstrs -o - %s \| FileCheck -check-prefix=GCN %s
				; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx906 -O0 --verify-machineinstrs -o - %s \| FileCheck -check-prefix=GCN-O0 %s

				; Test whole-wave register spilling.

				; In the testcase, the return address registers (SGPR30_SGPR31) should be preserved across the call.
				; Since the test limits the VGPR numbers, they are all in the call-clobber (scratch) range and RA should
				; spill any VGPR borrowed for spilling SGPRs. The writelane/readlane instructions that spill/restore
				; SGPRs into/from VGPR are whole-wave operations and hence the VGPRs involved in such operations require
				; whole-wave spilling.

				define void @test() #0 {
				; WWM-SPILL-LABEL: name: test
				; WWM-SPILL: bb.0 (%ir-block.0):
				; WWM-SPILL-NEXT: liveins: $sgpr12, $sgpr13, $sgpr14, $sgpr15, $sgpr30, $sgpr31, $vgpr31, $sgpr4_sgpr5, $sgpr6_sgpr7, $sgpr8_sgpr9, $sgpr10_sgpr11
				; WWM-SPILL-NEXT: {{ $}}
				; WWM-SPILL-NEXT: renamable $vgpr0 = IMPLICIT_DEF
				; WWM-SPILL-NEXT: renamable $vgpr0 = V_WRITELANE_B32 killed $sgpr30, 0, killed $vgpr0
				; WWM-SPILL-NEXT: renamable $vgpr0 = V_WRITELANE_B32 killed $sgpr31, 1, killed $vgpr0
				; WWM-SPILL-NEXT: SI_SPILL_WWM_V32_SAVE killed $vgpr0, %stack.2, $sgpr32, 0, implicit $exec :: (store (s32) into %stack.2, addrspace 5)
				; WWM-SPILL-NEXT: ADJCALLSTACKUP 0, 0, implicit-def dead $scc, implicit-def $sgpr32, implicit $sgpr32
				; WWM-SPILL-NEXT: renamable $sgpr16_sgpr17 = SI_PC_ADD_REL_OFFSET target-flags(amdgpu-gotprel32-lo) @ext_func + 4, target-flags(amdgpu-gotprel32-hi) @ext_func + 12, implicit-def dead $scc
				; WWM-SPILL-NEXT: renamable $sgpr16_sgpr17 = S_LOAD_DWORDX2_IMM killed renamable $sgpr16_sgpr17, 0, 0 :: (dereferenceable invariant load (s64) from got, addrspace 4)
				; WWM-SPILL-NEXT: dead $sgpr30_sgpr31 = SI_CALL killed renamable $sgpr16_sgpr17, @ext_func, csr_amdgpu, implicit $sgpr4_sgpr5, implicit $sgpr6_sgpr7, implicit $sgpr8_sgpr9, implicit $sgpr10_sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $vgpr31, implicit $sgpr0_sgpr1_sgpr2_sgpr3
				; WWM-SPILL-NEXT: ADJCALLSTACKDOWN 0, 0, implicit-def dead $scc, implicit-def $sgpr32, implicit $sgpr32
				; WWM-SPILL-NEXT: renamable $vgpr0 = SI_SPILL_WWM_V32_RESTORE %stack.2, $sgpr32, 0, implicit $exec :: (load (s32) from %stack.2, addrspace 5)
				; WWM-SPILL-NEXT: $sgpr31 = V_READLANE_B32 $vgpr0, 1
				; WWM-SPILL-NEXT: $sgpr30 = V_READLANE_B32 killed $vgpr0, 0
				; WWM-SPILL-NEXT: SI_RETURN
				;
				; WWM-SPILL-O0-LABEL: name: test
				; WWM-SPILL-O0: bb.0 (%ir-block.0):
				; WWM-SPILL-O0-NEXT: liveins: $sgpr12, $sgpr13, $sgpr14, $sgpr15, $sgpr30, $sgpr31, $vgpr31, $sgpr4_sgpr5, $sgpr6_sgpr7, $sgpr8_sgpr9, $sgpr10_sgpr11
				; WWM-SPILL-O0-NEXT: {{ $}}
				; WWM-SPILL-O0-NEXT: renamable $vgpr0 = IMPLICIT_DEF
				; WWM-SPILL-O0-NEXT: renamable $vgpr0 = V_WRITELANE_B32 killed $sgpr30, 0, $vgpr0
				; WWM-SPILL-O0-NEXT: renamable $vgpr0 = V_WRITELANE_B32 killed $sgpr31, 1, $vgpr0
				; WWM-SPILL-O0-NEXT: SI_SPILL_WWM_V32_SAVE $vgpr0, %stack.2, $sgpr32, 0, implicit $exec :: (store (s32) into %stack.2, addrspace 5)
				; WWM-SPILL-O0-NEXT: renamable $vgpr0 = COPY $vgpr31
				; WWM-SPILL-O0-NEXT: ADJCALLSTACKUP 0, 0, implicit-def dead $scc, implicit-def $sgpr32, implicit $sgpr32
				; WWM-SPILL-O0-NEXT: renamable $sgpr16_sgpr17 = SI_PC_ADD_REL_OFFSET target-flags(amdgpu-gotprel32-lo) @ext_func + 4, target-flags(amdgpu-gotprel32-hi) @ext_func + 12, implicit-def dead $scc
				; WWM-SPILL-O0-NEXT: renamable $sgpr16_sgpr17 = S_LOAD_DWORDX2_IMM killed renamable $sgpr16_sgpr17, 0, 0 :: (dereferenceable invariant load (s64) from got, addrspace 4)
				; WWM-SPILL-O0-NEXT: renamable $sgpr20_sgpr21_sgpr22_sgpr23 = COPY $sgpr0_sgpr1_sgpr2_sgpr3
				; WWM-SPILL-O0-NEXT: $vgpr31 = COPY killed renamable $vgpr0
				; WWM-SPILL-O0-NEXT: $sgpr0_sgpr1_sgpr2_sgpr3 = COPY killed renamable $sgpr20_sgpr21_sgpr22_sgpr23
				; WWM-SPILL-O0-NEXT: dead $sgpr30_sgpr31 = SI_CALL killed renamable $sgpr16_sgpr17, @ext_func, csr_amdgpu, implicit killed $sgpr4_sgpr5, implicit killed $sgpr6_sgpr7, implicit killed $sgpr8_sgpr9, implicit killed $sgpr10_sgpr11, implicit killed $sgpr12, implicit killed $sgpr13, implicit killed $sgpr14, implicit killed $sgpr15, implicit $vgpr31, implicit $sgpr0_sgpr1_sgpr2_sgpr3
				; WWM-SPILL-O0-NEXT: $vgpr0 = SI_SPILL_WWM_V32_RESTORE %stack.2, $sgpr32, 0, implicit $exec :: (load (s32) from %stack.2, addrspace 5)
				; WWM-SPILL-O0-NEXT: ADJCALLSTACKDOWN 0, 0, implicit-def dead $scc, implicit-def $sgpr32, implicit $sgpr32
				; WWM-SPILL-O0-NEXT: dead $sgpr31 = V_READLANE_B32 $vgpr0, 1
				; WWM-SPILL-O0-NEXT: dead $sgpr30 = V_READLANE_B32 killed $vgpr0, 0
				; WWM-SPILL-O0-NEXT: SI_RETURN
				;
				; GCN-LABEL: test:
				; GCN: ; %bb.0:
				; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GCN-NEXT: s_mov_b32 s16, s33
				; GCN-NEXT: s_mov_b32 s33, s32
				; GCN-NEXT: s_xor_saveexec_b64 s[18:19], -1
				; GCN-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
				; GCN-NEXT: s_mov_b64 exec, s[18:19]
				; GCN-NEXT: v_mov_b32_e32 v1, s34
				; GCN-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
				; GCN-NEXT: v_mov_b32_e32 v1, s35
				; GCN-NEXT: ; implicit-def: $vgpr0
				; GCN-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
				; GCN-NEXT: v_mov_b32_e32 v1, s16
				; GCN-NEXT: v_writelane_b32 v0, s30, 0
				; GCN-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill
				; GCN-NEXT: s_addk_i32 s32, 0x800
				; GCN-NEXT: v_writelane_b32 v0, s31, 1
				; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1
				; GCN-NEXT: buffer_store_dword v0, off, s[0:3], s33 ; 4-byte Folded Spill
				; GCN-NEXT: s_mov_b64 exec, s[34:35]
				; GCN-NEXT: s_getpc_b64 s[16:17]
				; GCN-NEXT: s_add_u32 s16, s16, ext_func@gotpcrel32@lo+4
				; GCN-NEXT: s_addc_u32 s17, s17, ext_func@gotpcrel32@hi+12
				; GCN-NEXT: s_load_dwordx2 s[16:17], s[16:17], 0x0
				; GCN-NEXT: s_waitcnt lgkmcnt(0)
				; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]
				; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1
				; GCN-NEXT: buffer_load_dword v0, off, s[0:3], s33 ; 4-byte Folded Reload
				; GCN-NEXT: s_mov_b64 exec, s[34:35]
				; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: v_readlane_b32 s31, v0, 1
				; GCN-NEXT: v_readlane_b32 s30, v0, 0
				; GCN-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
				; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: v_readfirstlane_b32 s34, v0
				; GCN-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload
				; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: v_readfirstlane_b32 s35, v0
				; GCN-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload
				; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: v_readfirstlane_b32 s4, v0
				; GCN-NEXT: s_xor_saveexec_b64 s[6:7], -1
				; GCN-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
				; GCN-NEXT: s_mov_b64 exec, s[6:7]
				; GCN-NEXT: s_addk_i32 s32, 0xf800
				; GCN-NEXT: s_mov_b32 s33, s4
				; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: s_setpc_b64 s[30:31]
				;
				; GCN-O0-LABEL: test:
				; GCN-O0: ; %bb.0:
				; GCN-O0-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GCN-O0-NEXT: s_mov_b32 s16, s33
				; GCN-O0-NEXT: s_mov_b32 s33, s32
				; GCN-O0-NEXT: s_xor_saveexec_b64 s[18:19], -1
				; GCN-O0-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
				; GCN-O0-NEXT: s_mov_b64 exec, s[18:19]
				; GCN-O0-NEXT: v_mov_b32_e32 v1, s34
				; GCN-O0-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
				; GCN-O0-NEXT: v_mov_b32_e32 v1, s35
				; GCN-O0-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
				; GCN-O0-NEXT: v_mov_b32_e32 v1, s16
				; GCN-O0-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill
				; GCN-O0-NEXT: s_add_i32 s32, s32, 0x800
				; GCN-O0-NEXT: ; implicit-def: $vgpr0
				; GCN-O0-NEXT: v_writelane_b32 v0, s30, 0
				; GCN-O0-NEXT: v_writelane_b32 v0, s31, 1
				; GCN-O0-NEXT: s_or_saveexec_b64 s[34:35], -1
				; GCN-O0-NEXT: buffer_store_dword v0, off, s[0:3], s33 ; 4-byte Folded Spill
				; GCN-O0-NEXT: s_mov_b64 exec, s[34:35]
				; GCN-O0-NEXT: v_mov_b32_e32 v0, v31
				; GCN-O0-NEXT: s_getpc_b64 s[16:17]
				; GCN-O0-NEXT: s_add_u32 s16, s16, ext_func@gotpcrel32@lo+4
				; GCN-O0-NEXT: s_addc_u32 s17, s17, ext_func@gotpcrel32@hi+12
				; GCN-O0-NEXT: s_load_dwordx2 s[16:17], s[16:17], 0x0
				; GCN-O0-NEXT: s_mov_b64 s[22:23], s[2:3]
				; GCN-O0-NEXT: s_mov_b64 s[20:21], s[0:1]
				; GCN-O0-NEXT: v_mov_b32_e32 v31, v0
				; GCN-O0-NEXT: s_mov_b64 s[0:1], s[20:21]
				; GCN-O0-NEXT: s_mov_b64 s[2:3], s[22:23]
				; GCN-O0-NEXT: s_waitcnt lgkmcnt(0)
				; GCN-O0-NEXT: s_swappc_b64 s[30:31], s[16:17]
				; GCN-O0-NEXT: s_or_saveexec_b64 s[34:35], -1
				; GCN-O0-NEXT: buffer_load_dword v0, off, s[0:3], s33 ; 4-byte Folded Reload
				; GCN-O0-NEXT: s_mov_b64 exec, s[34:35]
				; GCN-O0-NEXT: s_waitcnt vmcnt(0)
				; GCN-O0-NEXT: v_readlane_b32 s31, v0, 1
				; GCN-O0-NEXT: v_readlane_b32 s30, v0, 0
				; GCN-O0-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
				; GCN-O0-NEXT: s_waitcnt vmcnt(0)
				; GCN-O0-NEXT: v_readfirstlane_b32 s34, v0
				; GCN-O0-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload
				; GCN-O0-NEXT: s_waitcnt vmcnt(0)
				; GCN-O0-NEXT: v_readfirstlane_b32 s35, v0
				; GCN-O0-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload
				; GCN-O0-NEXT: s_waitcnt vmcnt(0)
				; GCN-O0-NEXT: v_readfirstlane_b32 s4, v0
				; GCN-O0-NEXT: s_xor_saveexec_b64 s[6:7], -1
				; GCN-O0-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
				; GCN-O0-NEXT: s_mov_b64 exec, s[6:7]
				; GCN-O0-NEXT: s_add_i32 s32, s32, 0xfffff800
				; GCN-O0-NEXT: s_mov_b32 s33, s4
				; GCN-O0-NEXT: s_waitcnt vmcnt(0)
				; GCN-O0-NEXT: s_setpc_b64 s[30:31]
				call void @ext_func()
				ret void
				}

				declare void @ext_func();

				attributes #0 = { nounwind "amdgpu-num-vgpr"="4" }

llvm/test/CodeGen/AMDGPU/wwm-reserved-spill.ll

Show First 20 Lines • Show All 127 Lines • ▼ Show 20 Lines	; GFX9-O3-NEXT: s_setpc_b64 s[30:31]
ret void		ret void
}		}

define amdgpu_gfx void @strict_wwm_cfg(<4 x i32> inreg %tmp14, i32 %arg) {		define amdgpu_gfx void @strict_wwm_cfg(<4 x i32> inreg %tmp14, i32 %arg) {
; GFX9-O0-LABEL: strict_wwm_cfg:		; GFX9-O0-LABEL: strict_wwm_cfg:
; GFX9-O0: ; %bb.0: ; %entry		; GFX9-O0: ; %bb.0: ; %entry
; GFX9-O0-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-O0-NEXT: s_xor_saveexec_b64 s[34:35], -1		; GFX9-O0-NEXT: s_xor_saveexec_b64 s[34:35], -1
; GFX9-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:28 ; 4-byte Folded Spill
		; GFX9-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:32 ; 4-byte Folded Spill
; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-O0-NEXT: v_mov_b32_e32 v3, v0
; GFX9-O0-NEXT: s_mov_b32 s36, s4		; GFX9-O0-NEXT: s_mov_b32 s36, s4
; GFX9-O0-NEXT: ; kill: def $sgpr36 killed $sgpr36 def $sgpr36_sgpr37_sgpr38_sgpr39		; GFX9-O0-NEXT: ; kill: def $sgpr36 killed $sgpr36 def $sgpr36_sgpr37_sgpr38_sgpr39
; GFX9-O0-NEXT: s_mov_b32 s37, s5		; GFX9-O0-NEXT: s_mov_b32 s37, s5
; GFX9-O0-NEXT: s_mov_b32 s38, s6		; GFX9-O0-NEXT: s_mov_b32 s38, s6
; GFX9-O0-NEXT: s_mov_b32 s39, s7		; GFX9-O0-NEXT: s_mov_b32 s39, s7
; GFX9-O0-NEXT: s_mov_b64 s[42:43], s[38:39]		; GFX9-O0-NEXT: s_mov_b64 s[42:43], s[38:39]
; GFX9-O0-NEXT: s_mov_b64 s[40:41], s[36:37]		; GFX9-O0-NEXT: s_mov_b64 s[40:41], s[36:37]
; GFX9-O0-NEXT: v_writelane_b32 v3, s40, 0		; GFX9-O0-NEXT: ; implicit-def: $vgpr0
; GFX9-O0-NEXT: v_writelane_b32 v3, s41, 1		; GFX9-O0-NEXT: v_writelane_b32 v0, s40, 0
; GFX9-O0-NEXT: v_writelane_b32 v3, s42, 2		; GFX9-O0-NEXT: v_writelane_b32 v0, s41, 1
; GFX9-O0-NEXT: v_writelane_b32 v3, s43, 3		; GFX9-O0-NEXT: v_writelane_b32 v0, s42, 2
		; GFX9-O0-NEXT: v_writelane_b32 v0, s43, 3
; GFX9-O0-NEXT: s_mov_b32 s34, 0		; GFX9-O0-NEXT: s_mov_b32 s34, 0
; GFX9-O0-NEXT: buffer_load_dwordx2 v[4:5], off, s[36:39], s34		; GFX9-O0-NEXT: buffer_load_dwordx2 v[4:5], off, s[36:39], s34
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: buffer_store_dword v5, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v5, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
; GFX9-O0-NEXT: ; implicit-def: $sgpr36_sgpr37		; GFX9-O0-NEXT: ; implicit-def: $sgpr36_sgpr37
; GFX9-O0-NEXT: v_mov_b32_e32 v1, v4		; GFX9-O0-NEXT: v_mov_b32_e32 v1, v4
; GFX9-O0-NEXT: s_not_b64 exec, exec		; GFX9-O0-NEXT: s_not_b64 exec, exec
; GFX9-O0-NEXT: v_mov_b32_e32 v1, s34		; GFX9-O0-NEXT: v_mov_b32_e32 v1, s34
; GFX9-O0-NEXT: s_not_b64 exec, exec		; GFX9-O0-NEXT: s_not_b64 exec, exec
; GFX9-O0-NEXT: s_or_saveexec_b64 s[36:37], -1		; GFX9-O0-NEXT: s_or_saveexec_b64 s[36:37], -1
; GFX9-O0-NEXT: v_mov_b32_e32 v2, s34		; GFX9-O0-NEXT: v_mov_b32_e32 v2, s34
; GFX9-O0-NEXT: s_nop 1		; GFX9-O0-NEXT: s_nop 1
; GFX9-O0-NEXT: v_mov_b32_dpp v2, v1 row_bcast:31 row_mask:0xc bank_mask:0xf		; GFX9-O0-NEXT: v_mov_b32_dpp v2, v1 row_bcast:31 row_mask:0xc bank_mask:0xf
; GFX9-O0-NEXT: v_add_u32_e64 v1, v1, v2		; GFX9-O0-NEXT: v_add_u32_e64 v1, v1, v2
; GFX9-O0-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-O0-NEXT: s_mov_b64 exec, s[36:37]
; GFX9-O0-NEXT: v_mov_b32_e32 v4, v1		; GFX9-O0-NEXT: v_mov_b32_e32 v4, v1
; GFX9-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
; GFX9-O0-NEXT: v_cmp_eq_u32_e64 s[36:37], v0, s34		; GFX9-O0-NEXT: v_cmp_eq_u32_e64 s[36:37], v3, s34
; GFX9-O0-NEXT: v_mov_b32_e32 v0, s34		; GFX9-O0-NEXT: v_mov_b32_e32 v3, s34
; GFX9-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-O0-NEXT: s_mov_b64 s[34:35], exec		; GFX9-O0-NEXT: s_mov_b64 s[34:35], exec
; GFX9-O0-NEXT: v_writelane_b32 v3, s34, 4		; GFX9-O0-NEXT: v_writelane_b32 v0, s34, 4
; GFX9-O0-NEXT: v_writelane_b32 v3, s35, 5		; GFX9-O0-NEXT: v_writelane_b32 v0, s35, 5
		; GFX9-O0-NEXT: s_or_saveexec_b64 s[44:45], -1
		; GFX9-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill
		; GFX9-O0-NEXT: s_mov_b64 exec, s[44:45]
; GFX9-O0-NEXT: s_and_b64 s[34:35], s[34:35], s[36:37]		; GFX9-O0-NEXT: s_and_b64 s[34:35], s[34:35], s[36:37]
; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-O0-NEXT: s_cbranch_execz .LBB1_2		; GFX9-O0-NEXT: s_cbranch_execz .LBB1_2
; GFX9-O0-NEXT: ; %bb.1: ; %if		; GFX9-O0-NEXT: ; %bb.1: ; %if
; GFX9-O0-NEXT: buffer_load_dword v4, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_nop 0		; GFX9-O0-NEXT: s_nop 0
; GFX9-O0-NEXT: buffer_load_dword v5, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v4, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: v_mov_b32_e32 v0, v5		; GFX9-O0-NEXT: v_mov_b32_e32 v0, v4
; GFX9-O0-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-O0-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-O0-NEXT: v_mov_b32_e32 v1, 0		; GFX9-O0-NEXT: v_mov_b32_e32 v1, 0
; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-O0-NEXT: v_mov_b32_e32 v2, v0		; GFX9-O0-NEXT: v_mov_b32_e32 v2, v0
; GFX9-O0-NEXT: s_not_b64 exec, exec		; GFX9-O0-NEXT: s_not_b64 exec, exec
; GFX9-O0-NEXT: v_mov_b32_e32 v2, v1		; GFX9-O0-NEXT: v_mov_b32_e32 v2, v1
; GFX9-O0-NEXT: s_not_b64 exec, exec		; GFX9-O0-NEXT: s_not_b64 exec, exec
; GFX9-O0-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-O0-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-O0-NEXT: v_mov_b32_dpp v1, v2 row_bcast:31 row_mask:0xc bank_mask:0xf		; GFX9-O0-NEXT: v_mov_b32_dpp v1, v2 row_bcast:31 row_mask:0xc bank_mask:0xf
; GFX9-O0-NEXT: v_add_u32_e64 v1, v2, v1		; GFX9-O0-NEXT: v_add_u32_e64 v1, v2, v1
; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-O0-NEXT: v_mov_b32_e32 v0, v1		; GFX9-O0-NEXT: v_mov_b32_e32 v0, v1
; GFX9-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-O0-NEXT: .LBB1_2: ; %merge		; GFX9-O0-NEXT: .LBB1_2: ; %merge
; GFX9-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_nop 0		; GFX9-O0-NEXT: s_nop 0
		; GFX9-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
		; GFX9-O0-NEXT: s_or_saveexec_b64 s[44:45], -1
; GFX9-O0-NEXT: buffer_load_dword v4, off, s[0:3], s32 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v4, off, s[0:3], s32 ; 4-byte Folded Reload
; GFX9-O0-NEXT: v_readlane_b32 s34, v3, 4		; GFX9-O0-NEXT: s_mov_b64 exec, s[44:45]
; GFX9-O0-NEXT: v_readlane_b32 s35, v3, 5
; GFX9-O0-NEXT: s_or_b64 exec, exec, s[34:35]
; GFX9-O0-NEXT: v_readlane_b32 s36, v3, 0
; GFX9-O0-NEXT: v_readlane_b32 s37, v3, 1
; GFX9-O0-NEXT: v_readlane_b32 s38, v3, 2
; GFX9-O0-NEXT: v_readlane_b32 s39, v3, 3
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: v_cmp_eq_u32_e64 s[34:35], v0, v4		; GFX9-O0-NEXT: v_readlane_b32 s34, v4, 4
		; GFX9-O0-NEXT: v_readlane_b32 s35, v4, 5
		; GFX9-O0-NEXT: s_or_b64 exec, exec, s[34:35]
		; GFX9-O0-NEXT: v_readlane_b32 s36, v4, 0
		; GFX9-O0-NEXT: v_readlane_b32 s37, v4, 1
		; GFX9-O0-NEXT: v_readlane_b32 s38, v4, 2
		; GFX9-O0-NEXT: v_readlane_b32 s39, v4, 3
		; GFX9-O0-NEXT: v_cmp_eq_u32_e64 s[34:35], v0, v3
; GFX9-O0-NEXT: v_cndmask_b32_e64 v0, 0, 1, s[34:35]		; GFX9-O0-NEXT: v_cndmask_b32_e64 v0, 0, 1, s[34:35]
; GFX9-O0-NEXT: s_mov_b32 s34, 1		; GFX9-O0-NEXT: s_mov_b32 s34, 1
; GFX9-O0-NEXT: v_lshlrev_b32_e64 v0, s34, v0		; GFX9-O0-NEXT: v_lshlrev_b32_e64 v0, s34, v0
; GFX9-O0-NEXT: s_mov_b32 s34, 2		; GFX9-O0-NEXT: s_mov_b32 s34, 2
; GFX9-O0-NEXT: v_and_b32_e64 v0, v0, s34		; GFX9-O0-NEXT: v_and_b32_e64 v0, v0, s34
; GFX9-O0-NEXT: s_mov_b32 s34, 0		; GFX9-O0-NEXT: s_mov_b32 s34, 0
; GFX9-O0-NEXT: buffer_store_dword v0, off, s[36:39], s34 offset:4		; GFX9-O0-NEXT: buffer_store_dword v0, off, s[36:39], s34 offset:4
; GFX9-O0-NEXT: s_xor_saveexec_b64 s[34:35], -1		; GFX9-O0-NEXT: s_xor_saveexec_b64 s[34:35], -1
; GFX9-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload
; GFX9-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v4, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload
; GFX9-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:28 ; 4-byte Folded Reload
		; GFX9-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:32 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: s_setpc_b64 s[30:31]		; GFX9-O0-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX9-O3-LABEL: strict_wwm_cfg:		; GFX9-O3-LABEL: strict_wwm_cfg:
; GFX9-O3: ; %bb.0: ; %entry		; GFX9-O3: ; %bb.0: ; %entry
; GFX9-O3-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-O3-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-O3-NEXT: s_xor_saveexec_b64 s[34:35], -1		; GFX9-O3-NEXT: s_xor_saveexec_b64 s[34:35], -1
▲ Show 20 Lines • Show All 100 Lines • ▼ Show 20 Lines

define amdgpu_gfx void @strict_wwm_call(<4 x i32> inreg %tmp14, i32 inreg %arg) {		define amdgpu_gfx void @strict_wwm_call(<4 x i32> inreg %tmp14, i32 inreg %arg) {
; GFX9-O0-LABEL: strict_wwm_call:		; GFX9-O0-LABEL: strict_wwm_call:
; GFX9-O0: ; %bb.0:		; GFX9-O0: ; %bb.0:
; GFX9-O0-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-O0-NEXT: s_mov_b32 s35, s33		; GFX9-O0-NEXT: s_mov_b32 s35, s33
; GFX9-O0-NEXT: s_mov_b32 s33, s32		; GFX9-O0-NEXT: s_mov_b32 s33, s32
; GFX9-O0-NEXT: s_xor_saveexec_b64 s[36:37], -1		; GFX9-O0-NEXT: s_xor_saveexec_b64 s[36:37], -1
; GFX9-O0-NEXT: buffer_store_dword v3, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_store_dword v2, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v2, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
; GFX9-O0-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-O0-NEXT: s_mov_b64 exec, s[36:37]
; GFX9-O0-NEXT: s_add_i32 s32, s32, 0x400		; GFX9-O0-NEXT: s_add_i32 s32, s32, 0x800
; GFX9-O0-NEXT: v_writelane_b32 v3, s30, 0		; GFX9-O0-NEXT: ; implicit-def: $vgpr0
; GFX9-O0-NEXT: v_writelane_b32 v3, s31, 1		; GFX9-O0-NEXT: v_writelane_b32 v0, s30, 0
		; GFX9-O0-NEXT: v_writelane_b32 v0, s31, 1
		; GFX9-O0-NEXT: s_or_saveexec_b64 s[48:49], -1
		; GFX9-O0-NEXT: buffer_store_dword v0, off, s[0:3], s33 ; 4-byte Folded Spill
		; GFX9-O0-NEXT: s_mov_b64 exec, s[48:49]
; GFX9-O0-NEXT: s_mov_b32 s36, s4		; GFX9-O0-NEXT: s_mov_b32 s36, s4
; GFX9-O0-NEXT: ; kill: def $sgpr36 killed $sgpr36 def $sgpr36_sgpr37_sgpr38_sgpr39		; GFX9-O0-NEXT: ; kill: def $sgpr36 killed $sgpr36 def $sgpr36_sgpr37_sgpr38_sgpr39
; GFX9-O0-NEXT: s_mov_b32 s37, s5		; GFX9-O0-NEXT: s_mov_b32 s37, s5
; GFX9-O0-NEXT: s_mov_b32 s38, s6		; GFX9-O0-NEXT: s_mov_b32 s38, s6
; GFX9-O0-NEXT: s_mov_b32 s39, s7		; GFX9-O0-NEXT: s_mov_b32 s39, s7
; GFX9-O0-NEXT: ; kill: def $sgpr40_sgpr41_sgpr42_sgpr43 killed $sgpr36_sgpr37_sgpr38_sgpr39		; GFX9-O0-NEXT: ; kill: def $sgpr40_sgpr41_sgpr42_sgpr43 killed $sgpr36_sgpr37_sgpr38_sgpr39
; GFX9-O0-NEXT: s_mov_b32 s34, 0		; GFX9-O0-NEXT: s_mov_b32 s34, 0
; GFX9-O0-NEXT: v_mov_b32_e32 v2, s8		; GFX9-O0-NEXT: v_mov_b32_e32 v2, s8
; GFX9-O0-NEXT: s_not_b64 exec, exec		; GFX9-O0-NEXT: s_not_b64 exec, exec
; GFX9-O0-NEXT: v_mov_b32_e32 v2, s34		; GFX9-O0-NEXT: v_mov_b32_e32 v2, s34
; GFX9-O0-NEXT: s_not_b64 exec, exec		; GFX9-O0-NEXT: s_not_b64 exec, exec
; GFX9-O0-NEXT: s_or_saveexec_b64 s[40:41], -1		; GFX9-O0-NEXT: s_or_saveexec_b64 s[40:41], -1
; GFX9-O0-NEXT: s_getpc_b64 s[42:43]		; GFX9-O0-NEXT: s_getpc_b64 s[42:43]
; GFX9-O0-NEXT: s_add_u32 s42, s42, strict_wwm_called@rel32@lo+4		; GFX9-O0-NEXT: s_add_u32 s42, s42, strict_wwm_called@rel32@lo+4
; GFX9-O0-NEXT: s_addc_u32 s43, s43, strict_wwm_called@rel32@hi+12		; GFX9-O0-NEXT: s_addc_u32 s43, s43, strict_wwm_called@rel32@hi+12
; GFX9-O0-NEXT: s_mov_b64 s[46:47], s[2:3]		; GFX9-O0-NEXT: s_mov_b64 s[46:47], s[2:3]
; GFX9-O0-NEXT: s_mov_b64 s[44:45], s[0:1]		; GFX9-O0-NEXT: s_mov_b64 s[44:45], s[0:1]
; GFX9-O0-NEXT: s_mov_b64 s[0:1], s[44:45]		; GFX9-O0-NEXT: s_mov_b64 s[0:1], s[44:45]
; GFX9-O0-NEXT: s_mov_b64 s[2:3], s[46:47]		; GFX9-O0-NEXT: s_mov_b64 s[2:3], s[46:47]
; GFX9-O0-NEXT: v_mov_b32_e32 v0, v2		; GFX9-O0-NEXT: v_mov_b32_e32 v0, v2
; GFX9-O0-NEXT: s_swappc_b64 s[30:31], s[42:43]		; GFX9-O0-NEXT: s_swappc_b64 s[30:31], s[42:43]
; GFX9-O0-NEXT: v_mov_b32_e32 v1, v0		; GFX9-O0-NEXT: v_mov_b32_e32 v1, v0
		; GFX9-O0-NEXT: s_or_saveexec_b64 s[48:49], -1
		; GFX9-O0-NEXT: buffer_load_dword v0, off, s[0:3], s33 ; 4-byte Folded Reload
		; GFX9-O0-NEXT: s_mov_b64 exec, s[48:49]
; GFX9-O0-NEXT: v_add_u32_e64 v1, v1, v2		; GFX9-O0-NEXT: v_add_u32_e64 v1, v1, v2
; GFX9-O0-NEXT: s_mov_b64 exec, s[40:41]		; GFX9-O0-NEXT: s_mov_b64 exec, s[40:41]
; GFX9-O0-NEXT: v_mov_b32_e32 v0, v1		; GFX9-O0-NEXT: v_mov_b32_e32 v3, v1
; GFX9-O0-NEXT: buffer_store_dword v0, off, s[36:39], s34 offset:4		; GFX9-O0-NEXT: buffer_store_dword v3, off, s[36:39], s34 offset:4
; GFX9-O0-NEXT: v_readlane_b32 s31, v3, 1		; GFX9-O0-NEXT: s_waitcnt vmcnt(1)
; GFX9-O0-NEXT: v_readlane_b32 s30, v3, 0		; GFX9-O0-NEXT: v_readlane_b32 s31, v0, 1
		; GFX9-O0-NEXT: v_readlane_b32 s30, v0, 0
; GFX9-O0-NEXT: s_xor_saveexec_b64 s[36:37], -1		; GFX9-O0-NEXT: s_xor_saveexec_b64 s[36:37], -1
; GFX9-O0-NEXT: buffer_load_dword v3, off, s[0:3], s33 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
; GFX9-O0-NEXT: buffer_load_dword v2, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v2, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
; GFX9-O0-NEXT: buffer_load_dword v1, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v1, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_mov_b64 exec, s[36:37]		; GFX9-O0-NEXT: s_mov_b64 exec, s[36:37]
; GFX9-O0-NEXT: s_add_i32 s32, s32, 0xfffffc00		; GFX9-O0-NEXT: s_add_i32 s32, s32, 0xfffff800
; GFX9-O0-NEXT: s_mov_b32 s33, s35		; GFX9-O0-NEXT: s_mov_b32 s33, s35
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: s_setpc_b64 s[30:31]		; GFX9-O0-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX9-O3-LABEL: strict_wwm_call:		; GFX9-O3-LABEL: strict_wwm_call:
; GFX9-O3: ; %bb.0:		; GFX9-O3: ; %bb.0:
; GFX9-O3-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-O3-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-O3-NEXT: s_mov_b32 s38, s33		; GFX9-O3-NEXT: s_mov_b32 s38, s33
; GFX9-O3-NEXT: s_mov_b32 s33, s32		; GFX9-O3-NEXT: s_mov_b32 s33, s32
; GFX9-O3-NEXT: s_xor_saveexec_b64 s[34:35], -1		; GFX9-O3-NEXT: s_xor_saveexec_b64 s[34:35], -1
; GFX9-O3-NEXT: buffer_store_dword v3, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-O3-NEXT: buffer_store_dword v3, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-O3-NEXT: buffer_store_dword v2, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-O3-NEXT: buffer_store_dword v2, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-O3-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill		; GFX9-O3-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
; GFX9-O3-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-O3-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-O3-NEXT: v_writelane_b32 v3, s30, 0		; GFX9-O3-NEXT: ; implicit-def: $vgpr3
; GFX9-O3-NEXT: s_addk_i32 s32, 0x400		; GFX9-O3-NEXT: s_addk_i32 s32, 0x400
		; GFX9-O3-NEXT: v_writelane_b32 v3, s30, 0
; GFX9-O3-NEXT: v_writelane_b32 v3, s31, 1		; GFX9-O3-NEXT: v_writelane_b32 v3, s31, 1
; GFX9-O3-NEXT: v_mov_b32_e32 v2, s8		; GFX9-O3-NEXT: v_mov_b32_e32 v2, s8
; GFX9-O3-NEXT: s_not_b64 exec, exec		; GFX9-O3-NEXT: s_not_b64 exec, exec
; GFX9-O3-NEXT: v_mov_b32_e32 v2, 0		; GFX9-O3-NEXT: v_mov_b32_e32 v2, 0
; GFX9-O3-NEXT: s_not_b64 exec, exec		; GFX9-O3-NEXT: s_not_b64 exec, exec
; GFX9-O3-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-O3-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-O3-NEXT: v_mov_b32_e32 v0, v2		; GFX9-O3-NEXT: v_mov_b32_e32 v0, v2
; GFX9-O3-NEXT: s_getpc_b64 s[36:37]		; GFX9-O3-NEXT: s_getpc_b64 s[36:37]
▲ Show 20 Lines • Show All 108 Lines • ▼ Show 20 Lines	; GFX9-O3-NEXT: s_setpc_b64 s[30:31]
%sub = sub i64 %mul, %add		%sub = sub i64 %mul, %add
ret i64 %sub		ret i64 %sub
}		}

define amdgpu_gfx void @strict_wwm_call_i64(<4 x i32> inreg %tmp14, i64 inreg %arg) {		define amdgpu_gfx void @strict_wwm_call_i64(<4 x i32> inreg %tmp14, i64 inreg %arg) {
; GFX9-O0-LABEL: strict_wwm_call_i64:		; GFX9-O0-LABEL: strict_wwm_call_i64:
; GFX9-O0: ; %bb.0:		; GFX9-O0: ; %bb.0:
; GFX9-O0-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-O0-NEXT: s_mov_b32 s42, s33		; GFX9-O0-NEXT: s_mov_b32 s44, s33
; GFX9-O0-NEXT: s_mov_b32 s33, s32		; GFX9-O0-NEXT: s_mov_b32 s33, s32
; GFX9-O0-NEXT: s_xor_saveexec_b64 s[34:35], -1		; GFX9-O0-NEXT: s_xor_saveexec_b64 s[34:35], -1
; GFX9-O0-NEXT: buffer_store_dword v10, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_store_dword v8, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v6, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
		; GFX9-O0-NEXT: buffer_store_dword v8, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: buffer_store_dword v9, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v9, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_store_dword v2, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v2, off, s[0:3], s33 offset:20 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_store_dword v3, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: buffer_store_dword v4, off, s[0:3], s33 offset:20 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_store_dword v3, off, s[0:3], s33 offset:24 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v3, off, s[0:3], s33 offset:24 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_store_dword v2, off, s[0:3], s33 offset:28 ; 4-byte Folded Spill
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
		; GFX9-O0-NEXT: buffer_store_dword v4, off, s[0:3], s33 offset:28 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_store_dword v3, off, s[0:3], s33 offset:32 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v3, off, s[0:3], s33 offset:32 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_store_dword v4, off, s[0:3], s33 offset:36 ; 4-byte Folded Spill		; GFX9-O0-NEXT: buffer_store_dword v2, off, s[0:3], s33 offset:36 ; 4-byte Folded Spill
; GFX9-O0-NEXT: buffer_store_dword v5, off, s[0:3], s33 offset:40 ; 4-byte Folded Spill		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-O0-NEXT: buffer_store_dword v3, off, s[0:3], s33 offset:40 ; 4-byte Folded Spill
; GFX9-O0-NEXT: s_add_i32 s32, s32, 0xc00		; GFX9-O0-NEXT: buffer_store_dword v4, off, s[0:3], s33 offset:44 ; 4-byte Folded Spill
; GFX9-O0-NEXT: v_writelane_b32 v10, s30, 0		; GFX9-O0-NEXT: buffer_store_dword v5, off, s[0:3], s33 offset:48 ; 4-byte Folded Spill
; GFX9-O0-NEXT: v_writelane_b32 v10, s31, 1		; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]
		; GFX9-O0-NEXT: s_add_i32 s32, s32, 0x1000
		; GFX9-O0-NEXT: ; implicit-def: $vgpr0
		; GFX9-O0-NEXT: v_writelane_b32 v0, s30, 0
		; GFX9-O0-NEXT: v_writelane_b32 v0, s31, 1
; GFX9-O0-NEXT: s_mov_b32 s34, s8		; GFX9-O0-NEXT: s_mov_b32 s34, s8
; GFX9-O0-NEXT: s_mov_b32 s36, s4		; GFX9-O0-NEXT: s_mov_b32 s36, s4
; GFX9-O0-NEXT: ; kill: def $sgpr36 killed $sgpr36 def $sgpr36_sgpr37_sgpr38_sgpr39		; GFX9-O0-NEXT: ; kill: def $sgpr36 killed $sgpr36 def $sgpr36_sgpr37_sgpr38_sgpr39
; GFX9-O0-NEXT: s_mov_b32 s37, s5		; GFX9-O0-NEXT: s_mov_b32 s37, s5
; GFX9-O0-NEXT: s_mov_b32 s38, s6		; GFX9-O0-NEXT: s_mov_b32 s38, s6
; GFX9-O0-NEXT: s_mov_b32 s39, s7		; GFX9-O0-NEXT: s_mov_b32 s39, s7
; GFX9-O0-NEXT: v_writelane_b32 v10, s36, 2		; GFX9-O0-NEXT: v_writelane_b32 v0, s36, 2
; GFX9-O0-NEXT: v_writelane_b32 v10, s37, 3		; GFX9-O0-NEXT: v_writelane_b32 v0, s37, 3
; GFX9-O0-NEXT: v_writelane_b32 v10, s38, 4		; GFX9-O0-NEXT: v_writelane_b32 v0, s38, 4
; GFX9-O0-NEXT: v_writelane_b32 v10, s39, 5		; GFX9-O0-NEXT: v_writelane_b32 v0, s39, 5
; GFX9-O0-NEXT: ; kill: def $sgpr34 killed $sgpr34 def $sgpr34_sgpr35		; GFX9-O0-NEXT: ; kill: def $sgpr34 killed $sgpr34 def $sgpr34_sgpr35
; GFX9-O0-NEXT: s_mov_b32 s35, s9		; GFX9-O0-NEXT: s_mov_b32 s35, s9
; GFX9-O0-NEXT: ; kill: def $sgpr40_sgpr41 killed $sgpr34_sgpr35		; GFX9-O0-NEXT: ; kill: def $sgpr40_sgpr41 killed $sgpr34_sgpr35
; GFX9-O0-NEXT: s_mov_b64 s[36:37], 0		; GFX9-O0-NEXT: s_mov_b64 s[36:37], 0
; GFX9-O0-NEXT: v_mov_b32_e32 v8, s34		; GFX9-O0-NEXT: v_mov_b32_e32 v8, s34
; GFX9-O0-NEXT: v_mov_b32_e32 v9, s35		; GFX9-O0-NEXT: v_mov_b32_e32 v9, s35
; GFX9-O0-NEXT: s_not_b64 exec, exec		; GFX9-O0-NEXT: s_not_b64 exec, exec
; GFX9-O0-NEXT: v_mov_b32_e32 v8, s36		; GFX9-O0-NEXT: v_mov_b32_e32 v8, s36
; GFX9-O0-NEXT: v_mov_b32_e32 v9, s37		; GFX9-O0-NEXT: v_mov_b32_e32 v9, s37
; GFX9-O0-NEXT: s_not_b64 exec, exec		; GFX9-O0-NEXT: s_not_b64 exec, exec
; GFX9-O0-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-O0-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-O0-NEXT: v_writelane_b32 v10, s34, 6		; GFX9-O0-NEXT: v_writelane_b32 v0, s34, 6
; GFX9-O0-NEXT: v_writelane_b32 v10, s35, 7		; GFX9-O0-NEXT: v_writelane_b32 v0, s35, 7
		; GFX9-O0-NEXT: s_or_saveexec_b64 s[42:43], -1
		; GFX9-O0-NEXT: buffer_store_dword v0, off, s[0:3], s33 ; 4-byte Folded Spill
		; GFX9-O0-NEXT: s_mov_b64 exec, s[42:43]
; GFX9-O0-NEXT: v_mov_b32_e32 v2, v8		; GFX9-O0-NEXT: v_mov_b32_e32 v2, v8
; GFX9-O0-NEXT: s_mov_b32 s34, 32		; GFX9-O0-NEXT: s_mov_b32 s34, 32
; GFX9-O0-NEXT: ; implicit-def: $sgpr36_sgpr37		; GFX9-O0-NEXT: ; implicit-def: $sgpr36_sgpr37
; GFX9-O0-NEXT: v_lshrrev_b64 v[3:4], s34, v[8:9]		; GFX9-O0-NEXT: v_lshrrev_b64 v[3:4], s34, v[8:9]
; GFX9-O0-NEXT: s_getpc_b64 s[34:35]		; GFX9-O0-NEXT: s_getpc_b64 s[34:35]
; GFX9-O0-NEXT: s_add_u32 s34, s34, strict_wwm_called_i64@gotpcrel32@lo+4		; GFX9-O0-NEXT: s_add_u32 s34, s34, strict_wwm_called_i64@gotpcrel32@lo+4
; GFX9-O0-NEXT: s_addc_u32 s35, s35, strict_wwm_called_i64@gotpcrel32@hi+12		; GFX9-O0-NEXT: s_addc_u32 s35, s35, strict_wwm_called_i64@gotpcrel32@hi+12
; GFX9-O0-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0		; GFX9-O0-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
; GFX9-O0-NEXT: s_mov_b64 s[38:39], s[2:3]		; GFX9-O0-NEXT: s_mov_b64 s[38:39], s[2:3]
; GFX9-O0-NEXT: s_mov_b64 s[36:37], s[0:1]		; GFX9-O0-NEXT: s_mov_b64 s[36:37], s[0:1]
; GFX9-O0-NEXT: s_mov_b64 s[0:1], s[36:37]		; GFX9-O0-NEXT: s_mov_b64 s[0:1], s[36:37]
; GFX9-O0-NEXT: s_mov_b64 s[2:3], s[38:39]		; GFX9-O0-NEXT: s_mov_b64 s[2:3], s[38:39]
; GFX9-O0-NEXT: v_mov_b32_e32 v0, v2		; GFX9-O0-NEXT: v_mov_b32_e32 v0, v2
; GFX9-O0-NEXT: v_mov_b32_e32 v1, v3		; GFX9-O0-NEXT: v_mov_b32_e32 v1, v3
; GFX9-O0-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-O0-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-O0-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-O0-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX9-O0-NEXT: v_readlane_b32 s34, v10, 6		; GFX9-O0-NEXT: s_or_saveexec_b64 s[42:43], -1
; GFX9-O0-NEXT: v_readlane_b32 s35, v10, 7		; GFX9-O0-NEXT: buffer_load_dword v6, off, s[0:3], s33 ; 4-byte Folded Reload
; GFX9-O0-NEXT: v_readlane_b32 s36, v10, 2		; GFX9-O0-NEXT: s_mov_b64 exec, s[42:43]
; GFX9-O0-NEXT: v_readlane_b32 s37, v10, 3		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: v_readlane_b32 s38, v10, 4		; GFX9-O0-NEXT: v_readlane_b32 s34, v6, 6
; GFX9-O0-NEXT: v_readlane_b32 s39, v10, 5		; GFX9-O0-NEXT: v_readlane_b32 s35, v6, 7
		; GFX9-O0-NEXT: v_readlane_b32 s36, v6, 2
		; GFX9-O0-NEXT: v_readlane_b32 s37, v6, 3
		; GFX9-O0-NEXT: v_readlane_b32 s38, v6, 4
		; GFX9-O0-NEXT: v_readlane_b32 s39, v6, 5
; GFX9-O0-NEXT: v_mov_b32_e32 v2, v0		; GFX9-O0-NEXT: v_mov_b32_e32 v2, v0
		; GFX9-O0-NEXT: s_or_saveexec_b64 s[42:43], -1
		; GFX9-O0-NEXT: buffer_load_dword v0, off, s[0:3], s33 ; 4-byte Folded Reload
		; GFX9-O0-NEXT: s_mov_b64 exec, s[42:43]
; GFX9-O0-NEXT: v_mov_b32_e32 v3, v1		; GFX9-O0-NEXT: v_mov_b32_e32 v3, v1
; GFX9-O0-NEXT: ; implicit-def: $sgpr40		; GFX9-O0-NEXT: ; implicit-def: $sgpr40
; GFX9-O0-NEXT: ; implicit-def: $sgpr40		; GFX9-O0-NEXT: ; implicit-def: $sgpr40
; GFX9-O0-NEXT: ; kill: def $vgpr2 killed $vgpr2 killed $exec		; GFX9-O0-NEXT: ; kill: def $vgpr2 killed $vgpr2 killed $exec
; GFX9-O0-NEXT: v_mov_b32_e32 v4, v8		; GFX9-O0-NEXT: v_mov_b32_e32 v4, v8
; GFX9-O0-NEXT: v_mov_b32_e32 v5, v9		; GFX9-O0-NEXT: v_mov_b32_e32 v5, v9
; GFX9-O0-NEXT: v_add_co_u32_e64 v2, s[40:41], v2, v4		; GFX9-O0-NEXT: v_add_co_u32_e64 v2, s[40:41], v2, v4
; GFX9-O0-NEXT: v_addc_co_u32_e64 v3, s[40:41], v3, v5, s[40:41]		; GFX9-O0-NEXT: v_addc_co_u32_e64 v3, s[40:41], v3, v5, s[40:41]
; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-O0-NEXT: v_mov_b32_e32 v0, v2		; GFX9-O0-NEXT: v_mov_b32_e32 v6, v2
; GFX9-O0-NEXT: v_mov_b32_e32 v1, v3		; GFX9-O0-NEXT: v_mov_b32_e32 v7, v3
; GFX9-O0-NEXT: s_mov_b32 s34, 0		; GFX9-O0-NEXT: s_mov_b32 s34, 0
; GFX9-O0-NEXT: buffer_store_dwordx2 v[0:1], off, s[36:39], s34 offset:4		; GFX9-O0-NEXT: buffer_store_dwordx2 v[6:7], off, s[36:39], s34 offset:4
; GFX9-O0-NEXT: v_readlane_b32 s31, v10, 1		; GFX9-O0-NEXT: s_waitcnt vmcnt(1)
; GFX9-O0-NEXT: v_readlane_b32 s30, v10, 0		; GFX9-O0-NEXT: v_readlane_b32 s31, v0, 1
		; GFX9-O0-NEXT: v_readlane_b32 s30, v0, 0
; GFX9-O0-NEXT: s_xor_saveexec_b64 s[34:35], -1		; GFX9-O0-NEXT: s_xor_saveexec_b64 s[34:35], -1
; GFX9-O0-NEXT: buffer_load_dword v10, off, s[0:3], s33 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_nop 0		; GFX9-O0-NEXT: buffer_load_dword v6, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
; GFX9-O0-NEXT: buffer_load_dword v8, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_nop 0		; GFX9-O0-NEXT: s_nop 0
; GFX9-O0-NEXT: buffer_load_dword v9, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v8, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_nop 0		; GFX9-O0-NEXT: s_nop 0
; GFX9-O0-NEXT: buffer_load_dword v2, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v9, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_nop 0		; GFX9-O0-NEXT: s_nop 0
; GFX9-O0-NEXT: buffer_load_dword v3, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v2, off, s[0:3], s33 offset:20 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_nop 0
; GFX9-O0-NEXT: buffer_load_dword v4, off, s[0:3], s33 offset:20 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_nop 0		; GFX9-O0-NEXT: s_nop 0
; GFX9-O0-NEXT: buffer_load_dword v3, off, s[0:3], s33 offset:24 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v3, off, s[0:3], s33 offset:24 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_nop 0		; GFX9-O0-NEXT: s_nop 0
; GFX9-O0-NEXT: buffer_load_dword v2, off, s[0:3], s33 offset:28 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v4, off, s[0:3], s33 offset:28 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_nop 0		; GFX9-O0-NEXT: s_nop 0
; GFX9-O0-NEXT: buffer_load_dword v3, off, s[0:3], s33 offset:32 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v3, off, s[0:3], s33 offset:32 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_nop 0		; GFX9-O0-NEXT: s_nop 0
; GFX9-O0-NEXT: buffer_load_dword v4, off, s[0:3], s33 offset:36 ; 4-byte Folded Reload		; GFX9-O0-NEXT: buffer_load_dword v2, off, s[0:3], s33 offset:36 ; 4-byte Folded Reload
; GFX9-O0-NEXT: buffer_load_dword v5, off, s[0:3], s33 offset:40 ; 4-byte Folded Reload		; GFX9-O0-NEXT: s_nop 0
		; GFX9-O0-NEXT: buffer_load_dword v3, off, s[0:3], s33 offset:40 ; 4-byte Folded Reload
		; GFX9-O0-NEXT: s_nop 0
		; GFX9-O0-NEXT: buffer_load_dword v4, off, s[0:3], s33 offset:44 ; 4-byte Folded Reload
		; GFX9-O0-NEXT: buffer_load_dword v5, off, s[0:3], s33 offset:48 ; 4-byte Folded Reload
; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-O0-NEXT: s_add_i32 s32, s32, 0xfffff400		; GFX9-O0-NEXT: s_add_i32 s32, s32, 0xfffff000
; GFX9-O0-NEXT: s_mov_b32 s33, s42		; GFX9-O0-NEXT: s_mov_b32 s33, s44
; GFX9-O0-NEXT: s_waitcnt vmcnt(0)		; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
; GFX9-O0-NEXT: s_setpc_b64 s[30:31]		; GFX9-O0-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX9-O3-LABEL: strict_wwm_call_i64:		; GFX9-O3-LABEL: strict_wwm_call_i64:
; GFX9-O3: ; %bb.0:		; GFX9-O3: ; %bb.0:
; GFX9-O3-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-O3-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-O3-NEXT: s_mov_b32 s40, s33		; GFX9-O3-NEXT: s_mov_b32 s40, s33
; GFX9-O3-NEXT: s_mov_b32 s33, s32		; GFX9-O3-NEXT: s_mov_b32 s33, s32
; GFX9-O3-NEXT: s_xor_saveexec_b64 s[34:35], -1		; GFX9-O3-NEXT: s_xor_saveexec_b64 s[34:35], -1
; GFX9-O3-NEXT: buffer_store_dword v8, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-O3-NEXT: buffer_store_dword v8, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-O3-NEXT: buffer_store_dword v6, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-O3-NEXT: buffer_store_dword v6, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-O3-NEXT: s_waitcnt vmcnt(0)		; GFX9-O3-NEXT: s_waitcnt vmcnt(0)
; GFX9-O3-NEXT: buffer_store_dword v7, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill		; GFX9-O3-NEXT: buffer_store_dword v7, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
; GFX9-O3-NEXT: buffer_store_dword v2, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill		; GFX9-O3-NEXT: buffer_store_dword v2, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
; GFX9-O3-NEXT: buffer_store_dword v3, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill		; GFX9-O3-NEXT: buffer_store_dword v3, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill
; GFX9-O3-NEXT: buffer_store_dword v2, off, s[0:3], s33 offset:20 ; 4-byte Folded Spill		; GFX9-O3-NEXT: buffer_store_dword v2, off, s[0:3], s33 offset:20 ; 4-byte Folded Spill
; GFX9-O3-NEXT: s_waitcnt vmcnt(0)		; GFX9-O3-NEXT: s_waitcnt vmcnt(0)
; GFX9-O3-NEXT: buffer_store_dword v3, off, s[0:3], s33 offset:24 ; 4-byte Folded Spill		; GFX9-O3-NEXT: buffer_store_dword v3, off, s[0:3], s33 offset:24 ; 4-byte Folded Spill
; GFX9-O3-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-O3-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-O3-NEXT: v_writelane_b32 v8, s30, 0		; GFX9-O3-NEXT: ; implicit-def: $vgpr8
; GFX9-O3-NEXT: s_addk_i32 s32, 0x800		; GFX9-O3-NEXT: s_addk_i32 s32, 0x800
		; GFX9-O3-NEXT: v_writelane_b32 v8, s30, 0
; GFX9-O3-NEXT: v_writelane_b32 v8, s31, 1		; GFX9-O3-NEXT: v_writelane_b32 v8, s31, 1
; GFX9-O3-NEXT: s_or_saveexec_b64 s[34:35], -1		; GFX9-O3-NEXT: s_or_saveexec_b64 s[34:35], -1
; GFX9-O3-NEXT: s_getpc_b64 s[36:37]		; GFX9-O3-NEXT: s_getpc_b64 s[36:37]
; GFX9-O3-NEXT: s_add_u32 s36, s36, strict_wwm_called_i64@gotpcrel32@lo+4		; GFX9-O3-NEXT: s_add_u32 s36, s36, strict_wwm_called_i64@gotpcrel32@lo+4
; GFX9-O3-NEXT: s_addc_u32 s37, s37, strict_wwm_called_i64@gotpcrel32@hi+12		; GFX9-O3-NEXT: s_addc_u32 s37, s37, strict_wwm_called_i64@gotpcrel32@hi+12
; GFX9-O3-NEXT: s_load_dwordx2 s[36:37], s[36:37], 0x0		; GFX9-O3-NEXT: s_load_dwordx2 s[36:37], s[36:37], 0x0
; GFX9-O3-NEXT: s_mov_b64 exec, s[34:35]		; GFX9-O3-NEXT: s_mov_b64 exec, s[34:35]
; GFX9-O3-NEXT: v_mov_b32_e32 v6, s8		; GFX9-O3-NEXT: v_mov_b32_e32 v6, s8
▲ Show 20 Lines • Show All 234 Lines • Show Last 20 Lines

llvm/test/CodeGen/MIR/AMDGPU/machine-function-info-after-pei.ll

	Show All 31 Lines
	; AFTER-PEI-NEXT: fp32-input-denormals: true			; AFTER-PEI-NEXT: fp32-input-denormals: true
	; AFTER-PEI-NEXT: fp32-output-denormals: true			; AFTER-PEI-NEXT: fp32-output-denormals: true
	; AFTER-PEI-NEXT: fp64-fp16-input-denormals: true			; AFTER-PEI-NEXT: fp64-fp16-input-denormals: true
	; AFTER-PEI-NEXT: fp64-fp16-output-denormals: true			; AFTER-PEI-NEXT: fp64-fp16-output-denormals: true
	; AFTER-PEI-NEXT: highBitsOf32BitAddress: 0			; AFTER-PEI-NEXT: highBitsOf32BitAddress: 0
	; AFTER-PEI-NEXT: occupancy: 5			; AFTER-PEI-NEXT: occupancy: 5
	; AFTER-PEI-NEXT: scavengeFI: '%fixed-stack.0'			; AFTER-PEI-NEXT: scavengeFI: '%fixed-stack.0'
	; AFTER-PEI-NEXT: vgprForAGPRCopy: ''			; AFTER-PEI-NEXT: vgprForAGPRCopy: ''
				; AFTER-PEI-NEXT: sgprForEXECCopy: ''
	; AFTER-PEI-NEXT: body:			; AFTER-PEI-NEXT: body:
	define amdgpu_kernel void @scavenge_fi(i32 addrspace(1)* %out, i32 %in) #0 {			define amdgpu_kernel void @scavenge_fi(i32 addrspace(1)* %out, i32 %in) #0 {
	%wide.sgpr0 = call <32 x i32> asm sideeffect "; def $0", "=s" () #0			%wide.sgpr0 = call <32 x i32> asm sideeffect "; def $0", "=s" () #0
	%wide.sgpr1 = call <32 x i32> asm sideeffect "; def $0", "=s" () #0			%wide.sgpr1 = call <32 x i32> asm sideeffect "; def $0", "=s" () #0
	%wide.sgpr2 = call <32 x i32> asm sideeffect "; def $0", "=s" () #0			%wide.sgpr2 = call <32 x i32> asm sideeffect "; def $0", "=s" () #0
	%wide.sgpr3 = call <32 x i32> asm sideeffect "; def $0", "=s" () #0			%wide.sgpr3 = call <32 x i32> asm sideeffect "; def $0", "=s" () #0

	call void asm sideeffect "; use $0", "s"(<32 x i32> %wide.sgpr0) #0			call void asm sideeffect "; use $0", "s"(<32 x i32> %wide.sgpr0) #0
	call void asm sideeffect "; use $0", "s"(<32 x i32> %wide.sgpr1) #0			call void asm sideeffect "; use $0", "s"(<32 x i32> %wide.sgpr1) #0
	call void asm sideeffect "; use $0", "s"(<32 x i32> %wide.sgpr2) #0			call void asm sideeffect "; use $0", "s"(<32 x i32> %wide.sgpr2) #0
	call void asm sideeffect "; use $0", "s"(<32 x i32> %wide.sgpr3) #0			call void asm sideeffect "; use $0", "s"(<32 x i32> %wide.sgpr3) #0
	ret void			ret void
	}			}

	attributes #0 = { nounwind }			attributes #0 = { nounwind }

llvm/test/CodeGen/MIR/AMDGPU/machine-function-info-no-ir.mir

	Show All 40 Lines
	# FULL-NEXT: dx10-clamp: true			# FULL-NEXT: dx10-clamp: true
	# FULL-NEXT: fp32-input-denormals: true			# FULL-NEXT: fp32-input-denormals: true
	# FULL-NEXT: fp32-output-denormals: true			# FULL-NEXT: fp32-output-denormals: true
	# FULL-NEXT: fp64-fp16-input-denormals: true			# FULL-NEXT: fp64-fp16-input-denormals: true
	# FULL-NEXT: fp64-fp16-output-denormals: true			# FULL-NEXT: fp64-fp16-output-denormals: true
	# FULL-NEXT: highBitsOf32BitAddress: 0			# FULL-NEXT: highBitsOf32BitAddress: 0
	# FULL-NEXT: occupancy: 10			# FULL-NEXT: occupancy: 10
	# FULL-NEXT: vgprForAGPRCopy: ''			# FULL-NEXT: vgprForAGPRCopy: ''
				# FULL-NEXT: sgprForEXECCopy: ''
	# FULL-NEXT: body:			# FULL-NEXT: body:

	# SIMPLE: machineFunctionInfo:			# SIMPLE: machineFunctionInfo:
	# SIMPLE-NEXT: explicitKernArgSize: 128			# SIMPLE-NEXT: explicitKernArgSize: 128
	# SIMPLE-NEXT: maxKernArgAlign: 64			# SIMPLE-NEXT: maxKernArgAlign: 64
	# SIMPLE-NEXT: ldsSize: 2048			# SIMPLE-NEXT: ldsSize: 2048
	# SIMPLE-NEXT: gdsSize: 256			# SIMPLE-NEXT: gdsSize: 256
	# SIMPLE-NEXT: isEntryFunction: true			# SIMPLE-NEXT: isEntryFunction: true
	▲ Show 20 Lines • Show All 82 Lines • ▼ Show 20 Lines
	# FULL-NEXT: dx10-clamp: true			# FULL-NEXT: dx10-clamp: true
	# FULL-NEXT: fp32-input-denormals: true			# FULL-NEXT: fp32-input-denormals: true
	# FULL-NEXT: fp32-output-denormals: true			# FULL-NEXT: fp32-output-denormals: true
	# FULL-NEXT: fp64-fp16-input-denormals: true			# FULL-NEXT: fp64-fp16-input-denormals: true
	# FULL-NEXT: fp64-fp16-output-denormals: true			# FULL-NEXT: fp64-fp16-output-denormals: true
	# FULL-NEXT: highBitsOf32BitAddress: 0			# FULL-NEXT: highBitsOf32BitAddress: 0
	# FULL-NEXT: occupancy: 10			# FULL-NEXT: occupancy: 10
	# FULL-NEXT: vgprForAGPRCopy: ''			# FULL-NEXT: vgprForAGPRCopy: ''
				# FULL-NEXT: sgprForEXECCopy: ''
	# FULL-NEXT: body:			# FULL-NEXT: body:

	# SIMPLE: machineFunctionInfo:			# SIMPLE: machineFunctionInfo:
	# SIMPLE-NEXT: maxKernArgAlign: 1			# SIMPLE-NEXT: maxKernArgAlign: 1
	# SIMPLE-NEXT: argumentInfo:			# SIMPLE-NEXT: argumentInfo:
	# SIMPLE-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }			# SIMPLE-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
	# SIMPLE-NEXT: dispatchPtr: { reg: '$sgpr4_sgpr5' }			# SIMPLE-NEXT: dispatchPtr: { reg: '$sgpr4_sgpr5' }
	# SIMPLE-NEXT: queuePtr: { reg: '$sgpr6_sgpr7' }			# SIMPLE-NEXT: queuePtr: { reg: '$sgpr6_sgpr7' }
	▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
	# FULL-NEXT: dx10-clamp: true			# FULL-NEXT: dx10-clamp: true
	# FULL-NEXT: fp32-input-denormals: true			# FULL-NEXT: fp32-input-denormals: true
	# FULL-NEXT: fp32-output-denormals: true			# FULL-NEXT: fp32-output-denormals: true
	# FULL-NEXT: fp64-fp16-input-denormals: true			# FULL-NEXT: fp64-fp16-input-denormals: true
	# FULL-NEXT: fp64-fp16-output-denormals: true			# FULL-NEXT: fp64-fp16-output-denormals: true
	# FULL-NEXT: highBitsOf32BitAddress: 0			# FULL-NEXT: highBitsOf32BitAddress: 0
	# FULL-NEXT: occupancy: 10			# FULL-NEXT: occupancy: 10
	# FULL-NEXT: vgprForAGPRCopy: ''			# FULL-NEXT: vgprForAGPRCopy: ''
				# FULL-NEXT: sgprForEXECCopy: ''
	# FULL-NEXT: body:			# FULL-NEXT: body:

	# SIMPLE: machineFunctionInfo:			# SIMPLE: machineFunctionInfo:
	# SIMPLE-NEXT: maxKernArgAlign: 1			# SIMPLE-NEXT: maxKernArgAlign: 1
	# SIMPLE-NEXT: argumentInfo:			# SIMPLE-NEXT: argumentInfo:
	# SIMPLE-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }			# SIMPLE-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
	# SIMPLE-NEXT: dispatchPtr: { reg: '$sgpr4_sgpr5' }			# SIMPLE-NEXT: dispatchPtr: { reg: '$sgpr4_sgpr5' }
	# SIMPLE-NEXT: queuePtr: { reg: '$sgpr6_sgpr7' }			# SIMPLE-NEXT: queuePtr: { reg: '$sgpr6_sgpr7' }
	▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
	# FULL-NEXT: dx10-clamp: true			# FULL-NEXT: dx10-clamp: true
	# FULL-NEXT: fp32-input-denormals: true			# FULL-NEXT: fp32-input-denormals: true
	# FULL-NEXT: fp32-output-denormals: true			# FULL-NEXT: fp32-output-denormals: true
	# FULL-NEXT: fp64-fp16-input-denormals: true			# FULL-NEXT: fp64-fp16-input-denormals: true
	# FULL-NEXT: fp64-fp16-output-denormals: true			# FULL-NEXT: fp64-fp16-output-denormals: true
	# FULL-NEXT: highBitsOf32BitAddress: 0			# FULL-NEXT: highBitsOf32BitAddress: 0
	# FULL-NEXT: occupancy: 10			# FULL-NEXT: occupancy: 10
	# FULL-NEXT: vgprForAGPRCopy: ''			# FULL-NEXT: vgprForAGPRCopy: ''
				# FULL-NEXT: sgprForEXECCopy: ''
	# FULL-NEXT: body:			# FULL-NEXT: body:

	# SIMPLE: machineFunctionInfo:			# SIMPLE: machineFunctionInfo:
	# SIMPLE-NEXT: maxKernArgAlign: 1			# SIMPLE-NEXT: maxKernArgAlign: 1
	# SIMPLE-NEXT: isEntryFunction: true			# SIMPLE-NEXT: isEntryFunction: true
	# SIMPLE-NEXT: argumentInfo:			# SIMPLE-NEXT: argumentInfo:
	# SIMPLE-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }			# SIMPLE-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
	# SIMPLE-NEXT: dispatchPtr: { reg: '$sgpr4_sgpr5' }			# SIMPLE-NEXT: dispatchPtr: { reg: '$sgpr4_sgpr5' }
	▲ Show 20 Lines • Show All 230 Lines • ▼ Show 20 Lines
	name: vgpr_for_agpr_copy_noreg			name: vgpr_for_agpr_copy_noreg
	machineFunctionInfo:			machineFunctionInfo:
	vgprForAGPRCopy: '$noreg'			vgprForAGPRCopy: '$noreg'
	body: \|			body: \|
	bb.0:			bb.0:
	SI_RETURN			SI_RETURN

	...			...

				---
				# ALL-LABEL: name: sgpr_for_exec_copy
				# ALL: sgprForEXECCopy: '$sgpr2_sgpr3'
				name: sgpr_for_exec_copy
				machineFunctionInfo:
				sgprForEXECCopy: '$sgpr2_sgpr3'
				body: \|
				bb.0:
				SI_RETURN

				...

				---
				# ALL-LABEL: name: sgpr_for_exec_copy_noreg
				# FULL: sgprForEXECCopy: ''
				# SIMPLE-NOT: sgprForEXECCopy
				name: sgpr_for_exec_copy_noreg
				machineFunctionInfo:
				sgprForEXECCopy: '$noreg'
				body: \|
				bb.0:
				SI_RETURN

				...

llvm/test/CodeGen/MIR/AMDGPU/machine-function-info.ll

	Show All 34 Lines
	; CHECK-NEXT: dx10-clamp: true			; CHECK-NEXT: dx10-clamp: true
	; CHECK-NEXT: fp32-input-denormals: true			; CHECK-NEXT: fp32-input-denormals: true
	; CHECK-NEXT: fp32-output-denormals: true			; CHECK-NEXT: fp32-output-denormals: true
	; CHECK-NEXT: fp64-fp16-input-denormals: true			; CHECK-NEXT: fp64-fp16-input-denormals: true
	; CHECK-NEXT: fp64-fp16-output-denormals: true			; CHECK-NEXT: fp64-fp16-output-denormals: true
	; CHECK-NEXT: highBitsOf32BitAddress: 0			; CHECK-NEXT: highBitsOf32BitAddress: 0
	; CHECK-NEXT: occupancy: 10			; CHECK-NEXT: occupancy: 10
	; CHECK-NEXT: vgprForAGPRCopy: ''			; CHECK-NEXT: vgprForAGPRCopy: ''
				; CHECK-NEXT: sgprForEXECCopy: '$sgpr100_sgpr101'
	; CHECK-NEXT: body:			; CHECK-NEXT: body:
	define amdgpu_kernel void @kernel(i32 %arg0, i64 %arg1, <16 x i32> %arg2) {			define amdgpu_kernel void @kernel(i32 %arg0, i64 %arg1, <16 x i32> %arg2) {
	%gep = getelementptr inbounds [512 x float], [512 x float] addrspace(3)* @lds, i32 0, i32 %arg0			%gep = getelementptr inbounds [512 x float], [512 x float] addrspace(3)* @lds, i32 0, i32 %arg0
	store float 0.0, float addrspace(3)* %gep, align 4			store float 0.0, float addrspace(3)* %gep, align 4
	ret void			ret void
	}			}

	@gds = addrspace(2) global [128 x i32] undef, align 4			@gds = addrspace(2) global [128 x i32] undef, align 4
	Show All 24 Lines
	; CHECK-NEXT: dx10-clamp: true			; CHECK-NEXT: dx10-clamp: true
	; CHECK-NEXT: fp32-input-denormals: true			; CHECK-NEXT: fp32-input-denormals: true
	; CHECK-NEXT: fp32-output-denormals: true			; CHECK-NEXT: fp32-output-denormals: true
	; CHECK-NEXT: fp64-fp16-input-denormals: true			; CHECK-NEXT: fp64-fp16-input-denormals: true
	; CHECK-NEXT: fp64-fp16-output-denormals: true			; CHECK-NEXT: fp64-fp16-output-denormals: true
	; CHECK-NEXT: highBitsOf32BitAddress: 0			; CHECK-NEXT: highBitsOf32BitAddress: 0
	; CHECK-NEXT: occupancy: 10			; CHECK-NEXT: occupancy: 10
	; CHECK-NEXT: vgprForAGPRCopy: ''			; CHECK-NEXT: vgprForAGPRCopy: ''
				; CHECK-NEXT: sgprForEXECCopy: '$sgpr100_sgpr101'
	; CHECK-NEXT: body:			; CHECK-NEXT: body:
	define amdgpu_ps void @ps_shader(i32 %arg0, i32 inreg %arg1) {			define amdgpu_ps void @ps_shader(i32 %arg0, i32 inreg %arg1) {
	%gep = getelementptr inbounds [128 x i32], [128 x i32] addrspace(2)* @gds, i32 0, i32 %arg0			%gep = getelementptr inbounds [128 x i32], [128 x i32] addrspace(2)* @gds, i32 0, i32 %arg0
	atomicrmw add i32 addrspace(2)* %gep, i32 8 seq_cst			atomicrmw add i32 addrspace(2)* %gep, i32 8 seq_cst
	ret void			ret void
	}			}

	; CHECK-LABEL: {{^}}name: gds_size_shader			; CHECK-LABEL: {{^}}name: gds_size_shader
	Show All 38 Lines
	; CHECK-NEXT: dx10-clamp: true			; CHECK-NEXT: dx10-clamp: true
	; CHECK-NEXT: fp32-input-denormals: true			; CHECK-NEXT: fp32-input-denormals: true
	; CHECK-NEXT: fp32-output-denormals: true			; CHECK-NEXT: fp32-output-denormals: true
	; CHECK-NEXT: fp64-fp16-input-denormals: true			; CHECK-NEXT: fp64-fp16-input-denormals: true
	; CHECK-NEXT: fp64-fp16-output-denormals: true			; CHECK-NEXT: fp64-fp16-output-denormals: true
	; CHECK-NEXT: highBitsOf32BitAddress: 0			; CHECK-NEXT: highBitsOf32BitAddress: 0
	; CHECK-NEXT: occupancy: 10			; CHECK-NEXT: occupancy: 10
	; CHECK-NEXT: vgprForAGPRCopy: ''			; CHECK-NEXT: vgprForAGPRCopy: ''
				; CHECK-NEXT: sgprForEXECCopy: '$sgpr100_sgpr101'
	; CHECK-NEXT: body:			; CHECK-NEXT: body:
	define void @function() {			define void @function() {
	ret void			ret void
	}			}

	; CHECK-LABEL: {{^}}name: function_nsz			; CHECK-LABEL: {{^}}name: function_nsz
	; CHECK: machineFunctionInfo:			; CHECK: machineFunctionInfo:
	; CHECK-NEXT: explicitKernArgSize: 0			; CHECK-NEXT: explicitKernArgSize: 0
	Show All 30 Lines
	; CHECK-NEXT: dx10-clamp: true			; CHECK-NEXT: dx10-clamp: true
	; CHECK-NEXT: fp32-input-denormals: true			; CHECK-NEXT: fp32-input-denormals: true
	; CHECK-NEXT: fp32-output-denormals: true			; CHECK-NEXT: fp32-output-denormals: true
	; CHECK-NEXT: fp64-fp16-input-denormals: true			; CHECK-NEXT: fp64-fp16-input-denormals: true
	; CHECK-NEXT: fp64-fp16-output-denormals: true			; CHECK-NEXT: fp64-fp16-output-denormals: true
	; CHECK-NEXT: highBitsOf32BitAddress: 0			; CHECK-NEXT: highBitsOf32BitAddress: 0
	; CHECK-NEXT: occupancy: 10			; CHECK-NEXT: occupancy: 10
	; CHECK-NEXT: vgprForAGPRCopy: ''			; CHECK-NEXT: vgprForAGPRCopy: ''
				; CHECK-NEXT: sgprForEXECCopy: '$sgpr100_sgpr101'
	; CHECK-NEXT: body:			; CHECK-NEXT: body:
	define void @function_nsz() #0 {			define void @function_nsz() #0 {
	ret void			ret void
	}			}

	; CHECK-LABEL: {{^}}name: function_dx10_clamp_off			; CHECK-LABEL: {{^}}name: function_dx10_clamp_off
	; CHECK: mode:			; CHECK: mode:
	; CHECK-NEXT: ieee: true			; CHECK-NEXT: ieee: true
	▲ Show 20 Lines • Show All 63 Lines • Show Last 20 Lines

llvm/test/CodeGen/MIR/AMDGPU/sgpr-for-exec-copy-invalid-reg.mir

This file was added.

				# RUN: not llc -mtriple=amdgcn-amd-amdhsa -run-pass=none -verify-machineinstrs %s -o /dev/null 2>&1 \| FileCheck -check-prefix=ERR %s

				---
				name: invalid_reg
				machineFunctionInfo:
				# ERR: [[@LINE+1]]:21: unknown register name 'srst'
				sgprForEXECCopy: '$srst'
				body: \|
				bb.0:
				S_ENDPGM 0

				...

llvm/test/CodeGen/MIR/AMDGPU/stack-id-assert.mir

	# This test used to crash MIRPrinter::convertStackObjects():			# This test used to crash MIRPrinter::convertStackObjects():
	# MFI can contain some dead stack objects after PEI pass, but objects storage			# MFI can contain some dead stack objects after PEI pass, but objects storage
	# contains not dead objects only. So using objects IDs as offset in the storage			# contains not dead objects only. So using objects IDs as offset in the storage
	# caused out of bounds access.			# caused out of bounds access.

	# RUN: llc -march=amdgcn -run-pass=si-lower-sgpr-spills,prologepilog -verify-machineinstrs -o - %s \| FileCheck %s			# RUN: llc -march=amdgcn -start-before=si-lower-sgpr-spills -stop-after=prologepilog -verify-machineinstrs -o - %s \| FileCheck %s

	# CHECK-LABEL: name: foo			# CHECK-LABEL: name: foo
	# CHECK: {{^}}fixedStack: []			# CHECK: {{^}}fixedStack: []
	# CHECK: stack: []			# CHECK: stack: []

	# CHECK-LABEL: name: bar			# CHECK-LABEL: name: bar
	# CHECK: fixedStack: []			# CHECK: fixedStack: []
	# CHECK-NEXT: {{^}}stack:			# CHECK-NEXT: {{^}}stack:
	▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRsAcceptedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 477532

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

llvm/lib/Target/AMDGPU/SIDefines.h

llvm/lib/Target/AMDGPU/SIFrameLowering.h

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

llvm/lib/Target/AMDGPU/SIInstrInfo.h

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

llvm/lib/Target/AMDGPU/SIInstructions.td

llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp

llvm/test/CodeGen/AMDGPU/GlobalISel/assert-align.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/call-outgoing-stack-args.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/image-waterfall-loop-O0.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/localizer.ll

llvm/test/CodeGen/AMDGPU/abi-attribute-hints-undefined-behavior.ll

llvm/test/CodeGen/AMDGPU/branch-relax-spill.ll

llvm/test/CodeGen/AMDGPU/call-alias-register-usage-agpr.ll

llvm/test/CodeGen/AMDGPU/call-alias-register-usage1.ll

llvm/test/CodeGen/AMDGPU/callee-frame-setup.ll

llvm/test/CodeGen/AMDGPU/cf-loop-on-constant.ll

llvm/test/CodeGen/AMDGPU/collapse-endcf.ll

llvm/test/CodeGen/AMDGPU/control-flow-fastregalloc.ll

llvm/test/CodeGen/AMDGPU/cross-block-use-is-not-abi-copy.ll

llvm/test/CodeGen/AMDGPU/csr-sgpr-spill-live-ins.mir

llvm/test/CodeGen/AMDGPU/dwarf-multi-register-use-crash.ll

llvm/test/CodeGen/AMDGPU/fix-frame-reg-in-custom-csr-spills.ll

llvm/test/CodeGen/AMDGPU/flat-scratch-init.ll

llvm/test/CodeGen/AMDGPU/fold-reload-into-exec.mir

llvm/test/CodeGen/AMDGPU/fold-reload-into-m0.mir

llvm/test/CodeGen/AMDGPU/frame-setup-without-sgpr-to-vgpr-spills.ll

llvm/test/CodeGen/AMDGPU/gfx-call-non-gfx-func.ll

llvm/test/CodeGen/AMDGPU/gfx-callable-argument-types.ll

llvm/test/CodeGen/AMDGPU/gfx-callable-preserved-registers.ll

llvm/test/CodeGen/AMDGPU/gfx-callable-return-types.ll

llvm/test/CodeGen/AMDGPU/indirect-call.ll

llvm/test/CodeGen/AMDGPU/kernel-vgpr-spill-mubuf-with-voffset.ll

llvm/test/CodeGen/AMDGPU/load-constant-i16.ll

llvm/test/CodeGen/AMDGPU/mubuf-legalize-operands.ll

llvm/test/CodeGen/AMDGPU/mul24-pass-ordering.ll

llvm/test/CodeGen/AMDGPU/need-fp-from-vgpr-spills.ll

llvm/test/CodeGen/AMDGPU/no-source-locations-in-prologue.ll

llvm/test/CodeGen/AMDGPU/partial-sgpr-to-vgpr-spills.ll

llvm/test/CodeGen/AMDGPU/scc-clobbered-sgpr-to-vmem-spill.ll

llvm/test/CodeGen/AMDGPU/sgpr-spill-dead-frame-in-dbg-value.mir

llvm/test/CodeGen/AMDGPU/sgpr-spill-fi-skip-processing-stack-arg-dbg-value.mir

llvm/test/CodeGen/AMDGPU/sgpr-spill-no-vgprs.ll

llvm/test/CodeGen/AMDGPU/sgpr-spill-partially-undef.mir

llvm/test/CodeGen/AMDGPU/sgpr-spill-update-only-slot-indexes.ll

llvm/test/CodeGen/AMDGPU/sgpr-spill-vmem-large-frame.mir

llvm/test/CodeGen/AMDGPU/sgpr-spills-split-regalloc.ll

llvm/test/CodeGen/AMDGPU/si-spill-sgpr-stack.ll

llvm/test/CodeGen/AMDGPU/sibling-call.ll

llvm/test/CodeGen/AMDGPU/spill-csr-frame-ptr-reg-copy.ll

llvm/test/CodeGen/AMDGPU/spill-offset-calculation.ll

llvm/test/CodeGen/AMDGPU/spill-reg-tuple-super-reg-use.mir

llvm/test/CodeGen/AMDGPU/spill-scavenge-offset.ll

llvm/test/CodeGen/AMDGPU/spill-sgpr-csr-live-ins.mir

llvm/test/CodeGen/AMDGPU/spill-sgpr-stack-no-sgpr.ll

llvm/test/CodeGen/AMDGPU/spill-sgpr-to-virtual-vgpr.mir

llvm/test/CodeGen/AMDGPU/spill-vgpr-to-agpr-update-regscavenger.ll

llvm/test/CodeGen/AMDGPU/spill-writelane-vgprs.ll

llvm/test/CodeGen/AMDGPU/spill192.mir

llvm/test/CodeGen/AMDGPU/spill224.mir

llvm/test/CodeGen/AMDGPU/tail-call-amdgpu-gfx.ll

llvm/test/CodeGen/AMDGPU/tuple-allocation-failure.ll

llvm/test/CodeGen/AMDGPU/unstructured-cfg-def-use-issue.ll

llvm/test/CodeGen/AMDGPU/vgpr-tuple-allocation.ll

llvm/test/CodeGen/AMDGPU/wwm-register-spill-during-regalloc.ll

llvm/test/CodeGen/AMDGPU/wwm-reserved-spill.ll

llvm/test/CodeGen/MIR/AMDGPU/machine-function-info-after-pei.ll

llvm/test/CodeGen/MIR/AMDGPU/machine-function-info-no-ir.mir

[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs
AcceptedPublic